Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bandwidth usage way too high for limited data plan #9081

Open
oskarth opened this issue Oct 1, 2019 · 26 comments

Comments

@oskarth
Copy link
Member

@oskarth oskarth commented Oct 1, 2019

Problem

As a user with a limited data plan, I want the bandwidth usage to be substantially lower, so that I can use Status on 3G/4G with a limited data plan.

Details

App version: 0.13.1 (2019080817)
OS: ios
Node version: v.0.25.0-beta.0
Mailserver: mail-03.gc-us-central1-a.eth.beta

Using iOS, current period under Cellular data says the following for similar apps:

  • Status 14.8 GB
  • Telegram 1.8 GB
  • Line 884 MB
  • Fastmail 210 MB
  • Signal 203 MB

Note that all apps outside of Status have attachments and images in them.

To calibrate for usage, here are the corresponding numbers for Screentime the last 7 days:

  • Telegram 2h 29m
  • Status 1h 18m
  • Line 1h 8m
  • Fastmail 27m
  • Signal 11m

Note that in Line and Signal I'm not in any public channels, but in Telegram I'm in several that are a lot more noisy than Status.

Comparing to Telegram, Line and Signal, this means we currently consume 10-20x more bandwidth, without attachments. As a user, this is an unacceptable user experience.

Implementation

As a somewhat representative user, but with a limited data plan I care more about this than cover traffic/metadata protection.

Acceptance Criteria

Bandwidth usage reduced 10-20x so that it is within a factor or three of comparable apps, like Telegram, Line and Signal.

Notes

In light of the current financial situation, timeline, and growing the core app user base, it might be the case that we partition the problem in two:

a) Continue long-term 'fundamental' research in conjunction with other projects to develop a better alternative (Block.Science/Swarm/Nym/libp2p)
b) bandaid to help with adoption and traction of Status the app, w/o as strong metadata/decentralization guarantees, a la Infura-for-chat (basically what we have already with mailserver)

Side note: I also searched for 'bandwidth' in open issues and couldn't find a relevant one, which is a bit surprising given that it's a very common user complaint, anecdotally. User feedback not making its way into concrete problem descriptions? cc @rachelhamlin @hesterbruikman

Future Steps

Replace Whisper.

@oskarth

This comment has been minimized.

Copy link
Member Author

@oskarth oskarth commented Oct 1, 2019

@cammellos

This comment has been minimized.

Copy link
Member

@cammellos cammellos commented Oct 1, 2019

@oskarth which version of the app are you running? could you please add it to the description

@oskarth

This comment has been minimized.

Copy link
Member Author

@oskarth oskarth commented Oct 1, 2019

@cammellos done

@rachelhamlin

This comment has been minimized.

Copy link
Member

@rachelhamlin rachelhamlin commented Oct 1, 2019

Side note: I also searched for 'bandwidth' in open issues and couldn't find a relevant one, which is a bit surprising given that it's a very common user complaint, anecdotally.

It's actually on my mind, but not captured, you're right @oskarth. To justify that slightly, we haven't had mental bandwidth to focus on much outside of SNT utility, multi-account, keycard and bugs this year—until now. So bandwidth will be a topic during our Oct 15 planning session (discuss post TK today).

User feedback not making its way into concrete problem descriptions?

The issue of capturing user feedback is something that I very much hope to prioritize now that @andremedeiros is coming onboard to help with the dev process.

b) bandaid to help with adoption and traction of Status the app, w/o as strong metadata/decentralization guarantees, a la Infura-for-chat (basically what we have already with mailserver)

What kind of sacrifice are we willing to make here? Let's discuss in janitors.

@oskarth

This comment has been minimized.

Copy link
Member Author

@oskarth oskarth commented Oct 1, 2019

It's actually on my mind, but not captured, you're right @oskarth. To justify that slightly, we haven't had mental bandwidth to focus on much outside of SNT utility, multi-account, keycard and bugs this year—until now. So bandwidth will be a topic during our Oct 15 planning session (discuss post TK today).

Yeah that's fair, I think it's a larger systemic issue though, as user's feedback doesn't make its way to GHIs. Perhaps cause it is too intimidating? Or they give feedback in other forums and then there's a lack of follow up? Something re community link missing here, not quite sure what. cc @jonathanbarker @j-zerah FYI.

What kind of sacrifice are we willing to make here? Let's discuss in janitors.

I'd like this to be an open discussion, but we can bring it up there as well.

@cammellos

This comment has been minimized.

Copy link
Member

@cammellos cammellos commented Oct 1, 2019

There's also to note that the version used for bandwidth is currently still listening to the old shared topic, which will be disabled for v1.
From the bandwidth tests https://docs.google.com/spreadsheets/d/13kffxZaPnvULoy5Qh5sZSI2551KusCLJWUcdhKyobkE/edit#gid=0 , that version is ~6 times more bandwidth hungry than v1 (15 MB vs 94), although the benchmark are to be taken with a pinch of salt.
Currently working in having them automated, so we can better tune them and record them.

@andremedeiros

This comment has been minimized.

Copy link
Member

@andremedeiros andremedeiros commented Oct 1, 2019

Currently working in having them automated, so we can better tune them and record them.

Commendable effort, @cammellos! What does this involve and how hard would it be to get this to run as part of the test suite?

@hesterbruikman

This comment has been minimized.

Copy link
Member

@hesterbruikman hesterbruikman commented Oct 1, 2019

Thought on the UI side:
We already implemented 'fetch messages' to allow for more user controlled bandwidth use. Could expand this to channels (cc @errorists ). That is after exploring other options to save bandwidth while retaining all functionality.

Regarding feedbacak will check in #statusphere / ambassadors channel to see if they reecognize the issue. As reliability and notifications, which are crucial for on the go use, have developed a bad rep it could also be that the majority of our user base is relying on wifi/in home experimentation with Status. Just a theory.

@cammellos

This comment has been minimized.

Copy link
Member

@cammellos cammellos commented Oct 2, 2019

@andremedeiros We are going only to test status-protocol-go, as automating the testing of status-react is much harder (so we won't be testing mailserver interactions until that code is ported).

The strategy that I am following is to have two clients (or more, for now just two), interact with each other for a specified amount of time/messages. Both client will be dockerized and run through docker-compose, at the end of the tests metrics for each container can be collected with docker stats.

We can probably easily get it in the test suite (status-protocol-go), the only dependencies would be docker/compose and golang, it's would take some more time to make it a red/green test (it's more of a benchmark), as well as we don't have isolated network conditions for now, so it also depends on the overall traffic of the network, but we can take that into account when measuring.

@andremedeiros

This comment has been minimized.

Copy link
Member

@andremedeiros andremedeiros commented Oct 2, 2019

That makes perfect sense, @cammellos. Thank you.

@jonathanbarker

This comment has been minimized.

Copy link

@jonathanbarker jonathanbarker commented Oct 3, 2019

Or they give feedback in other forums and then there's a lack of follow up? Something re community link missing here, not quite sure what. cc @jonathanbarker @j-zerah FYI.

re: capturing user/community feedback - have we considered an "engagement survey" type mechanism for our main community users? Similar to one for core contributors, but focused on feedback they have for Status products, features, etc

@hesterbruikman

This comment has been minimized.

Copy link
Member

@hesterbruikman hesterbruikman commented Oct 3, 2019

We have had those in the past and it surely is time to bring them back! It was always a bit more of a one off, never a solid mechanism that better balances effort-output.

Regarding bandwidth, a quick poll in #statusphere brought no alarming response by 3 active community member/contributors. All estimating a monthly 1GB going to Status. Not to say that it's not a problem:)

@corpetty

This comment has been minimized.

Copy link
Contributor

@corpetty corpetty commented Oct 7, 2019

one plan: have a separate type of public chat that isn't as private (with respect to traffic analysis). You could just have it as an option and have some UI element that notes what kind of public channel it is.

This could also be the default and then we can set up relays that allows people to communicate in these at bandwidth costs (similar to @jakubgs bridge)

@cammellos

This comment has been minimized.

Copy link
Member

@cammellos cammellos commented Oct 8, 2019

Here's a rough tool to check for bandwidth:
https://github.com/status-im/status-protocol-go-bandwidth-test

one plan: have a separate type of public chat that isn't as private (with respect to traffic analysis). You could just have it as an option and have some UI element that notes what kind of public channel it is.

The issue is not due to public chats I believe ( we have high usage even without joining any public chat from the previous bandwidth tests ), it's mainly due to discovery topics, as you receive messages not sent to you, while in public chat is the chance is lower (there's a chance that your bloom filter matches some other topic, but it's probably not huge), so unless we completely bypass whisper not sure we can optimize those much, but worth having a look.

@corpetty

This comment has been minimized.

Copy link
Contributor

@corpetty corpetty commented Oct 8, 2019

have we ever tried playing with a dynamic tuning of the bloom filters based on user preference? It's basically a sliding scale of how much you poll for based on how much information you want to give the server your asking. If a user doesn't care about that, then they can at least minimize the amount of "extra stuff" they're getting.

@cammellos

This comment has been minimized.

Copy link
Member

@cammellos cammellos commented Oct 8, 2019

That's a fairly big chance, it means we would be fundamentally changing how whisper works, basically say you don't even use a bloom filter, but you pass just a list of topics (provides no darkness, but best bandwidth), you still have an issue with the shared topic (currently each user is assigned to a random bucket based on the pk, n = 5000).

We also have a personal topic, that can be used instead of the partitioned, which is the user pk, but at that point any darkness is gone, so makes little sense to use whisper.

I think we need to understand a bit better what's the consumption is coming from, is it coming from extra messages that you don't care about? Is it coming from the fact that you receive multiple copies of each message? or is it just whisper overhead? etc

Once we understand better the dynamics we can see what we can do and where is best to optimize imo

@corpetty

This comment has been minimized.

Copy link
Contributor

@corpetty corpetty commented Oct 8, 2019

thoroughly agreed.

@adambabik

This comment has been minimized.

Copy link
Member

@adambabik adambabik commented Oct 10, 2019

I think we need to understand a bit better what's the consumption is coming from, is it coming from extra messages that you don't care about? Is it coming from the fact that you receive multiple copies of each message? or is it just whisper overhead? etc

We have quite granular Whisper metrics that can answer most of these questions: https://github.com/status-im/whisper/blob/master/whisperv6/metrics.go. For example, we have envelopeAddedCounter and envelopeNewAddedCounter which difference can tell how many duplicates we receive. Or envelopeErrNoBloomMatchCounter which tells about the number of messages not matching bloom filter.

What we would need to do is to expose them in the app because as far as I know they are used exclusively for statusd running on our servers.

Many open source projects, like Firefox, collects stats and sends them to the centralized servers only if a user agrees to do so. Maybe we can have a similar strategy? It should be opt-in of course.

@jakubgs

This comment has been minimized.

Copy link
Member

@jakubgs jakubgs commented Oct 10, 2019

We had a short meeting today about the badwidth testing and I've noted down some things:

Knowledege

  • The status-protocol-go-bandwidth-test by Andrea can already do some basic tests
    • Runs a separate processes in Docker
    • Can control duration, numbers of messages, numbers of peers
  • What we want is a way to run these kinds of tests and generate reports
  • We want an ability to compare those results across time to look for improvement or regressions
  • We are not including traffic to and from mailservers because that part is Clojure-only for now
  • Making traffic realisitic could be nice, but the real thing we care about is bandwith used

Tasks

  • Add more granular metrics for status-protocol-go and whisper to measure:
    • Message about drop rate & delivery success
    • Noise from miss-deliveries
    • Ratios of 1-to-1, public, and private group messages
  • Find a reporting format, preferably fed to Prometheus
    • This will require lower cardinality, so aggregating message metrics
  • Run tests for various volumes to check complexity
  • Run tests periodically to measure improvements/regressions

I will start working on those probably next week, as I have to finish some other stuff.

@adambabik

This comment has been minimized.

Copy link
Member

@adambabik adambabik commented Oct 10, 2019

Find a reporting format, preferably fed to Prometheus

Not sure I would recommend pull-based tools for load testing. Unless these load tests will be fairly long. Also, Prometheus due to getting data periodically can miss some fluctuations which might be interesting for us. Maybe writing to InfluxDB? Having all data points can be also an advantage.

@jakubgs

This comment has been minimized.

Copy link
Member

@jakubgs jakubgs commented Oct 10, 2019

I did consider InfluxDB too, we can see what works better. I'd agree that a push rather than pull scheme would work better for benchmarks.

@oskarth

This comment has been minimized.

Copy link
Member Author

@oskarth oskarth commented Oct 18, 2019

Discuss post: https://discuss.status.im/t/fixing-whisper-for-great-profit/1419
Theoretical model numbers: https://htmlpreview.github.io/?https://github.com/vacp2p/research/blob/master/whisper_scalability/report.html
Waku mode draft: status-im/specs#54

@jakubgs any luck with above?

Also it'd be great if we can figure out where other traffic might be coming from, i.e. things that aren't captured by above model. For example, I remember some benchmark saying we spend 20% of traffic on Infura, which seems insane but makes sense given lack of transaction indexing (?). This means it is might become the bottleneck with Waku mode in place, which would hint at using attacking the indexing problem, e.g. with something algorithmic like @yenda was working on, or indexing a la thegraph that @bgits suggested

@yenda

This comment has been minimized.

Copy link
Member

@yenda yenda commented Oct 18, 2019

@oskarth on a new account there would only be a handful of calls for infura afaik. The heavy stuff is only when there is transactions to recover

@jakubgs

This comment has been minimized.

Copy link
Member

@jakubgs jakubgs commented Oct 24, 2019

Here's an update on the current state of my work on this:

Metrics

Add more granular metrics for status-protocol-go and whisper to measure

Currently I'm not sure how to fix the version issue, it should be fixed in status-go, but to figure out how to do that correctly I'll have to talk to Adam.

Storage

Find a reporting format, preferably fed to Prometheus

  • status-im/infra-ci#10 - Researching use of InfluxDB for benchmarking metrics
    • If we use it we'd need some kind of abstraction layer above both InfluxDB and Prometheus

I also looked at pushgateway for Prometheus as an alternative, but that still is dependent on Prometheus pull rate/interval and would not represent the real time creation of the metrics generated by the benchmark.

Orchestration

Run tests for various volumes to check complexity
Run tests periodically to measure improvements/regressions

After investigating the status-protocol-go-bandwidth-test package by Andrea I don't think there's anything wrong with his simple approach of just spawning the processes with his run.sh script. Though it might be a bit nicer if we used something like Supervisord using the numprocs setting or systemd using instantiated services to orchestrate multiple processes in a more manageable way.

@jakubgs

This comment has been minimized.

Copy link
Member

@jakubgs jakubgs commented Nov 6, 2019

According to Adam the best way to collect these metrics would be to subscribe to the envelopeFeed:
https://github.com/status-im/whisper/blob/39d4d0a14f/whisperv6/whisper.go#L178-L182
And listen for the EventEnvelopeReceived event:
https://github.com/status-im/whisper/blob/39d4d0a14f/whisperv6/events.go#L19
Which would allow me to collect envelope metrics(size, numbers) in InfluxDB without having to modify the whisper repo itself.

@jakubgs

This comment has been minimized.

Copy link
Member

@jakubgs jakubgs commented Nov 19, 2019

I've added a Topic attribute to Envelope in status-im/whisper#38.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.