Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: improve peers performance #1262

Merged
merged 19 commits into from Oct 24, 2019
Merged

Conversation

hacdias
Copy link
Member

@hacdias hacdias commented Oct 20, 2019

This refactors the peers locations fetching/updating new locations every three one seconds so any changes will only be seen after three one seconds. This reduced dramatically the CPU usage on my end and still keeps the functionality.

As it was stated by @olizilla on #1072:

We do a lot of work in peer-locations.js to expose the internal logic of the queuing of geoip look-ups as state in redux. We are not visualising the queue, nor do we plan to, so i suspect the code could be written more simply. We could extend the work in #1071 to encapsulate the geoip look up queue and caching in a class, and simply have it periodically update the redux state with batches of locations as they are found, rather than have all the queuing logic managed in redux.

@lidel @autonome, can you take a look at this one, please?


closes #1273

License: MIT
Signed-off-by: Henrique Dias <hacdias@gmail.com>
License: MIT
Signed-off-by: Henrique Dias <hacdias@gmail.com>
License: MIT
Signed-off-by: Henrique Dias <hacdias@gmail.com>
@hacdias
Copy link
Member Author

hacdias commented Oct 20, 2019

I still need to fix the tests. However, I don't think that'll be an easy task taking in account I've changed this a lot. Perhaps I'll focus my tests on the PeerLocationResolver class.

License: MIT
Signed-off-by: Henrique Dias <hacdias@gmail.com>
Copy link
Member

@olizilla olizilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it and works well!

Some comments inline to make thing a little more comprehendable for the reader.

As an stretch goal, it'd be lovely to animate the addition of nodes to the map seperately from discovering their location, so that we could show nodes continually popping in and fading out of the map all the time, rather than flashing changes on 3s intervals.

src/bundles/peer-locations.js Outdated Show resolved Hide resolved
src/bundles/peer-locations.js Outdated Show resolved Hide resolved
src/bundles/peer-locations.js Outdated Show resolved Hide resolved
@@ -300,44 +148,65 @@ class PeerLocationResolver {
...opts.cache
})

this.peersToAddrs = {}
this.failedAddrs = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failedAddrs and peersToAddrs are both unbounded caches... if we need them consider using an LRU cache with a max size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just storing them to avoid calling the same lookup over and over despite of its failure. I've changed the max size to 500, where we delete the first 100 when it gets full. I think it's not worth it to implement a full LRU cache, but it follows the principle.

License: MIT
Signed-off-by: Henrique Dias <hacdias@gmail.com>
License: MIT
Signed-off-by: Henrique Dias <hacdias@gmail.com>
@hacdias
Copy link
Member Author

hacdias commented Oct 21, 2019

@olizilla just made a few updates to fix and improve the things you've mentioned. I'll see what I can do about the animations in a different PR.

@autonome
Copy link

This is awesome, thanks so much digging further into the root cause of the resource consumption.

Is there a specific reason why three seconds is the default?

@hacdias
Copy link
Member Author

hacdias commented Oct 21, 2019

uh, I said 3? I set it to 1. Not a specific reason, but it's much smoother now...

@autonome
Copy link

It was originally three and you changed to one? I was thinking the other way... can we go to 5s or higher ;) If we had a list of specific reasons why people need to know, it would be easier to reason about instead of picking arbitrary numbers.

But as long as not burning laptop battery, 1s is fine for now!

const unique = new Set(allCoord.map(JSON.stringify))
return Array.from(unique).map(JSON.parse)
staleAfter: ms.seconds(1),
retryAfter: ms.seconds(1),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have the interval as a const at top of file, and documented

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added!

License: MIT
Signed-off-by: Henrique Dias <hacdias@gmail.com>
@hacdias
Copy link
Member Author

hacdias commented Oct 22, 2019

Basically the difference will only make us see the changes later and add more requests to queue later. I don't believe it influences the amount of rendering that much, but not sure. Either way, it's a simple change.

@lidel
Copy link
Member

lidel commented Oct 22, 2019

IIUC we are fetching info for 10 locations at the same time:

  // Max number of locations to retrieve concurrently
  opts.concurrency = opts.concurrency || 10

@hacdias do you remember if we picked this number for any particular reason?
I'd like to decrease this to 4 to be safely below current XHR/HTTP 1.1 concurrency limits in browsers (iirc its around 6 concurrent requests per the same host in Chromium).

@hacdias
Copy link
Member Author

hacdias commented Oct 22, 2019

@lidel the default is 10 but we are fetching 20! It was changed on #1071

@hacdias
Copy link
Member Author

hacdias commented Oct 22, 2019

Green light to merge?

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidel the default is 10 but we are fetching 20! It was changed on #1071

Hm... I don't think this is capable of doing 20 🙃

To be specific, queue concurrency can be at 20, but fetch calls to HTTP API are throttled to max 4-6 at a time by the browser itself, and fetch calls over that limit are put in "pending" state by the browser until existing ones are finished.

When you load Peers screen for the first time geoip cache gets populated fast because its fetched from localhost, but you can observe how browser is freezing requests into "pending" state when network speed is artificially slowed down (scroll down):

2019-10-22--22-35-56

I propose to change the queue concurrency to 4 as a part of this PR: afaik there is no gain from anything higher than that, and by limiting it to 4 and not 6 we give some breathing room for requests other than get. IT would also help with #882.

+ two asks to simplify code below

package.json Outdated Show resolved Hide resolved
src/bundles/peer-locations.js Outdated Show resolved Hide resolved
@hacdias
Copy link
Member Author

hacdias commented Oct 22, 2019

@lidel done!

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I noticed potential regression of the problem described in #887.

In short, geolocation of peers should be limited to the Peers screen. Geoip data should be downloaded only if user is on the Peers screen, but right now it gets fetched even when on Status screen:

incredible load when Web UI is opened

First load of Status screen in this PR:

Screenshot_2019-10-23 67339501-a3ae0380-f52b-11e9-8d2e-93c4f83d7612 png (PNG Image, 1702 × 1349 pixels)

First load of Status screen in Web UI v2.5.7:

old-webui-2019-10-23--00-23-58

I did not dig into this further, but we need to address this before merging.

Ps. I am testing the screen with ~4k peers.
This may not be visible with regular setup. FYI my node's settings:

		"ConnMgr": {
			"GracePeriod": "3m",
			"HighWater": 5000,
			"LowWater": 4000,
			"Type": "basic"
		},

src/bundles/peer-locations.js Outdated Show resolved Hide resolved
License: MIT
Signed-off-by: Henrique Dias <hacdias@gmail.com>
@hacdias
Copy link
Member Author

hacdias commented Oct 23, 2019

Both of those are now fixed @lidel!

@lidel
Copy link
Member

lidel commented Oct 23, 2019

Hm, for some reason map and location column are not updated now.
@hacdias are you able to reproduce? (I got the same result in Firefox and Brave)

@hacdias
Copy link
Member Author

hacdias commented Oct 23, 2019

Can't reproduce on Firefox Dev Edition, neither on Chrome.

@hacdias
Copy link
Member Author

hacdias commented Oct 23, 2019

@lidel any update on this?

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was trying to figure out what is going wrong around 4000 peers.
Added small mitigation in ab83def (longer brain dump below, in case its useful)
Need to look into other things, but can circle back to this if needed.
Apart from two inline questions, lgtm.

Notes on performance with >4000 peers

CPU is still high (when run with thousands of peers)

Until geoip cache is populated, the CPU load remains high at all times (only on Peers tab):

2019-10-23--20-07-48

Another thing is that it can take long time between executions of findLocations. Sometimes 4 minutes, sometimes 30 seconds. Is this a slowdown due to CPU usage, or effect of some trigger logic? Are we running it only when peer list changed?

findLocations pauses around geoipCache.get

In my case, at >4000 peers, the initial call to findLocations (empty geoipCache) in Brave took 165442ms to finish (over 2 minutes!). That is why I did not see any dots on the map. Those appear later, after subsequent findLocations calls return values from cache, so it took over 2 minutes for me to see anything during the initial load.

If I replace geoipCache with in-memory Map, the time of processing 4k peers in findLocations cuts down to under 4 seconds. Seems that reading and writing concurrently to the cache backed by idb-keyval causes periodical slowdown around await this.geoipCache.get(ipv4Addr) every 20-30 peers.

Processing single peer should take around 1ms, but every 20-30th peer stops for between 15 to over 20 seconds. At more than a thousand peers, this compounds into significant delay.

On the UX I mitigated the slowness of initial update with thousands of peers in ab83def

src/bundles/peer-locations.js Show resolved Hide resolved
This change improves performance of initial load
of Peers screen when there are thousands of peers.

During the first three times PeerLocationResolver.findLocations is called
we sort peers by latency and resolve geoip only for the closest subset.

This ensures the top of the list in the UI gets updated fast while
thousands of more distant peers gets resolved later.

closes #1273
Co-Authored-By: Marcin Rataj <lidel@lidel.org>
@lidel
Copy link
Member

lidel commented Oct 24, 2019

Oops, broke tests, will fix shortly

Updated tests in 275553b, lint fixed in b08f3b5

Operate on a copy of peers and check if this.pass exists
before executing pass-based optimizations
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this is valuable improvement already. LGTM from me.
CPU performance in scenarios with thousands of peers can be tackled in separate PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Peers: prioritize geoip lookup of N peers with lowest Latency
4 participants