Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(discv5): populate kbuckets & improved RLPx peering #7683

Merged
merged 10 commits into from Apr 19, 2024

Conversation

emhane
Copy link
Member

@emhane emhane commented Apr 16, 2024

Improves bootstrapping by starting to fill kbuckets at furthest log2distance from local node id, as these are easier to fill.

@emhane emhane added C-perf A change motivated by improving speed, memory usage or disk footprint A-discv5 Related to discv5 discovery labels Apr 16, 2024
@emhane emhane requested a review from joshieDo April 16, 2024 16:23
@emhane emhane changed the title feat(discv5): perf(discv5): bootstrap Apr 16, 2024
@emhane
Copy link
Member Author

emhane commented Apr 17, 2024

closed in favour of #7695

@emhane emhane closed this Apr 17, 2024
@emhane emhane deleted the emhane/rev-lookup-tgt branch April 17, 2024 17:53
@emhane emhane restored the emhane/rev-lookup-tgt branch April 17, 2024 18:23
@emhane
Copy link
Member Author

emhane commented Apr 17, 2024

will still be useful to reverse this, to iteratively get closer to own node id

@emhane emhane reopened this Apr 17, 2024
Copy link
Member

@Rjected Rjected left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that this code is supposed to get discv5 to return a specific distance internally since it uses self.local_key.distance(target)? But instead of doing it for all log2 up to 255, we do it for half?

@emhane
Copy link
Member Author

emhane commented Apr 18, 2024

Am I understanding correctly that this code is supposed to get discv5 to return a specific distance internally since it uses self.local_key.distance(target)? But instead of doing it for all log2 up to 255, we do it for half?

exactly. it's unlikely to find peers for the closest buckets, 0-127. I think we should eventually extend it to cover all buckets though, if not already do that now tbh.

@emhane emhane changed the title perf(discv5): bootstrap perf(discv5): populate kbuckets Apr 18, 2024
@emhane emhane requested a review from Rjected April 18, 2024 16:40
@emhane emhane changed the title perf(discv5): populate kbuckets perf(discv5): populate kbuckets & improved RLPx peering Apr 19, 2024
@emhane
Copy link
Member Author

emhane commented Apr 19, 2024

Metrics on outgoing RLPx connections back the hypothesis that pseudorandom FINDNODE target selection, explicitly targeting each kbucket, is more beneficial over time than a completely random target in lookup queries that aim to discover RLPx peers. Outgoing RLPx connections are important for nodes behind NAT to be successful, as @joshieDo has been strongly advocating for this week.

This pr was run between 16.04 21:30 - 17.04 12:30, although targeting all kbuckets, not only the top half (half furthest away from local node id as latest commit decreased it to in fear of too much overhead targeting all). Between 17.04 19:30 - 18.04 19:30, the node was ran with target generated by let tgt = NodeId::random(), which under the hood is let tgt: [u8; 32] = rand::random(). Both runs started with empty known_peers.json file (RLPx peers saved from prev run), and on new non-default ports with --instance flag.

This panel shows the number of successfully established outgoing RLPx connections over time.
Screenshot 2024-04-19 at 12 36 37
Pseudorandom target selection, targeting each kbukcet index equal many times, outperforms random target selection. The former discovers equally many useful RLPx peers as the latter in almost half the time (15 hours as opposed to 24 hours).

Furthermore, pseudorandom target selection targeting each kbukcet is also cheaper than random target selection, as it leads to less peer churn.
Screenshot 2024-04-19 at 12 40 11
This metric doesn't account for unique peers, it's hence likely that random target selection has higher peer churn than the given pseudorandom target selection as it re-establishes sessions with recently evicted peers (sigp/discv5 library stores sessions in an LRU cache). This assumption is strengthened by looking at the first panel, which shows that the increased peer churn isn't leading to more outgoing RLPx connections.

Based on these metrics, I will increase the targeted kbucket range in the lookup query to stretch over the whole kbucket range, since peer churn is less than the completely random target selection commonly used.

Copy link
Member

@Rjected Rjected left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the additional docs and explanation with metrics!

@emhane
Copy link
Member Author

emhane commented Apr 19, 2024

lgtm, thanks for the additional docs and explanation with metrics!

kinda got the "metrics or go home" mentality from you guys

@emhane emhane added this pull request to the merge queue Apr 19, 2024
Merged via the queue into main with commit 20568b8 Apr 19, 2024
27 checks passed
@emhane emhane deleted the emhane/rev-lookup-tgt branch April 19, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-discv5 Related to discv5 discovery C-perf A change motivated by improving speed, memory usage or disk footprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants