-
Notifications
You must be signed in to change notification settings - Fork 421
Speed up remove_stale_channels_and_tracking nontrivially #4080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up remove_stale_channels_and_tracking nontrivially #4080
Conversation
During startup, the lightning protocol forces us to fetch a ton of gossip for channels where there is a `channel_update` in only one direction. We then have to wait around a while until we can prune the crap cause we don't know when the gossip sync has completed. Sadly, doing a large prune via `remove_stale_channels_and_tracking` is somewhat slow. Removing a large portion of our graph currently takes a bit more than 7.5 seconds on an i9-14900K, which can ultimately ~hang a node with a few less GHz ~forever. The bulk of this time is in our `IndexedMap` removals, where we walk the entire `keys` `Vec` to remove the entry, then shift it down after removing. Here we shift to a bulk removal model when removing channels, doing a single `Vec` iterate + shift. This reduces the same test to around 1.38 seconds on the same hardware.
👋 Thanks for assigning @tnull as a reviewer! |
fbb4cf3
to
f1c166b
Compare
During startup, the lightning protocol forces us to fetch a ton of gossip for channels where there is a `channel_update` in only one direction. We then have to wait around a while until we can prune the crap cause we don't know when the gossip sync has completed. Sadly, doing a large prune via `remove_stale_channels_and_tracking` is somewhat slow. Removing a large portion of our graph currently takes a bit more than 7.5 seconds on an i9-14900K, which can ultimately ~hang a node with a few less GHz ~forever. The bulk of this time is in our `IndexedMap` removals, where we walk the entire `keys` `Vec` to remove the entry, then shift it down after removing. In the previous commit we shifted to a bulk removal model for channels, here we do the same for nodes. This reduces the same test to around 340 milliseconds on the same hardware.
f1c166b
to
28b526a
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4080 +/- ##
==========================================
- Coverage 88.63% 88.62% -0.01%
==========================================
Files 176 176
Lines 131920 132146 +226
Branches 131920 132146 +226
==========================================
+ Hits 116927 117116 +189
- Misses 12325 12359 +34
- Partials 2668 2671 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could also get backported?
We could, I hadn't considered it much cause it changes the public API, but it only adds a few methods on |
self.removed_node_counters.lock().unwrap().reserve(channels_removed_bulk.len()); | ||
let mut nodes_to_remove = hash_set_with_capacity(channels_removed_bulk.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the point of these is to avoid reallocations, should we allocate 2x the channels ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. Its probably fine, though - we're very unlikely to remove two nodes for every channel we remove (implying all of our removals are nodes that have a single channel to another node that also only has a single channel, ie its all pairs of nodes that don't connect to the rest of the network). If we assume any kind of connectivity even 1-per-node is way overkill, and ultimately the cost of a single additional allocation + memmove isn't all that high.
Backported in #4143 |
During startup, the lightning protocol forces us to fetch a ton of
gossip for channels where there is a
channel_update
in only onedirection. We then have to wait around a while until we can prune
the crap cause we don't know when the gossip sync has completed.
Sadly, doing a large prune via
remove_stale_channels_and_tracking
is somewhat slow. Removing a large portion of our graph currently
takes a bit more than 7.5 seconds on an i9-14900K, which can
ultimately ~hang a node with a few less GHz ~forever.
The bulk of this time is in our
IndexedMap
removals, where wewalk the entire
keys
Vec
to remove the entry, then shift itdown after removing.
Here we shift to a bulk removal model when removing channels, doing
a single
Vec
iterate + shift. This reduces the same test toaround 340 milliseconds on the same hardware.
Fixes #4070