Skip to content

Conversation

TheBlueMatt
Copy link
Collaborator

During startup, the lightning protocol forces us to fetch a ton of
gossip for channels where there is a channel_update in only one
direction. We then have to wait around a while until we can prune
the crap cause we don't know when the gossip sync has completed.

Sadly, doing a large prune via remove_stale_channels_and_tracking
is somewhat slow. Removing a large portion of our graph currently
takes a bit more than 7.5 seconds on an i9-14900K, which can
ultimately ~hang a node with a few less GHz ~forever.

The bulk of this time is in our IndexedMap removals, where we
walk the entire keys Vec to remove the entry, then shift it
down after removing.

Here we shift to a bulk removal model when removing channels, doing
a single Vec iterate + shift. This reduces the same test to
around 340 milliseconds on the same hardware.

Fixes #4070

During startup, the lightning protocol forces us to fetch a ton of
gossip for channels where there is a `channel_update` in only one
direction. We then have to wait around a while until we can prune
the crap cause we don't know when the gossip sync has completed.

Sadly, doing a large prune via `remove_stale_channels_and_tracking`
is somewhat slow. Removing a large portion of our graph currently
takes a bit more than 7.5 seconds on an i9-14900K, which can
ultimately ~hang a node with a few less GHz ~forever.

The bulk of this time is in our `IndexedMap` removals, where we
walk the entire `keys` `Vec` to remove the entry, then shift it
down after removing.

Here we shift to a bulk removal model when removing channels, doing
a single `Vec` iterate + shift. This reduces the same test to
around 1.38 seconds on the same hardware.
@TheBlueMatt TheBlueMatt added this to the 0.2 milestone Sep 18, 2025
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Sep 18, 2025

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

During startup, the lightning protocol forces us to fetch a ton of
gossip for channels where there is a `channel_update` in only one
direction. We then have to wait around a while until we can prune
the crap cause we don't know when the gossip sync has completed.

Sadly, doing a large prune via `remove_stale_channels_and_tracking`
is somewhat slow. Removing a large portion of our graph currently
takes a bit more than 7.5 seconds on an i9-14900K, which can
ultimately ~hang a node with a few less GHz ~forever.

The bulk of this time is in our `IndexedMap` removals, where we
walk the entire `keys` `Vec` to remove the entry, then shift it
down after removing.

In the previous commit we shifted to a bulk removal model for
channels, here we do the same for nodes. This reduces the same test
to around 340 milliseconds on the same hardware.
Copy link

codecov bot commented Sep 18, 2025

Codecov Report

❌ Patch coverage is 97.43590% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.62%. Comparing base (c71334e) to head (28b526a).
⚠️ Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/util/indexed_map.rs 94.44% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4080      +/-   ##
==========================================
- Coverage   88.63%   88.62%   -0.01%     
==========================================
  Files         176      176              
  Lines      131920   132146     +226     
  Branches   131920   132146     +226     
==========================================
+ Hits       116927   117116     +189     
- Misses      12325    12359      +34     
- Partials     2668     2671       +3     
Flag Coverage Δ
fuzzing 21.57% <0.00%> (-0.03%) ⬇️
tests 88.46% <97.43%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tnull tnull self-requested a review September 18, 2025 06:06
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could also get backported?

@TheBlueMatt
Copy link
Collaborator Author

We could, I hadn't considered it much cause it changes the public API, but it only adds a few methods on IndexedMap so I'll tag it.

Comment on lines +2394 to +2395
self.removed_node_counters.lock().unwrap().reserve(channels_removed_bulk.len());
let mut nodes_to_remove = hash_set_with_capacity(channels_removed_bulk.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the point of these is to avoid reallocations, should we allocate 2x the channels ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. Its probably fine, though - we're very unlikely to remove two nodes for every channel we remove (implying all of our removals are nodes that have a single channel to another node that also only has a single channel, ie its all pairs of nodes that don't connect to the rest of the network). If we assume any kind of connectivity even 1-per-node is way overkill, and ultimately the cost of a single additional allocation + memmove isn't all that high.

@TheBlueMatt TheBlueMatt merged commit 50391d3 into lightningdevkit:main Sep 18, 2025
25 checks passed
@TheBlueMatt
Copy link
Collaborator Author

Backported in #4143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gossip: bloated network graph, slow to prune
4 participants