Speed up remove_stale_channels_and_tracking nontrivially #4080

TheBlueMatt · 2025-09-18T00:34:36Z

During startup, the lightning protocol forces us to fetch a ton of
gossip for channels where there is a channel_update in only one
direction. We then have to wait around a while until we can prune
the crap cause we don't know when the gossip sync has completed.

Sadly, doing a large prune via remove_stale_channels_and_tracking
is somewhat slow. Removing a large portion of our graph currently
takes a bit more than 7.5 seconds on an i9-14900K, which can
ultimately ~hang a node with a few less GHz ~forever.

The bulk of this time is in our IndexedMap removals, where we
walk the entire keys Vec to remove the entry, then shift it
down after removing.

Here we shift to a bulk removal model when removing channels, doing
a single Vec iterate + shift. This reduces the same test to
around 340 milliseconds on the same hardware.

Fixes #4070

During startup, the lightning protocol forces us to fetch a ton of gossip for channels where there is a `channel_update` in only one direction. We then have to wait around a while until we can prune the crap cause we don't know when the gossip sync has completed. Sadly, doing a large prune via `remove_stale_channels_and_tracking` is somewhat slow. Removing a large portion of our graph currently takes a bit more than 7.5 seconds on an i9-14900K, which can ultimately ~hang a node with a few less GHz ~forever. The bulk of this time is in our `IndexedMap` removals, where we walk the entire `keys` `Vec` to remove the entry, then shift it down after removing. Here we shift to a bulk removal model when removing channels, doing a single `Vec` iterate + shift. This reduces the same test to around 1.38 seconds on the same hardware.

ldk-reviews-bot · 2025-09-18T00:34:38Z

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

During startup, the lightning protocol forces us to fetch a ton of gossip for channels where there is a `channel_update` in only one direction. We then have to wait around a while until we can prune the crap cause we don't know when the gossip sync has completed. Sadly, doing a large prune via `remove_stale_channels_and_tracking` is somewhat slow. Removing a large portion of our graph currently takes a bit more than 7.5 seconds on an i9-14900K, which can ultimately ~hang a node with a few less GHz ~forever. The bulk of this time is in our `IndexedMap` removals, where we walk the entire `keys` `Vec` to remove the entry, then shift it down after removing. In the previous commit we shifted to a bulk removal model for channels, here we do the same for nodes. This reduces the same test to around 340 milliseconds on the same hardware.

codecov · 2025-09-18T02:05:40Z

Codecov Report

❌ Patch coverage is 97.43590% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.62%. Comparing base (c71334e) to head (28b526a).
⚠️ Report is 24 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/util/indexed_map.rs	94.44%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4080      +/-   ##
==========================================
- Coverage   88.63%   88.62%   -0.01%     
==========================================
  Files         176      176              
  Lines      131920   132146     +226     
  Branches   131920   132146     +226     
==========================================
+ Hits       116927   117116     +189     
- Misses      12325    12359      +34     
- Partials     2668     2671       +3

Flag	Coverage Δ
fuzzing	`21.57% <0.00%> (-0.03%)`	⬇️
tests	`88.46% <97.43%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tnull

LGTM

tnull

Maybe this could also get backported?

TheBlueMatt · 2025-09-18T11:05:44Z

We could, I hadn't considered it much cause it changes the public API, but it only adds a few methods on IndexedMap so I'll tag it.

tankyleo · 2025-09-18T15:45:13Z

lightning/src/routing/gossip.rs

+			self.removed_node_counters.lock().unwrap().reserve(channels_removed_bulk.len());
+			let mut nodes_to_remove = hash_set_with_capacity(channels_removed_bulk.len());


If the point of these is to avoid reallocations, should we allocate 2x the channels ?

Hmm, good point. Its probably fine, though - we're very unlikely to remove two nodes for every channel we remove (implying all of our removals are nodes that have a single channel to another node that also only has a single channel, ie its all pairs of nodes that don't connect to the rest of the network). If we assume any kind of connectivity even 1-per-node is way overkill, and ultimately the cost of a single additional allocation + memmove isn't all that high.

TheBlueMatt · 2025-10-06T15:21:11Z

Backported in #4143

TheBlueMatt added this to the 0.2 milestone Sep 18, 2025

TheBlueMatt mentioned this pull request Sep 18, 2025

gossip: bloated network graph, slow to prune #4070

Closed

ldk-reviews-bot requested a review from tankyleo September 18, 2025 00:44

TheBlueMatt force-pushed the 2025-09-faster-prune branch from fbb4cf3 to f1c166b Compare September 18, 2025 01:39

TheBlueMatt force-pushed the 2025-09-faster-prune branch from f1c166b to 28b526a Compare September 18, 2025 01:43

tnull self-requested a review September 18, 2025 06:06

tnull approved these changes Sep 18, 2025

View reviewed changes

tnull reviewed Sep 18, 2025

View reviewed changes

TheBlueMatt added the backport 0.1 label Sep 18, 2025

tankyleo reviewed Sep 18, 2025

View reviewed changes

tankyleo approved these changes Sep 18, 2025

View reviewed changes

TheBlueMatt merged commit 50391d3 into lightningdevkit:main Sep 18, 2025
25 checks passed

TheBlueMatt mentioned this pull request Oct 6, 2025

0.1.6 Initial backports #4143

Open

TheBlueMatt removed the backport 0.1 label Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up remove_stale_channels_and_tracking nontrivially #4080

Speed up remove_stale_channels_and_tracking nontrivially #4080

TheBlueMatt commented Sep 18, 2025

Uh oh!

ldk-reviews-bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

tnull left a comment

Uh oh!

tnull left a comment

Uh oh!

TheBlueMatt commented Sep 18, 2025

Uh oh!

tankyleo Sep 18, 2025

Uh oh!

TheBlueMatt Sep 18, 2025

Uh oh!

Uh oh!

TheBlueMatt commented Oct 6, 2025

Uh oh!

Uh oh!

		self.removed_node_counters.lock().unwrap().reserve(channels_removed_bulk.len());
		let mut nodes_to_remove = hash_set_with_capacity(channels_removed_bulk.len());

Speed up remove_stale_channels_and_tracking nontrivially #4080

Speed up remove_stale_channels_and_tracking nontrivially #4080

Conversation

TheBlueMatt commented Sep 18, 2025

Uh oh!

ldk-reviews-bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Sep 18, 2025

Uh oh!

tankyleo Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheBlueMatt commented Oct 6, 2025

Uh oh!

Uh oh!

ldk-reviews-bot commented Sep 18, 2025 •

edited

Loading

codecov bot commented Sep 18, 2025 •

edited

Loading