Restore `lazy` flag to `KVStore::remove` #4189

TheBlueMatt · 2025-10-30T00:21:05Z

A user pointed out, when looking to upgrade to LDK 0.2, that the
lazy flag is actually quite important for performance when using
a MonitorUpdatingPersister, especially in synchronous persistence
mode.

Thus, we add it back here.

ldk-reviews-bot · 2025-10-30T00:21:08Z

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

codecov · 2025-10-30T00:30:48Z

Codecov Report

❌ Patch coverage is 56.25000% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.84%. Comparing base (02a9af9) to head (0f9548b).
⚠️ Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning-persister/src/fs_store.rs	52.38%	5 Missing and 5 partials ⚠️
lightning/src/util/persist.rs	68.75%	3 Missing and 2 partials ⚠️
lightning-background-processor/src/lib.rs	0.00%	2 Missing ⚠️
lightning/src/util/test_utils.rs	60.00%	2 Missing ⚠️
lightning-liquidity/src/lsps2/service.rs	0.00%	1 Missing ⚠️
lightning-liquidity/src/lsps5/service.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4189      +/-   ##
==========================================
- Coverage   88.87%   88.84%   -0.04%     
==========================================
  Files         180      180              
  Lines      137863   137870       +7     
  Branches   137863   137870       +7     
==========================================
- Hits       122522   122485      -37     
- Misses      12532    12573      +41     
- Partials     2809     2812       +3

Flag	Coverage Δ
fuzzing	`21.44% <0.00%> (+0.58%)`	⬆️
tests	`88.68% <56.25%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tnull · 2025-10-30T08:59:36Z

lightning/src/util/persist.rs

 	/// potentially get lost on crash after the method returns. Therefore, this flag should only be
 	/// set for `remove` operations that can be safely replayed at a later time.
 	///
+	/// All removal operations must complete in a consistent total order with [`Self::write`]s


I'm still not sure if this would even work. For example in FilesystemStore, if we simply call remove and leave the decision on when to sync the changes to disk to the OS, how could we be certain that the ordering is preserved? IIRC we basically concluded this can't be guaranteed, especially since different guarantees on different platforms might vary?

To avoid this, maybe it is possible to constrain lazy removes to keys that won't ever be written again in the future?

I've read the context of this PR now, and it seems the perf issue was around monitor updates. I think those are never re-written?

I'm still not sure if this would even work. For example in FilesystemStore, if we simply call remove and leave the decision on when to sync the changes to disk to the OS, how could we be certain that the ordering is preserved?

Filesystems provide an order, the only thing they dont provide without an fsync is any kinds of guarantee its on disk. I don't think this is a problem.

Although, given we use rename for write, I do wonder if the unlink would simply get lost here as it would apply to the original file that is dropped already anyways?

Yes, that is how it should work on any reasonable filesystem. In theory its possible for some filesystems to fail the write rename part because the file still exists, but that's unrelated to the remove, that's just the read+write being at the same time.

Filesystems provide an order, the only thing they dont provide without an fsync is any kinds of guarantee its on disk. I don't think this is a problem.

A file that should have been removed, but is still there. Is that not a pb?

That's the explicit point of the lazy flag - it allows a store to not guarantee that the entry will be removed if there's a crash/ill-timed restart.

Discussed this offline, man 3p unlink had me convinced this is safe to do on POSIX.

ldk-reviews-bot · 2025-10-30T08:59:40Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

joostjager

Are there more details on why/when the lazy flag is important?

TheBlueMatt · 2025-10-30T12:13:28Z

In this case its important for perf when removing a large number of monitor updates on an fsstore (requiring fsync for each can add up rather substantially eg if we're removing 1k mon updates), but thinking about it more I think its also important for the same case in the async design - if you have a KVStore that handles ordering (eg like the locks in the fsstore/vss store) then the lazy flag allows you to spawn-and-forget a removal, rather than the callsite having to "block" waiting on your removal to finish.

joostjager · 2025-10-30T12:20:42Z

1k updates is a lot. Do you think it adds much over let's say 50 updates? Maybe that also makes the perf problem go away without lazy flag...

TheBlueMatt · 2025-10-30T12:24:43Z

1k updates seems entirely reasonable for a node doing lots of forwarding. ChannelMonitors can easily be a few thousand times larger than ChannelMonitorUpdates, so wanting to amortize over more ChannelMonitorUpdates seems very reasonable (the startup cost of more ChannelMonitorUpdates is pretty low, or at least is if your KVStore read latency is low or once we load them in parallel).

Not being able to pick a reasonable update count just because of an API limitation in how we do removals seems like a pretty weird limitation, no?

joostjager · 2025-10-30T12:42:54Z

Not being able to pick a reasonable update count just because of an API limitation in how we do removals seems like a pretty weird limitation, no?

The question was just whether 1000 is reasonable, and you made it clear that it is 👍

This reverts commit 561da4c. A user pointed out, when looking to upgrade to LDK 0.2, that the `lazy` flag is actually quite important for performance when using a `MonitorUpdatingPersister`, especially in synchronous persistence mode. Thus, we add it back here. Fixes lightningdevkit#4188

In the previous commit we reverted 561da4c. One of the motivations for it (in addition to `lazy` removals being somewhat less, though still arguably useful in an async context) was that the ordering requirements of `lazy` removals is somewhat unclear. Here we simply default to the simplest safe option, requiring a total order across all `write` and `remove` operations to the same key, `lazy` or not.

TheBlueMatt · 2025-10-30T13:18:13Z

Fixed rustfmt

$ git diff-tree -U1 9973d780a 0f9548bf8
diff --git a/lightning/src/util/persist.rs b/lightning/src/util/persist.rs
index 78fdba2113..5d34603c96 100644
--- a/lightning/src/util/persist.rs
+++ b/lightning/src/util/persist.rs
@@ -1084,4 +1084,3 @@ where
 				let latest_update_id = current_monitor.get_latest_update_id();
-				self
-					.cleanup_stale_updates_for_monitor_to(&monitor_key, latest_update_id, lazy)
+				self.cleanup_stale_updates_for_monitor_to(&monitor_key, latest_update_id, lazy)
 					.await?;

TheBlueMatt · 2025-10-30T15:36:18Z

Only a rustfmt change since @joostjager ack'd, so landing.

TheBlueMatt · 2025-10-30T18:57:31Z

Backported to 0.2 in #4193

domZippilli · 2025-10-30T19:35:20Z

🥳

wvanlint · 2025-10-30T22:03:53Z

Thanks for landing this!

I think the comments above covered everything. We use the MonitorUpdatingPersister with maximum_pending_updates = 1000 for efficiency due to the high forwarding volume, and an update_persisted_channel call can trigger channel monitor update consolidation when maximum_pending_updates is reached. This consolidation results in maximum_pending_updates sequential KVStore::remove calls, which caused issues when it's performed in a non-lazy fashion. In our case in 0.1, it blocked the Tokio runtime (for ~7 ms * 1000 = 7s), but I assume it will affect the caller in the async design as well as Matt mentioned.

I was curious if there are possible simplifications, such as all remove calls being considered lazy or if remove can be constrained to keys that won't ever be written again in the future as Joost mentioned. But I see there are requirements coming from #4059 (comment) as well.

TheBlueMatt added this to the 0.2 milestone Oct 30, 2025

TheBlueMatt added the backport 0.2 label Oct 30, 2025

TheBlueMatt linked an issue Oct 30, 2025 that may be closed by this pull request

Add lazy persistence back to KVStore::delete #4188

Closed

TheBlueMatt requested a review from tnull October 30, 2025 00:21

tnull reviewed Oct 30, 2025

View reviewed changes

tnull requested a review from joostjager October 30, 2025 09:42

joostjager reviewed Oct 30, 2025

View reviewed changes

joostjager previously approved these changes Oct 30, 2025

View reviewed changes

TheBlueMatt added 2 commits October 30, 2025 13:17

TheBlueMatt dismissed joostjager’s stale review via 0f9548b October 30, 2025 13:17

TheBlueMatt force-pushed the 2025-10-lazy-again branch from 9973d78 to 0f9548b Compare October 30, 2025 13:17

tnull approved these changes Oct 30, 2025

View reviewed changes

TheBlueMatt merged commit d53d6b4 into lightningdevkit:main Oct 30, 2025
23 of 25 checks passed

TheBlueMatt mentioned this pull request Oct 30, 2025

0.2-rc1 backports #4193

Merged

TheBlueMatt removed the backport 0.2 label Oct 30, 2025

Restore lazy flag to KVStore::remove #4189

Restore lazy flag to KVStore::remove #4189

Uh oh!

Conversation

TheBlueMatt commented Oct 30, 2025

Uh oh!

ldk-reviews-bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tnull Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tnull Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

joostjager Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

tnull Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented Oct 30, 2025

Uh oh!

joostjager left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Oct 30, 2025

Uh oh!

joostjager commented Oct 30, 2025

Uh oh!

TheBlueMatt commented Oct 30, 2025

Uh oh!

joostjager commented Oct 30, 2025

Uh oh!

TheBlueMatt commented Oct 30, 2025

Uh oh!

TheBlueMatt commented Oct 30, 2025

Uh oh!

Uh oh!

TheBlueMatt commented Oct 30, 2025

Uh oh!

domZippilli commented Oct 30, 2025

Uh oh!

wvanlint commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Restore `lazy` flag to `KVStore::remove` #4189

Restore `lazy` flag to `KVStore::remove` #4189

ldk-reviews-bot commented Oct 30, 2025 •

edited

Loading

codecov bot commented Oct 30, 2025 •

edited

Loading

tnull Oct 30, 2025 •

edited

Loading

tnull Oct 30, 2025 •

edited

Loading