Defer ChainMonitor updates and persistence to flush() #4345

joostjager · 2026-01-26T09:55:50Z

Summary

Introduce DeferredChainMonitor, a wrapper around ChainMonitor that queues watch_channel and update_channel operations, returning InProgress until flush() is called. This enables batched persistence of monitor updates after ChannelManager persistence, ensuring correct ordering where the ChannelManager state is never ahead of the monitor state on restart.

The Problem

There's a race condition that can cause channel force closures: if the node crashes after writing channel monitors but before writing the channel manager, the monitors will be ahead of the manager on restart. This can lead to state desync and force closures.

The Solution

By deferring monitor writes until after the channel manager is persisted (via flush()), we ensure the manager is always at least as up-to-date as the monitors.

Key changes:

DeferredChainMonitor queues monitor operations and returns InProgress
Calling flush() applies pending operations and persists monitors
All ChainMonitor traits (Listen, Confirm, EventsProvider, etc.) are passed through, allowing drop-in replacement
Background processor updated to capture pending count before ChannelManager persistence, then flush after persistence completes

Performance Impact

Multi-channel, multi-node load testing (using ldk-server chaos branch) shows no measurable throughput difference between deferred and direct persistence modes.

This is likely because forwarding and payment processing are already effectively single-threaded: the background processor batches all forwards for the entire node in a single pass, so the deferral overhead doesn't add any meaningful bottleneck to an already serialized path.

Alternative Designs Considered

Several approaches were explored to solve the monitor/manager persistence ordering problem:

1. Queue at KVStore level (#4310)

Introduces a QueuedKVStoreSync wrapper that queues all writes in memory, committing them in a single batch at chokepoints where data leaves the system (get_and_clear_pending_msg_events, get_and_clear_pending_events). This approach aims for true atomic multi-key writes but requires KVStore backends that support transactions (e.g., SQLite) - filesystem backends cannot achieve full atomicity.

Trade-offs: Most general solution but requires changes to persistence boundaries and cannot fully close the desync gap with filesystem storage.

2. Queue at Persister level (#4317)

Updates MonitorUpdatingPersister to queue persist operations in memory, with actual writes happening on flush(). Adds flush() to the Persist trait and ChainMonitor.

Trade-offs: Only fixes the issue for MonitorUpdatingPersister; custom Persist implementations remain vulnerable to the race condition.

3. Queue internally in ChainMonitor (#4351)

Modifies ChainMonitor directly to queue operations internally, returning InProgress until flush() is called.

Trade-offs: Requires an enormous amount of test changes since existing tests expect immediate persistence behavior.

ldk-reviews-bot · 2026-01-26T09:55:53Z

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

joostjager · 2026-01-26T10:50:28Z

Added a DeferredChainMonitor wrapper instead of modifying ChainMonitor directly. The wrapper intercepts watch_channel and update_channel calls, queues them, and returns InProgress. When flush is called, it processes the queued operations and persists them in the correct order after ChannelManager persistence. This approach keeps ChainMonitor unchanged so that existing tests which expect synchronous behavior continue to work without modification. Only the background processor and production code paths use the deferred wrapper while the test suite can keep using ChainMonitor directly.

joostjager · 2026-01-26T14:04:57Z

Initially attempted to implement this as a thin adapter/wrapper that would sit between the ChannelManager and an existing ChainMonitor, forwarding calls while deferring the Watch operations. However, when integrating with ldk-node, this approach quickly ran into Rust ownership and lifetime issues since it required keeping both the original ChainMonitor and the wrapper around simultaneously. The current implementation takes a simpler approach where DeferredChainMonitor owns its own ChainMonitor internally and implements Deref to it, making it a complete drop-in replacement that can be instantiated with the same parameters as ChainMonitor while exposing all the same traits and methods.

codecov · 2026-01-26T15:54:27Z

Codecov Report

❌ Patch coverage is 86.30952% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.09%. Comparing base (e8a9303) to head (b7d9730).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/chain/deferred.rs	87.78%	46 Missing and 7 partials ⚠️
lightning-background-processor/src/lib.rs	77.14%	12 Missing and 4 partials ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #4345    +/-   ##
========================================
  Coverage   86.09%   86.09%            
========================================
  Files         156      157     +1     
  Lines      102462   102932   +470     
  Branches   102462   102932   +470     
========================================
+ Hits        88213    88621   +408     
- Misses      11753    11808    +55     
- Partials     2496     2503     +7

Flag	Coverage Δ
tests	`86.09% <86.30%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Extract common logic for listing monitor files into a helper function that filters out temporary .tmp files created during persistence operations. This simplifies test code and improves reliability on systems where directory iteration order is non-deterministic. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Introduce a `DeferredChainMonitor` wrapper around `ChainMonitor` that queues `watch_channel` and `update_channel` operations, returning `InProgress` until `flush()` is called. This enables batched persistence of monitor updates after `ChannelManager` persistence, ensuring correct ordering where the `ChannelManager` state is never ahead of the monitor state on restart. Key changes: - `DeferredChainMonitor` queues monitor operations and returns `InProgress` - Calling `flush()` applies pending operations and persists monitors - All `ChainMonitor` traits (Listen, Confirm, EventsProvider, etc.) are passed through, allowing drop-in replacement - Background processor updated to capture pending count before `ChannelManager` persistence, then flush after persistence completes Includes comprehensive tests covering the full channel lifecycle with payment flows using `DeferredChainMonitor`. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

joostjager · 2026-01-28T12:35:57Z

lightning/src/chain/deferred.rs

+	/// `update_channel` operations until `flush()` is called, using real
+	/// ChannelManagers and a complete channel open + payment flow.
+	#[test]
+	fn test_deferred_monitor_payment() {


Had to include a test that sets up from scratch, because the test infra is built around ChainMonitor

joostjager changed the title ~~Chain mon deferred writes~~ Defer ChainMonitor updates and persistence to flush() Jan 26, 2026

joostjager force-pushed the chain-mon-deferred-writes branch 3 times, most recently from 36a8b33 to 73c0a66 Compare January 26, 2026 13:59

joostjager force-pushed the chain-mon-deferred-writes branch 2 times, most recently from 5bd0ea3 to 0c005d0 Compare January 26, 2026 14:08

joostjager force-pushed the chain-mon-deferred-writes branch from 0c005d0 to bc1b327 Compare January 27, 2026 11:29

This was referenced Jan 27, 2026

Batched persistence with a queuing KVStore (PoC) #4310

Closed

Defer MonitorUpdatingPersister writes to flush() #4317

Closed

Defer ChainMonitor updates and persistence to flush() #4351

Closed

joostjager force-pushed the chain-mon-deferred-writes branch 2 times, most recently from ed6f0a3 to 360b6e5 Compare January 28, 2026 11:49

joostjager and others added 2 commits January 28, 2026 12:56

joostjager force-pushed the chain-mon-deferred-writes branch from 360b6e5 to b7d9730 Compare January 28, 2026 12:31

joostjager commented Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer ChainMonitor updates and persistence to flush() #4345

Defer ChainMonitor updates and persistence to flush() #4345

joostjager commented Jan 26, 2026 •

edited

Loading

Uh oh!

ldk-reviews-bot commented Jan 26, 2026

Uh oh!

joostjager commented Jan 26, 2026

Uh oh!

joostjager commented Jan 26, 2026

Uh oh!

codecov bot commented Jan 26, 2026 •

edited

Loading

Uh oh!

joostjager Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Defer ChainMonitor updates and persistence to flush() #4345

Are you sure you want to change the base?

Defer ChainMonitor updates and persistence to flush() #4345

Conversation

joostjager commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Problem

The Solution

Performance Impact

Alternative Designs Considered

1. Queue at KVStore level (#4310)

2. Queue at Persister level (#4317)

3. Queue internally in ChainMonitor (#4351)

Uh oh!

ldk-reviews-bot commented Jan 26, 2026

Uh oh!

joostjager commented Jan 26, 2026

Uh oh!

joostjager commented Jan 26, 2026

Uh oh!

codecov bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

joostjager Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joostjager commented Jan 26, 2026 •

edited

Loading

codecov bot commented Jan 26, 2026 •

edited

Loading