-
Notifications
You must be signed in to change notification settings - Fork 444
Description
This is the claude output of the following prompt:
write a design doc for a new approach to resolving HTLCs and payments.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic.
Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them - adding a new MonitorEvent resolution path through a new method (rather than via ChannelMonitorUpdates). This should remove the need for the RAA-blocking updates entirely - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents.
Further, for claiming inbound (MPP) payments, the ChannelMonitor should push a pending MonitorEvent to replay that claim, but not provide it to the ChannelManager immediately. That way, if we crash and restart the MonitorEvents to claim the MPP parts will be replayed automatically and we can claim any MPP Parts that we didn't manage to claim before crashing. We'll still have to track once all channels part of the MPP payment have the preimage and delete the MonitorEvent, but should be able to remove some other on-load logic.
Finally, decisions on when an HTLC is able to be forwarded to the outbound channel should similarly be done using a MonitorEvent, avoiding more work in ChannelManager HTLC reconstruction.
Please write a thorough design doc detailing how much of the ChannelManager on-load complexity can go away, listing specific hunks of code and fields.
Design: Unified HTLC Resolution via ChannelMonitor
Status: Draft
Date: 2026-03-12 (rev 1)
Problem Statement
Currently, HTLC resolution (and decisions about when HTLCs can be forwarded) is
split between two systems:
Channelobjects (part ofChannelManager) handle HTLC resolution while
the channel is open — tracking commitment state transitions, forwarding
decisions, and preimage propagation.ChannelMonitors handle HTLC resolution after the channel is closed —
watching for on-chain transactions, claiming outputs, and failing HTLCs back.
This split creates two major problems:
Race conditions around channel closure. An HTLC might be in the process of
being resolved in the Channel when the channel closes. The Channel buffers
HTLC actions in monitor_pending_* fields (channel.rs:3128–3131) while a
monitor update is in progress. If the channel is dropped during this window, we
don't know whether the ChannelMonitor has responsibility for those HTLCs. This
is explicitly called out as an open bug in channel.rs:3124–3127:
If a channel is drop'd, we don't know whether the
ChannelMonitoris
ultimately responsible for some of the HTLCs here or not — we don't know
whether the update in question completed or not. We currently ignore these
fields entirely when force-closing a channel, but need to handle this somehow
or we run the risk of losing HTLCs!
Enormous complexity in ChannelManager deserialization. On restart, the
ChannelManager must reconstruct the state of all in-flight HTLCs by
cross-referencing Channel state, ChannelMonitor state, in-flight monitor
updates, blocked monitor updates, RAA-blocking actions, and various legacy maps.
This reconstruction logic spans ~1600 lines of from_channel_manager_data()
(channelmanager.rs:18635–20263) and is one of the most complex and error-prone
parts of LDK.
Proposed Solution
Move all HTLC resolution to ChannelMonitors, driven by
ChannelMonitorUpdates, with results communicated back to ChannelManager via
MonitorEvents. The ChannelManager becomes a pure routing/forwarding
engine that tells monitors what to do, and monitors tell the manager what
happened.
Core Principles
-
ChannelMonitoris the sole authority on HTLC resolution state. Whether a
channel is open or closed, the monitor decides when an HTLC is resolved and
communicates this to theChannelManager. -
MonitorEvents are persistent until acknowledged. TheChannelMonitor
keepsMonitorEvents in its persistent state until theChannelManager
explicitly acknowledges them via a new method (not viaChannelMonitorUpdate). -
Restart == replay. On restart, the
ChannelManagersimply replays all
pending (unacknowledged)MonitorEvents from all monitors. No reconstruction
logic needed. -
Inbound MPP claims use deferred
MonitorEvents. When claiming an MPP
payment, theChannelMonitorstores aMonitorEventfor the claim but does
not provide it to theChannelManagerimmediately. On restart, these events
are replayed, allowing crash-safe MPP claiming without special on-load logic.
Current Architecture (What We Have Today)
HTLC Lifecycle in an Open Channel
When a channel is open, an inbound HTLC goes through these states
(channel.rs InboundHTLCState, lines 174–233):
RemoteAnnounced(InboundHTLCResolution)
│ commitment_signed received
▼
AwaitingRemoteRevokeToAnnounce(InboundHTLCResolution)
│ counterparty revoke_and_ack
▼
AwaitingAnnouncedRemoteRevoke(InboundHTLCResolution)
│ our revoke_and_ack + their commitment_signed
▼
Committed { update_add_htlc: InboundUpdateAdd }
│ HTLC now irrevocably committed; forwarding decision made
│ fail_htlc/fulfill_htlc
▼
LocalRemoved(InboundHTLCRemovalReason)
│ counterparty revoke_and_ack
▼
[removed from tracking]
When the HTLC reaches Committed, the InboundUpdateAdd payload
(channel.rs:337–376) indicates its readiness:
WithOnion { update_add_htlc }— onion not yet decoded, added to
decode_update_add_htlcsfor processingForwarded { ... }— already forwarded to the outbound edge, onion prunedLegacy— pre-0.3 HTLC without onion persistence
The transition from AwaitingAnnouncedRemoteRevoke to Committed happens in
revoke_and_ack() (channel.rs:~8587), where WithOnion HTLCs are pushed to
monitor_pending_update_adds (channel.rs:8696) and eventually decoded via
process_pending_update_add_htlcs() (channelmanager.rs:7195–7535).
Forwarding Decisions
Decoded HTLCs flow through process_pending_htlc_forwards()
(channelmanager.rs:7558–7645):
process_pending_update_add_htlcs()decodes onions from the
decode_update_add_htlcsmap (channelmanager.rs:2807)- Decoded HTLCs are added to the
forward_htlcsmap
(channelmanager.rs:2789–2791) forward_htlcsis drained; for each HTLC:- If
short_chan_id != 0:process_forward_htlcs()sends it to an outbound
channel viaqueue_add_htlc()(channelmanager.rs:7836–8105) - If
short_chan_id == 0:process_receive_htlcs()handles it as a final
payment
- If
Monitor Update Blocking and the monitor_pending_* Fields
When a ChannelMonitorUpdate is being persisted, the Channel cannot proceed
with certain protocol messages. Pending work is buffered:
monitor_pending_forwards: Vec<(PendingHTLCInfo, u64)>(channel.rs:3128)
— inbound HTLCs ready to forwardmonitor_pending_failures: Vec<(HTLCSource, PaymentHash, HTLCFailReason)>
(channel.rs:3129) — inbound HTLCs to fail backwardsmonitor_pending_finalized_fulfills: Vec<(HTLCSource, Option<AttributionData>)>
(channel.rs:3130) — fulfilled HTLCs awaiting acknowledgment (persisted, TLV 11)monitor_pending_update_adds: Vec<msgs::UpdateAddHTLC>(channel.rs:3131)
— inbound update_add messages awaiting onion decode
These are released via monitor_updating_restored() (channel.rs:9100–9234)
which returns a MonitorRestoreUpdates struct (channel.rs:1176–1197) containing:
pub struct MonitorRestoreUpdates {
pub raa: Option<msgs::RevokeAndACK>,
pub commitment_update: Option<msgs::CommitmentUpdate>,
pub commitment_order: RAACommitmentOrder,
pub accepted_htlcs: Vec<(PendingHTLCInfo, u64)>, // from monitor_pending_forwards
pub failed_htlcs: Vec<(HTLCSource, PaymentHash, HTLCFailReason)>,
pub finalized_claimed_htlcs: Vec<(HTLCSource, Option<AttributionData>)>,
pub pending_update_adds: Vec<msgs::UpdateAddHTLC>, // from monitor_pending_update_adds
pub funding_broadcastable: Option<Transaction>,
pub channel_ready: Option<msgs::ChannelReady>,
// ... other fields
}Preimage Claiming (Forwarded Payments)
When a downstream channel receives a preimage:
ChannelManager::claim_funds_internal()is called- For the upstream (inbound) channel,
Channel::get_update_fulfill_htlc_and_commit()(channel.rs:7106–7166)
generates aChannelMonitorUpdatewith aPaymentPreimagestep
(channel.rs:7018–7025) - A
RAAMonitorUpdateBlockingAction::ForwardedPaymentInboundClaim
(channelmanager.rs:1672–1677) is set on the downstream channel, blocking
its next RAA monitor update - A
MonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel
(channelmanager.rs:1474–1477) pairsEvent::PaymentForwardedwith
unblocking the downstream channel viaEventUnblockedChannel
(channelmanager.rs:1414–1420) - When the upstream monitor update completes,
handle_monitor_update_completion_actions()(channelmanager.rs:10103–10255)
emits the event and frees the RAA blocker
RAA-Blocking Infrastructure
The RAA-blocking system involves multiple types and fields:
Types:
RAAMonitorUpdateBlockingAction(channelmanager.rs:1668–1700): Enum with
ForwardedPaymentInboundClaimandClaimedMPPPaymentvariantsMonitorUpdateCompletionAction(channelmanager.rs:1454–1495): Enum with
PaymentClaimed,EmitEventOptionAndFreeOtherChannel, and
FreeDuplicateClaimImmediatelyvariantsEventCompletionAction::ReleaseRAAChannelMonitorUpdate
(channelmanager.rs:1557–1562): Deferred RAA release on event processingEventUnblockedChannel(channelmanager.rs:1414–1420): Pointer to channel to
unblockPendingChannelMonitorUpdate(channel.rs:1472–1478): Blocked update wrapper
Fields (PeerState, channelmanager.rs:1709–1782):
in_flight_monitor_updates: BTreeMap<ChannelId, (OutPoint, Vec<ChannelMonitorUpdate>)>
(line 1740)monitor_update_blocked_actions: BTreeMap<ChannelId, Vec<MonitorUpdateCompletionAction>>
(line 1760)actions_blocking_raa_monitor_updates: BTreeMap<ChannelId, Vec<RAAMonitorUpdateBlockingAction>>
(line 1765)closed_channel_monitor_update_ids: BTreeMap<ChannelId, u64>(line 1775)
Fields (ChannelContext, channel.rs):
blocked_monitor_updates: Vec<PendingChannelMonitorUpdate>(line 3339)
Functions:
raa_monitor_updates_held()(channelmanager.rs:12671–12688): Checks the
actions_blocking_raa_monitor_updatesmap AND the pending events queue for
ReleaseRAAChannelMonitorUpdateactionshandle_monitor_update_release()(channelmanager.rs:~14962–15036): Removes
blockers and unblocks the channel'sblocked_monitor_updatesqueuerevoke_and_ack(..., hold_mon_update: bool)(channel.rs:~8359): The
hold_mon_updateparameter conditionally blocks the resulting monitor update
Inbound MPP Claiming
The MPP claim flow is particularly complex:
- User calls
claim_funds(preimage)(channelmanager.rs:9206) begin_claiming_payment()moves payment fromclaimable_paymentsto
pending_claiming_payments(channelmanager.rs:~1319–1380)- For each MPP part,
claim_mpp_part()(channelmanager.rs:9563+):
a. CallsChannel::get_update_fulfill_htlc_and_commit()for open channels
b. CreatesChannelMonitorUpdatewithPaymentPreimagestep +PaymentClaimDetails
c. Sets up sharedPendingMPPClaim(channelmanager.rs:1609–1612):d. Createspub(crate) struct PendingMPPClaim { channels_without_preimage: Vec<(PublicKey, ChannelId)>, channels_with_preimage: Vec<(PublicKey, ChannelId)>, }
RAAMonitorUpdateBlockingAction::ClaimedMPPPaymentper channel
e. CreatesMonitorUpdateCompletionAction::PaymentClaimedper channel - As each monitor update completes,
handle_monitor_update_completion_actions()(channelmanager.rs:10147–10155)
moves entries fromchannels_without_preimagetochannels_with_preimage - When
channels_without_preimageis empty: free all RAA blockers, emit
Event::PaymentClaimed
Supporting types:
PendingMPPClaimPointer(Arc<Mutex<PendingMPPClaim>>)(line 1650): Shared
pointer for cross-channel coordinationMPPClaimHTLCSource(line 1618–1623): Identifies each MPP part channelPaymentClaimDetails(line 1637–1642): Stored inChannelMonitorfor
restart claim replayHTLCClaimSource(line 1590–1595): Deserialization-time equivalent of
MPPClaimHTLCSource
HTLC Resolution After Channel Closure
After closure, the ChannelMonitor takes over:
- Watches for on-chain HTLC timeouts/claims (channelmonitor.rs:5257–5756)
- Creates
MonitorEvent::HTLCEventwith preimage (line 6134) or without
(line 5607) as HTLCs resolve on-chain - Creates
MonitorEvent::CommitmentTxConfirmed(line 5432) when commitment tx
is detected ChannelManager::process_monitor_events_for_failover()
(channelmanager.rs:13247–13373) consumes these events to fail/claim upstream
The MonitorEvent enum (channelmonitor.rs:188–227) currently has:
HTLCEvent(HTLCUpdate)— HTLC resolved on-chain (claim or timeout)HolderForceClosedWithInfo { reason, outpoint, channel_id }— we force-closedHolderForceClosed(OutPoint)— legacy force-closeCommitmentTxConfirmed(())— commitment tx confirmed on-chainCompleted { funding_txo, channel_id, monitor_update_id }— monitor update
persisted
Events are currently fire-and-forget: get_and_clear_pending_monitor_events()
(channelmonitor.rs:4373–4377) does a mem::swap to drain them.
The Painful On-Load Reconstruction
On deserialization, from_channel_manager_data() (channelmanager.rs:18635–20263)
must perform a vast reconstruction. Here is every section with exact line ranges:
Step 1: Channel vs. Monitor State Validation (lines 18688–18876)
For each deserialized FundedChannel, compare its commitment transaction
numbers against the corresponding ChannelMonitor:
channel.get_cur_holder_commitment_transaction_number()
> monitor.get_cur_holder_commitment_number()
|| channel.get_revoked_counterparty_commitment_transaction_number()
> monitor.get_min_seen_secret()
|| channel.get_cur_counterparty_commitment_transaction_number()
> monitor.get_cur_counterparty_commitment_number()
|| channel.context.get_latest_monitor_update_id()
< monitor.get_latest_update_id()
If the channel is behind the monitor: force-close with
ClosureReason::OutdatedChannelManager and fail any orphaned HTLCs not in the
monitor. This queues BackgroundEvent::MonitorUpdateRegeneratedOnStartup with
a ChannelForceClosed step.
Step 2: Closed Channel Monitor Processing (lines 18878–18935)
For monitors without a corresponding Channel (already closed), track their
latest update IDs in closed_channel_monitor_update_ids and queue force-close
monitor updates for monitors with state needing update.
Step 3: In-Flight Monitor Update Replay (lines 18970–19205)
The handle_in_flight_updates! macro (lines 18982–19048) processes each
in_flight_monitor_updates entry:
- Compare each update's
update_idagainstmonitor.get_latest_update_id() - If all completed: queue
BackgroundEvent::MonitorUpdatesCompletewith
highest_update_id_completed - If some pending: retain only incomplete updates, queue as
BackgroundEvent::MonitorUpdateRegeneratedOnStartupfor replay - Validate that channel's unblocked update ID doesn't exceed monitor's ID
This macro is invoked twice: once for open channels (lines ~19050–19096) and
once for remaining closed-channel updates (lines ~19097–19139).
Step 4: Reconstruct/Deserialize Decision (lines 19207–19239)
The key branch: should we reconstruct HTLC state from monitors or use
persisted ChannelManager state?
// Non-test: always reconstruct for version >= RECONSTRUCT_HTLCS_FROM_CHANS_VERSION (2)
let reconstruct_manager_from_monitors = _version >= RECONSTRUCT_HTLCS_FROM_CHANS_VERSION;
// Test: random or controlled via env varStep 5: HTLC Forwarding State Reconstruction (lines 19267–19362)
Two passes over all channel monitors:
First pass (lines 19267–19333): For each monitor with an open channel
(when reconstruct_manager_from_monitors):
- Call
inbound_htlcs_pending_decode()(channel.rs:7439–7448) to getWithOnion
HTLCs → populatedecode_update_add_htlcs - Call
inbound_forwarded_htlcs()(channel.rs:7452–7507) to get already-
forwarded HTLCs → populatealready_forwarded_htlcs - For closed channels: call
insert_from_monitor_on_startup()for outbound
payments, process preimage claims viapending_outbounds.claim_htlc()
Second pass (lines 19334–19512): For each monitor:
- For open channels with
reconstruct_manager_from_monitors: call
outbound_htlc_forwards()(channel.rs:7512–7533) and prune via
dedup_decode_update_add_htlcs()andprune_forwarded_htlc() - For closed channels: call
get_all_current_outbound_htlcs()and
reconcile_pending_htlcs_with_monitor()for each; also handle
get_onchain_failed_outbound_htlcs()→failed_htlcs
Step 6: Preimage Claim Replay from Monitors (lines 19514–19591)
For each monitor (open or closed), find outbound HTLCs with preimages:
- Filter via
get_all_current_outbound_htlcs()for HTLCs where
preimage_opt.is_some() - Check that the inbound edge's monitor still exists (not archived)
- Check
claimable_balances().is_empty()to skip fully-resolved monitors - Verify
counterparty_node_id.is_some()(required since 0.0.124) - Push to
pending_claims_to_replayfor later execution
Step 7: RAA-Blocking Restoration (lines 19695–19770)
Reconstruct actions_blocking_raa_monitor_updates from the persisted
monitor_update_blocked_actions_per_peer:
For each MonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel:
- Find the blocked channel's peer state
- Push
blocking_action(anRAAMonitorUpdateBlockingAction) into
actions_blocking_raa_monitor_updates[blocked_channel_id] - Handle edge case: pre-0.1 MPP claims where a channel blocked itself
Step 8: HTLC Deduplication (lines 19772–19798)
When reconstruct_manager_from_monitors:
- Dedup
failed_htlcsagainstdecode_update_add_htlcs - Dedup
claimable_paymentsagainstdecode_update_add_htlcs - Choose between reconstructed maps vs legacy maps (lines 19800–19809)
Step 9: ChannelManager Construction (lines 19864–19926)
The ChannelManager struct is built with the reconstructed state, including
forward_htlcs, decode_update_add_htlcs, claimable_payments, and
pending_background_events.
Step 10: MPP Claim Replay from Monitor Preimages (lines 19928–20088)
For each monitor, call get_stored_preimages() to retrieve
(PaymentHash, (PaymentPreimage, Vec<PaymentClaimDetails>)):
- Cross-reference with
already_forwarded_htlcs— if an inbound HTLC was
forwarded to a downstream channel and the downstream has the preimage,
push it topending_claims_to_replay(lines 19935–19968) - For each
PaymentClaimDetails:- Dedup via
processed_claims: HashSet<Vec<MPPClaimHTLCSource>> - Skip if already in
pending_claiming_payments - Create fresh
PendingMPPClaimwith all channels in
channels_without_preimage - Call
begin_claiming_payment()+claim_mpp_part()for each part
(lines 20001–20088)
- Dedup via
Step 11: Legacy Preimage-Without-ClaimDetails Path (lines 20090–20196)
For preimages in monitors that have no PaymentClaimDetails (pre-0.3):
- Remove payment from
claimable_payments - For each HTLC part:
- Call
claim_htlc_while_disconnected_dropping_mon_update_legacy()
on the channel (line 20141–20146) - Call
provide_payment_preimage_unsafe_legacy()directly on the monitor
(line 20164) — explicitly unsafe, noted as only for upgrade path
- Call
- Push
Event::PaymentClaimedmanually
Step 12: Failed HTLC and Claim Execution (lines 20200–20257)
- Call
fail_htlc_backwards_internal()for allfailed_htlcs - Fail any remaining
already_forwarded_htlcsthat weren't pruned
(lines 20213–20227) — these are HTLCs the inbound channel thought were
forwarded but the outbound channel doesn't have, implying they were failed - Call
claim_funds_internal()for allpending_claims_to_replay
(lines 20229–20257)
Step 13: Helper Functions (lines 20266–20352)
prune_forwarded_htlc()(lines 20266–20281): Remove specific HTLC from
already_forwarded_htlcsreconcile_pending_htlcs_with_monitor()(lines 20285–20352): Master dedup
function that removes HTLCs fromdecode_update_add_htlcs,
forward_htlcs_legacy, andpending_intercepted_htlcs_legacywhen the
monitor has taken responsibility
Proposed Architecture
New MonitorEvent Variants
Extend MonitorEvent (channelmonitor.rs:188) to cover all HTLC resolution
outcomes, not just post-close on-chain events:
pub enum MonitorEvent {
// Existing variants (retained)
HTLCEvent(HTLCUpdate),
HolderForceClosedWithInfo { .. },
HolderForceClosed(OutPoint),
CommitmentTxConfirmed(()),
Completed { .. },
// New variants
/// An HTLC was irrevocably committed to both commitment transactions and
/// can now be forwarded/received. Generated when the ChannelMonitor
/// processes a LatestHolderCommitment update containing the HTLC and the
/// counterparty's revocation for the prior state has been received.
///
/// Replaces the current flow where Channel pushes to
/// monitor_pending_update_adds → decode_update_add_htlcs → forward_htlcs.
HTLCAccepted {
channel_id: ChannelId,
htlc: msgs::UpdateAddHTLC,
},
/// A forwarded HTLC was claimed with a preimage. The ChannelManager should
/// propagate the preimage to the inbound edge.
///
/// Replaces the current flow where claim_funds_internal() directly drives
/// the inbound channel + sets up RAA blocking on the outbound channel.
ForwardedHTLCClaimed {
source: HTLCSource,
preimage: PaymentPreimage,
downstream_value_msat: u64,
},
/// An inbound MPP payment part has been durably claimed with a preimage.
/// This event is generated but NOT immediately surfaced — it is stored in
/// deferred_restart_events and only replayed on restart to enable
/// crash-safe MPP claiming without ChannelManager-side tracking.
///
/// Replaces PendingMPPClaim, PendingMPPClaimPointer, and the complex
/// on-load reconstruction in lines 19928-20088.
InboundMPPClaimPersisted {
payment_hash: PaymentHash,
preimage: PaymentPreimage,
htlc_source: HTLCPreviousHopData,
claim_details: PaymentClaimDetails,
},
}New MonitorEvent Acknowledgment Path
Add a method on ChannelMonitor (and the chain::Watch trait) to acknowledge
processed events:
/// A unique identifier for a MonitorEvent, used for acknowledgment.
/// Monotonically increasing per-monitor counter.
pub struct MonitorEventId(u64);
impl ChannelMonitor {
/// Acknowledge that the given MonitorEvents have been processed by the
/// ChannelManager. The monitor will remove them from its persistent state.
///
/// This should be called after the ChannelManager has durably processed
/// the events (i.e., after the ChannelManager has been re-persisted with
/// the resulting state changes).
pub fn acknowledge_monitor_events(&self, up_to_id: MonitorEventId);
}Each MonitorEvent gets a unique MonitorEventId (monotonic counter per
monitor). Events remain in the monitor's persistent state until acknowledged.
On restart, unacknowledged events are replayed.
This is deliberately not a ChannelMonitorUpdate — acknowledgments flow in
the opposite direction and don't need the same ordering guarantees. However,
acknowledging events does trigger a monitor re-persist (since the monitor's
serialized state changed).
New ChannelMonitorUpdateStep Variants
enum ChannelMonitorUpdateStep {
// Existing variants retained...
/// An HTLC has been irrevocably committed. The monitor should generate
/// an HTLCAccepted MonitorEvent. This step is sent when the Channel
/// determines the HTLC is in both commitment txns and the prior
/// counterparty state is revoked.
///
/// Replaces the monitor_pending_update_adds → decode_update_add_htlcs flow.
HTLCIrrevocablyCommitted {
update_add_htlc: msgs::UpdateAddHTLC,
},
/// The ChannelManager has decided to fulfill an HTLC with a preimage.
/// For forwarded HTLCs, the monitor should generate a ForwardedHTLCClaimed
/// event. The source identifies the inbound edge for preimage propagation.
///
/// This extends the existing PaymentPreimage step to carry source info.
FulfillHTLC {
htlc_id: u64,
preimage: PaymentPreimage,
source: HTLCSource,
},
}Note: We may not need a new FailHTLC step. HTLC failures on open channels
still flow through normal commitment transaction negotiation. The monitor only
needs to handle failures post-close (which it already does via HTLCEvent
with payment_preimage: None).
Deferred MonitorEvents for MPP Claims
When the user calls claim_funds(preimage) for an MPP payment:
-
The
ChannelManagersendsChannelMonitorUpdates with
PaymentPreimage+PaymentClaimDetailssteps to each channel's monitor
(same as today). -
Each
ChannelMonitor, upon processing the preimage update, stores an
InboundMPPClaimPersistedevent in a newdeferred_restart_eventslist
(NOT inpending_monitor_events). This event is persisted with the monitor. -
On restart, the
ChannelManagercalls a newget_restart_events()method
(or the existingget_and_clear_pending_monitor_events()is enhanced).
Monitors returnInboundMPPClaimPersistedevents. TheChannelManager
uses these to identify which MPP parts have been claimed and which haven't,
then claims any missing parts. -
Once all MPP parts across all channels have the preimage durably stored
(confirmed by all monitors having theInboundMPPClaimPersistedevent),
theChannelManageracknowledges all theInboundMPPClaimPersisted
events, removing them from the monitors.
This replaces the current on-load logic that iterates all monitors via
get_stored_preimages() and cross-references with claimable_payments /
pending_claiming_payments state (lines 19928–20088).
Resolution Flow (Open Channel — HTLC Acceptance)
ChannelManager Channel ChannelMonitor
| | |
|-- receive update_add_htlc --> | |
| |-- CMU: LatestHolder ->|
| | |
|<-- commitment_signed ---------| |
| |-- CMU: CommitSecret ->|
| | |
|<-- revoke_and_ack ------------| |
| | |
| [Channel confirms HTLC irrevocably committed] |
| |-- CMU: HTLCIrrev. -->|
| | Committed |
| |
|<------------- MonitorEvent::HTLCAccepted ------------|
| |
|-- [decode onion, forward/receive decision] |
| |
Resolution Flow (Claiming a Forwarded HTLC)
ChannelManager ChannelMonitor (downstream)
| |
|<-- MonitorEvent::HTLCEvent ------| (preimage from counterparty claim)
| (or during normal operation: |
| preimage arrives via |
| update_fulfill_htlc) |
| |
|-- CMU: FulfillHTLC + source ---->|
| |
| [Monitor stores preimage, generates ForwardedHTLCClaimed]
| |
|<-- MonitorEvent::ForwardedHTLC --|
| Claimed |
| |
|-- [send preimage to inbound |
| channel's monitor via CMU] |
| |
|-- [once inbound confirmed: |
| acknowledge event] |
| |
Resolution Flow (Restart)
ChannelManager (new) ChannelMonitor (from disk)
| |
|-- get_pending_monitor_events() ->|
| + get_restart_events() |
| |
|<-- [all unacknowledged events] --|
| (HTLCAccepted, ForwardedHTLCClaimed,
| InboundMPPClaimPersisted, etc.)
| |
|-- [process each event as if |
| receiving it for first time] |
| |
|-- acknowledge_monitor_events() ->|
| |
No reconstruction logic needed. The monitor state IS the source of truth.
What Can Be Removed
Fields That Can Be Eliminated
In PeerState (channelmanager.rs:1709–1782)
| Field | Line | Why Removable |
|---|---|---|
monitor_update_blocked_actions |
1760 | Completion actions move into monitor; ChannelManager no longer queues post-completion work |
actions_blocking_raa_monitor_updates |
1765 | RAA blocking is eliminated entirely — safety comes from event acknowledgment |
closed_channel_monitor_update_ids |
1775 | Monitors self-track their update IDs; ChannelManager no longer mirrors this for on-load dedup |
In ChannelContext (channel.rs:3120–3340)
| Field | Lines | Why Removable |
|---|---|---|
monitor_pending_forwards |
3128 | Forwarding driven by MonitorEvent::HTLCAccepted; no buffering needed |
monitor_pending_failures |
3129 | Failure propagation driven by MonitorEvent::HTLCEvent; no buffering needed |
monitor_pending_finalized_fulfills |
3130 | Fulfill tracking moves to monitor's persistent events |
monitor_pending_update_adds |
3131 | Replaced by MonitorEvent::HTLCAccepted |
blocked_monitor_updates |
3339 | RAA blocking eliminated; all updates flow through immediately |
This also eliminates the race condition described in channel.rs:3124–3127.
In ChannelManagerData (channelmanager.rs:18013–18041)
| Field | Line | Why Removable |
|---|---|---|
monitor_update_blocked_actions_per_peer |
18025–18026 | No more blocked actions to persist |
in_flight_monitor_updates |
18030 | Monitor knows its own state; no need for CM to track |
forward_htlcs_legacy |
18036 | Legacy map replaced by monitor events |
pending_intercepted_htlcs_legacy |
18037 | Legacy map replaced by monitor events |
decode_update_add_htlcs_legacy |
18038 | Legacy map replaced by monitor events |
In ChannelManager (runtime state, channelmanager.rs:2780–2820)
| Field | Line | Why Removable |
|---|---|---|
decode_update_add_htlcs |
2807 | HTLCs-to-decode communicated via MonitorEvent::HTLCAccepted; onion decode happens inline |
Note: forward_htlcs (line 2789) and pending_intercepted_htlcs (line 2800)
are still needed for the forwarding pipeline. They are populated from monitor
events rather than from channel state.
Enums/Types That Can Be Simplified or Removed
| Type | Location | Why Removable |
|---|---|---|
RAAMonitorUpdateBlockingAction |
channelmanager.rs:1668–1700 | Entire enum: both variants serve RAA blocking which is eliminated |
MonitorUpdateCompletionAction |
channelmanager.rs:1454–1495 | EmitEventOptionAndFreeOtherChannel and FreeDuplicateClaimImmediately removed; PaymentClaimed simplified or moved to monitor |
EventCompletionAction::ReleaseRAAChannelMonitorUpdate |
channelmanager.rs:1557–1562 | No RAA blocking to release |
EventUnblockedChannel |
channelmanager.rs:1414–1420 | Only existed to carry RAA blocker info |
PostMonitorUpdateChanResume |
channelmanager.rs:1521–1538 | Drastically simplified — no more htlc_forwards, decode_update_add_htlcs, failed_htlcs fields needed |
BackgroundEvent::MonitorUpdateRegeneratedOnStartup |
channelmanager.rs:1397–1402 | No in-flight updates to regenerate on load |
BackgroundEvent::MonitorUpdatesComplete |
channelmanager.rs:1406–1410 | Simplified — completion tracking moves to monitor |
PendingMPPClaim |
channelmanager.rs:1609–1612 | Replaced by deferred InboundMPPClaimPersisted events |
PendingMPPClaimPointer |
channelmanager.rs:1650 | Goes with PendingMPPClaim |
MPPClaimHTLCSource |
channelmanager.rs:1618–1623 | Goes with MPP claim tracking |
HTLCClaimSource |
channelmanager.rs:1590–1595 | Only used during on-load reconstruction |
MonitorRestoreUpdates |
channel.rs:1176–1197 | Most fields removable — only raa, commitment_update, and protocol messages retained |
InboundUpdateAdd::Forwarded |
channel.rs:349–363 | Channel no longer needs to track forwarding state for on-load reconstruction |
Functions/Methods That Can Be Removed or Drastically Simplified
On-Load Reconstruction (the big win)
| Function/Section | Lines | Current Purpose | After Change |
|---|---|---|---|
handle_in_flight_updates! macro |
18982–19048 | Replay in-flight monitor updates | Remove entirely — monitors know their own state |
| In-flight update handling for open channels | 19050–19096 | Match in-flight updates to open channels | Remove entirely |
| In-flight update handling for closed channels | 19097–19139 | Match in-flight updates to closed channels | Remove entirely |
| HTLC reconstruction from channels | 19274–19301 | Rebuild decode_update_add_htlcs and already_forwarded_htlcs from Channel state |
Remove entirely — replaced by MonitorEvent::HTLCAccepted replay |
| Outbound forward dedup | 19334–19362 | Call outbound_htlc_forwards() and dedup_decode_update_add_htlcs() |
Remove entirely |
| Outbound HTLC processing for closed channels | 19365–19512 | Cross-reference monitors with pending_outbound_payments, handle preimage claims and failures |
Drastically simplify — monitor events carry all needed info; outbound payment tracking may still be needed for PaymentSent events |
| Preimage claim replay | 19514–19591 | Find preimages in monitors, check upstream monitors, queue for replay | Remove entirely — ForwardedHTLCClaimed events are replayed automatically |
| RAA-blocking restoration | 19695–19770 | Reconstruct actions_blocking_raa_monitor_updates from persisted monitor_update_blocked_actions_per_peer |
Remove entirely |
| HTLC deduplication | 19772–19798 | Remove already-processed HTLCs from decode queues | Remove entirely — no decode queues to dedup |
| Legacy map selection | 19800–19809 | Choose between reconstructed and legacy maps | Remove entirely |
dedup_decode_update_add_htlcs() |
~18525–18555 | Prevent double-forwarding by matching on prev_outbound_scid_alias and htlc_id |
Remove entirely |
prune_forwarded_htlc() |
20266–20281 | Remove forwarded HTLCs from tracking | Remove entirely |
reconcile_pending_htlcs_with_monitor() |
20285–20352 | Master dedup function across decode_update_add_htlcs, forward_htlcs_legacy, pending_intercepted_htlcs_legacy |
Remove entirely |
MPP claim replay from get_stored_preimages() |
19928–20088 | Reconstruct PendingMPPClaim, call begin_claiming_payment() + claim_mpp_part() |
Remove entirely — replaced by InboundMPPClaimPersisted replay |
Legacy preimage path (no PaymentClaimDetails) |
20090–20196 | claim_htlc_while_disconnected_dropping_mon_update_legacy() + provide_payment_preimage_unsafe_legacy() |
Remove entirely — no longer needed post-migration |
| Failed HTLC backwards propagation | 20200–20212 | fail_htlc_backwards_internal() for failed_htlcs |
Simplified — some of this may still be needed for outbound payments, but forwarded-HTLC failures are handled by monitor events |
| Already-forwarded HTLC failure | 20213–20227 | Fail HTLCs that appear forwarded but are missing from outbound edge | Remove entirely — monitor events make this reconciliation unnecessary |
| Claim replay execution | 20229–20257 | claim_funds_internal() for pending_claims_to_replay |
Remove entirely — replayed via MonitorEvents |
Estimated lines removed from from_channel_manager_data(): ~1200–1400 lines.
RAA-Blocking Infrastructure
| Function | Lines | Purpose | After Change |
|---|---|---|---|
raa_monitor_updates_held() |
12671–12688 | Check actions_blocking_raa_monitor_updates + pending events for ReleaseRAAChannelMonitorUpdate |
Remove entirely |
test_raa_monitor_updates_held() |
12691–12707 | Test helper | Remove entirely |
get_and_clear_pending_raa_blockers() |
~14935–14955 | Extract blockers for startup | Remove entirely |
handle_monitor_update_release() |
~14962–15036 | Remove RAAMonitorUpdateBlockingAction, unblock channel's blocked_monitor_updates via unblock_next_blocked_monitor_update() |
Remove entirely |
handle_monitor_update_completion_actions() |
10103–10255 | Process MonitorUpdateCompletionAction variants: track PendingMPPClaim progress, emit events, free RAA blockers |
Drastically simplify — only simple event emission remains |
handle_post_event_actions() (ReleaseRAA path) |
~15040–15113 | When user handles PaymentForwarded, release the downstream channel's RAA via EventCompletionAction::ReleaseRAAChannelMonitorUpdate |
Remove ReleaseRAA path |
Channel Methods (channel.rs)
| Method | Lines | Purpose | After Change |
|---|---|---|---|
monitor_updating_paused() |
9079–9094 | Push pending forwards/failures/fulfills to monitor_pending_* fields |
Remove entirely — no pending queues |
monitor_updating_restored() |
9100–9234 | Drain monitor_pending_* fields into MonitorRestoreUpdates |
Drastically simplify — only protocol message resends (raa, commitment_update) remain |
unblock_next_blocked_monitor_update() |
~10755–10763 | Dequeue from blocked_monitor_updates |
Remove entirely |
push_ret_blockable_mon_update() |
~10768–10779 | Conditionally block or return monitor update | Remove entirely — updates always flow through |
on_startup_drop_completed_blocked_mon_updates_through() |
~10784–10799 | Drop stale blocked updates on startup | Remove entirely |
get_latest_unblocked_monitor_update_id() |
~4149–4154 | Track boundary of unblocked updates | Remove entirely — no blocking concept |
inbound_htlcs_pending_decode() |
7439–7448 | Extract WithOnion HTLCs for on-load decode queue rebuild |
Remove entirely |
inbound_forwarded_htlcs() |
7452–7507 | Extract forwarded HTLCs for on-load already_forwarded_htlcs rebuild |
Remove entirely |
has_legacy_inbound_htlcs() |
7428–7435 | Detect pre-0.3 HTLC state (InboundUpdateAdd::Legacy) |
Remove entirely (version migration) |
outbound_htlc_forwards() |
7512–7533 | Extract outbound forwards for on-load dedup | Remove entirely |
claim_htlc_while_disconnected_dropping_mon_update_legacy() |
(channel.rs) | Legacy on-load claim that bypasses normal monitor update flow | Remove entirely |
Why RAA-Blocking Is Eliminated
The current RAA-blocking mechanism exists because:
-
When a forwarded payment is claimed, the downstream channel's RAA monitor
update (which, as a side effect of revoking the prior state, removes the
preimage from one commitment transaction) must not complete before the
upstream channel's monitor update (which adds the preimage) is durable.
Otherwise, on restart, the preimage might be lost from the downstream
monitor while the upstream monitor never received it. -
For MPP payments, all channel monitors must have the preimage before any of
them can have it removed from a commitment transaction via revocation. The
PendingMPPClaimshared pointer coordinates this across channels.
In the new architecture, this is handled naturally:
-
The
ChannelMonitorstores preimages inpayment_preimages
(channelmonitor.rs:1272) durably. The preimage is never "lost" from the
monitor's state due to a revocation — it lives in a separate map. -
ForwardedHTLCClaimedevents persist until theChannelManageracknowledges
them. TheChannelManageronly acknowledges after confirming the preimage is
durable on the inbound edge. -
For MPP,
InboundMPPClaimPersistedevents persist in each monitor until all
parts are confirmed claimed. On restart, any missing parts are re-claimed. -
ChannelMonitorUpdates flow through immediately — nohold_mon_update
parameter onrevoke_and_ack(), noblocked_monitor_updatesqueue. The
safety guarantee comes from the acknowledgment path, not from blocking the
update path.
Detailed Design: Inbound MPP Claiming
Current Approach (Complex)
- User calls
claim_funds(preimage)(channelmanager.rs:9206) begin_claiming_payment()moves payment fromclaimable_paymentsto
pending_claiming_payments(channelmanager.rs:~1319–1380)- For each MPP part,
claim_mpp_part()(channelmanager.rs:9563+):
a. CallsChannel::get_update_fulfill_htlc_and_commit()for open channels
b. CreatesChannelMonitorUpdatewithPaymentPreimagestep +PaymentClaimDetails
c. Sets up sharedPendingMPPClaim(channels_without_preimage/channels_with_preimage)
d. CreatesRAAMonitorUpdateBlockingAction::ClaimedMPPPaymentper channel
e. CreatesMonitorUpdateCompletionAction::PaymentClaimedper channel - As each monitor update completes,
handle_monitor_update_completion_actions()(lines 10147–10155) moves
entries betweenPendingMPPClaimlists - When all channels have preimage: free all RAA blockers, emit
Event::PaymentClaimed - On restart: iterate all monitors'
get_stored_preimages(), reconstruct
PendingMPPClaim, dedup viaprocessed_claims, call
begin_claiming_payment()+claim_mpp_part()for each
New Approach (Simple)
- User calls
claim_funds(preimage) ChannelManagersendsChannelMonitorUpdatewithPaymentPreimage+
PaymentClaimDetailsto each channel's monitor- Each
ChannelMonitor, upon processing the update:
a. Stores the preimage inpayment_preimages
b. Stores anInboundMPPClaimPersistedevent indeferred_restart_events
c. For open channels: fulfills the HTLC in the commitment transaction
normally - The
ChannelManagertracks confirmed parts viaMonitorEvent::Completed - Once all parts confirmed: emit
Event::PaymentClaimed, acknowledge all
InboundMPPClaimPersistedevents - On restart: monitors replay
InboundMPPClaimPersistedevents →
ChannelManageridentifies which parts were claimed → claims missing
parts → done
What Goes Away
PendingMPPClaim/PendingMPPClaimPointer(channelmanager.rs:1609–1663)RAAMonitorUpdateBlockingAction::ClaimedMPPPaymentvariant (line 1685)- The completion-tracking in
handle_monitor_update_completion_actions()
(channelmanager.rs:10116–10223) - On-load MPP reconstruction (channelmanager.rs:19928–20088, ~160 lines)
- Legacy preimage path (channelmanager.rs:20090–20196, ~106 lines)
MPPClaimHTLCSource,HTLCClaimSource,processed_claimsHashSet
Detailed Design: Forwarded HTLC Claiming
Current Approach
- Downstream channel receives preimage from counterparty (via
update_fulfill_htlcor on-chain claim) ChannelManager::claim_funds_internal()is called- For the upstream (inbound) channel:
a.Channel::get_update_fulfill_htlc_and_commit()generates a
ChannelMonitorUpdateon the upstream channel with aPaymentPreimage
step
b. ARAAMonitorUpdateBlockingAction::ForwardedPaymentInboundClaim
(channelmanager.rs:1672–1677) blocks the downstream channel's next RAA
c. AMonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel
(channelmanager.rs:1474–1477) is stored on the upstream channel's
monitor_update_blocked_actions, pairingEvent::PaymentForwardedwith
anEventUnblockedChannelthat will free the downstream - When the upstream monitor update completes:
handle_monitor_update_completion_actions()emitsPaymentForwardedand
callshandle_monitor_update_release()to remove the RAA blocker - On restart: reconstruct blockers from
monitor_update_blocked_actions_per_peer(lines 19695–19770)
New Approach
- Downstream channel receives preimage from counterparty
ChannelManagersends aFulfillHTLCChannelMonitorUpdateto the
downstreamChannelMonitor, including theHTLCSourceidentifying the
upstream edge- Downstream
ChannelMonitorgeneratesForwardedHTLCClaimedevent
(persistent until acknowledged) ChannelManagerreceivesForwardedHTLCClaimed, sends preimage to
upstream channel viaChannelMonitorUpdatewithPaymentPreimagestep- When upstream monitor confirms preimage storage (via
MonitorEvent::Completed):ChannelManageracknowledges the
ForwardedHTLCClaimedevent on the downstream monitor ChannelManageremitsEvent::PaymentForwarded- On restart: downstream monitor replays unacknowledged
ForwardedHTLCClaimed→ChannelManagerre-sends preimage to upstream →
safe
What Goes Away
RAAMonitorUpdateBlockingAction::ForwardedPaymentInboundClaimvariant
(channelmanager.rs:1672–1677)MonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel
(channelmanager.rs:1474–1477)EventCompletionAction::ReleaseRAAChannelMonitorUpdate
(channelmanager.rs:1557–1562)EventUnblockedChannelstruct (channelmanager.rs:1414–1447)- The entire
blocked_monitor_updatesmechanism inChannel(channel.rs:3339) - All
hold_mon_updatelogic inChannel::revoke_and_ack()
(channel.rs:~8675–8694) - The RAA-blocking restoration on load (channelmanager.rs:19695–19770)
Detailed Design: HTLC Forwarding via MonitorEvent
Current Approach
When an HTLC becomes irrevocably committed in the Channel:
revoke_and_ack()(channel.rs:~8587) transitions it toCommittedwith
InboundUpdateAdd::WithOnion- The
update_add_htlcmessage is pushed tomonitor_pending_update_adds - When monitor update completes,
monitor_updating_restored()drains
monitor_pending_update_addsintoMonitorRestoreUpdates::pending_update_adds ChannelManagerputs them intodecode_update_add_htlcsmap (keyed by
outbound SCID alias)process_pending_update_add_htlcs()(channelmanager.rs:7195–7535) decodes
each onion and routes toforward_htlcsprocess_pending_htlc_forwards()(channelmanager.rs:7558–7645) forwards or
receives
On restart, inbound_htlcs_pending_decode() extracts WithOnion HTLCs to
rebuild the decode_update_add_htlcs map, and complex deduplication prevents
double-forwarding.
New Approach
- When
revoke_and_ack()confirms an HTLC as irrevocably committed, the
Channelsends aChannelMonitorUpdatewithHTLCIrrevocablyCommitted
step containing theupdate_add_htlcmessage - The
ChannelMonitorprocesses this step and generates an
MonitorEvent::HTLCAcceptedevent (persistent until acknowledged) ChannelManagerreceivesHTLCAccepted, decodes the onion, and routes
toforward_htlcsor handles as a final payment- After forwarding/receiving,
ChannelManageracknowledges the event - On restart: unacknowledged
HTLCAcceptedevents are replayed → onion is
decoded again → forwarding happens again → idempotent (the downstream
channel will reject the duplicateupdate_add_htlc)
What Goes Away
decode_update_add_htlcsmap (channelmanager.rs:2807)monitor_pending_update_addsfield (channel.rs:3131)monitor_pending_forwardsfield (channel.rs:3128) — forwarding is driven
by events, not buffered in the channelInboundUpdateAdd::WithOnionvariant — the monitor holds the raw
update_add_htlcuntil acknowledged, so the channel doesn't need toInboundUpdateAdd::Forwardedvariant — no longer needed for on-load
reconstructioninbound_htlcs_pending_decode()(channel.rs:7439–7448)inbound_forwarded_htlcs()(channel.rs:7452–7507)outbound_htlc_forwards()(channel.rs:7512–7533)- All on-load dedup logic (
dedup_decode_update_add_htlcs,
reconcile_pending_htlcs_with_monitor,prune_forwarded_htlc) - The
already_forwarded_htlcstemporary map infrom_channel_manager_data()
(lines 19251–19254)
New ChannelMonitor State
The ChannelMonitorImpl (channelmonitor.rs:1199–1400) needs new fields:
pub(crate) struct ChannelMonitorImpl<Signer: EcdsaChannelSigner> {
// ... existing fields ...
/// MonitorEvents that have been generated but not yet acknowledged by the
/// ChannelManager. These survive serialization and are replayed on restart.
/// Replaces the fire-and-forget `pending_monitor_events` for new event types.
pending_unacknowledged_events: Vec<(MonitorEventId, MonitorEvent)>,
/// The next MonitorEventId to assign.
next_event_id: u64,
/// Deferred events (e.g., InboundMPPClaimPersisted) that should not be
/// surfaced to the ChannelManager during normal operation but should be
/// replayed on restart. These are stored separately so that
/// get_and_clear_pending_monitor_events() doesn't return them.
deferred_restart_events: Vec<(MonitorEventId, MonitorEvent)>,
}The existing pending_monitor_events: Vec<MonitorEvent> field
(channelmonitor.rs:1283) is kept for backwards compatibility with existing
MonitorEvent variants (on-chain events) during the migration period, then
eventually deprecated.
Serialization Changes
pending_unacknowledged_events and deferred_restart_events must be
serialized as new TLV fields in ChannelMonitorImpl. The existing
MonitorEvent serialization (channelmonitor.rs:228–246) supports existing
variants; new variants need new TLV tags in the MonitorEvent enum.
get_and_clear_pending_monitor_events() Changes
The current implementation (channelmonitor.rs:4373–4377) does mem::swap to
drain events. In the new design:
fn get_pending_monitor_events(&self) -> Vec<(MonitorEventId, MonitorEvent)> {
// Return copies of unacknowledged events without clearing.
self.pending_unacknowledged_events.clone()
}
fn get_restart_events(&self) -> Vec<(MonitorEventId, MonitorEvent)> {
// Called only on restart. Returns deferred events.
self.deferred_restart_events.clone()
}
fn acknowledge_events(&mut self, up_to_id: MonitorEventId) {
self.pending_unacknowledged_events.retain(|(id, _)| id.0 > up_to_id.0);
self.deferred_restart_events.retain(|(id, _)| id.0 > up_to_id.0);
}The ChannelManager tracks the highest acknowledged MonitorEventId per
monitor (either in its own state or by querying the monitor) to distinguish
"new" from "already-processed" events during normal operation. A simple
approach: after processing new events, immediately acknowledge them (the
monitor will re-persist, and if the ChannelManager crashes before
re-persisting, the events will replay on restart — which is the desired
behavior).
Interaction with Existing Constraints
"MonitorEvents MUST NOT be generated during update processing"
The existing constraint (channelmonitor.rs:~1274–1282) says:
MonitorEvents MUST NOT be generated during update processing, only generated
during chain data processing.
This constraint exists because of a race in ChainMonitor::update_channel
where the in-memory state is updated under a read-lock, but persistence hasn't
completed yet. If events were generated during update processing and consumed
before persistence, a restart would replay the update but the event would be
lost.
In the new design, this constraint is relaxed because:
- Events are persistent-until-acknowledged
- Even if an event is generated during update processing and the update isn't
persisted, on restart the update will be replayed and the event regenerated - The acknowledgment path ensures the
ChannelManagerwon't "lose" events
However, we must ensure idempotent event generation — replaying a
ChannelMonitorUpdate must not duplicate events. The monitor should check
whether an event for a given HTLC already exists before generating a new one.
This is straightforward since events carry enough identifying information
(channel_id + htlc_id) for dedup.
Chain Watch Trait
The chain::Watch trait's release_pending_monitor_events() method
(chain/mod.rs:345–347) needs to change:
/// Returns pending MonitorEvents with their IDs for acknowledgment tracking.
fn release_pending_monitor_events(&self)
-> Vec<(OutPoint, ChannelId, Vec<(MonitorEventId, MonitorEvent)>, PublicKey)>;
/// Acknowledge processed events up to the given ID per monitor.
fn acknowledge_monitor_events(&self,
channel_id: &ChannelId, up_to_id: MonitorEventId);Migration Strategy
Backwards Compatibility
- Old
ChannelMonitorstate can be read by new code. New fields are additive
TLVs with defaults (empty vecs, zero counter). - Old
ChannelManagerstate can still be loaded. On first load with new code,
the on-load reconstruction runs one final time (the existing logic is
retained behind the version check). After theChannelManageris
re-persisted, all state is in the new format. - Bump
SERIALIZATION_VERSION(channelmanager.rs:17246) and
RECONSTRUCT_HTLCS_FROM_CHANS_VERSION(channelmanager.rs:17258) to gate
new behavior.
Phased Approach
Phase 1: Persistent MonitorEvents with acknowledgment
- Add
MonitorEventId,pending_unacknowledged_events,
deferred_restart_eventstoChannelMonitorImpl - Add
acknowledge_monitor_events()toChannelMonitorandchain::Watch - Change
get_and_clear_pending_monitor_events()to not clear for new events - Existing
MonitorEventvariants (HTLCEvent,CommitmentTxConfirmed, etc.)
continue to use the old fire-and-forget path during this phase
Phase 2: Move HTLC forwarding to monitor-driven events
- Add
HTLCIrrevocablyCommittedstep andHTLCAcceptedevent Channelgenerates the new step inrevoke_and_ack()instead of pushing
tomonitor_pending_update_addsChannelManagerprocessesHTLCAcceptedin the event loop- Remove
decode_update_add_htlcsmap,monitor_pending_update_adds,
monitor_pending_forwards - Remove
inbound_htlcs_pending_decode(),inbound_forwarded_htlcs(),
outbound_htlc_forwards(), all dedup helpers
Phase 3: Move forwarded HTLC claiming to monitor events
- Add
FulfillHTLCstep andForwardedHTLCClaimedevent - Remove RAA-blocking infrastructure entirely:
RAAMonitorUpdateBlockingAction,EventCompletionAction::ReleaseRAA,
blocked_monitor_updates,hold_mon_updateparameter, etc. - Remove
MonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel
Phase 4: Move inbound MPP claiming to monitor-driven events
- Add
InboundMPPClaimPersistedevent - Modify
claim_funds_internal()to rely on deferred monitor events - Remove
PendingMPPClaim,PendingMPPClaimPointer,
ClaimedMPPPaymentRAA blocker - Remove
MPPClaimHTLCSource,HTLCClaimSource
Phase 5: Simplify on-load logic
- Replace all reconstruction logic with simple
MonitorEventreplay loop - Remove ~1200–1400 lines from
from_channel_manager_data() - Remove legacy map handling, legacy preimage paths,
_legacysuffixed fields - The on-load code reduces to:
for (channel_id, monitor) in channel_monitors { for (event_id, event) in monitor.get_pending_monitor_events() { process_monitor_event(event); } for (event_id, event) in monitor.get_restart_events() { process_monitor_event(event); } }
Risks and Open Questions
-
Monitor persistence size: Storing events until acknowledged increases the
persistent monitor size. Events are relatively small (one per HTLC), but
high-volume nodes could accumulate events if theChannelManageris slow
to persist. Mitigation: batch acknowledgments; keep events compact; bound
the maximum number of unacknowledged events. -
Idempotent event generation: When a
ChannelMonitorUpdateis replayed
on restart, the monitor must not duplicate events. The implementation must
check for existing events with the same HTLC identifier before generating
new ones. -
Backwards compatibility on upgrade: First load with new code must bootstrap
the new event state from the existing reconstruction. The existing
reconstruction logic runs one final time, then theChannelManageris
re-persisted in the new format. This means the existing reconstruction
code must be maintained (but can be feature-gated) until we're confident
all users have upgraded. -
InboundHTLCState/InboundUpdateAddsimplification: TheChannel
still needs to track HTLCs for commitment transaction negotiation, but
InboundUpdateAdd::Forwarded(which exists solely for on-load
reconstruction) can be removed. TheChannelcan transition directly from
WithOnionto "onion consumed" without tracking forwarding state. -
Timing of
HTLCAcceptedevents: The monitor needs to know when an
HTLC is irrevocably committed. Today, this is derived from theChannel's
InboundHTLCStatemachine. In the new design, theChannelsends an
explicitHTLCIrrevocablyCommittedstep at the right moment. The monitor
doesn't need to replicate the state machine — it just needs to generate the
event when it receives the step. This is simpler than having the monitor
independently track HTLC lifecycle states. -
Performance: The current "clear on read" approach for
MonitorEvents is
zero-cost at read time. Persistent events require cloning on read and
additional serialization. However:- Events are small (~100 bytes each)
- Acknowledgment can batch (one call covers many events)
- The massive reduction in on-load complexity saves far more developer time
than the small runtime cost
-
Trampoline forwards:
HTLCSource::TrampolineForwardcontains multiple
previous_hop_dataentries. The new event system must handle this — a
singleForwardedHTLCClaimedevent for a trampoline forward should
carry all upstream sources. This is straightforward since the
HTLCSourceenum already handles this.
Summary
| Metric | Current | Proposed |
|---|---|---|
| On-load reconstruction lines | ~1600 (18635–20263) | ~50 (event replay loop) |
| RAA-blocking types/functions | ~15 (types, fields, methods) | 0 |
ChannelManager-persisted HTLC state |
6+ maps/fields | 0 (all in monitors) |
| Restart correctness argument | Cross-reference Channel, Monitor, in-flight updates, blocked updates, RAA blockers, legacy maps | Replay unacknowledged events |
blocked_monitor_updates mechanism |
Complex FIFO queue with ordering constraints | Not needed |
monitor_pending_* fields |
4 Vec fields with race condition bug (channel.rs:3124–3127) | 0 — eliminated along with the bug |
| MPP claim coordination | PendingMPPClaim + PendingMPPClaimPointer + Arc<Mutex> + RAA blockers + completion actions |
Deferred monitor events |
| Forward HTLC claiming safety | RAA blocking + event completion actions + EventUnblockedChannel |
Persistent events + acknowledgment |
| Channel closure HTLC handoff | Race-prone (known bug) | Race-free — monitor already has all state |
The key insight is that persistent, acknowledged MonitorEvents replace
both the RAA-blocking mechanism (which existed to ensure preimages aren't
lost across monitor updates) and the on-load reconstruction logic (which
existed because MonitorEvents were fire-and-forget). By making events
durable and acknowledgment-driven, we get correctness by construction — the
monitor holds onto events until the ChannelManager has processed them, and
on restart we simply replay.
The channel closure race condition (channel.rs:3124–3127) disappears because
there are no monitor_pending_* fields to lose — the ChannelMonitor
generates events directly from its own state, which is always durable.