Skip to content

[Claude Slop] HTLC Event-based ChannelManager reconstruction change #4482

@TheBlueMatt

Description

@TheBlueMatt

This is the claude output of the following prompt:

write a design doc for a new approach to resolving HTLCs and payments.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be forwarded) is the responsibility of Channel objects (a part of ChannelManager) until the channel is closed, and then the ChannelMonitor thereafter. This leads to some complexity around race conditions for HTLCs right around channel closure. Additionally, there is lots of complexity reconstructing the state of all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them - adding a new MonitorEvent resolution path through a new method (rather than via ChannelMonitorUpdates). This should remove the need for the RAA-blocking updates entirely - on restart instead of examining the set of HTLCs in monitors we can simply replay all the pending MonitorEvents.

Further, for claiming inbound (MPP) payments, the ChannelMonitor should push a pending MonitorEvent to replay that claim, but not provide it to the ChannelManager immediately. That way, if we crash and restart the MonitorEvents to claim the MPP parts will be replayed automatically and we can claim any MPP Parts that we didn't manage to claim before crashing. We'll still have to track once all channels part of the MPP payment have the preimage and delete the MonitorEvent, but should be able to remove some other on-load logic.

Finally, decisions on when an HTLC is able to be forwarded to the outbound channel should similarly be done using a MonitorEvent, avoiding more work in ChannelManager HTLC reconstruction.

Please write a thorough design doc detailing how much of the ChannelManager on-load complexity can go away, listing specific hunks of code and fields.

Design: Unified HTLC Resolution via ChannelMonitor

Status: Draft

Date: 2026-03-12 (rev 1)


Problem Statement

Currently, HTLC resolution (and decisions about when HTLCs can be forwarded) is
split between two systems:

  1. Channel objects (part of ChannelManager) handle HTLC resolution while
    the channel is open — tracking commitment state transitions, forwarding
    decisions, and preimage propagation.
  2. ChannelMonitors handle HTLC resolution after the channel is closed —
    watching for on-chain transactions, claiming outputs, and failing HTLCs back.

This split creates two major problems:

Race conditions around channel closure. An HTLC might be in the process of
being resolved in the Channel when the channel closes. The Channel buffers
HTLC actions in monitor_pending_* fields (channel.rs:3128–3131) while a
monitor update is in progress. If the channel is dropped during this window, we
don't know whether the ChannelMonitor has responsibility for those HTLCs. This
is explicitly called out as an open bug in channel.rs:3124–3127:

If a channel is drop'd, we don't know whether the ChannelMonitor is
ultimately responsible for some of the HTLCs here or not — we don't know
whether the update in question completed or not. We currently ignore these
fields entirely when force-closing a channel, but need to handle this somehow
or we run the risk of losing HTLCs!

Enormous complexity in ChannelManager deserialization. On restart, the
ChannelManager must reconstruct the state of all in-flight HTLCs by
cross-referencing Channel state, ChannelMonitor state, in-flight monitor
updates, blocked monitor updates, RAA-blocking actions, and various legacy maps.
This reconstruction logic spans ~1600 lines of from_channel_manager_data()
(channelmanager.rs:18635–20263) and is one of the most complex and error-prone
parts of LDK.

Proposed Solution

Move all HTLC resolution to ChannelMonitors, driven by
ChannelMonitorUpdates, with results communicated back to ChannelManager via
MonitorEvents. The ChannelManager becomes a pure routing/forwarding
engine that tells monitors what to do, and monitors tell the manager what
happened.

Core Principles

  1. ChannelMonitor is the sole authority on HTLC resolution state. Whether a
    channel is open or closed, the monitor decides when an HTLC is resolved and
    communicates this to the ChannelManager.

  2. MonitorEvents are persistent until acknowledged. The ChannelMonitor
    keeps MonitorEvents in its persistent state until the ChannelManager
    explicitly acknowledges them via a new method (not via ChannelMonitorUpdate).

  3. Restart == replay. On restart, the ChannelManager simply replays all
    pending (unacknowledged) MonitorEvents from all monitors. No reconstruction
    logic needed.

  4. Inbound MPP claims use deferred MonitorEvents. When claiming an MPP
    payment, the ChannelMonitor stores a MonitorEvent for the claim but does
    not provide it to the ChannelManager immediately. On restart, these events
    are replayed, allowing crash-safe MPP claiming without special on-load logic.


Current Architecture (What We Have Today)

HTLC Lifecycle in an Open Channel

When a channel is open, an inbound HTLC goes through these states
(channel.rs InboundHTLCState, lines 174–233):

RemoteAnnounced(InboundHTLCResolution)
    │ commitment_signed received
    ▼
AwaitingRemoteRevokeToAnnounce(InboundHTLCResolution)
    │ counterparty revoke_and_ack
    ▼
AwaitingAnnouncedRemoteRevoke(InboundHTLCResolution)
    │ our revoke_and_ack + their commitment_signed
    ▼
Committed { update_add_htlc: InboundUpdateAdd }
    │ HTLC now irrevocably committed; forwarding decision made
    │ fail_htlc/fulfill_htlc
    ▼
LocalRemoved(InboundHTLCRemovalReason)
    │ counterparty revoke_and_ack
    ▼
[removed from tracking]

When the HTLC reaches Committed, the InboundUpdateAdd payload
(channel.rs:337–376) indicates its readiness:

  • WithOnion { update_add_htlc } — onion not yet decoded, added to
    decode_update_add_htlcs for processing
  • Forwarded { ... } — already forwarded to the outbound edge, onion pruned
  • Legacy — pre-0.3 HTLC without onion persistence

The transition from AwaitingAnnouncedRemoteRevoke to Committed happens in
revoke_and_ack() (channel.rs:~8587), where WithOnion HTLCs are pushed to
monitor_pending_update_adds (channel.rs:8696) and eventually decoded via
process_pending_update_add_htlcs() (channelmanager.rs:7195–7535).

Forwarding Decisions

Decoded HTLCs flow through process_pending_htlc_forwards()
(channelmanager.rs:7558–7645):

  1. process_pending_update_add_htlcs() decodes onions from the
    decode_update_add_htlcs map (channelmanager.rs:2807)
  2. Decoded HTLCs are added to the forward_htlcs map
    (channelmanager.rs:2789–2791)
  3. forward_htlcs is drained; for each HTLC:
    • If short_chan_id != 0: process_forward_htlcs() sends it to an outbound
      channel via queue_add_htlc() (channelmanager.rs:7836–8105)
    • If short_chan_id == 0: process_receive_htlcs() handles it as a final
      payment

Monitor Update Blocking and the monitor_pending_* Fields

When a ChannelMonitorUpdate is being persisted, the Channel cannot proceed
with certain protocol messages. Pending work is buffered:

  • monitor_pending_forwards: Vec<(PendingHTLCInfo, u64)> (channel.rs:3128)
    — inbound HTLCs ready to forward
  • monitor_pending_failures: Vec<(HTLCSource, PaymentHash, HTLCFailReason)>
    (channel.rs:3129) — inbound HTLCs to fail backwards
  • monitor_pending_finalized_fulfills: Vec<(HTLCSource, Option<AttributionData>)>
    (channel.rs:3130) — fulfilled HTLCs awaiting acknowledgment (persisted, TLV 11)
  • monitor_pending_update_adds: Vec<msgs::UpdateAddHTLC> (channel.rs:3131)
    — inbound update_add messages awaiting onion decode

These are released via monitor_updating_restored() (channel.rs:9100–9234)
which returns a MonitorRestoreUpdates struct (channel.rs:1176–1197) containing:

pub struct MonitorRestoreUpdates {
    pub raa: Option<msgs::RevokeAndACK>,
    pub commitment_update: Option<msgs::CommitmentUpdate>,
    pub commitment_order: RAACommitmentOrder,
    pub accepted_htlcs: Vec<(PendingHTLCInfo, u64)>,       // from monitor_pending_forwards
    pub failed_htlcs: Vec<(HTLCSource, PaymentHash, HTLCFailReason)>,
    pub finalized_claimed_htlcs: Vec<(HTLCSource, Option<AttributionData>)>,
    pub pending_update_adds: Vec<msgs::UpdateAddHTLC>,     // from monitor_pending_update_adds
    pub funding_broadcastable: Option<Transaction>,
    pub channel_ready: Option<msgs::ChannelReady>,
    // ... other fields
}

Preimage Claiming (Forwarded Payments)

When a downstream channel receives a preimage:

  1. ChannelManager::claim_funds_internal() is called
  2. For the upstream (inbound) channel,
    Channel::get_update_fulfill_htlc_and_commit() (channel.rs:7106–7166)
    generates a ChannelMonitorUpdate with a PaymentPreimage step
    (channel.rs:7018–7025)
  3. A RAAMonitorUpdateBlockingAction::ForwardedPaymentInboundClaim
    (channelmanager.rs:1672–1677) is set on the downstream channel, blocking
    its next RAA monitor update
  4. A MonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel
    (channelmanager.rs:1474–1477) pairs Event::PaymentForwarded with
    unblocking the downstream channel via EventUnblockedChannel
    (channelmanager.rs:1414–1420)
  5. When the upstream monitor update completes,
    handle_monitor_update_completion_actions() (channelmanager.rs:10103–10255)
    emits the event and frees the RAA blocker

RAA-Blocking Infrastructure

The RAA-blocking system involves multiple types and fields:

Types:

  • RAAMonitorUpdateBlockingAction (channelmanager.rs:1668–1700): Enum with
    ForwardedPaymentInboundClaim and ClaimedMPPPayment variants
  • MonitorUpdateCompletionAction (channelmanager.rs:1454–1495): Enum with
    PaymentClaimed, EmitEventOptionAndFreeOtherChannel, and
    FreeDuplicateClaimImmediately variants
  • EventCompletionAction::ReleaseRAAChannelMonitorUpdate
    (channelmanager.rs:1557–1562): Deferred RAA release on event processing
  • EventUnblockedChannel (channelmanager.rs:1414–1420): Pointer to channel to
    unblock
  • PendingChannelMonitorUpdate (channel.rs:1472–1478): Blocked update wrapper

Fields (PeerState, channelmanager.rs:1709–1782):

  • in_flight_monitor_updates: BTreeMap<ChannelId, (OutPoint, Vec<ChannelMonitorUpdate>)>
    (line 1740)
  • monitor_update_blocked_actions: BTreeMap<ChannelId, Vec<MonitorUpdateCompletionAction>>
    (line 1760)
  • actions_blocking_raa_monitor_updates: BTreeMap<ChannelId, Vec<RAAMonitorUpdateBlockingAction>>
    (line 1765)
  • closed_channel_monitor_update_ids: BTreeMap<ChannelId, u64> (line 1775)

Fields (ChannelContext, channel.rs):

  • blocked_monitor_updates: Vec<PendingChannelMonitorUpdate> (line 3339)

Functions:

  • raa_monitor_updates_held() (channelmanager.rs:12671–12688): Checks the
    actions_blocking_raa_monitor_updates map AND the pending events queue for
    ReleaseRAAChannelMonitorUpdate actions
  • handle_monitor_update_release() (channelmanager.rs:~14962–15036): Removes
    blockers and unblocks the channel's blocked_monitor_updates queue
  • revoke_and_ack(..., hold_mon_update: bool) (channel.rs:~8359): The
    hold_mon_update parameter conditionally blocks the resulting monitor update

Inbound MPP Claiming

The MPP claim flow is particularly complex:

  1. User calls claim_funds(preimage) (channelmanager.rs:9206)
  2. begin_claiming_payment() moves payment from claimable_payments to
    pending_claiming_payments (channelmanager.rs:~1319–1380)
  3. For each MPP part, claim_mpp_part() (channelmanager.rs:9563+):
    a. Calls Channel::get_update_fulfill_htlc_and_commit() for open channels
    b. Creates ChannelMonitorUpdate with PaymentPreimage step + PaymentClaimDetails
    c. Sets up shared PendingMPPClaim (channelmanager.rs:1609–1612):
    pub(crate) struct PendingMPPClaim {
        channels_without_preimage: Vec<(PublicKey, ChannelId)>,
        channels_with_preimage: Vec<(PublicKey, ChannelId)>,
    }
    d. Creates RAAMonitorUpdateBlockingAction::ClaimedMPPPayment per channel
    e. Creates MonitorUpdateCompletionAction::PaymentClaimed per channel
  4. As each monitor update completes,
    handle_monitor_update_completion_actions() (channelmanager.rs:10147–10155)
    moves entries from channels_without_preimage to channels_with_preimage
  5. When channels_without_preimage is empty: free all RAA blockers, emit
    Event::PaymentClaimed

Supporting types:

  • PendingMPPClaimPointer(Arc<Mutex<PendingMPPClaim>>) (line 1650): Shared
    pointer for cross-channel coordination
  • MPPClaimHTLCSource (line 1618–1623): Identifies each MPP part channel
  • PaymentClaimDetails (line 1637–1642): Stored in ChannelMonitor for
    restart claim replay
  • HTLCClaimSource (line 1590–1595): Deserialization-time equivalent of
    MPPClaimHTLCSource

HTLC Resolution After Channel Closure

After closure, the ChannelMonitor takes over:

  • Watches for on-chain HTLC timeouts/claims (channelmonitor.rs:5257–5756)
  • Creates MonitorEvent::HTLCEvent with preimage (line 6134) or without
    (line 5607) as HTLCs resolve on-chain
  • Creates MonitorEvent::CommitmentTxConfirmed (line 5432) when commitment tx
    is detected
  • ChannelManager::process_monitor_events_for_failover()
    (channelmanager.rs:13247–13373) consumes these events to fail/claim upstream

The MonitorEvent enum (channelmonitor.rs:188–227) currently has:

  • HTLCEvent(HTLCUpdate) — HTLC resolved on-chain (claim or timeout)
  • HolderForceClosedWithInfo { reason, outpoint, channel_id } — we force-closed
  • HolderForceClosed(OutPoint) — legacy force-close
  • CommitmentTxConfirmed(()) — commitment tx confirmed on-chain
  • Completed { funding_txo, channel_id, monitor_update_id } — monitor update
    persisted

Events are currently fire-and-forget: get_and_clear_pending_monitor_events()
(channelmonitor.rs:4373–4377) does a mem::swap to drain them.


The Painful On-Load Reconstruction

On deserialization, from_channel_manager_data() (channelmanager.rs:18635–20263)
must perform a vast reconstruction. Here is every section with exact line ranges:

Step 1: Channel vs. Monitor State Validation (lines 18688–18876)

For each deserialized FundedChannel, compare its commitment transaction
numbers against the corresponding ChannelMonitor:

channel.get_cur_holder_commitment_transaction_number()
    > monitor.get_cur_holder_commitment_number()
|| channel.get_revoked_counterparty_commitment_transaction_number()
    > monitor.get_min_seen_secret()
|| channel.get_cur_counterparty_commitment_transaction_number()
    > monitor.get_cur_counterparty_commitment_number()
|| channel.context.get_latest_monitor_update_id()
    < monitor.get_latest_update_id()

If the channel is behind the monitor: force-close with
ClosureReason::OutdatedChannelManager and fail any orphaned HTLCs not in the
monitor. This queues BackgroundEvent::MonitorUpdateRegeneratedOnStartup with
a ChannelForceClosed step.

Step 2: Closed Channel Monitor Processing (lines 18878–18935)

For monitors without a corresponding Channel (already closed), track their
latest update IDs in closed_channel_monitor_update_ids and queue force-close
monitor updates for monitors with state needing update.

Step 3: In-Flight Monitor Update Replay (lines 18970–19205)

The handle_in_flight_updates! macro (lines 18982–19048) processes each
in_flight_monitor_updates entry:

  1. Compare each update's update_id against monitor.get_latest_update_id()
  2. If all completed: queue BackgroundEvent::MonitorUpdatesComplete with
    highest_update_id_completed
  3. If some pending: retain only incomplete updates, queue as
    BackgroundEvent::MonitorUpdateRegeneratedOnStartup for replay
  4. Validate that channel's unblocked update ID doesn't exceed monitor's ID

This macro is invoked twice: once for open channels (lines ~19050–19096) and
once for remaining closed-channel updates (lines ~19097–19139).

Step 4: Reconstruct/Deserialize Decision (lines 19207–19239)

The key branch: should we reconstruct HTLC state from monitors or use
persisted ChannelManager state?

// Non-test: always reconstruct for version >= RECONSTRUCT_HTLCS_FROM_CHANS_VERSION (2)
let reconstruct_manager_from_monitors = _version >= RECONSTRUCT_HTLCS_FROM_CHANS_VERSION;
// Test: random or controlled via env var

Step 5: HTLC Forwarding State Reconstruction (lines 19267–19362)

Two passes over all channel monitors:

First pass (lines 19267–19333): For each monitor with an open channel
(when reconstruct_manager_from_monitors):

  • Call inbound_htlcs_pending_decode() (channel.rs:7439–7448) to get WithOnion
    HTLCs → populate decode_update_add_htlcs
  • Call inbound_forwarded_htlcs() (channel.rs:7452–7507) to get already-
    forwarded HTLCs → populate already_forwarded_htlcs
  • For closed channels: call insert_from_monitor_on_startup() for outbound
    payments, process preimage claims via pending_outbounds.claim_htlc()

Second pass (lines 19334–19512): For each monitor:

  • For open channels with reconstruct_manager_from_monitors: call
    outbound_htlc_forwards() (channel.rs:7512–7533) and prune via
    dedup_decode_update_add_htlcs() and prune_forwarded_htlc()
  • For closed channels: call get_all_current_outbound_htlcs() and
    reconcile_pending_htlcs_with_monitor() for each; also handle
    get_onchain_failed_outbound_htlcs()failed_htlcs

Step 6: Preimage Claim Replay from Monitors (lines 19514–19591)

For each monitor (open or closed), find outbound HTLCs with preimages:

  • Filter via get_all_current_outbound_htlcs() for HTLCs where
    preimage_opt.is_some()
  • Check that the inbound edge's monitor still exists (not archived)
  • Check claimable_balances().is_empty() to skip fully-resolved monitors
  • Verify counterparty_node_id.is_some() (required since 0.0.124)
  • Push to pending_claims_to_replay for later execution

Step 7: RAA-Blocking Restoration (lines 19695–19770)

Reconstruct actions_blocking_raa_monitor_updates from the persisted
monitor_update_blocked_actions_per_peer:

For each MonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel:

  • Find the blocked channel's peer state
  • Push blocking_action (an RAAMonitorUpdateBlockingAction) into
    actions_blocking_raa_monitor_updates[blocked_channel_id]
  • Handle edge case: pre-0.1 MPP claims where a channel blocked itself

Step 8: HTLC Deduplication (lines 19772–19798)

When reconstruct_manager_from_monitors:

  • Dedup failed_htlcs against decode_update_add_htlcs
  • Dedup claimable_payments against decode_update_add_htlcs
  • Choose between reconstructed maps vs legacy maps (lines 19800–19809)

Step 9: ChannelManager Construction (lines 19864–19926)

The ChannelManager struct is built with the reconstructed state, including
forward_htlcs, decode_update_add_htlcs, claimable_payments, and
pending_background_events.

Step 10: MPP Claim Replay from Monitor Preimages (lines 19928–20088)

For each monitor, call get_stored_preimages() to retrieve
(PaymentHash, (PaymentPreimage, Vec<PaymentClaimDetails>)):

  1. Cross-reference with already_forwarded_htlcs — if an inbound HTLC was
    forwarded to a downstream channel and the downstream has the preimage,
    push it to pending_claims_to_replay (lines 19935–19968)
  2. For each PaymentClaimDetails:
    • Dedup via processed_claims: HashSet<Vec<MPPClaimHTLCSource>>
    • Skip if already in pending_claiming_payments
    • Create fresh PendingMPPClaim with all channels in
      channels_without_preimage
    • Call begin_claiming_payment() + claim_mpp_part() for each part
      (lines 20001–20088)

Step 11: Legacy Preimage-Without-ClaimDetails Path (lines 20090–20196)

For preimages in monitors that have no PaymentClaimDetails (pre-0.3):

  • Remove payment from claimable_payments
  • For each HTLC part:
    • Call claim_htlc_while_disconnected_dropping_mon_update_legacy()
      on the channel (line 20141–20146)
    • Call provide_payment_preimage_unsafe_legacy() directly on the monitor
      (line 20164) — explicitly unsafe, noted as only for upgrade path
  • Push Event::PaymentClaimed manually

Step 12: Failed HTLC and Claim Execution (lines 20200–20257)

  • Call fail_htlc_backwards_internal() for all failed_htlcs
  • Fail any remaining already_forwarded_htlcs that weren't pruned
    (lines 20213–20227) — these are HTLCs the inbound channel thought were
    forwarded but the outbound channel doesn't have, implying they were failed
  • Call claim_funds_internal() for all pending_claims_to_replay
    (lines 20229–20257)

Step 13: Helper Functions (lines 20266–20352)

  • prune_forwarded_htlc() (lines 20266–20281): Remove specific HTLC from
    already_forwarded_htlcs
  • reconcile_pending_htlcs_with_monitor() (lines 20285–20352): Master dedup
    function that removes HTLCs from decode_update_add_htlcs,
    forward_htlcs_legacy, and pending_intercepted_htlcs_legacy when the
    monitor has taken responsibility

Proposed Architecture

New MonitorEvent Variants

Extend MonitorEvent (channelmonitor.rs:188) to cover all HTLC resolution
outcomes, not just post-close on-chain events:

pub enum MonitorEvent {
    // Existing variants (retained)
    HTLCEvent(HTLCUpdate),
    HolderForceClosedWithInfo { .. },
    HolderForceClosed(OutPoint),
    CommitmentTxConfirmed(()),
    Completed { .. },

    // New variants

    /// An HTLC was irrevocably committed to both commitment transactions and
    /// can now be forwarded/received. Generated when the ChannelMonitor
    /// processes a LatestHolderCommitment update containing the HTLC and the
    /// counterparty's revocation for the prior state has been received.
    ///
    /// Replaces the current flow where Channel pushes to
    /// monitor_pending_update_adds → decode_update_add_htlcs → forward_htlcs.
    HTLCAccepted {
        channel_id: ChannelId,
        htlc: msgs::UpdateAddHTLC,
    },

    /// A forwarded HTLC was claimed with a preimage. The ChannelManager should
    /// propagate the preimage to the inbound edge.
    ///
    /// Replaces the current flow where claim_funds_internal() directly drives
    /// the inbound channel + sets up RAA blocking on the outbound channel.
    ForwardedHTLCClaimed {
        source: HTLCSource,
        preimage: PaymentPreimage,
        downstream_value_msat: u64,
    },

    /// An inbound MPP payment part has been durably claimed with a preimage.
    /// This event is generated but NOT immediately surfaced — it is stored in
    /// deferred_restart_events and only replayed on restart to enable
    /// crash-safe MPP claiming without ChannelManager-side tracking.
    ///
    /// Replaces PendingMPPClaim, PendingMPPClaimPointer, and the complex
    /// on-load reconstruction in lines 19928-20088.
    InboundMPPClaimPersisted {
        payment_hash: PaymentHash,
        preimage: PaymentPreimage,
        htlc_source: HTLCPreviousHopData,
        claim_details: PaymentClaimDetails,
    },
}

New MonitorEvent Acknowledgment Path

Add a method on ChannelMonitor (and the chain::Watch trait) to acknowledge
processed events:

/// A unique identifier for a MonitorEvent, used for acknowledgment.
/// Monotonically increasing per-monitor counter.
pub struct MonitorEventId(u64);

impl ChannelMonitor {
    /// Acknowledge that the given MonitorEvents have been processed by the
    /// ChannelManager. The monitor will remove them from its persistent state.
    ///
    /// This should be called after the ChannelManager has durably processed
    /// the events (i.e., after the ChannelManager has been re-persisted with
    /// the resulting state changes).
    pub fn acknowledge_monitor_events(&self, up_to_id: MonitorEventId);
}

Each MonitorEvent gets a unique MonitorEventId (monotonic counter per
monitor). Events remain in the monitor's persistent state until acknowledged.
On restart, unacknowledged events are replayed.

This is deliberately not a ChannelMonitorUpdate — acknowledgments flow in
the opposite direction and don't need the same ordering guarantees. However,
acknowledging events does trigger a monitor re-persist (since the monitor's
serialized state changed).

New ChannelMonitorUpdateStep Variants

enum ChannelMonitorUpdateStep {
    // Existing variants retained...

    /// An HTLC has been irrevocably committed. The monitor should generate
    /// an HTLCAccepted MonitorEvent. This step is sent when the Channel
    /// determines the HTLC is in both commitment txns and the prior
    /// counterparty state is revoked.
    ///
    /// Replaces the monitor_pending_update_adds → decode_update_add_htlcs flow.
    HTLCIrrevocablyCommitted {
        update_add_htlc: msgs::UpdateAddHTLC,
    },

    /// The ChannelManager has decided to fulfill an HTLC with a preimage.
    /// For forwarded HTLCs, the monitor should generate a ForwardedHTLCClaimed
    /// event. The source identifies the inbound edge for preimage propagation.
    ///
    /// This extends the existing PaymentPreimage step to carry source info.
    FulfillHTLC {
        htlc_id: u64,
        preimage: PaymentPreimage,
        source: HTLCSource,
    },
}

Note: We may not need a new FailHTLC step. HTLC failures on open channels
still flow through normal commitment transaction negotiation. The monitor only
needs to handle failures post-close (which it already does via HTLCEvent
with payment_preimage: None).

Deferred MonitorEvents for MPP Claims

When the user calls claim_funds(preimage) for an MPP payment:

  1. The ChannelManager sends ChannelMonitorUpdates with
    PaymentPreimage + PaymentClaimDetails steps to each channel's monitor
    (same as today).

  2. Each ChannelMonitor, upon processing the preimage update, stores an
    InboundMPPClaimPersisted event in a new deferred_restart_events list
    (NOT in pending_monitor_events). This event is persisted with the monitor.

  3. On restart, the ChannelManager calls a new get_restart_events() method
    (or the existing get_and_clear_pending_monitor_events() is enhanced).
    Monitors return InboundMPPClaimPersisted events. The ChannelManager
    uses these to identify which MPP parts have been claimed and which haven't,
    then claims any missing parts.

  4. Once all MPP parts across all channels have the preimage durably stored
    (confirmed by all monitors having the InboundMPPClaimPersisted event),
    the ChannelManager acknowledges all the InboundMPPClaimPersisted
    events, removing them from the monitors.

This replaces the current on-load logic that iterates all monitors via
get_stored_preimages() and cross-references with claimable_payments /
pending_claiming_payments state (lines 19928–20088).

Resolution Flow (Open Channel — HTLC Acceptance)

ChannelManager                    Channel              ChannelMonitor
     |                               |                       |
     |-- receive update_add_htlc --> |                       |
     |                               |-- CMU: LatestHolder ->|
     |                               |                       |
     |<-- commitment_signed ---------|                       |
     |                               |-- CMU: CommitSecret ->|
     |                               |                       |
     |<-- revoke_and_ack ------------|                       |
     |                               |                       |
     |   [Channel confirms HTLC irrevocably committed]       |
     |                               |-- CMU: HTLCIrrev.  -->|
     |                               |   Committed           |
     |                                                       |
     |<------------- MonitorEvent::HTLCAccepted ------------|
     |                                                       |
     |-- [decode onion, forward/receive decision]            |
     |                                                       |

Resolution Flow (Claiming a Forwarded HTLC)

ChannelManager                    ChannelMonitor (downstream)
     |                                  |
     |<-- MonitorEvent::HTLCEvent ------|  (preimage from counterparty claim)
     |    (or during normal operation:  |
     |     preimage arrives via         |
     |     update_fulfill_htlc)         |
     |                                  |
     |-- CMU: FulfillHTLC + source ---->|
     |                                  |
     |   [Monitor stores preimage, generates ForwardedHTLCClaimed]
     |                                  |
     |<-- MonitorEvent::ForwardedHTLC --|
     |    Claimed                       |
     |                                  |
     |-- [send preimage to inbound    |
     |    channel's monitor via CMU]   |
     |                                  |
     |-- [once inbound confirmed:      |
     |    acknowledge event]           |
     |                                  |

Resolution Flow (Restart)

ChannelManager (new)              ChannelMonitor (from disk)
     |                                  |
     |-- get_pending_monitor_events() ->|
     |   + get_restart_events()         |
     |                                  |
     |<-- [all unacknowledged events] --|
     |    (HTLCAccepted, ForwardedHTLCClaimed,
     |     InboundMPPClaimPersisted, etc.)
     |                                  |
     |-- [process each event as if     |
     |    receiving it for first time]  |
     |                                  |
     |-- acknowledge_monitor_events() ->|
     |                                  |

No reconstruction logic needed. The monitor state IS the source of truth.


What Can Be Removed

Fields That Can Be Eliminated

In PeerState (channelmanager.rs:1709–1782)

Field Line Why Removable
monitor_update_blocked_actions 1760 Completion actions move into monitor; ChannelManager no longer queues post-completion work
actions_blocking_raa_monitor_updates 1765 RAA blocking is eliminated entirely — safety comes from event acknowledgment
closed_channel_monitor_update_ids 1775 Monitors self-track their update IDs; ChannelManager no longer mirrors this for on-load dedup

In ChannelContext (channel.rs:3120–3340)

Field Lines Why Removable
monitor_pending_forwards 3128 Forwarding driven by MonitorEvent::HTLCAccepted; no buffering needed
monitor_pending_failures 3129 Failure propagation driven by MonitorEvent::HTLCEvent; no buffering needed
monitor_pending_finalized_fulfills 3130 Fulfill tracking moves to monitor's persistent events
monitor_pending_update_adds 3131 Replaced by MonitorEvent::HTLCAccepted
blocked_monitor_updates 3339 RAA blocking eliminated; all updates flow through immediately

This also eliminates the race condition described in channel.rs:3124–3127.

In ChannelManagerData (channelmanager.rs:18013–18041)

Field Line Why Removable
monitor_update_blocked_actions_per_peer 18025–18026 No more blocked actions to persist
in_flight_monitor_updates 18030 Monitor knows its own state; no need for CM to track
forward_htlcs_legacy 18036 Legacy map replaced by monitor events
pending_intercepted_htlcs_legacy 18037 Legacy map replaced by monitor events
decode_update_add_htlcs_legacy 18038 Legacy map replaced by monitor events

In ChannelManager (runtime state, channelmanager.rs:2780–2820)

Field Line Why Removable
decode_update_add_htlcs 2807 HTLCs-to-decode communicated via MonitorEvent::HTLCAccepted; onion decode happens inline

Note: forward_htlcs (line 2789) and pending_intercepted_htlcs (line 2800)
are still needed for the forwarding pipeline. They are populated from monitor
events rather than from channel state.

Enums/Types That Can Be Simplified or Removed

Type Location Why Removable
RAAMonitorUpdateBlockingAction channelmanager.rs:1668–1700 Entire enum: both variants serve RAA blocking which is eliminated
MonitorUpdateCompletionAction channelmanager.rs:1454–1495 EmitEventOptionAndFreeOtherChannel and FreeDuplicateClaimImmediately removed; PaymentClaimed simplified or moved to monitor
EventCompletionAction::ReleaseRAAChannelMonitorUpdate channelmanager.rs:1557–1562 No RAA blocking to release
EventUnblockedChannel channelmanager.rs:1414–1420 Only existed to carry RAA blocker info
PostMonitorUpdateChanResume channelmanager.rs:1521–1538 Drastically simplified — no more htlc_forwards, decode_update_add_htlcs, failed_htlcs fields needed
BackgroundEvent::MonitorUpdateRegeneratedOnStartup channelmanager.rs:1397–1402 No in-flight updates to regenerate on load
BackgroundEvent::MonitorUpdatesComplete channelmanager.rs:1406–1410 Simplified — completion tracking moves to monitor
PendingMPPClaim channelmanager.rs:1609–1612 Replaced by deferred InboundMPPClaimPersisted events
PendingMPPClaimPointer channelmanager.rs:1650 Goes with PendingMPPClaim
MPPClaimHTLCSource channelmanager.rs:1618–1623 Goes with MPP claim tracking
HTLCClaimSource channelmanager.rs:1590–1595 Only used during on-load reconstruction
MonitorRestoreUpdates channel.rs:1176–1197 Most fields removable — only raa, commitment_update, and protocol messages retained
InboundUpdateAdd::Forwarded channel.rs:349–363 Channel no longer needs to track forwarding state for on-load reconstruction

Functions/Methods That Can Be Removed or Drastically Simplified

On-Load Reconstruction (the big win)

Function/Section Lines Current Purpose After Change
handle_in_flight_updates! macro 18982–19048 Replay in-flight monitor updates Remove entirely — monitors know their own state
In-flight update handling for open channels 19050–19096 Match in-flight updates to open channels Remove entirely
In-flight update handling for closed channels 19097–19139 Match in-flight updates to closed channels Remove entirely
HTLC reconstruction from channels 19274–19301 Rebuild decode_update_add_htlcs and already_forwarded_htlcs from Channel state Remove entirely — replaced by MonitorEvent::HTLCAccepted replay
Outbound forward dedup 19334–19362 Call outbound_htlc_forwards() and dedup_decode_update_add_htlcs() Remove entirely
Outbound HTLC processing for closed channels 19365–19512 Cross-reference monitors with pending_outbound_payments, handle preimage claims and failures Drastically simplify — monitor events carry all needed info; outbound payment tracking may still be needed for PaymentSent events
Preimage claim replay 19514–19591 Find preimages in monitors, check upstream monitors, queue for replay Remove entirelyForwardedHTLCClaimed events are replayed automatically
RAA-blocking restoration 19695–19770 Reconstruct actions_blocking_raa_monitor_updates from persisted monitor_update_blocked_actions_per_peer Remove entirely
HTLC deduplication 19772–19798 Remove already-processed HTLCs from decode queues Remove entirely — no decode queues to dedup
Legacy map selection 19800–19809 Choose between reconstructed and legacy maps Remove entirely
dedup_decode_update_add_htlcs() ~18525–18555 Prevent double-forwarding by matching on prev_outbound_scid_alias and htlc_id Remove entirely
prune_forwarded_htlc() 20266–20281 Remove forwarded HTLCs from tracking Remove entirely
reconcile_pending_htlcs_with_monitor() 20285–20352 Master dedup function across decode_update_add_htlcs, forward_htlcs_legacy, pending_intercepted_htlcs_legacy Remove entirely
MPP claim replay from get_stored_preimages() 19928–20088 Reconstruct PendingMPPClaim, call begin_claiming_payment() + claim_mpp_part() Remove entirely — replaced by InboundMPPClaimPersisted replay
Legacy preimage path (no PaymentClaimDetails) 20090–20196 claim_htlc_while_disconnected_dropping_mon_update_legacy() + provide_payment_preimage_unsafe_legacy() Remove entirely — no longer needed post-migration
Failed HTLC backwards propagation 20200–20212 fail_htlc_backwards_internal() for failed_htlcs Simplified — some of this may still be needed for outbound payments, but forwarded-HTLC failures are handled by monitor events
Already-forwarded HTLC failure 20213–20227 Fail HTLCs that appear forwarded but are missing from outbound edge Remove entirely — monitor events make this reconciliation unnecessary
Claim replay execution 20229–20257 claim_funds_internal() for pending_claims_to_replay Remove entirely — replayed via MonitorEvents

Estimated lines removed from from_channel_manager_data(): ~1200–1400 lines.

RAA-Blocking Infrastructure

Function Lines Purpose After Change
raa_monitor_updates_held() 12671–12688 Check actions_blocking_raa_monitor_updates + pending events for ReleaseRAAChannelMonitorUpdate Remove entirely
test_raa_monitor_updates_held() 12691–12707 Test helper Remove entirely
get_and_clear_pending_raa_blockers() ~14935–14955 Extract blockers for startup Remove entirely
handle_monitor_update_release() ~14962–15036 Remove RAAMonitorUpdateBlockingAction, unblock channel's blocked_monitor_updates via unblock_next_blocked_monitor_update() Remove entirely
handle_monitor_update_completion_actions() 10103–10255 Process MonitorUpdateCompletionAction variants: track PendingMPPClaim progress, emit events, free RAA blockers Drastically simplify — only simple event emission remains
handle_post_event_actions() (ReleaseRAA path) ~15040–15113 When user handles PaymentForwarded, release the downstream channel's RAA via EventCompletionAction::ReleaseRAAChannelMonitorUpdate Remove ReleaseRAA path

Channel Methods (channel.rs)

Method Lines Purpose After Change
monitor_updating_paused() 9079–9094 Push pending forwards/failures/fulfills to monitor_pending_* fields Remove entirely — no pending queues
monitor_updating_restored() 9100–9234 Drain monitor_pending_* fields into MonitorRestoreUpdates Drastically simplify — only protocol message resends (raa, commitment_update) remain
unblock_next_blocked_monitor_update() ~10755–10763 Dequeue from blocked_monitor_updates Remove entirely
push_ret_blockable_mon_update() ~10768–10779 Conditionally block or return monitor update Remove entirely — updates always flow through
on_startup_drop_completed_blocked_mon_updates_through() ~10784–10799 Drop stale blocked updates on startup Remove entirely
get_latest_unblocked_monitor_update_id() ~4149–4154 Track boundary of unblocked updates Remove entirely — no blocking concept
inbound_htlcs_pending_decode() 7439–7448 Extract WithOnion HTLCs for on-load decode queue rebuild Remove entirely
inbound_forwarded_htlcs() 7452–7507 Extract forwarded HTLCs for on-load already_forwarded_htlcs rebuild Remove entirely
has_legacy_inbound_htlcs() 7428–7435 Detect pre-0.3 HTLC state (InboundUpdateAdd::Legacy) Remove entirely (version migration)
outbound_htlc_forwards() 7512–7533 Extract outbound forwards for on-load dedup Remove entirely
claim_htlc_while_disconnected_dropping_mon_update_legacy() (channel.rs) Legacy on-load claim that bypasses normal monitor update flow Remove entirely

Why RAA-Blocking Is Eliminated

The current RAA-blocking mechanism exists because:

  1. When a forwarded payment is claimed, the downstream channel's RAA monitor
    update (which, as a side effect of revoking the prior state, removes the
    preimage from one commitment transaction) must not complete before the
    upstream channel's monitor update (which adds the preimage) is durable.
    Otherwise, on restart, the preimage might be lost from the downstream
    monitor while the upstream monitor never received it.

  2. For MPP payments, all channel monitors must have the preimage before any of
    them can have it removed from a commitment transaction via revocation. The
    PendingMPPClaim shared pointer coordinates this across channels.

In the new architecture, this is handled naturally:

  • The ChannelMonitor stores preimages in payment_preimages
    (channelmonitor.rs:1272) durably. The preimage is never "lost" from the
    monitor's state due to a revocation — it lives in a separate map.

  • ForwardedHTLCClaimed events persist until the ChannelManager acknowledges
    them. The ChannelManager only acknowledges after confirming the preimage is
    durable on the inbound edge.

  • For MPP, InboundMPPClaimPersisted events persist in each monitor until all
    parts are confirmed claimed. On restart, any missing parts are re-claimed.

  • ChannelMonitorUpdates flow through immediately — no hold_mon_update
    parameter on revoke_and_ack(), no blocked_monitor_updates queue. The
    safety guarantee comes from the acknowledgment path, not from blocking the
    update path.


Detailed Design: Inbound MPP Claiming

Current Approach (Complex)

  1. User calls claim_funds(preimage) (channelmanager.rs:9206)
  2. begin_claiming_payment() moves payment from claimable_payments to
    pending_claiming_payments (channelmanager.rs:~1319–1380)
  3. For each MPP part, claim_mpp_part() (channelmanager.rs:9563+):
    a. Calls Channel::get_update_fulfill_htlc_and_commit() for open channels
    b. Creates ChannelMonitorUpdate with PaymentPreimage step + PaymentClaimDetails
    c. Sets up shared PendingMPPClaim (channels_without_preimage/channels_with_preimage)
    d. Creates RAAMonitorUpdateBlockingAction::ClaimedMPPPayment per channel
    e. Creates MonitorUpdateCompletionAction::PaymentClaimed per channel
  4. As each monitor update completes,
    handle_monitor_update_completion_actions() (lines 10147–10155) moves
    entries between PendingMPPClaim lists
  5. When all channels have preimage: free all RAA blockers, emit
    Event::PaymentClaimed
  6. On restart: iterate all monitors' get_stored_preimages(), reconstruct
    PendingMPPClaim, dedup via processed_claims, call
    begin_claiming_payment() + claim_mpp_part() for each

New Approach (Simple)

  1. User calls claim_funds(preimage)
  2. ChannelManager sends ChannelMonitorUpdate with PaymentPreimage +
    PaymentClaimDetails to each channel's monitor
  3. Each ChannelMonitor, upon processing the update:
    a. Stores the preimage in payment_preimages
    b. Stores an InboundMPPClaimPersisted event in deferred_restart_events
    c. For open channels: fulfills the HTLC in the commitment transaction
    normally
  4. The ChannelManager tracks confirmed parts via MonitorEvent::Completed
  5. Once all parts confirmed: emit Event::PaymentClaimed, acknowledge all
    InboundMPPClaimPersisted events
  6. On restart: monitors replay InboundMPPClaimPersisted events →
    ChannelManager identifies which parts were claimed → claims missing
    parts → done

What Goes Away

  • PendingMPPClaim / PendingMPPClaimPointer (channelmanager.rs:1609–1663)
  • RAAMonitorUpdateBlockingAction::ClaimedMPPPayment variant (line 1685)
  • The completion-tracking in handle_monitor_update_completion_actions()
    (channelmanager.rs:10116–10223)
  • On-load MPP reconstruction (channelmanager.rs:19928–20088, ~160 lines)
  • Legacy preimage path (channelmanager.rs:20090–20196, ~106 lines)
  • MPPClaimHTLCSource, HTLCClaimSource, processed_claims HashSet

Detailed Design: Forwarded HTLC Claiming

Current Approach

  1. Downstream channel receives preimage from counterparty (via
    update_fulfill_htlc or on-chain claim)
  2. ChannelManager::claim_funds_internal() is called
  3. For the upstream (inbound) channel:
    a. Channel::get_update_fulfill_htlc_and_commit() generates a
    ChannelMonitorUpdate on the upstream channel with a PaymentPreimage
    step
    b. A RAAMonitorUpdateBlockingAction::ForwardedPaymentInboundClaim
    (channelmanager.rs:1672–1677) blocks the downstream channel's next RAA
    c. A MonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel
    (channelmanager.rs:1474–1477) is stored on the upstream channel's
    monitor_update_blocked_actions, pairing Event::PaymentForwarded with
    an EventUnblockedChannel that will free the downstream
  4. When the upstream monitor update completes:
    handle_monitor_update_completion_actions() emits PaymentForwarded and
    calls handle_monitor_update_release() to remove the RAA blocker
  5. On restart: reconstruct blockers from
    monitor_update_blocked_actions_per_peer (lines 19695–19770)

New Approach

  1. Downstream channel receives preimage from counterparty
  2. ChannelManager sends a FulfillHTLC ChannelMonitorUpdate to the
    downstream ChannelMonitor, including the HTLCSource identifying the
    upstream edge
  3. Downstream ChannelMonitor generates ForwardedHTLCClaimed event
    (persistent until acknowledged)
  4. ChannelManager receives ForwardedHTLCClaimed, sends preimage to
    upstream channel via ChannelMonitorUpdate with PaymentPreimage step
  5. When upstream monitor confirms preimage storage (via
    MonitorEvent::Completed): ChannelManager acknowledges the
    ForwardedHTLCClaimed event on the downstream monitor
  6. ChannelManager emits Event::PaymentForwarded
  7. On restart: downstream monitor replays unacknowledged
    ForwardedHTLCClaimedChannelManager re-sends preimage to upstream →
    safe

What Goes Away

  • RAAMonitorUpdateBlockingAction::ForwardedPaymentInboundClaim variant
    (channelmanager.rs:1672–1677)
  • MonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel
    (channelmanager.rs:1474–1477)
  • EventCompletionAction::ReleaseRAAChannelMonitorUpdate
    (channelmanager.rs:1557–1562)
  • EventUnblockedChannel struct (channelmanager.rs:1414–1447)
  • The entire blocked_monitor_updates mechanism in Channel (channel.rs:3339)
  • All hold_mon_update logic in Channel::revoke_and_ack()
    (channel.rs:~8675–8694)
  • The RAA-blocking restoration on load (channelmanager.rs:19695–19770)

Detailed Design: HTLC Forwarding via MonitorEvent

Current Approach

When an HTLC becomes irrevocably committed in the Channel:

  1. revoke_and_ack() (channel.rs:~8587) transitions it to Committed with
    InboundUpdateAdd::WithOnion
  2. The update_add_htlc message is pushed to monitor_pending_update_adds
  3. When monitor update completes, monitor_updating_restored() drains
    monitor_pending_update_adds into MonitorRestoreUpdates::pending_update_adds
  4. ChannelManager puts them into decode_update_add_htlcs map (keyed by
    outbound SCID alias)
  5. process_pending_update_add_htlcs() (channelmanager.rs:7195–7535) decodes
    each onion and routes to forward_htlcs
  6. process_pending_htlc_forwards() (channelmanager.rs:7558–7645) forwards or
    receives

On restart, inbound_htlcs_pending_decode() extracts WithOnion HTLCs to
rebuild the decode_update_add_htlcs map, and complex deduplication prevents
double-forwarding.

New Approach

  1. When revoke_and_ack() confirms an HTLC as irrevocably committed, the
    Channel sends a ChannelMonitorUpdate with HTLCIrrevocablyCommitted
    step containing the update_add_htlc message
  2. The ChannelMonitor processes this step and generates an
    MonitorEvent::HTLCAccepted event (persistent until acknowledged)
  3. ChannelManager receives HTLCAccepted, decodes the onion, and routes
    to forward_htlcs or handles as a final payment
  4. After forwarding/receiving, ChannelManager acknowledges the event
  5. On restart: unacknowledged HTLCAccepted events are replayed → onion is
    decoded again → forwarding happens again → idempotent (the downstream
    channel will reject the duplicate update_add_htlc)

What Goes Away

  • decode_update_add_htlcs map (channelmanager.rs:2807)
  • monitor_pending_update_adds field (channel.rs:3131)
  • monitor_pending_forwards field (channel.rs:3128) — forwarding is driven
    by events, not buffered in the channel
  • InboundUpdateAdd::WithOnion variant — the monitor holds the raw
    update_add_htlc until acknowledged, so the channel doesn't need to
  • InboundUpdateAdd::Forwarded variant — no longer needed for on-load
    reconstruction
  • inbound_htlcs_pending_decode() (channel.rs:7439–7448)
  • inbound_forwarded_htlcs() (channel.rs:7452–7507)
  • outbound_htlc_forwards() (channel.rs:7512–7533)
  • All on-load dedup logic (dedup_decode_update_add_htlcs,
    reconcile_pending_htlcs_with_monitor, prune_forwarded_htlc)
  • The already_forwarded_htlcs temporary map in from_channel_manager_data()
    (lines 19251–19254)

New ChannelMonitor State

The ChannelMonitorImpl (channelmonitor.rs:1199–1400) needs new fields:

pub(crate) struct ChannelMonitorImpl<Signer: EcdsaChannelSigner> {
    // ... existing fields ...

    /// MonitorEvents that have been generated but not yet acknowledged by the
    /// ChannelManager. These survive serialization and are replayed on restart.
    /// Replaces the fire-and-forget `pending_monitor_events` for new event types.
    pending_unacknowledged_events: Vec<(MonitorEventId, MonitorEvent)>,

    /// The next MonitorEventId to assign.
    next_event_id: u64,

    /// Deferred events (e.g., InboundMPPClaimPersisted) that should not be
    /// surfaced to the ChannelManager during normal operation but should be
    /// replayed on restart. These are stored separately so that
    /// get_and_clear_pending_monitor_events() doesn't return them.
    deferred_restart_events: Vec<(MonitorEventId, MonitorEvent)>,
}

The existing pending_monitor_events: Vec<MonitorEvent> field
(channelmonitor.rs:1283) is kept for backwards compatibility with existing
MonitorEvent variants (on-chain events) during the migration period, then
eventually deprecated.

Serialization Changes

pending_unacknowledged_events and deferred_restart_events must be
serialized as new TLV fields in ChannelMonitorImpl. The existing
MonitorEvent serialization (channelmonitor.rs:228–246) supports existing
variants; new variants need new TLV tags in the MonitorEvent enum.

get_and_clear_pending_monitor_events() Changes

The current implementation (channelmonitor.rs:4373–4377) does mem::swap to
drain events. In the new design:

fn get_pending_monitor_events(&self) -> Vec<(MonitorEventId, MonitorEvent)> {
    // Return copies of unacknowledged events without clearing.
    self.pending_unacknowledged_events.clone()
}

fn get_restart_events(&self) -> Vec<(MonitorEventId, MonitorEvent)> {
    // Called only on restart. Returns deferred events.
    self.deferred_restart_events.clone()
}

fn acknowledge_events(&mut self, up_to_id: MonitorEventId) {
    self.pending_unacknowledged_events.retain(|(id, _)| id.0 > up_to_id.0);
    self.deferred_restart_events.retain(|(id, _)| id.0 > up_to_id.0);
}

The ChannelManager tracks the highest acknowledged MonitorEventId per
monitor (either in its own state or by querying the monitor) to distinguish
"new" from "already-processed" events during normal operation. A simple
approach: after processing new events, immediately acknowledge them (the
monitor will re-persist, and if the ChannelManager crashes before
re-persisting, the events will replay on restart — which is the desired
behavior).


Interaction with Existing Constraints

"MonitorEvents MUST NOT be generated during update processing"

The existing constraint (channelmonitor.rs:~1274–1282) says:

MonitorEvents MUST NOT be generated during update processing, only generated
during chain data processing.

This constraint exists because of a race in ChainMonitor::update_channel
where the in-memory state is updated under a read-lock, but persistence hasn't
completed yet. If events were generated during update processing and consumed
before persistence, a restart would replay the update but the event would be
lost.

In the new design, this constraint is relaxed because:

  • Events are persistent-until-acknowledged
  • Even if an event is generated during update processing and the update isn't
    persisted, on restart the update will be replayed and the event regenerated
  • The acknowledgment path ensures the ChannelManager won't "lose" events

However, we must ensure idempotent event generation — replaying a
ChannelMonitorUpdate must not duplicate events. The monitor should check
whether an event for a given HTLC already exists before generating a new one.
This is straightforward since events carry enough identifying information
(channel_id + htlc_id) for dedup.

Chain Watch Trait

The chain::Watch trait's release_pending_monitor_events() method
(chain/mod.rs:345–347) needs to change:

/// Returns pending MonitorEvents with their IDs for acknowledgment tracking.
fn release_pending_monitor_events(&self)
    -> Vec<(OutPoint, ChannelId, Vec<(MonitorEventId, MonitorEvent)>, PublicKey)>;

/// Acknowledge processed events up to the given ID per monitor.
fn acknowledge_monitor_events(&self,
    channel_id: &ChannelId, up_to_id: MonitorEventId);

Migration Strategy

Backwards Compatibility

  • Old ChannelMonitor state can be read by new code. New fields are additive
    TLVs with defaults (empty vecs, zero counter).
  • Old ChannelManager state can still be loaded. On first load with new code,
    the on-load reconstruction runs one final time (the existing logic is
    retained behind the version check). After the ChannelManager is
    re-persisted, all state is in the new format.
  • Bump SERIALIZATION_VERSION (channelmanager.rs:17246) and
    RECONSTRUCT_HTLCS_FROM_CHANS_VERSION (channelmanager.rs:17258) to gate
    new behavior.

Phased Approach

Phase 1: Persistent MonitorEvents with acknowledgment

  • Add MonitorEventId, pending_unacknowledged_events,
    deferred_restart_events to ChannelMonitorImpl
  • Add acknowledge_monitor_events() to ChannelMonitor and chain::Watch
  • Change get_and_clear_pending_monitor_events() to not clear for new events
  • Existing MonitorEvent variants (HTLCEvent, CommitmentTxConfirmed, etc.)
    continue to use the old fire-and-forget path during this phase

Phase 2: Move HTLC forwarding to monitor-driven events

  • Add HTLCIrrevocablyCommitted step and HTLCAccepted event
  • Channel generates the new step in revoke_and_ack() instead of pushing
    to monitor_pending_update_adds
  • ChannelManager processes HTLCAccepted in the event loop
  • Remove decode_update_add_htlcs map, monitor_pending_update_adds,
    monitor_pending_forwards
  • Remove inbound_htlcs_pending_decode(), inbound_forwarded_htlcs(),
    outbound_htlc_forwards(), all dedup helpers

Phase 3: Move forwarded HTLC claiming to monitor events

  • Add FulfillHTLC step and ForwardedHTLCClaimed event
  • Remove RAA-blocking infrastructure entirely:
    RAAMonitorUpdateBlockingAction, EventCompletionAction::ReleaseRAA,
    blocked_monitor_updates, hold_mon_update parameter, etc.
  • Remove MonitorUpdateCompletionAction::EmitEventOptionAndFreeOtherChannel

Phase 4: Move inbound MPP claiming to monitor-driven events

  • Add InboundMPPClaimPersisted event
  • Modify claim_funds_internal() to rely on deferred monitor events
  • Remove PendingMPPClaim, PendingMPPClaimPointer,
    ClaimedMPPPayment RAA blocker
  • Remove MPPClaimHTLCSource, HTLCClaimSource

Phase 5: Simplify on-load logic

  • Replace all reconstruction logic with simple MonitorEvent replay loop
  • Remove ~1200–1400 lines from from_channel_manager_data()
  • Remove legacy map handling, legacy preimage paths, _legacy suffixed fields
  • The on-load code reduces to:
    for (channel_id, monitor) in channel_monitors {
        for (event_id, event) in monitor.get_pending_monitor_events() {
            process_monitor_event(event);
        }
        for (event_id, event) in monitor.get_restart_events() {
            process_monitor_event(event);
        }
    }

Risks and Open Questions

  1. Monitor persistence size: Storing events until acknowledged increases the
    persistent monitor size. Events are relatively small (one per HTLC), but
    high-volume nodes could accumulate events if the ChannelManager is slow
    to persist. Mitigation: batch acknowledgments; keep events compact; bound
    the maximum number of unacknowledged events.

  2. Idempotent event generation: When a ChannelMonitorUpdate is replayed
    on restart, the monitor must not duplicate events. The implementation must
    check for existing events with the same HTLC identifier before generating
    new ones.

  3. Backwards compatibility on upgrade: First load with new code must bootstrap
    the new event state from the existing reconstruction. The existing
    reconstruction logic runs one final time, then the ChannelManager is
    re-persisted in the new format. This means the existing reconstruction
    code must be maintained (but can be feature-gated) until we're confident
    all users have upgraded.

  4. InboundHTLCState / InboundUpdateAdd simplification: The Channel
    still needs to track HTLCs for commitment transaction negotiation, but
    InboundUpdateAdd::Forwarded (which exists solely for on-load
    reconstruction) can be removed. The Channel can transition directly from
    WithOnion to "onion consumed" without tracking forwarding state.

  5. Timing of HTLCAccepted events: The monitor needs to know when an
    HTLC is irrevocably committed. Today, this is derived from the Channel's
    InboundHTLCState machine. In the new design, the Channel sends an
    explicit HTLCIrrevocablyCommitted step at the right moment. The monitor
    doesn't need to replicate the state machine — it just needs to generate the
    event when it receives the step. This is simpler than having the monitor
    independently track HTLC lifecycle states.

  6. Performance: The current "clear on read" approach for MonitorEvents is
    zero-cost at read time. Persistent events require cloning on read and
    additional serialization. However:

    • Events are small (~100 bytes each)
    • Acknowledgment can batch (one call covers many events)
    • The massive reduction in on-load complexity saves far more developer time
      than the small runtime cost
  7. Trampoline forwards: HTLCSource::TrampolineForward contains multiple
    previous_hop_data entries. The new event system must handle this — a
    single ForwardedHTLCClaimed event for a trampoline forward should
    carry all upstream sources. This is straightforward since the
    HTLCSource enum already handles this.


Summary

Metric Current Proposed
On-load reconstruction lines ~1600 (18635–20263) ~50 (event replay loop)
RAA-blocking types/functions ~15 (types, fields, methods) 0
ChannelManager-persisted HTLC state 6+ maps/fields 0 (all in monitors)
Restart correctness argument Cross-reference Channel, Monitor, in-flight updates, blocked updates, RAA blockers, legacy maps Replay unacknowledged events
blocked_monitor_updates mechanism Complex FIFO queue with ordering constraints Not needed
monitor_pending_* fields 4 Vec fields with race condition bug (channel.rs:3124–3127) 0 — eliminated along with the bug
MPP claim coordination PendingMPPClaim + PendingMPPClaimPointer + Arc<Mutex> + RAA blockers + completion actions Deferred monitor events
Forward HTLC claiming safety RAA blocking + event completion actions + EventUnblockedChannel Persistent events + acknowledgment
Channel closure HTLC handoff Race-prone (known bug) Race-free — monitor already has all state

The key insight is that persistent, acknowledged MonitorEvents replace
both the RAA-blocking mechanism (which existed to ensure preimages aren't
lost across monitor updates) and the on-load reconstruction logic (which
existed because MonitorEvents were fire-and-forget). By making events
durable and acknowledgment-driven, we get correctness by construction — the
monitor holds onto events until the ChannelManager has processed them, and
on restart we simply replay.

The channel closure race condition (channel.rs:3124–3127) disappears because
there are no monitor_pending_* fields to lose — the ChannelMonitor
generates events directly from its own state, which is always durable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions