Defer LSPS4 HTLC forwarding until channel usable#6
Merged
Conversation
1728656 to
e737493
Compare
LDK fires peer_connected after the TCP+Init handshake but before channel_reestablish completes. During this window, channels exist but are not yet usable (is_usable=false). The previous code forwarded HTLCs unconditionally in peer_connected and htlc_intercepted, causing ~10% of payments to fail on reconnect. Distinguish two connected-peer cases in htlc_intercepted and peer_connected: - Channels exist but none usable (reestablish in progress): defer the HTLC for timer-based retry via process_pending_htlcs(). - No channels exist (first payment, JIT open needed): proceed immediately so calculate_htlc_actions_for_peer can emit the OpenChannel event. process_pending_htlcs (5s timer) only retries the reestablish case (channels exist, waiting to become usable). It must not handle the no-channel case to avoid emitting duplicate OpenChannel events while a JIT open is already in flight. Remove the re-check race in htlc_intercepted that could fire concurrently with peer_connected. The webhook + peer_connected path is the single owner of the offline-peer reconnect flow. Blocking inside peer_connected was also considered but rejected: there is no LDK event for "channel usable after reconnect" to wake on, so it would require a spin-wait with arbitrary timeout. A timer-based retry is cleaner and avoids holding the lock.
e737493 to
e81d9dd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LDK fires peer_connected after the TCP+Init handshake but before
channel_reestablish completes. During this window, channels exist
but are not yet usable (is_usable=false). The previous code
forwarded HTLCs unconditionally in peer_connected and
htlc_intercepted, causing ~10% of payments to fail on reconnect.
Distinguish two connected-peer cases in htlc_intercepted and
peer_connected:
the HTLC for timer-based retry via process_pending_htlcs().
immediately so calculate_htlc_actions_for_peer can emit the
OpenChannel event.
process_pending_htlcs (5s timer) only retries the reestablish
case (channels exist, waiting to become usable). It must not
handle the no-channel case to avoid emitting duplicate
OpenChannel events while a JIT open is already in flight.
Remove the re-check race in htlc_intercepted that could fire
concurrently with peer_connected. The webhook + peer_connected
path is the single owner of the offline-peer reconnect flow.
Blocking inside peer_connected was also considered but rejected:
there is no LDK event for "channel usable after reconnect" to
wake on, so it would require a spin-wait with arbitrary timeout.
A timer-based retry is cleaner and avoids holding the lock.
Summary:
htlc_intercepted(connected)OpenChannelhtlc_intercepted(offline)peer_connectedOpenChannelprocess_pending_htlcs(timer)channel_ready