You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While we "shipped" async persistence quite a few releases ago, we've always maintained that it is an alpha feature and has some known (and probably unknown) bugs. This is a tracking issue for both, specifically things I think we should audit as well as things I'm confident are broken, or at least at some point or another took a note of because I was confident it was broken at that time.
Things that are probably broken
Handle async persist of claims against closed channels
we should update docs on PaymentPathSuccessful/ProbSuccessful/ProbeFailed/HTLCHandlingFailed to note that it is not a robust event and should only be used to update the scorer, or add an RAA-blocker to the event push in outbound_payments.rs
If a payment is claimed, the claim monitor update completes getting the preimage durably on disk, then the PaymentClaimed event is generated, the channel can go all the way through to a new state without the PaymentClaimed event being handled or the ChannelManager being persisted, allowing the preimage to be removed from disk and the PaymentClaimed event to be missed
need to block the final RAA on the PaymentClaimed event!
Check that we re-claim payments from other MPP channels if we restart and only managed to claim on one channel, specifically:
test various scenarios around force-closes on the first channel we claimed on potentially removing any blocks on the second channel or otherwise ensure we re-claim.
claim from previous notes: if we hold monitor updates due to an event or so, route an htlc through a channel, the peer force-closes the channel, then we'll forget that HTLC and never fail it back. (my personal 2023-10-pending-htlcs-lost-on-mon-delay branch may have a test for this)
Things we should audit carefully
PaymentClaiable purports to provide a guarantee that the payment is claimable, however if we start persisting the monitor, then the user sees the PaymentClaimable, then we restart without completing monitor or manager persistence, the counterparty could broadcast the previous state and we cant get the payment. (Very) marginally expand test_monitor_update_fail_claim #3126. Similarly, if we do persist the monitor but not the manager, we'll force-close and fail the HTLC back, which is the same end-result (this was never true, we'll obviously use the preimage in a monitor to claim on-chain 🤦).
we shouldn't provide the PaymentClaimable event until the last RAA persistence completes. also delays PaymentClaimable which is good for privacy (makes it look more like a forward)
When doing async persistence for initial channel open, what happens when...
the initial persist is pending, never completed, and then we restart?
the initial persist is pending, is it possible to get an update before it completes (hitting the InvalidValue return in channelmanager deser)?
What happens when we get an in-flight update from channelmanager on startup for a closed channel?
Make sure all ChannelMonitor access in ChannelManager::read is RO, and make the monitors map non-mutable
Check that PaymentClaimed is regenerated on restart properly if we have a preimage in at least one (potentially of MPP) channel
What happens if we send a payment, dont persist the channelmanager at all until the payment has gone all the way through the state machine and been claimed/failed? presumably on claim the event processing blocks the raa but maybe does for paymentfailed too?
Check that we re-claim payments from other MPP channels if we restart and only managed to claim on one channel, specifically:
This is, somewhat surprisingly, not currently broken, but its quite subtle and I kinda want to block related channels here anyway. Right now, when we do a claim, we hold the total_consistency_lock read lock for the duration of the MPP claim (across all channels); this is quite important and I don't see this changing. Further, on startup, if we see one monitor with a preimage we'll replay it on all other monitors as long as the pending payment is still there. Thus there's really only two cases here, we either claim the payment via some ChannelMonitor's preimage in which case the ChannelManager replays the whole thing on startup (and the channel that did get updated force-closes) or the ChannelManager has completed a persistence after the claim, at which point it has all the other ChannelMonitors queued up and will replay them through the normal async pipeline.
This relies on another important property that is critical to our payment receipt flow working, but is not documented and we really shouldn't rely on - the ChannelManager is always persisted between receiving a payment and claiming it. This is a side-effect of the way our lightning-background-processor works, specifically that we always interleave event processing and ChannelManager persistence. (@G8XSU we should persist the ChannelManager::claimable_payments info in ChannelMonitors with the HTLCs so that we can recreate the PaymentClaimed even for users even if the ChannelManager never gets persisted between first receiving the payment HTLC and claiming it).
If we assume we make that change, the MPP-with-async-persist flow would become unsafe - we could have one channel complete its full claim persist and commitment_signed dance then restart and wouldn't be able to reclaim the payment because the preimage is missing. Thus, I'd like to go ahead and land an RAAMonitorUpdateBlockingAction for MPP claims, but not store it to disk as we can always regenerate it when we replay the claim.
While we "shipped" async persistence quite a few releases ago, we've always maintained that it is an alpha feature and has some known (and probably unknown) bugs. This is a tracking issue for both, specifically things I think we should audit as well as things I'm confident are broken, or at least at some point or another took a note of because I was confident it was broken at that time.
Things that are probably broken
2023-11-mon-claim-bug branch
branch - Block monitor updates to ensure preimages are in each MPP part #3120)Things we should audit carefully
test_monitor_update_fail_claim
#3126.Similarly, if we do persist the monitor but not the manager, we'll force-close and fail the HTLC back, which is the same end-result(this was never true, we'll obviously use the preimage in a monitor to claim on-chain 🤦).we shouldn't provide the PaymentClaimable event until the last RAA persistence completes. also delays PaymentClaimable which is good for privacy (makes it look more like a forward)Finally, to actually "release" it, we should:
The text was updated successfully, but these errors were encountered: