Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor commitment broadcast to always go through OnchainTxHandler #2703

Merged

Conversation

wpaulino
Copy link
Contributor

@wpaulino wpaulino commented Nov 3, 2023

Currently, our holder commitment broadcast only goes through the OnchainTxHandler for anchor outputs channels because we can actually bump the commitment transaction fees with it. For non-anchor outputs channels, we would just broadcast once directly via the ChannelForceClosed monitor update, without going through the OnchainTxHandler.

As we add support for async signing, we need to be tolerable to signing failures. A signing failure of our holder commitment will currently panic, but once the panic is removed, we must be able to retry signing once the signer is available. We can easily achieve this via the existing OnchainTxHandler::rebroadcast_pending_claims, but this requires that we first queue our holder commitment as a claim. This commit ensures we do so everywhere we need to broadcast a holder commitment transaction, regardless of the channel type.

This addresses the prerequisites to #2520 as noted in #2520 (comment).

@codecov-commenter
Copy link

codecov-commenter commented Nov 27, 2023

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (0c67753) 88.66% compared to head (60bb39a) 88.57%.
Report is 15 commits behind head on main.

Files Patch % Lines
lightning/src/ln/functional_tests.rs 87.50% 3 Missing and 2 partials ⚠️
lightning/src/ln/reorg_tests.rs 94.84% 3 Missing and 2 partials ⚠️
lightning/src/chain/channelmonitor.rs 97.82% 1 Missing and 1 partial ⚠️
lightning/src/chain/onchaintx.rs 88.88% 1 Missing and 1 partial ⚠️
lightning/src/ln/monitor_tests.rs 90.00% 1 Missing and 1 partial ⚠️
lightning/src/ln/reload_tests.rs 75.00% 0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2703      +/-   ##
==========================================
- Coverage   88.66%   88.57%   -0.10%     
==========================================
  Files         115      115              
  Lines       91168    91399     +231     
  Branches    91168    91399     +231     
==========================================
+ Hits        80838    80955     +117     
- Misses       7908     7977      +69     
- Partials     2422     2467      +45     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wpaulino wpaulino force-pushed the retryable-commitment-broadcast branch from 13b4f49 to d9422ca Compare November 29, 2023 18:39
// Now that we've detected a confirmed commitment transaction, attempt to cancel
// pending claims for any commitments that were previously confirmed such that
// we don't continue claiming inputs that no longer exist.
self.cancel_prev_commitment_claims(&logger, &txid);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly pretty skeptical of our test coverage of re-creating claims after a reorg, which makes me pretty skeptical of this change. If we want to delete pending claims, can we instead do it after ANTI_REORG_DELAY? I'm not quite sure I understand the motivation for this commit anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly pretty skeptical of our test coverage of re-creating claims after a reorg, which makes me pretty skeptical of this change.

We have several tests covering possible reorg scenarios, are you implying we have cases uncovered?

If we want to delete pending claims, can we instead do it after ANTI_REORG_DELAY?

The claims never confirm because their inputs are now reorged out so ANTI_REORG_DELAY doesn't help.

I'm not quite sure I understand the motivation for this commit anyway.

It's mostly a nice-to-have change -- it simplifies certain test assertions and prevents us from continuously trying to claim inputs that will never succeed as they no longer exist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have several tests covering possible reorg scenarios, are you implying we have cases uncovered?

Where? I glanced in reorg_tests and didn't see any that were checking that if we reorg out a commitment tx we broadcast our own (replacement) commitment tx immediately afterwards.

The claims never confirm because their inputs are now reorged out so ANTI_REORG_DELAY doesn't help.

Right, I mean if we see a conflicting commitment tx we remove the conflicts here, but we could also do this after 6 confs on the conflicting commitment tx.

It's mostly a nice-to-have change -- it simplifies certain test assertions and prevents us from continuously trying to claim inputs that will never succeed as they no longer exist.

Hmm, looks like currently only one test fails? I assume this is mostly in reference to a future patchset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where? I glanced in reorg_tests and didn't see any that were checking that if we reorg out a commitment tx we broadcast our own (replacement) commitment tx immediately afterwards.

That specific case we don't have coverage, but it all depends on whether we needed to broadcast before the reorg. I wrote a quick test locally and it checks out, so I can push that.

Right, I mean if we see a conflicting commitment tx we remove the conflicts here, but we could also do this after 6 confs on the conflicting commitment tx.

Why wait that long though? We know the previous claims are invalid as soon as the conflict confirms. Note that this is just about removing the claims that come after the commitment, not the commitment itself. We will continue to retry the commitment until one reaches ANTI_REORG_DELAY.

Hmm, looks like currently only one test fails? I assume this is mostly in reference to a future patchset.

It's not so much about the number of tests failing, but rather simplifying assertions throughout the failing test. There is a future patch to follow, but it doesn't really concern reorgs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wait that long though? We know the previous claims are invalid as soon as the conflict confirms. Note that this is just about removing the claims that come after the commitment, not the commitment itself. We will continue to retry the commitment until one reaches ANTI_REORG_DELAY.

Mostly because there's no new test in the first commit, and I know we have some level of missing test coverage here, and I'm not sure we can enumerate all the cases very easily so I'm just trying to be pretty cautious. Doubly so since we dont hit many reorg cases in prod so we won't discover these bugs unless its in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the risk here. As long as we can guarantee we'll broadcast our own commitment after reorg (new test shows this), there's no chance we'll miss claiming anything from it, as once it confirms, the monitor will pick up the outputs to claim per usual.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my concern is that we somehow forget to re-add claims for our own transactions, but you're right, your test should be pretty good for that. Can you make the test into a matrix, though, with anchors and use of B broadcasting a revoked transaction rather than a normal one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

.or_else(|| {
self.pending_claim_requests.iter()
.find(|(_, claim)| claim.outpoints().iter().any(|claim_outpoint| *claim_outpoint == outpoint))
.map(|(claim_id, _)| *claim_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be unreachable, right? It looks like no tests hit it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be yes, but I included here just to be safe in the event we are tracking a pending request in pending_claim_requests that we have yet to generate a claim for.

// Now that we've detected a confirmed commitment transaction, attempt to cancel
// pending claims for any commitments that were previously confirmed such that
// we don't continue claiming inputs that no longer exist.
self.cancel_prev_commitment_claims(&logger, &txid);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my concern is that we somehow forget to re-add claims for our own transactions, but you're right, your test should be pretty good for that. Can you make the test into a matrix, though, with anchors and use of B broadcasting a revoked transaction rather than a normal one?

@TheBlueMatt
Copy link
Collaborator

Needs a rebase.

wpaulino and others added 3 commits December 11, 2023 16:44
Once a commitment transaction is broadcast/confirms, we may need to
claim some of the HTLCs in it. These claims are sent as requests to the
`OnchainTxHandler`, which will bump their feerate as they remain
unconfirmed. When said commitment transaction becomes unconfirmed
though, and another commitment confirms instead, i.e., a reorg happens,
the `OnchainTxHandler` doesn't have any insight into whether these
claims are still valid or not, so it continues attempting to claim the
HTLCs from the previous commitment (now unconfirmed) forever, along with
the HTLCs from the newly confirmed commitment.
Currently, our holder commitment broadcast only goes through the
`OnchainTxHandler` for anchor outputs channels because we can actually
bump the commitment transaction fees with it. For non-anchor outputs
channels, we would just broadcast once directly via the
`ChannelForceClosed` monitor update, without going through the
`OnchainTxHandler`.

As we add support for async signing, we need to be tolerable to signing
failures. A signing failure of our holder commitment will currently
panic, but once the panic is removed, we must be able to retry signing
once the signer is available. We can easily achieve this via the
existing `OnchainTxHandler::rebroadcast_pending_claims`, but this
requires that we first queue our holder commitment as a claim. This
commit ensures we do so everywhere we need to broadcast a holder
commitment transaction, regardless of the channel type.

Co-authored-by: Rachel Malonson <rachel@lightspark.com>
@wpaulino wpaulino force-pushed the retryable-commitment-broadcast branch from 569fd4a to 60bb39a Compare December 12, 2023 00:45
@TheBlueMatt TheBlueMatt merged commit 0dbf17b into lightningdevkit:main Dec 13, 2023
13 of 15 checks passed
@wpaulino wpaulino deleted the retryable-commitment-broadcast branch December 13, 2023 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants