Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle retrying sign_counterparty_commitment failures #2558

Merged
merged 7 commits into from
Nov 2, 2023

Conversation

waterson
Copy link
Contributor

@waterson waterson commented Sep 6, 2023

Follow on to #2554. Adopts that PR and adds tests.

If sign_counterparty_commitment fails (i.e. because the signer is temporarily disconnected), this really indicates that we should retry the message sending which required the signature later, rather than force-closing the channel (which probably won't even work if the signer is missing).

@codecov-commenter
Copy link

codecov-commenter commented Sep 7, 2023

Codecov Report

Attention: 45 lines in your changes are missing coverage. Please review.

Comparison is base (1f399b0) 88.71% compared to head (1edfdb2) 88.74%.
Report is 7 commits behind head on main.

❗ Current head 1edfdb2 differs from pull request most recent head 014a336. Consider uploading reports for the commit 014a336 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2558      +/-   ##
==========================================
+ Coverage   88.71%   88.74%   +0.02%     
==========================================
  Files         112      113       +1     
  Lines       88502    88876     +374     
  Branches    88502    88876     +374     
==========================================
+ Hits        78517    78873     +356     
- Misses       7752     7757       +5     
- Partials     2233     2246      +13     
Files Coverage Δ
lightning/src/ln/functional_tests.rs 97.23% <100.00%> (-0.06%) ⬇️
lightning/src/ln/mod.rs 82.60% <ø> (ø)
lightning/src/util/test_channel_signer.rs 84.65% <75.00%> (-0.53%) ⬇️
lightning/src/ln/functional_test_utils.rs 90.58% <76.00%> (-0.43%) ⬇️
lightning/src/ln/channel.rs 88.66% <94.67%> (+0.26%) ⬆️
lightning/src/ln/async_signer_tests.rs 95.04% <95.04%> (ø)
lightning/src/ln/channelmanager.rs 78.78% <80.00%> (+0.01%) ⬆️

... and 10 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@waterson waterson force-pushed the pr-2554 branch 6 times, most recently from e906663 to ef6738d Compare September 11, 2023 18:08
@TheBlueMatt
Copy link
Collaborator

LG mod one comment - rather than pushing a new commit that makes the code compile against the latest upstream, can you rebase on upstream and interleave the changes into each commit such that each individual commit in the PR builds and passes tests on its own?

@waterson
Copy link
Contributor Author

LG mod one comment - rather than pushing a new commit that makes the code compile against the latest upstream, can you rebase on upstream and interleave the changes into each commit such that each individual commit in the PR builds and passes tests on its own?

Would you be okay with me squashing any of these first?

@TheBlueMatt
Copy link
Collaborator

In general I'd prefer to end up with as many commits as possible (as long as they're all free-standing and pass tests by themselves) rather than one big commit that changes several things.

@waterson
Copy link
Contributor Author

Ok! Changes interleaved... ptal when you get a chance! Thank you!

@waterson
Copy link
Contributor Author

I've been experimenting with this change in our stack and I realize that all is not well.

I just pushed 72dbe94 to fix zero-conf inbound channel acceptance. The issue here is that we need to defer sending the channel_ready until after we've sent the funding_signed. Failure to do so violates the protocol (and comically causes LND to lose its mind).

There is a similar issue that I've observed with handling commitment_signed where we can confuse our partner by sending the RAA too early: I'm working on that but don't have it finished. And there is likely an issue with outbound zero-conf as well.

@wpaulino @TheBlueMatt as we get into some of the other signatures/secrets, do you think it's going to be necessary to recreate parallel machinery like that which exists for paused channel monitors? That's fairly complex in itself, not to mention how the two might have to interact with each other. 😬

@waterson waterson force-pushed the pr-2554 branch 3 times, most recently from 329175b to 6be9aec Compare September 26, 2023 00:42
@TheBlueMatt
Copy link
Collaborator

Personally I'd rather push the last commit onto another PR. This PR is nice and simple, and implements async signing for just commitment signing. We should do a separate PR that adds async signing for releasing the revocation secret, rather than trying to do this all at once.

@TheBlueMatt
Copy link
Collaborator

Also, let's get this rebased so we can land it now that 117 is out 🎉

TheBlueMatt
TheBlueMatt previously approved these changes Oct 11, 2023
lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
lightning/src/ln/channel.rs Show resolved Hide resolved
lightning/src/ln/channel.rs Show resolved Hide resolved
lightning/src/ln/async_signer_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/async_signer_tests.rs Show resolved Hide resolved
lightning/src/ln/async_signer_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/async_signer_tests.rs Show resolved Hide resolved
lightning/src/ln/async_signer_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/channel.rs Show resolved Hide resolved
@TheBlueMatt
Copy link
Collaborator

Oookayyy, with 118 out the door I think its time to land this! Sadly it needs a small rebase now.

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you squash all of the fixup commits?

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
If sign_counterparty_commitment fails (i.e. because the signer is
temporarily disconnected), this really indicates that we should
retry the message sending later, rather than force-closing the
channel (which probably won't even work if the signer is missing).

Here we add initial handling of sign_counterparty_commitment
failing during normal channel operation, setting a new flag in
`ChannelContext` which indicates we should retry sending the
commitment update later. We don't yet add any ability to do that
retry.
If sign_counterparty_commitment fails (i.e. because the signer is
temporarily disconnected), this really indicates that we should
retry the message sending which required the signature later,
rather than force-closing the channel (which probably won't even
work if the signer is missing).

Here we add initial handling of sign_counterparty_commitment
failing during outbound channel funding, setting a new flag in
`ChannelContext` which indicates we should retry sending the
`funding_created` later. We don't yet add any ability to do that
retry.
If sign_counterparty_commitment fails (i.e. because the signer is
temporarily disconnected), this really indicates that we should
retry the message sending which required the signature later,
rather than force-closing the channel (which probably won't even
work if the signer is missing).

Here we add initial handling of sign_counterparty_commitment
failing during inbound channel funding, setting a flag in
`ChannelContext` which indicates we should retry sending the
`funding_signed` later. We don't yet add any ability to do that
retry.
TheBlueMatt and others added 4 commits November 1, 2023 15:24
If sign_counterparty_commitment fails (i.e. because the signer is
temporarily disconnected), this really indicates that we should
retry the message sending which required the signature later,
rather than force-closing the channel (which probably won't even
work if the signer is missing).

This commit adds initial retrying of failures, specifically
regenerating commitment updates, attempting to re-sign the
`CommitmentSigned` message, and sending it to our peers if we
succed.
If sign_counterparty_commitment fails (i.e. because the signer is
temporarily disconnected), this really indicates that we should
retry the message sending which required the signature later,
rather than force-closing the channel (which probably won't even
work if the signer is missing).

This commit adds retrying of outbound funding_created signing
failures, regenerating the `FundingCreated` message, attempting to
re-sign, and sending it to our peers if we succeed.
If sign_counterparty_commitment fails (i.e. because the signer is
temporarily disconnected), this really indicates that we should
retry the message sending which required the signature later,
rather than force-closing the channel (which probably won't even
work if the signer is missing).

This commit adds retrying of inbound funding_created signing
failures, regenerating the `FundingSigned` message, attempting to
re-sign, and sending it to our peers if we succeed.
Adds a `get_signer` method to the context so that a test can get ahold of the
channel signer. Adds a `set_available` method on the `TestChannelSigner` to
allow a test to enable and disable the signer: when disabled some of the
signer's methods will return `Err` which will typically activate the error
handling case. Adds a `set_channel_signer_available` function on the test
`Node` class to make it easy to enable and disable a specific signer.

Adds a new `async_signer_tests` module:

* Check for asynchronous handling of `funding_created` and `funding_signed`.
* Check that we correctly resume processing after awaiting an asynchronous
  signature for a `commitment_signed` event.
* Verify correct handling during peer disconnect.
* Verify correct handling for inbound zero-conf.
@wpaulino wpaulino removed their request for review November 1, 2023 22:25
@TheBlueMatt TheBlueMatt merged commit 281a0ae into lightningdevkit:main Nov 2, 2023
15 checks passed
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Dec 13, 2023
We are intending to release without having completed our async
signing logic, which sadly means we need to cfg-gate it to ensure
we restore the previous state of panicking on signer errors, rather
than putting us in a stuck state with no way to recover.

Here we add a new `async_signing` cfg flag and use it to gate all
the new logic from lightningdevkit#2558 effectively reverting commits
1da2929 through
014a336.
yellowred pushed a commit to yellowred/rust-lightning that referenced this pull request Dec 19, 2023
We are intending to release without having completed our async
signing logic, which sadly means we need to cfg-gate it to ensure
we restore the previous state of panicking on signer errors, rather
than putting us in a stuck state with no way to recover.

Here we add a new `async_signing` cfg flag and use it to gate all
the new logic from lightningdevkit#2558 effectively reverting commits
1da2929 through
014a336.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants