-
Notifications
You must be signed in to change notification settings - Fork 421
Support async fetching of commitment point during channel reestablish #4197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support async fetching of commitment point during channel reestablish #4197
Conversation
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4197 +/- ##
==========================================
- Coverage 89.34% 89.34% -0.01%
==========================================
Files 180 180
Lines 138480 138620 +140
Branches 138480 138620 +140
==========================================
+ Hits 123730 123846 +116
- Misses 12129 12149 +20
- Partials 2621 2625 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
🔔 1st Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
lightning/src/ln/channel.rs
Outdated
| .ok(); | ||
| if expected_point.is_none() { | ||
| self.context.signer_pending_stale_state_verification = Some((commitment_number, given_secret)); | ||
| return Err(ChannelError::Ignore("Waiting on async signer to verify stale state proof".to_owned())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice I think this means we'll often never panic - the peer will reconnect, we'll ignore the message, then they'll send some other message which will cause us to, for example, ChannelError::close("Got commitment signed message when channel was not in an operational state"). We'll either have to have logic in ~every message handler to ignore the message if signer_pending_stale_state_verification is set or we can just disconnect them here and let them be in a reconnect loop until the signer resolves (which I think is fine?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, ended up disconnecting. Is there any reason for us to close in those cases though? We could just make those ChannelError::close a WarnAndDisconnect instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, those cases could definitely move to a warn-and-disconnect. Historically we've been pretty happy to just close if the peer does something dumb, and in 95% of the cases we've never seen peers do anything so dumb, so we've never really had a motivation to change it. Not crazy to do though.
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
|
fwiw |
7c25f35 to
6b8123d
Compare
|
|
6b8123d to
fa13381
Compare
|
🔔 1st Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
|
🔔 2nd Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
|
🔔 3rd Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
|
✅ Added second reviewer: @valentinewallace |
|
🔔 1st Reminder Hey @valentinewallace! This PR has been waiting for your review. |
fa13381 to
201478e
Compare
|
CI is quite sad |
201478e to
82e2371
Compare
|
Had to rebase to account for the changes to |
valentinewallace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing blocking!
| if expected_point != Some(PublicKey::from_secret_key(&self.context.secp_ctx, &given_secret)) { | ||
| return Err(ChannelError::close("Peer sent a channel_reestablish indicating we're stale with an invalid commitment secret".to_owned())); | ||
| } | ||
| Self::panic_on_stale_state(logger); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have test coverage for hitting this, may be pre-existing though
| if expected_point != PublicKey::from_secret_key(&self.context.secp_ctx, &given_secret) { | ||
| return Err(ChannelError::close("Peer sent a garbage channel_reestablish with secret key not matching the commitment height provided".to_owned())); | ||
| } | ||
| } else if msg.next_remote_commitment_number + 1 == our_commitment_transaction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to rename our_commitment_transaction to our_current_commit_tx_number or something like that, but it tis preexisting
| holder_commitment_next_transaction_number + 3, | ||
| &secp_ctx, | ||
| ) | ||
| .expect("Must be able to derive the previous revoked commitment point upon channel restoration")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering -- is there plans to get rid of this and go fully async with the method? I guess in a release or three?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for the upgrade case, we assume liveness prior to switching over to an async signer.
| return Err(ChannelError::close("Peer sent a channel_reestablish indicating we're stale with an invalid commitment secret".to_owned())); | ||
| } | ||
| Self::panic_on_stale_state(logger); | ||
| } else if msg.next_remote_commitment_number == our_commitment_transaction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a dumb question -- in the spec the next_remote_commitment_number is described as the next commitment number they expect to receive, but above we seem to be setting current_transaction_number to the current commitment number, which is a bit confusing. Just want to double check there's no off-by-one there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually the next_revocation_number in the spec. If you look at get_channel_reestablish, you'll find this comment where we set the field:
// We have to set next_remote_commitment_number to the next revoke_and_ack we expect to
// receive, however we track it by the next commitment number for a remote transaction
// (which is one further, as they always revoke previous commitment transaction, not
// the one we send) so we have to decrement by 1. Note that if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's confusing. Comment is a bit buried.
82e2371 to
905f990
Compare
valentinewallace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment but I'm otherwise good to ack after CI is fixed
`HolderCommitmentPoint` currently tracks the current and next point used on counterparty commitments, which are unrevoked. When we reestablish a channel, the counterparty sends us the commitment height, along with the corresponding secret, for the state they believe to be the latest. We compare said secret to the derived point we fetch from the signer to know if the peer is being honest. Since the protocol does not allow peers (assuming no data loss) to be behind the current state by more than one update, we can cache the two latest revoked commitment points alongside `HolderCommitmentPoint`, such that we no longer need to reach the signer asynchronously when handling `channel_reestablish` messages throughout the happy path. By doing so, we avoid complexity in needing to pause the state machine (which may also result in needing to stash any update messages from the counterparty) while the signer response is pending. The only remaining case left to handle is when the counterparty presents a `channel_reestablish` with a state later than what we know. This can only result in two terminal cases: either they provided a valid commitment secret proving we are behind and we need to panic, or they lied and we force close the channel. This is the only case we choose to handle asynchronously as it's relatively trivial to handle.
905f990 to
1f7b249
Compare
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Landing, given @valentinewallace indicated she was happy with it.
| /// Similar to [`Self::signer_pending_commitment_update`] but we're waiting to send a | ||
| /// [`msgs::ChannelReady`]. | ||
| signer_pending_channel_ready: bool, | ||
| // Upon receiving a [`msgs::ChannelReestablish`] message with a `next_remote_commitment_number` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: even for internal stuff its nice to make it a doc comment cause then cargo doc --include-private-items will generate docs for them and presumably some people's RLS will see it. Not sure if it actually impacts anyone on the team currently but I imagine in the future LLMs might care or maybe better IDEs might.
HolderCommitmentPointcurrently tracks the current and next point used on counterparty commitments, which are unrevoked. When we reestablish a channel, the counterparty sends us the commitment height, along with the corresponding secret, for the state they believe to be the latest. We compare said secret to the derived point we fetch from the signer to know if the peer is being honest.Since the protocol does not allow peers (assuming no data loss) to be behind the current state by more than one update, we can cache the two latest revoked commitment points alongside
HolderCommitmentPoint, such that we no longer need to reach the signer asynchronously when handlingchannel_reestablishmessages throughout the happy path. By doing so, we avoid complexity in needing to pause the state machine (which may also result in needing to stash any update messages from the counterparty) while the signer response is pending.The only remaining case left to handle is when the counterparty presents a
channel_reestablishwith a state later than what we know. This can only result in two terminal cases: either they provided a valid commitment secret proving we are behind and we need to panic, or they lied and we force close the channel. This is the only case we choose to handle asynchronously as it's relatively trivial to handle.