-
Notifications
You must be signed in to change notification settings - Fork 2.2k
htlcswitch: fix empty commit sig and no commit sig #2927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
htlcswitch: fix empty commit sig and no commit sig #2927
Conversation
|
For reference: #1927 |
The spec says not to send a commitment_signed without any changes, but LND does this. To understand why, you have to understand how LND works. I haven't read the code, but I'm pretty sure it works like this: 1. lnd slows down to do garbage collection, because it's in Go. 2. When an alert timer goes off, noticing it's not making process, it sends a twitter message to @Roasbeef. 3. @Roasbeef sshs into the user's machine and binary patches lnd to send a commitment_signed message. 4. Unfortunately he works so fast that various laws of causality are broken, meaning sometimes the commitment_signed is sent before any of thes other things happen. I'm fairly sure that this will stop as @Roasbeef ages, or lnd introduces some kind of causality enforcement fix. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2847230 to
de5cd1d
Compare
de5cd1d to
f8a137c
Compare
8b50fb3 to
ab2078b
Compare
a073f13 to
c230496
Compare
lnwallet/channel.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge with prior commit? should we also use this in l.debugf(), l.errorf(), etc? atm i think some use chan point and others use short channel id, would be nice if we were consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe. Ok, I know that I sometimes comment on commit structure in PRs and suggest some of them to be merged. This is particular the case for structs that are introduced in a separate commit but it is hard to review the code without any further context. In this case the prefix logger is an object that doesn't need much context to review (although this is a sliding scale) and therefore I made it a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added commits to also make the link logging consistent. Open question is what we think is the best prefix to use for link and channel? uint64, shortchan, chanpoint?
lnwallet/channel.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks duplicated from earlier commit, perhaps make internal helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it isn't duplicated. Here we check if we owe a signature, while in ReceiveNewCommitment, we check whether the remote party owed anything.
c230496 to
6548569
Compare
d55cd10 to
831dfc2
Compare
halseth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome change, making the channel state machine more deterministic and easy to reason about! LGTM 💯
cfromknecht
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Indeed this simplifies state machine and reduces the chance of bugs due to local state, e.g. needsUpdate 🍾
|
PR is approved, but need to do interop testing with CL and Eclair |
831dfc2 to
5080a03
Compare
|
Testing interop between CL 0.7.2 and this branch 5080a030968e783eb3e133ba5a3aa6c868169c95. Setup: connect CL and LND nodes with a channel. Then pay invoices both ways concurrently. Used script: I ran four instances of the script in parallel. No anomalies observed and channel remained functional. |
|
Same test for Eclair 0.3.2-5ad3944, no problems observed. |
|
Finally interop with |
|
Moved PR back into the board's approved column. Ready for a final check |
5080a03 to
8a02365
Compare
|
Rebased |
Roasbeef
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid set of changes, no major comments on initial pass. I've let a comment on a older related PR, to see how we can reconcile the two of them. Only material comment is a shift to avoid grabbing the channel state machine's mutex twice when we receive a new commitment.
| // updates in the local commit tx that aren't reflected in the remote commit tx | ||
| // yet. | ||
| func (lc *LightningChannel) OweCommitment(local bool) bool { | ||
| lc.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should instead obtain the mutex once within ReceiveNewCommitment, then either make this a private method, or update the godoc to note that it expects the mutex to already be held?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also called without a lock from ChannelLink. I now created a private no-lock version and a public method that wraps it in a lock.
I am wondering whether our channel manipulations are always safe though. Something like this for example:
if lc.OweCommitment(true) {
commitSig, htlcSigs, _, err := lc.SignNextCommitment()
I think the channel isn't modified concurrently, but why it the lock there then? Is it a safety precaution? Or for reading channel state in another subsystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock was mostly a safety precaution when the struct was first created just to ensure things were goroutine safe, so all other code added afterwards followed that trend. In the past there were also instances of grabbing live state machines from the peer, and using that to populate RPC information.
htlcswitch/link.go
Outdated
| // pending updates we need to commit due to our | ||
| // commitment chains being desynchronized. | ||
| if l.channel.FullySynced() { | ||
| if !l.channel.OweCommitment(true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see how this is used here now. Updating my prior comment, we can make a private version of this method that assumes the lock is already held, then call that when accepting a new commitment.
| } | ||
|
|
||
| case <-l.cfg.BatchTicker.Ticks(): | ||
| // If the current batch is empty, then we have no work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this should also lower idle CPU usage for routing nodes with tons of channels.
7d856d1 to
d0d6b20
Compare
To facilitate the logging, this commit adds a new OweCommitment method. For the logging, we only need to consider the remote perspective. In a later commit, we'll also start using the local perspective to support the decision to send another signature.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
Previously the channel method FullySynced was used to decide whether to send a new commit sig message. However, it could happen that FullySynced was false, but that we didn't owe a commitment signature. Instead we were waiting on the other party to send us a signature. If that happened, we'd send out an empty commit sig. This commit modifies the condition that triggers a new commit sig and fixes this deviation from the spec.
Instead of tracking local updates in a separate link variable, query this state from the channel itself. This commit also fixes the issue where the commit tx was not updated anymore after a failed first attempt because the revocation window was closed. Also those pending updates will be taken into account when the remote party revokes.
Now that channel exposes the number of pending local updates, it is no longer necessary to track the batch size separately in the link.
Replace logCommitTick as a way to deal with revocation window exhaustion by retrying to update the commit tx when the remote revocation is received. The rationale is that the revocation window always opens up because of a revoke message that is received from the other party. It is therefore not necessary to set a timer for this. The reception of the revoke message is the trigger to send a new commit sig if necessary.
d0d6b20 to
0b5afa6
Compare
|
Test from #3012 added to this PR. It needed a bit of modification, but the idea of the scenario is the same. For review, this is the delta with the previous revision of this pr: |
Roasbeef
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎷
Builds on top of #3564.
This PR fixes two known bugs in the commitment tx update sequence:
In addition to that, several simplifications are carried through where values were tracked in the link that can also be obtained from the channel.