-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail the channel when the revocation of the old channel state fails. #7876
base: master
Are you sure you want to change the base?
Fail the channel when the revocation of the old channel state fails. #7876
Conversation
When the revocation of a channel state fails after receiving a new CommitmentSigned msg we have to fail the channel otherwise we continue with an unclean state.
166213d
to
37e48be
Compare
@@ -1970,6 +1970,23 @@ func (l *channelLink) handleUpstreamMsg(msg lnwire.Message) { | |||
l.channel.RevokeCurrentCommitment() | |||
if err != nil { | |||
l.log.Errorf("unable to revoke commitment: %v", err) | |||
|
|||
// We need to fail the channel in case revoking our |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also do this for any of the other related channel operations? So process revoke, send sig, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this heads up,
After going through the code it seems to me that we are failing the channel accordingly already for most of the situations?
- After processing(receiving) the revocation msg: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link.go#L2037
- After receiving feeupdate msg: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link.go#L2107
- Also when receiving a commitment: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link.go#L1940-L1961
- When signing a new commitment we fail the channel if an error occurs: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link.go#L2206
So I think this was the only gap we had not failing the channel. But maybe you were referring to something different ?
But I will look again to see whether we could still have a case where we do not fail. I will update on this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently thinking about returning a warning in situations where we wanna give the channel time to resolve its issue (not an internal error basically) wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently thinking about returning a warning in situations where we wanna give the channel time to resolve its issue (not an internal error basically) wdyt?
Thinking through the warning idea again, I think this might not come in handy if we have several channels per peer, and according to the spec we would also need to close the peer connection, therefore affecting also other channels of the peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping it with the internal error for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah what's important here though is that we teat down the connection to ensure we don't retain any unclean state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n, I think this might not come in handy if we have several channels per peer, and according to the spec we would also need to close the peer connection, therefore affecting also other channels of the peer.
So I ended up going in this direction, as otherwise this could happen:
- DB messes up
- We just send the error/warning
- Peer doesn't try to tear down the link for w/e reason
- They send an add+sig
- We never get this (no link up), eventual force close occurs
Just have it as a single commit on top, we can remove otherwise though.
@ziggie1984 if you're ok with it, I think @positiveblue will take this over and also copy over that sqlite retry stuff as well. |
For sure :), @positiveblue go ahead thanks for picking this up, interested how you will solve the sql-lite issue. |
Fixes the part of #7869 where we continued with an unclean state because our DB was failing with the
SQLITE_BUSY
error. This is mandatory apart from the root cause where the SQL db would not retry db transactions.When we try to revoke the old channel commitment we might advance the memory state of our local commitment (depends where we fail during the revocation) failing later when committing to the db. Therefore we need to make sure we fail the channel so that we do not continue with an unclean state.
Decided to report back the internal error, but optimally we should send a warning maybe instead?
I think is still pretty difficult to test this behaviour unless we do have a mock for the
LightningChannel
? There are some failure tests here: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link_test.go#L5321. But because there is no real mock for theLightningChannel
(mocking RevokeCurrentCommitment its really hard to simulate a failure of the local revocation ?Apart from that, I realized when testing the case that one of the log outputs was a bit misleading, fixed this as well:
Steps to Test
Steps for reviewers to follow to test the change.
Pull Request Checklist
Testing
Code Style and Documentation
[skip ci]
in the commit message for small changes.馃摑 Please see our Contribution Guidelines for further guidance.