Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail the channel when the revocation of the old channel state fails. #7876

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ziggie1984
Copy link
Collaborator

@ziggie1984 ziggie1984 commented Aug 8, 2023

Fixes the part of #7869 where we continued with an unclean state because our DB was failing with the SQLITE_BUSY error. This is mandatory apart from the root cause where the SQL db would not retry db transactions.
When we try to revoke the old channel commitment we might advance the memory state of our local commitment (depends where we fail during the revocation) failing later when committing to the db. Therefore we need to make sure we fail the channel so that we do not continue with an unclean state.
Decided to report back the internal error, but optimally we should send a warning maybe instead?

I think is still pretty difficult to test this behaviour unless we do have a mock for the LightningChannel ? There are some failure tests here: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link_test.go#L5321. But because there is no real mock for the LightningChannel (mocking RevokeCurrentCommitment its really hard to simulate a failure of the local revocation ?

Apart from that, I realized when testing the case that one of the log outputs was a bit misleading, fixed this as well:

2023-08-08 07:49:48.827 [DBG] LNWL: ChannelPoint(aec33ba32b6f582dd9efac1c9c8cf5b0ef039aa107aa58d8a4dd79ca67bcd83f:0): sync: remote's next commit height is 2, while we believe it is 2, we owe them a commitment

Steps to Test

Steps for reviewers to follow to test the change.

Pull Request Checklist

Testing

  • Your PR passes all CI checks.
  • Tests covering the positive and negative (error paths) are included.
  • Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

馃摑 Please see our Contribution Guidelines for further guidance.

When the revocation of a channel state fails after receiving a new
CommitmentSigned msg we have to fail the channel otherwise we
continue with an unclean state.
@ziggie1984 ziggie1984 force-pushed the fail-channel-when-revoc-fails branch from 166213d to 37e48be Compare August 8, 2023 11:27
@@ -1970,6 +1970,23 @@ func (l *channelLink) handleUpstreamMsg(msg lnwire.Message) {
l.channel.RevokeCurrentCommitment()
if err != nil {
l.log.Errorf("unable to revoke commitment: %v", err)

// We need to fail the channel in case revoking our
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also do this for any of the other related channel operations? So process revoke, send sig, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this heads up,

After going through the code it seems to me that we are failing the channel accordingly already for most of the situations?

So I think this was the only gap we had not failing the channel. But maybe you were referring to something different ?

But I will look again to see whether we could still have a case where we do not fail. I will update on this issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently thinking about returning a warning in situations where we wanna give the channel time to resolve its issue (not an internal error basically) wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently thinking about returning a warning in situations where we wanna give the channel time to resolve its issue (not an internal error basically) wdyt?

Thinking through the warning idea again, I think this might not come in handy if we have several channels per peer, and according to the spec we would also need to close the peer connection, therefore affecting also other channels of the peer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping it with the internal error for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah what's important here though is that we teat down the connection to ensure we don't retain any unclean state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n, I think this might not come in handy if we have several channels per peer, and according to the spec we would also need to close the peer connection, therefore affecting also other channels of the peer.

So I ended up going in this direction, as otherwise this could happen:

  • DB messes up
  • We just send the error/warning
  • Peer doesn't try to tear down the link for w/e reason
  • They send an add+sig
  • We never get this (no link up), eventual force close occurs

Just have it as a single commit on top, we can remove otherwise though.

@saubyk saubyk added P1 MUST be fixed or reviewed force closes labels Aug 17, 2023
@Roasbeef
Copy link
Member

@ziggie1984 if you're ok with it, I think @positiveblue will take this over and also copy over that sqlite retry stuff as well.

@ziggie1984
Copy link
Collaborator Author

For sure :), @positiveblue go ahead thanks for picking this up, interested how you will solve the sql-lite issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
force closes P1 MUST be fixed or reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants