Fail the channel when the revocation of the old channel state fails. #7876

ziggie1984 · 2023-08-08T11:16:53Z

Fixes the part of #7869 where we continued with an unclean state because our DB was failing with the SQLITE_BUSY error. This is mandatory apart from the root cause where the SQL db would not retry db transactions.
When we try to revoke the old channel commitment we might advance the memory state of our local commitment (depends where we fail during the revocation) failing later when committing to the db. Therefore we need to make sure we fail the channel so that we do not continue with an unclean state.
Decided to report back the internal error, but optimally we should send a warning maybe instead?

I think is still pretty difficult to test this behaviour unless we do have a mock for the LightningChannel ? There are some failure tests here: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link_test.go#L5321. But because there is no real mock for the LightningChannel (mocking RevokeCurrentCommitment its really hard to simulate a failure of the local revocation ?

Apart from that, I realized when testing the case that one of the log outputs was a bit misleading, fixed this as well:

2023-08-08 07:49:48.827 [DBG] LNWL: ChannelPoint(aec33ba32b6f582dd9efac1c9c8cf5b0ef039aa107aa58d8a4dd79ca67bcd83f:0): sync: remote's next commit height is 2, while we believe it is 2, we owe them a commitment

Steps to Test

Steps for reviewers to follow to test the change.

Pull Request Checklist

Testing

Your PR passes all CI checks.
Tests covering the positive and negative (error paths) are included.
Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

The change obeys the Code Documentation and Commenting guidelines, and lines wrap at 80.
Commits follow the Ideal Git Commit Structure.
Any new logging statements use an appropriate subsystem and logging level.
There is a change description in the release notes, or [skip ci] in the commit message for small changes.

📝 Please see our Contribution Guidelines for further guidance.

When the revocation of a channel state fails after receiving a new CommitmentSigned msg we have to fail the channel otherwise we continue with an unclean state.

Roasbeef · 2023-08-11T19:47:13Z

htlcswitch/link.go

@@ -1970,6 +1970,23 @@ func (l *channelLink) handleUpstreamMsg(msg lnwire.Message) {
 			l.channel.RevokeCurrentCommitment()
 		if err != nil {
 			l.log.Errorf("unable to revoke commitment: %v", err)
+
+			// We need to fail the channel in case revoking our


Let's also do this for any of the other related channel operations? So process revoke, send sig, etc.

Thanks for this heads up,

After going through the code it seems to me that we are failing the channel accordingly already for most of the situations?

After processing(receiving) the revocation msg: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link.go#L2037

After receiving feeupdate msg: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link.go#L2107

Also when receiving a commitment: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link.go#L1940-L1961

When signing a new commitment we fail the channel if an error occurs: https://github.com/lightningnetwork/lnd/blob/master/htlcswitch/link.go#L2206

So I think this was the only gap we had not failing the channel. But maybe you were referring to something different ?

But I will look again to see whether we could still have a case where we do not fail. I will update on this issue.

I am currently thinking about returning a warning in situations where we wanna give the channel time to resolve its issue (not an internal error basically) wdyt?

I am currently thinking about returning a warning in situations where we wanna give the channel time to resolve its issue (not an internal error basically) wdyt?

Thinking through the warning idea again, I think this might not come in handy if we have several channels per peer, and according to the spec we would also need to close the peer connection, therefore affecting also other channels of the peer.

keeping it with the internal error for now

Yeah what's important here though is that we teat down the connection to ensure we don't retain any unclean state.

n, I think this might not come in handy if we have several channels per peer, and according to the spec we would also need to close the peer connection, therefore affecting also other channels of the peer.

So I ended up going in this direction, as otherwise this could happen:

DB messes up

We just send the error/warning

Peer doesn't try to tear down the link for w/e reason

They send an add+sig

We never get this (no link up), eventual force close occurs

Just have it as a single commit on top, we can remove otherwise though.

Roasbeef · 2023-08-22T17:51:44Z

@ziggie1984 if you're ok with it, I think @positiveblue will take this over and also copy over that sqlite retry stuff as well.

ziggie1984 · 2023-08-22T17:57:17Z

For sure :), @positiveblue go ahead thanks for picking this up, interested how you will solve the sql-lite issue.

ziggie1984 added 3 commits August 8, 2023 13:02

lnwallet: fix log output msg.

7c1ae47

htlcswitch: fail channel when revoking it fails.

9d8333f

When the revocation of a channel state fails after receiving a new CommitmentSigned msg we have to fail the channel otherwise we continue with an unclean state.

docs: update release-docs.

37e48be

ziggie1984 force-pushed the fail-channel-when-revoc-fails branch from 166213d to 37e48be Compare August 8, 2023 11:27

Roasbeef requested changes Aug 11, 2023

View reviewed changes

saubyk added P1 MUST be fixed or reviewed force closes labels Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail the channel when the revocation of the old channel state fails. #7876

Fail the channel when the revocation of the old channel state fails. #7876

ziggie1984 commented Aug 8, 2023 •

edited

Roasbeef Aug 11, 2023

ziggie1984 Aug 14, 2023 •

edited

ziggie1984 Aug 14, 2023

ziggie1984 Aug 16, 2023

ziggie1984 Aug 16, 2023

Roasbeef Aug 25, 2023

Roasbeef Aug 25, 2023

Roasbeef commented Aug 22, 2023

ziggie1984 commented Aug 22, 2023

Fail the channel when the revocation of the old channel state fails. #7876

Are you sure you want to change the base?

Fail the channel when the revocation of the old channel state fails. #7876

Conversation

ziggie1984 commented Aug 8, 2023 • edited

Steps to Test

Pull Request Checklist

Testing

Code Style and Documentation

Roasbeef Aug 11, 2023

Choose a reason for hiding this comment

ziggie1984 Aug 14, 2023 • edited

Choose a reason for hiding this comment

ziggie1984 Aug 14, 2023

Choose a reason for hiding this comment

ziggie1984 Aug 16, 2023

Choose a reason for hiding this comment

ziggie1984 Aug 16, 2023

Choose a reason for hiding this comment

Roasbeef Aug 25, 2023

Choose a reason for hiding this comment

Roasbeef Aug 25, 2023

Choose a reason for hiding this comment

Roasbeef commented Aug 22, 2023

ziggie1984 commented Aug 22, 2023

ziggie1984 commented Aug 8, 2023 •

edited

ziggie1984 Aug 14, 2023 •

edited