lnwallet+contractcourt: gracefully handle auto force close post data … #7985

Roasbeef · 2023-09-14T00:20:06Z

…loss

In this commit, update the start up logic to gracefully handle a seemingly rare case. In this case, a peer detects local data loss with a set of active HTLCs. These HTLCs then eventually expire (they may or may not actually "exist"), causing a force close decision. Before this PR, this attempt would fail with a fatal error that can impede start up.

To better handle such a scenario, we'll now catch the error when we fail to force close due to entering the DLP and instead terminate the state machine at the broadcast state. When a commitment transaction eventually confirms, we'll play it as normal.

Fixes #7984

yyforyongyu

I think we should catch the error here to make sure we can finish startup,

lnd/contractcourt/channel_arbitrator.go

Line 534 in 90effda

case errNoResolutions:

And no advancing the state in channel arbitrator. Once data loss is detected, we'll wait for the remote to force close it. Then the chain arbitrator will detect it and send the event to RemoteUnilateralClosure, which will be picked up and handled by the channel arbitrator to clean up the resolutions.

contractcourt/channel_arbitrator.go

lnwallet/channel.go

Roasbeef · 2023-09-14T16:34:00Z

I think we should catch the error here to make sure we can finish startup,

True, we want to see have the channel arb active so it can get the on chain close signal.

Roasbeef · 2023-09-15T00:35:48Z

And no advancing the state in channel arbitrator.

Ah so you mean don't state step at all and instead wait for the on chain confirmation? Each time a block is mined though, we may end up triggering the go-to-chain decision, so I think we still need to catch this error and just have the state machine terminate as is. I updated the commit to just go back to StateBroadcastCommit, so it terminates there. Otherwise, in StateDefault, we'll do checkLocalChainActions, eventually deciding to force close again once the next block comes around.

Roasbeef · 2023-09-15T01:05:02Z

Pushed up a new commit:

test case added to ensure we start up
modified to not advance to next state, just stay in StateBroadcastCommit

yyforyongyu

pending linter fix, otherwise LGTM🙏

I think going back to either StateDefault or StateBroadcastCommit works. Once in DLP mode, there isn't much we can do here but listening to remote force close and possible breach, and these two triggers can both be handled as long as we don't go beyong StateBroadcastCommit.

A future PR may wanna investigate and modify the ChainArbitrator.Start to never stop when one of the channel arbitrators fails to start,

lnd/contractcourt/chain_arbitrator.go

Lines 707 to 709 in 90effda

    
           if err := arbitrator.Start(startState); err != nil { 
        
           	stopAndLog() 
        
           	return err

I think unless it's related to db failure, the system should start normally so it can operate on other channels.

contractcourt/channel_arbitrator_test.go

…loss In this commit, update the start up logic to gracefully handle a seemingly rare case. In this case, a peer detects local data loss with a set of active HTLCs. These HTLCs then eventually expire (they may or may not actually "exist"), causing a force close decision. Before this PR, this attempt would fail with a fatal error that can impede start up. To better handle such a scenario, we'll now catch the error when we fail to force close due to entering the DLP and instead terminate the state machine at the broadcast state. When a commitment transaction eventually confirms, we'll play it as normal. Fixes lightningnetwork#7984

Roasbeef mentioned this pull request Sep 14, 2023

[bug]: 0.17.0-rc3 doesn't start - "unable to start server: cannot force close channel with state: ChanStatusBorked|ChanStatusLocalDataLoss" #7984

Closed

saubyk requested review from yyforyongyu and guggero September 14, 2023 00:22

saubyk added this to the v0.17.0 milestone Sep 14, 2023

yyforyongyu added the no-changelog label Sep 14, 2023

yyforyongyu requested review from Crypt-iQ and removed request for guggero September 14, 2023 02:10

yyforyongyu reviewed Sep 14, 2023

View reviewed changes

contractcourt/channel_arbitrator.go Outdated Show resolved Hide resolved

lnwallet/channel.go Show resolved Hide resolved

Roasbeef force-pushed the graceful-data-loss-fc branch from 0076f8c to ad3241e Compare September 15, 2023 01:04

yyforyongyu approved these changes Sep 15, 2023

View reviewed changes

contractcourt/channel_arbitrator_test.go Show resolved Hide resolved

Crypt-iQ approved these changes Sep 15, 2023

View reviewed changes

Roasbeef force-pushed the graceful-data-loss-fc branch from ad3241e to de54a60 Compare September 16, 2023 01:29

Roasbeef merged commit 2d5c0c9 into lightningnetwork:master Sep 16, 2023
23 of 25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lnwallet+contractcourt: gracefully handle auto force close post data … #7985

lnwallet+contractcourt: gracefully handle auto force close post data … #7985

Roasbeef commented Sep 14, 2023

yyforyongyu left a comment

Roasbeef commented Sep 14, 2023

Roasbeef commented Sep 15, 2023

Roasbeef commented Sep 15, 2023

yyforyongyu left a comment

	if err := arbitrator.Start(startState); err != nil {
	stopAndLog()
	return err

lnwallet+contractcourt: gracefully handle auto force close post data … #7985

lnwallet+contractcourt: gracefully handle auto force close post data … #7985

Conversation

Roasbeef commented Sep 14, 2023

yyforyongyu left a comment

Choose a reason for hiding this comment

Roasbeef commented Sep 14, 2023

Roasbeef commented Sep 15, 2023

Roasbeef commented Sep 15, 2023

yyforyongyu left a comment

Choose a reason for hiding this comment