Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lnwallet+contractcourt: gracefully handle auto force close post data … #7985

Merged
merged 1 commit into from Sep 16, 2023

Conversation

Roasbeef
Copy link
Member

…loss

In this commit, update the start up logic to gracefully handle a seemingly rare case. In this case, a peer detects local data loss with a set of active HTLCs. These HTLCs then eventually expire (they may or may not actually "exist"), causing a force close decision. Before this PR, this attempt would fail with a fatal error that can impede start up.

To better handle such a scenario, we'll now catch the error when we fail to force close due to entering the DLP and instead terminate the state machine at the broadcast state. When a commitment transaction eventually confirms, we'll play it as normal.

Fixes #7984

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should catch the error here to make sure we can finish startup,

case errNoResolutions:

And no advancing the state in channel arbitrator. Once data loss is detected, we'll wait for the remote to force close it. Then the chain arbitrator will detect it and send the event to RemoteUnilateralClosure, which will be picked up and handled by the channel arbitrator to clean up the resolutions.

contractcourt/channel_arbitrator.go Outdated Show resolved Hide resolved
lnwallet/channel.go Show resolved Hide resolved
@Roasbeef
Copy link
Member Author

I think we should catch the error here to make sure we can finish startup,

True, we want to see have the channel arb active so it can get the on chain close signal.

@Roasbeef
Copy link
Member Author

And no advancing the state in channel arbitrator.

Ah so you mean don't state step at all and instead wait for the on chain confirmation? Each time a block is mined though, we may end up triggering the go-to-chain decision, so I think we still need to catch this error and just have the state machine terminate as is. I updated the commit to just go back to StateBroadcastCommit, so it terminates there. Otherwise, in StateDefault, we'll do checkLocalChainActions, eventually deciding to force close again once the next block comes around.

@Roasbeef
Copy link
Member Author

Pushed up a new commit:

  • test case added to ensure we start up
  • modified to not advance to next state, just stay in StateBroadcastCommit

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending linter fix, otherwise LGTM🙏

I think going back to either StateDefault or StateBroadcastCommit works. Once in DLP mode, there isn't much we can do here but listening to remote force close and possible breach, and these two triggers can both be handled as long as we don't go beyong StateBroadcastCommit.

A future PR may wanna investigate and modify the ChainArbitrator.Start to never stop when one of the channel arbitrators fails to start,

if err := arbitrator.Start(startState); err != nil {
stopAndLog()
return err

I think unless it's related to db failure, the system should start normally so it can operate on other channels.

contractcourt/channel_arbitrator_test.go Show resolved Hide resolved
…loss

In this commit, update the start up logic to gracefully handle a
seemingly rare case. In this case, a peer detects local data loss with a
set of active HTLCs. These HTLCs then eventually expire (they may or may
not actually "exist"), causing a force close decision. Before this PR,
this attempt would fail with a fatal error that can impede start up.

To better handle such a scenario, we'll now catch the error when we fail
to force close due to entering the DLP and instead terminate the state
machine at the broadcast state. When a commitment transaction eventually
confirms, we'll play it as normal.

Fixes lightningnetwork#7984
@Roasbeef Roasbeef merged commit 2d5c0c9 into lightningnetwork:master Sep 16, 2023
23 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants