Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss protection #1364

Merged
merged 18 commits into from
Aug 1, 2018
Merged

Conversation

halseth
Copy link
Contributor

@halseth halseth commented Jun 11, 2018

This PR implements the logic required for "data loss protection", as described in BOLT#2: https://github.com/lightningnetwork/lightning-rfc/blob/master/02-peer-protocol.md#message-retransmission

When syncing channel states we will now detect the cases where we probably have lost our local state, meaning broadcasting our commitment would be unsafe as it could be considered a breach. Instead we store the my_current_per_commitment_point sent to us by the remote in the database. This commitment point can later be used to reclaim our funds if the remote party decides to unilaterally close the channel using the corresponding state.

If we detect that the remote party probably has lost state, we'll be a good citizen and force close the channel using our latest commitment.

@Roasbeef: what is meant by:

In order to ensure we can carry out this process reliably we may need to ensure that they remote party first sends their re-sync message before we do. We can enforce this by requiring the initiator to send their message first.

?

Fixes #1131

@meshcollider meshcollider added safety General label for issues/PRs related to the safety of using the software recovery Related to the backup/restoration of LND data (e.g. wallet seeds) labels Jun 15, 2018
@lcasassa
Copy link

After doing a rebase to master, the docker image does not compile. Can you take a look please 👍

@irekzielinski
Copy link

Can't wait for this to be merged in - this will allow BLW wallet to use LND nodes!
Thank you for your effort guys!

@lcasassa
Copy link

lcasassa commented Jul 4, 2018

Any updates on this?

@Roasbeef
Copy link
Member

Roasbeef commented Jul 4, 2018 via email

@Roasbeef
Copy link
Member

Roasbeef commented Jul 4, 2018 via email

@lcasassa
Copy link

lcasassa commented Jul 5, 2018

I'm happy to test. But I get an error when compiling the docker image after doing a rebase from master. :/ If I don't do the rebase I get the offline channel error.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent PR! This is one of the last few things we're missing in lnd in terms of additional safety measures to ensure that our user's funds ah safu!

The architecture and the structure of the PR, along with the set of tests look pretty good to me at first pass. Many of my comments are style related, and pointing out some measures in the PR that should break after a rebase to the current master (as this PR is a month out of date at present).

lnd_test.go Outdated
@@ -2236,6 +2236,11 @@ func testChannelForceClosure(net *lntest.NetworkHarness, t *harnessTest) {
carolExpectedBalance,
carolBalResp.ConfirmedBalance)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be the case that this commit is no longer needed after rebase.

@@ -588,7 +597,7 @@ func (c *chainWatcher) dispatchRemoteForceClose(commitSpend *chainntnfs.SpendDet
// channel on-chain.
uniClose, err := lnwallet.NewUnilateralCloseSummary(
c.cfg.chanState, c.cfg.signer, c.cfg.pCache, commitSpend,
remoteCommit, isRemotePendingCommit,
remoteCommit, commitPoint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're already passing in the channel state, then why do we need to pass in this point as well?

func NewUnilateralCloseSummary(chanState *channeldb.OpenChannel, signer Signer,
pCache PreimageCache, commitSpend *chainntnfs.SpendDetail,
remoteCommit channeldb.ChannelCommitment,
remotePendingCommit bool) (*UnilateralCloseSummary, error) {
commitPoint *btcec.PublicKey) (*UnilateralCloseSummary, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here, we're already passing in the entire state.

lnd_test.go Outdated
// Carol will be the breached party. We set --nolisten to ensure Bob
// won't be able to connect to her and trigger the channel data
// protection logic automatically.
carol, err := net.NewNode("Carol", []string{"--debughtlc",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit w.r.t line wrapping here.

lnd_test.go Outdated
@@ -4981,27 +5007,44 @@ func testRevokedCloseRetributionZeroValueRemoteOutput(net *lntest.NetworkHarness

// Since we'd like to test some multi-hop failure scenarios, we'll
// introduce another node into our test network: Carol.
carol, err := net.NewNode("Carol", []string{"--debughtlc", "--hodl.exit-settle"})
carol, err := net.NewNode("Carol", []string{"--debughtlc",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit w.r.t line wrapping here.

// state, we'll just pass an empty commitment. Note
// that this means we won't be able to recover any HTLC
// funds.
// TODO(halseth): can we try to recover some HTLCs?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe...only if we have partial state, and this isn't actually the result of us restoring w/ seed+static-backups.

// funds.
// TODO(halseth): can we try to recover some HTLCs?
err = c.dispatchRemoteForceClose(
commitSpend, channeldb.ChannelCommitment{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now I see why the commitPoint and chan state are passed distinctly.

lnd_test.go Outdated
}()

// Dave will be the party losing his state.
dave, err := net.NewNode("Dave", []string{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just pass a nil here as the second param.

lnd_test.go Outdated

// We must let Dave communicate with Carol before they are able to open
// channel, so we connect Dave and Carol,
if err := net.ConnectNodes(ctxb, carol, dave); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll fail after rebasing to master as carol was started in --nolisten mode.

lnd_test.go Outdated
block = mineBlocks(t, net, 1)[0]
assertTxInBlock(t, block, daveSweep)

// Now Dave should considere the channel fully closed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considere -> consider?

@Roasbeef Roasbeef added P2 should be fixed if one has time needs review PR needs review by regular contributors needs testing PR hasn't yet been actively tested on testnet/mainnet labels Jul 11, 2018
@halseth halseth force-pushed the data-loss-protect branch 4 times, most recently from 116e18d to 45d0661 Compare July 12, 2018 10:48
@halseth
Copy link
Contributor Author

halseth commented Jul 12, 2018

Rebased and addressed comments. Also increased the cases we force close to include the remote giving us invalid data.

I will create a few issues to handle the remaining follow-ups, most notably prohibiting the user from force closing a desynced channel, and adding chanSync message resend.

PTAL

@halseth halseth closed this Jul 12, 2018
@halseth halseth deleted the data-loss-protect branch July 12, 2018 13:27
@halseth halseth restored the data-loss-protect branch July 12, 2018 13:35
@halseth halseth reopened this Jul 12, 2018
@Roasbeef Roasbeef added first pass review done PR has had first pass of review, needs more tho needs rebase PR has merge conflicts and removed needs review PR needs review by regular contributors labels Jul 12, 2018
@Roasbeef Roasbeef requested a review from wpaulino July 17, 2018 06:20
@halseth halseth removed the needs rebase PR has merge conflicts label Jul 17, 2018
c.Lock()
defer c.Unlock()

if err := c.Db.Update(func(tx *bolt.Tx) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to use putChanStatus here to reduce code duplication, signature might be more like:

func (c *OpenChannel) putChanStatus(tx *bolt.Tx, status ChannelStatus) (ChannelStatus, error)

for easier composition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried doing this, but this turned out to now reduce duplication, since then the DB must be Viewed from the caller, and the bucket must be retrieved twice in MarkDataLoss. The other suggestion were added, PTAL :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks for trying anyway 😂

In this commit we modify the integration tests slightly, by setting the
parties that gets breached during the breach tests to --nolisten. We do
this to ensure that once the data protection logic is in place, they
nodes won't automatically connect, detect the state desync and recover
before we are able to trigger the breach.
cfromknecht
cfromknecht previously approved these changes Jul 31, 2018
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 💾🔒 just needs a squash before merging

Since the ChanStatus field can be changed from concurrent callers, we
make it unexported and add the method ChanStatus() for safe retrieval.
This commit defines a few new errors that we can potentially encounter
during channel reestablishment:
* ErrInvalidLocalUnrevokedCommitPoint
* ErrCommitSyncLocalDataLoss
* ErrCommitSyncRemoteDataLoss

in addition to the already defined errors
* ErrInvalidLastCommitSecret
* ErrCannotSyncCommitChains
This commit enumerates the various error cases we can encounter when we
compare our local commit chain to the view the remote communicates to us
via msg.RemoteCommitTailHeight.

We now compare this height to our local tail height (note that there's
never a local "tip" at this point), returning relevant error in case of
a unrecoverable desync, and re-send a revocation in case we owe one.
This commit enumerates the various error cases we can encounter when we
compare our remote commit chain to the view the remote communicates to us
via msg.NextLocalCommitHeight.

We now compare this height to our remote tail and tip height, returning
relevant error in case of a unrecoverable desync, and re-send a
commitment signature (including log updates) in case we owe one.
This commit adds a check for the LocalUnrevokedCommitPoint sent to us by
the remote during channel reestablishment, ensuring it is the same point
as they have previously sent us.
This commit makes the link inspect the error encountered during channel
sync, force closing the channel if we detect a remote data loss.
…oss commitPoint

This commit makes the chainwatcher attempt to dispatch a remote close
when it detects a remote state with a state number higher than our
known remote state. This can mean that we lost some state, and we check
the database for (hopefully) a data loss commit point retrieved during
channel sync with the remote peer. If this commit point is found in the
database we use it to try to recover our funds from the commitment.
This commit adds the integration test testDataLossProtection, that
ensures that when a node loses state, the channel counterparty will
force close the channel, and they both can recover their funds.
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge 🎉

@Roasbeef Roasbeef merged commit 1e39cfc into lightningnetwork:master Aug 1, 2018
@lcasassa
Copy link

lcasassa commented Aug 1, 2018

Nice work here!! Thanks!!

@mcduck76
Copy link

mcduck76 commented Feb 6, 2019

Sorry for being late to the party, but just today bitcoin lightning wallet refuses to connect with "data loss protection not supported by this peer".
I am running current code base of lnd.

Is there an option to enable the feature? From what I can tell it is on by default since this merge.

@Roasbeef
Copy link
Member

Roasbeef commented Feb 6, 2019

@mcduck76 I'd report that to BLW, looks like they aren't interpreting feature bits correctly.

@mcduck76
Copy link

@mcduck76 I'd report that to BLW, looks like they aren't interpreting feature bits correctly.

I can connect from BLW to outher LND nodes. I really can't think of what is wrong with my installation, other nodes can connect fine.
Are there any configuration or make options that I have omitted so there is no support for the feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs testing PR hasn't yet been actively tested on testnet/mainnet P2 should be fixed if one has time ready to merge LGTM'd, may need rebase to address conflicts, anyone can merge recovery Related to the backup/restoration of LND data (e.g. wallet seeds) safety General label for issues/PRs related to the safety of using the software
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lnwallet+contractcourt+htlcswitch: fully implement the data-loss protection feature
7 participants