Skip to content

sweep: create sweeper#1960

Merged
Roasbeef merged 13 commits intolightningnetwork:masterfrom
joostjager:sweeper
Dec 19, 2018
Merged

sweep: create sweeper#1960
Roasbeef merged 13 commits intolightningnetwork:masterfrom
joostjager:sweeper

Conversation

@joostjager
Copy link
Copy Markdown
Contributor

@joostjager joostjager commented Sep 21, 2018

This PR moves the sweeper logic from utxonursery into a separate sweeper struct.

A follow up PR is #2000, which will let resolvers use the sweeper directly instead of going through nursery. The end goal is to remove nursery completely.

In this PR, there is no db migration to clean up now unused nursery store data. This allows us to keep the downgrade path open for a while.

@Roasbeef Roasbeef added enhancement Improvements to existing features / behaviour channel management The management of the nodes channels contracts P3 might get fixed, nice to have utxo sweeping labels Sep 22, 2018
@joostjager joostjager force-pushed the sweeper branch 9 times, most recently from 5ce7feb to 0e9fbeb Compare September 27, 2018 02:41
@joostjager joostjager force-pushed the sweeper branch 14 times, most recently from 369d13b to 58740fd Compare October 23, 2018 15:02
@halseth
Copy link
Copy Markdown
Contributor

halseth commented Dec 12, 2018

  1. What happens when the publish returns an error? It could be a non-recoverable error that has something to do with one of the inputs (wrong script, sign, ..). Or it could be a double spend because one or more of the inputs have been spent by the remote party or by our own sweep tx (that was published before restart). I think there are not many options then retry the next block?

We can be smarter about handling these errors. If the error is non-recoverable, then publishing in the next block won't really help (also can potentially get us banned by peers if we keep sending them invalid transactions). In that case we should return the error and exit IMO to surface the bug.

For double spends we can detect which input was already spent, remove it from the sweep, and retry. I think this is what is implicitly done already (since an input will be removed from the set after spend is detected), but I think it would help to make it more clear. We do something similar in the breacharbiter.

  1. What happens when the tx never confirms (for example fee too low)? If we keep waiting, it will block sweeping any other input forever. At some point we need to give up and go back to 1. But will we forget about the lost inputs then and loose the money? Or retry them? In that case, we'll get a double spend with our own stuck tx and sweeping will again be blocked forever.

It should be handled, but my suggestion is to reduce the scope of this PR to not include it. This is an existing problem.

It will fail, but after that sweep tx confirms, and allsets doesn't contain those inputs anymore, it will succeed.

Yes, I think this is actually my suggestion. Maybe we are actually talking about the same solution here, the behavior is just not obvious. See comment above.

I don't see how the PR can be reduced in size significantly by simplifying the retry mechanism. I only see about 20 lines dealing with the retry logic. And a form of retry is needed in any case (items above).

I don't think retry logic is strictly needed, and can be added in a follow-up PR.

@joostjager
Copy link
Copy Markdown
Contributor Author

@Roasbeef your comments have been processed.

@joostjager
Copy link
Copy Markdown
Contributor Author

Discussed simplification with @halseth. We could take out the reschedule logic and just try to sweep once every block. Start the batch timer at the beginning of every block and if it expires, try to sweep.

The downside of this is that a transaction that is stuck because of low fees will keep blocking all subsequent sweep txes if the backend doesn't support RBF. (This is something that would be solved in a probabilistic way by the randomized exponential back off.) It is also a regression from nursery, because nursery does skip over stuck txes because they are organized per height.

If we'd decide to go this path, the follow up pr with exponential backoff should probably go in before release.

@Roasbeef your input please

@joostjager
Copy link
Copy Markdown
Contributor Author

Added one commit that handles errors in a strict way, not to camouflage potential bugs.

Copy link
Copy Markdown
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look pretty good to me :)

Comment thread sweep/txgenerator.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be moved to immediately before the loop, I guess. Rationale being that we might encounter an error before ever using the result of this calculation :)

Comment thread sweep/store.go Outdated
Comment thread sweep/sweeper.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is only used for calculating the dust limit(?), can we instead store the dust limit directly here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have the dust limit calculation inside txgenerator to logically group the functionality together. generateInputPartitionings takes two fees and uses that as the basis to calculate the partitionings.

Comment thread sweep/sweeper.go Outdated
Comment thread sweep/sweeper.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this happens, feels like the whole sweeper should exit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with exit? The only thing that is running is the sweeper main event loop and that one is exited.

Comment thread sweep/sweeper.go Outdated
Comment thread sweep/sweeper.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an estimate how many blocks from sending the input to the sweeper until this condition is met, on average? (if it never successfully confirms)

Copy link
Copy Markdown
Contributor Author

@joostjager joostjager Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ((1 << 10) - 1) / 2 😄 so about 500 blocks.

Comment thread nursery_store.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not critical at this point?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a pre-existing bug that I wanted to mark with a TODO.

Comment thread utxonursery.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warn?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still happy flow, therefore Info.

Comment thread sweep/sweeper.go Outdated
Comment thread sweep/simulate/sim.go Outdated
Comment thread sweep/store.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just read the buckets directly? The migration logic would be more precise that way and less risk of accidentally reaching into the wrong bucket, if a future key/bucket in the codebase starts with the prefix.

Comment thread sweep/sweeper.go Outdated
Comment thread sweep/sweeper.go Outdated
@Roasbeef
Copy link
Copy Markdown
Member

Roasbeef commented Dec 17, 2018

Made a commit with some additional logs that should likely be cherry picked in:

c2e4fb2

e8e9b52

Here's the branch itself as I'm modifying commits as I test: https://github.com/Roasbeef/lnd/pull/new/sweeper-debug

@Roasbeef
Copy link
Copy Markdown
Member

With the patch above, I ran on a very old node with an ancient nursery and saw no transactions migrated:

./lnd.log.18:2018-12-16 20:06:20.012 [INF] SWPR: Migrating UTXO nursery finalized TXIDs

@joostjager
Copy link
Copy Markdown
Contributor Author

  • Commits cherry-picked except for wraps at 79 (instead of 80) chars.
  • Sweeper store bucket scan replaced by exact bucket retrieval.
  • Commit added to soften error handling again. Both error handling commits to be squashed when we are really certain that we want it like this.

@joostjager
Copy link
Copy Markdown
Contributor Author

Tested nursery tx migration again. Seems to work on my machine.

Roasbeef
Roasbeef previously approved these changes Dec 18, 2018
Copy link
Copy Markdown
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 LGTM 🎉

Ready to merge on my end once the final two commits squashed.

@joostjager
Copy link
Copy Markdown
Contributor Author

  • Added one more commit: lnwallet: prevent static fee estimator fees from being modified
  • Squashed error handling commits
  • Fixed unit test flake

joostjager and others added 13 commits December 18, 2018 10:50
Modifying the static fees is not thread safe. In this commit the fees
are made immutable.
We need to distinguish an lnd build for the purpose of integration
testing from a regular dev build. This makes it possible to adapt
parameters to let integration tests run faster (for example:
sweeper batch window).
This commit is a preparation for the implementation of remote spend
detection. Remote spends may happen before we broadcast our own sweep
tx. This calls for accurate height hints.
This commit adds a function that takes a set of inputs and splits them
in sensible sets to be used for generating transactions.
This commit adds a store for the sweeper. The sweeper needs minimal
persistent data to be able to recognize its own sweeps.
In this commit, the sweep package is extended from just tx generation to
an active sweeper that collects sweep inputs and autonomously proceeds
to publish the sweep tx after the batch window time interval has passed
without new inputs being added.
Previously, nursery generated and published its own sweep txes. It
stored the sweep tx in nursery_store to prevent a new tx with a new
sweep address from being generated on restart.

In this commit, sweep generation and publication is removed from nursery
and delegated to the sweeper. Also the confirmation notification is
received from the sweeper.
Copy link
Copy Markdown
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @joostjager, excited to finally have this in and help people sweep their stuck outputs! Very happy with the final iteration and the huge reduction in complexity over the nursery, LGTM 🎉

Copy link
Copy Markdown
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🔮

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel management The management of the nodes channels contracts enhancement Improvements to existing features / behaviour needs testing PR hasn't yet been actively tested on testnet/mainnet P2 should be fixed if one has time utxo sweeping

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants