contractcourt: Persist Resolution Outcomes #4157

carlaKC · 2020-04-06T07:22:18Z

This PR adds a new bucket to store the outcome of on-chain resolutions. The original thought for this was just to keep the entire briefcase around, beginning of that approach coded up here.

Reasons for a new bucket:

Making a large change to the state machine of a critical component of lnd for the sake of accounting feels like a bad trade off
We do not gain any historical data by doing so, since all old briefcases are deleted
The current state that is persisted on disk in insufficient, we do not know the outcome of certain stages (only that they were resolved) and we do not have sweep transactions

This change does not currently persist fees for resolutions. This will be done in a follow up, and the reports are serialized on disk in a manner which allows us to easily add fees at a later stage without a migration. Fees are a non-trivial element to add because we batch sweeps, so would need to modify our spend notifications to attribute a portion of fees to a specific output. We also have cases (like successTx) where fees are not supplied by the wallet, so are not detected as fees by btcwallet.

Fixes #2472

joostjager

Discussed the design of this PR offline. One thing that I found important in the discussion is about the PendingChannels rpc output. It is likely that there are some gaps in that output at the moment. Once resolvers are marked as resolved, their data disappears from the report.

This PR would store that data on disk. That could possibly be used as another data source for PendingChannels, but will there be problems with race conditions?

Fixing PendingChannels looks like a good intermediate step to me before exposing the data of fully closed channels. As a check that the design fits that too.

Another thought is that concerns could be more separated by keeping all the db code outside of the resolvers. Let them just expose a struct that channel arbitrator takes and persists.

Finally, thinking about this issue without too much consideration for the existing code: if resolvers were db-stateless (I think they should be and #3688 brings them very close to that) and we also get rid of the 'morphing' of one resolver into the other, I think it could be good to keep all the resolver in-memory for as long as the channel is still pending close. Do all the reporting to PendingChannels directly from the in-memory resolver state. Then when everything is finished, ChannelArbitrator fetches those in-memory reports and persists them all to disk in the same transaction where the channel is marked fully closed.

In general, i think there is a risk with this issue of building new functionality on top of a not so solid foundation and not getting the best value for the effort. Especially when adding new db code, the cost to change it again later can be high.

My proposal would be to do things in this order:

Remove nursery
Make resolvers stateless (probably already done when nursery is removed)
Keep resolvers in memory as long as channel is still pending close (fix PendingChannels report)
Atomically persist resolver reports when channel is fully resolved and report that along with the close summary

channeldb/reports.go

contractcourt/anchor_resolver.go

contractcourt/briefcase.go

channeldb/reports.go

carlaKC · 2020-04-08T14:09:48Z

In general, i think there is a risk with this issue of building new functionality on top of a not so solid foundation and not getting the best value for the effort.

While I think it is valuable to do things in order, I think we're currently throwing away a lot of data that is really useful for people to have. Would we be able to reproduce it entirely in this new stateless world order? And even if we can, would it be worth reprocessing all these special case channels (where we'd have to dig them out of historical chan bucket etc)? If not, I'd like to explore how we can keeping this data around in the most future proof way.

Especially when adding new db code, the cost to change it again later can be high.

That's what I like about adding a new bucket (rather than keeping around the old revolvers). I'm pretty confident that the information stored in this bucket is what is needed user side to get good insight into what went down on chain for each of their channels. Even if we do get rid of the morphing resolver (contest resolver -> success/timeout), that stage still does need to be recorded for the user; they need to know whether we went on chain with a successTx, or the remote party timed it out, for example. It may be done in a different way; eg a resolver hands of a report to the arbitrator, as you suggest, but the outcome we need on disk remains the same.

Another thought is that concerns could be more separated by keeping all the db code outside of the resolvers. Let them just expose a struct that channel arbitrator takes and persists.

Not sure if this can be done atomically in the current setup (which is a prio). But agreed in the theoretical stateless case, handoff of a report/outcome of some kind to channel arbitrator makes sense.

That could possibly be used as another data source for PendingChannels, but will there be problems with race conditions?

The potential race here would be that something is unresolved when we get our current set of resolvers from memory, it resolves but does not write to disk on time for when we get the already resolved set from disk. I don't think this is a huge concern, because PendingChannels is a temporal endpoint, and htlcs are only resolved on block arrival, so the chances of hitting it are slim. Having a record of what's been fully resolved is still a step in the right direction, and this unlikely race would be addressed if/when we transition to stateless by keeping in memory until fully resolved then flushing to disk.

Potential options here:

Do nothing, wait for channel arbitrator overhaul
[+] design reporting system based on future needs rather than guessing now
[-] might not be able to reconstruct historical resolvers; if we can, still have to write custom code to re-resolve already resolved channels to generate this data
Start writing to new bucket now
[+] we get the data we need now, in a separate bucket which does not affect existing state machine
[-] locks us into a db schema that we may not want
Keep resolvers around
[+] also get to keep data, although we still don't have all the info we need on disk, would need to ad hoc add fields
[-] large change to the state machine of a critical system

If we're going to be rehauling the channel arbitrator in the next few months, I'd say that it's worth waiting. But if that's not going to happen, I would like to start storing this information.

Roasbeef · 2020-04-20T19:11:17Z

Great points @carlaKC, I don't think we cut off future possibilities of removing the nursery or removing the checkpointing from resolvers if we go through with this change. Those changes are IMO nice to haves, as the existing system does work, but is hauling around a bit of technical debt. The changes proposed in this PR would scratch several needs on our end, and also our major users w.r.t wanting to have air-tight accounting w.r.t on-chain sweeping and fees (also I'm all for filling in any low hanging gaps along the way). I worry that if we go the "long way around", then we'd risk landing this core feature, since as we all now, at times things have a tendency to sprawl. There's also the matter of the extra review cycles (particularly the nursery changes) as they touch a pretty critical area w.r.t safe operation of lnd.

In the end, I think #2 straddles a happy medium of making a small-er change here which allows us to finally have sane records of past channel history, without cutting off any possible refactors or slight re-designs in this area. Ultimately, I think we'll need to be storing this data anyway.

Roasbeef

Completed an initial cursory pass, also occurs to me that this'll be rather useful for debug weird conditions related to chain claims. As is right now, we only have logs to go off, which themselves at times are missing critical information.

contractcourt/briefcase.go

channeldb/reports.go

contractcourt/channel_arbitrator.go

contractcourt/htlc_incoming_contest_resolver.go

Roasbeef

Still need to take a pass through the tests, but the diff is looking pretty good now IMO. The main comment I have is if we actually need that new method to scan for transactions in a height range, given that the sign desc will always contain the input value information we need. I might be missing something here though.

channeldb/reports.go

server.go

contractcourt/htlc_incoming_contest_resolver.go

contractcourt/htlc_success_resolver_test.go

contractcourt/htlc_outgoing_contest_resolver.go

channeldb/reports.go

joostjager · 2020-06-02T07:48:43Z

contractcourt/commit_sweep_resolver.go

+	// Checkpoint the resolver with a closure that will write the outcome
+	// of the resolver and its sweep transaction to disk.
+	return nil, c.Checkpoint(c, func(tx kvdb.RwTx) error {
+		return c.PutResolverReport(tx, report)


All of the resolvers now end with this PutResolverReport call. I'd try to move that to the caller of Resolve so that it isn't duplicated and always guaranteed to be executed. If you unify resolver report and contract report, the caller can request the report through the existing report() method.

We want the PutReport call to be called in the same tx as our final Checkpoint (otherwise we could checkpoint, restart and then won't resume the complete resolver but also won't have the report). I'm unsure whether it makes sense to move the final Checkpoint out of each resolver? The came could have been said of Checkpoint itself before we started writing reports, but it leaves Resolve incomplete to move it out imo.

I would prefer moving the CheckPoint out from the resolver rather than complicating the resolver with the closure.

contractcourt/channel_arbitrator.go

contractcourt/briefcase.go

channeldb/reports.go

joostjager

Main comment is still the ResolverType / ResolverOutcome structure. I know we've seen many iterations already, but as this is part of the database and exposed over rpc, imo we need to be sure it is the optimal way to represent the resolution results.

joostjager · 2020-07-06T08:06:37Z

channeldb/reports.go

+	// claimed the outpoint. This may be a sweep transaction, or a first
+	// stage success/timeout transaction.
+	SpendTxID *chainhash.Hash
+}


Still interested in an answer to this.

channeldb/reports.go

joostjager · 2020-07-06T08:12:55Z

channeldb/reports.go

+
+	// ResolverOutcomeTimeout indicates that a contract was timed out on
+	// chain.
+	ResolverOutcomeTimeout ResolverOutcome = 4


That it is timeout, is already encoded in the resolver type. Isn't this lost?

joostjager · 2020-07-06T08:19:38Z

channeldb/reports.go

+	// ResolverOutcomeFirstStage indicates that a htlc had to be claimed
+	// over two stages, with this outcome representing the confirmation
+	// of our success/timeout tx.
+	ResolverOutcomeFirstStage ResolverOutcome = 5


So there will be two ResolverTypeIncomingHtlc entries in the resolversBucket. One with outcome ResolverOutcomeFirstStage and another one with ResolverOutcomeClaimed?

joostjager · 2020-07-06T08:22:20Z

contractcourt/htlc_incoming_contest_resolver.go

 		select {
-		case preimage := <-preimageSubscription.WitnessUpdates:
+		case preimage := <-witnessUpdates:


I miss writing of the success resolution outcome in this resolver. Shouldn't the happy path also be concluded with a write to the new bucket?

The success resolver handles the write, since the claim has not confirmed yet at this stage.

joostjager · 2020-07-06T08:24:16Z

contractcourt/htlc_success_resolver.go

+
+	// If we have a success tx, we append a report to represent our first
+	// stage claim.
+	if h.htlcResolution.SignedSuccessTx != nil {


When exactly is the outcome of the first stage written to the bucket? I would expect that after the presigned tx confirms

joostjager · 2020-07-06T08:27:41Z

contractcourt/htlc_timeout_resolver.go

+
+		// Once our success tx has confirmed, we add a resolution for
+		// our success tx first stage transaction.
+		successTx := h.htlcResolution.SignedTimeoutTx


Confusing to call the timeout tx successTx ?

joostjager · 2020-07-06T08:29:03Z

contractcourt/htlc_timeout_resolver.go

+		OutPoint:        h.htlcResolution.ClaimOutpoint,
+		Amount:          amt,
+		ResolverType:    channeldb.ResolverTypeOutgoingHtlc,
+		ResolverOutcome: channeldb.ResolverOutcomeTimeout,


Still struggling with the definition of those outcomes. Inside the context of an outgoing htlc, we can only get it or not get it.

joostjager · 2020-07-06T08:36:39Z

contractcourt/anchor_resolver.go

 	c.reportLock.Unlock()

 	c.resolved = true
-	return nil, nil
+	return nil, c.PutResolverReport(nil, report)


I still don't fully understand why PutResolverReport needs to be called directly. I'd think that always calling CheckPoint in resolvers makes it clearer what is going on. It checkpoints the full state (internal + reports) of a resolver and that is the only function that is used.

I know there is access via the arb config to other functions, but maybe that isn't a good thing anyway.

joostjager · 2020-07-06T09:04:03Z

lnrpc/rpc.proto

+    stages. This outcome represents the broadcast of a timeout or success
+    transaction for this two stage htlc claim.
+    */
+    FIRST_STAGE = 4;


Rename to CLAIMED_FIRST_STAGE to make it clear that this is a claim result?

isn't this a precursor to CLAIMED and TIMEOUT tho?

joostjager · 2020-07-06T09:07:44Z

contractcourt/anchor_resolver.go


 		// Anchor was swept by someone else. This is possible after the
 		// 16 block csv lock.
 		case sweep.ErrRemoteSpend:
 			c.log.Warnf("our anchor spent by someone else")
+			outcome = channeldb.ResolverOutcomeAbandoned


joostjager

Discussed some of the outstanding comments offline. Still not totally happy with the PR, but approving anyway to no longer hold up progress. That makes for the required two approvals, but I think it would still be good if @cfromknecht takes a final look at just the .proto changes.

cfromknecht

a few small nits

lnrpc/rpc.proto

cfromknecht · 2020-07-07T01:59:58Z

lnrpc/rpc.proto

+    stages. This outcome represents the broadcast of a timeout or success
+    transaction for this two stage htlc claim.
+    */
+    FIRST_STAGE = 4;


isn't this a precursor to CLAIMED and TIMEOUT tho?

lnrpc/rpc.proto

cfromknecht

LGTM 🙌

cfromknecht · 2020-07-07T16:47:43Z

hold off on merging until #971 is in

Add a new top level bucket which holds closed channels nested by chain hash which contains additional information about channel closes. We add resolver resolutions under their own key so that we can extend the bucket with additional information if required.

To allow us to write the outcome of our resolver to disk, we add optional resolver reports to the CheckPoint function. Variadic params are used because some checkpoints may have no reports (when the resolver is not yet complete) and some may have two (in the case of a two stage resolution).

Our current set of reports contain much of the information we will need to persist contract resolutions. We add a function to create resolver reports from our exiting set of resolutions.

Incoming htlcs that are timed out or failed (invalid htlc or invoice condition not met), save a single on chain resolution because we don't need to take any actions on them ourselves (we don't need to worry about 2 stage claims since this is the success path for our peer).

Checkpoint our htlc claims with on chain reasolutions, including our first stage success tx where required.

When a remote peer claims one of our outgoing htlcs on chain, we do not care whether they claimed with multiple stages. We simply store the claim outgome then forget the resolver.

cfromknecht · 2020-07-07T17:50:04Z

@carlaKC needs rebase

carlaKC requested a review from joostjager April 6, 2020 07:30

carlaKC force-pushed the 2472-newbucketforresolutions branch from 5a31554 to 31f4f30 Compare April 6, 2020 08:00

joostjager reviewed Apr 7, 2020

View reviewed changes

Roasbeef added this to the 0.11.0 milestone Apr 14, 2020

Roasbeef added accounting contracts database Related to the database/storage of LND v0.11 labels Apr 14, 2020

Roasbeef requested changes Apr 20, 2020

View reviewed changes

cfromknecht added this to In progress in v0.11.0-beta via automation Apr 21, 2020

cfromknecht moved this from In progress to Review in progress in v0.11.0-beta Apr 21, 2020

carlaKC mentioned this pull request May 1, 2020

lnrpc: add optional labels to on chain transactions #4213

Merged

carlaKC force-pushed the 2472-newbucketforresolutions branch 5 times, most recently from f5dae08 to 690f9c6 Compare May 24, 2020 15:42

carlaKC force-pushed the 2472-newbucketforresolutions branch 2 times, most recently from 01166e6 to 9d7c576 Compare May 26, 2020 10:51

carlaKC marked this pull request as ready for review May 26, 2020 12:56

carlaKC requested a review from cfromknecht as a code owner May 26, 2020 12:56

carlaKC requested review from Roasbeef and joostjager May 26, 2020 12:56

Roasbeef requested changes May 30, 2020

View reviewed changes

joostjager reviewed Jun 2, 2020

View reviewed changes

carlaKC force-pushed the 2472-newbucketforresolutions branch 3 times, most recently from 6cdc1f3 to f5a1e79 Compare June 4, 2020 18:13

carlaKC force-pushed the 2472-newbucketforresolutions branch from 3c6d0ab to 2d777aa Compare July 3, 2020 16:25

carlaKC requested a review from joostjager July 5, 2020 08:45

joostjager reviewed Jul 6, 2020

View reviewed changes

v0.11.0-beta automation moved this from Review in progress to Reviewer approved Jul 6, 2020

joostjager approved these changes Jul 6, 2020

View reviewed changes

cfromknecht reviewed Jul 7, 2020

View reviewed changes

carlaKC force-pushed the 2472-newbucketforresolutions branch from 2d777aa to b888fa9 Compare July 7, 2020 10:14

carlaKC requested a review from cfromknecht July 7, 2020 15:14

cfromknecht approved these changes Jul 7, 2020

View reviewed changes

cfromknecht mentioned this pull request Jul 7, 2020

Add max value in flight flag #971

Merged

carlaKC added 11 commits July 7, 2020 19:49

contractcourt: add PutResolverReport function to chanArb config

8c8f857

contractcourt: add resolver report function to ContractReport

0a01d5d

Our current set of reports contain much of the information we will need to persist contract resolutions. We add a function to create resolver reports from our exiting set of resolutions.

contractcourt: store anchor resolutions on disk

f5b20b7

contractcourt/test: add test context for success resolver

eb07f89

contractcourt: save htlc success resolvers, including first stage

03b76ad

Checkpoint our htlc claims with on chain reasolutions, including our first stage success tx where required.

contractcourt: persist remote outgoing htlc claim outcome on disk

a38dc25

When a remote peer claims one of our outgoing htlcs on chain, we do not care whether they claimed with multiple stages. We simply store the claim outgome then forget the resolver.

contractcourt: store htlc timeout sweeps

d0ec872

contractcourt: persist commit sweep resolutions

d8a4b37

carlaKC added 2 commits July 7, 2020 19:50

lnrpc: add channel resolutions to closed channels

1d5d616

lntest: add resolver report assertions to force close test

177c314

carlaKC force-pushed the 2472-newbucketforresolutions branch from b888fa9 to 177c314 Compare July 7, 2020 17:50

cfromknecht merged commit 6044649 into lightningnetwork:master Jul 7, 2020

v0.11.0-beta automation moved this from Reviewer approved to Done Jul 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contractcourt: Persist Resolution Outcomes #4157

contractcourt: Persist Resolution Outcomes #4157

carlaKC commented Apr 6, 2020 •

edited

joostjager left a comment

carlaKC commented Apr 8, 2020

Roasbeef commented Apr 20, 2020

Roasbeef left a comment

Roasbeef left a comment

joostjager Jun 2, 2020

carlaKC Jun 4, 2020

joostjager Jun 10, 2020

joostjager left a comment

joostjager Jul 6, 2020

joostjager Jul 6, 2020

joostjager Jul 6, 2020

joostjager Jul 6, 2020

carlaKC Jul 7, 2020

joostjager Jul 6, 2020

joostjager Jul 6, 2020

joostjager Jul 6, 2020

joostjager Jul 6, 2020

joostjager Jul 6, 2020 •

edited

cfromknecht Jul 7, 2020

joostjager Jul 6, 2020

joostjager left a comment

cfromknecht left a comment

cfromknecht Jul 7, 2020

cfromknecht left a comment

cfromknecht commented Jul 7, 2020

cfromknecht commented Jul 7, 2020

contractcourt: Persist Resolution Outcomes #4157

contractcourt: Persist Resolution Outcomes #4157

Conversation

carlaKC commented Apr 6, 2020 • edited

joostjager left a comment

Choose a reason for hiding this comment

carlaKC commented Apr 8, 2020

Roasbeef commented Apr 20, 2020

Roasbeef left a comment

Choose a reason for hiding this comment

Roasbeef left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager Jul 6, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager left a comment

Choose a reason for hiding this comment

cfromknecht left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfromknecht left a comment

Choose a reason for hiding this comment

cfromknecht commented Jul 7, 2020

cfromknecht commented Jul 7, 2020

carlaKC commented Apr 6, 2020 •

edited

joostjager Jul 6, 2020 •

edited