Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contractcourt: Persist Resolution Outcomes #4157

Merged

Conversation

carlaKC
Copy link
Collaborator

@carlaKC carlaKC commented Apr 6, 2020

This PR adds a new bucket to store the outcome of on-chain resolutions. The original thought for this was just to keep the entire briefcase around, beginning of that approach coded up here.

Reasons for a new bucket:

  • Making a large change to the state machine of a critical component of lnd for the sake of accounting feels like a bad trade off
  • We do not gain any historical data by doing so, since all old briefcases are deleted
  • The current state that is persisted on disk in insufficient, we do not know the outcome of certain stages (only that they were resolved) and we do not have sweep transactions

This change does not currently persist fees for resolutions. This will be done in a follow up, and the reports are serialized on disk in a manner which allows us to easily add fees at a later stage without a migration. Fees are a non-trivial element to add because we batch sweeps, so would need to modify our spend notifications to attribute a portion of fees to a specific output. We also have cases (like successTx) where fees are not supplied by the wallet, so are not detected as fees by btcwallet.

Fixes #2472

@carlaKC carlaKC requested a review from joostjager April 6, 2020 07:30
@carlaKC carlaKC force-pushed the 2472-newbucketforresolutions branch from 5a31554 to 31f4f30 Compare April 6, 2020 08:00
Copy link
Collaborator

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed the design of this PR offline. One thing that I found important in the discussion is about the PendingChannels rpc output. It is likely that there are some gaps in that output at the moment. Once resolvers are marked as resolved, their data disappears from the report.

This PR would store that data on disk. That could possibly be used as another data source for PendingChannels, but will there be problems with race conditions?

Fixing PendingChannels looks like a good intermediate step to me before exposing the data of fully closed channels. As a check that the design fits that too.

Another thought is that concerns could be more separated by keeping all the db code outside of the resolvers. Let them just expose a struct that channel arbitrator takes and persists.

Finally, thinking about this issue without too much consideration for the existing code: if resolvers were db-stateless (I think they should be and #3688 brings them very close to that) and we also get rid of the 'morphing' of one resolver into the other, I think it could be good to keep all the resolver in-memory for as long as the channel is still pending close. Do all the reporting to PendingChannels directly from the in-memory resolver state. Then when everything is finished, ChannelArbitrator fetches those in-memory reports and persists them all to disk in the same transaction where the channel is marked fully closed.

In general, i think there is a risk with this issue of building new functionality on top of a not so solid foundation and not getting the best value for the effort. Especially when adding new db code, the cost to change it again later can be high.

My proposal would be to do things in this order:

  • Remove nursery
  • Make resolvers stateless (probably already done when nursery is removed)
  • Keep resolvers in memory as long as channel is still pending close (fix PendingChannels report)
  • Atomically persist resolver reports when channel is fully resolved and report that along with the close summary

channeldb/reports.go Show resolved Hide resolved
channeldb/reports.go Outdated Show resolved Hide resolved
channeldb/reports.go Show resolved Hide resolved
contractcourt/anchor_resolver.go Outdated Show resolved Hide resolved
contractcourt/briefcase.go Outdated Show resolved Hide resolved
channeldb/reports.go Show resolved Hide resolved
@carlaKC
Copy link
Collaborator Author

carlaKC commented Apr 8, 2020

In general, i think there is a risk with this issue of building new functionality on top of a not so solid foundation and not getting the best value for the effort.

While I think it is valuable to do things in order, I think we're currently throwing away a lot of data that is really useful for people to have. Would we be able to reproduce it entirely in this new stateless world order? And even if we can, would it be worth reprocessing all these special case channels (where we'd have to dig them out of historical chan bucket etc)? If not, I'd like to explore how we can keeping this data around in the most future proof way.

Especially when adding new db code, the cost to change it again later can be high.

That's what I like about adding a new bucket (rather than keeping around the old revolvers). I'm pretty confident that the information stored in this bucket is what is needed user side to get good insight into what went down on chain for each of their channels. Even if we do get rid of the morphing resolver (contest resolver -> success/timeout), that stage still does need to be recorded for the user; they need to know whether we went on chain with a successTx, or the remote party timed it out, for example. It may be done in a different way; eg a resolver hands of a report to the arbitrator, as you suggest, but the outcome we need on disk remains the same.

Another thought is that concerns could be more separated by keeping all the db code outside of the resolvers. Let them just expose a struct that channel arbitrator takes and persists.

Not sure if this can be done atomically in the current setup (which is a prio). But agreed in the theoretical stateless case, handoff of a report/outcome of some kind to channel arbitrator makes sense.

That could possibly be used as another data source for PendingChannels, but will there be problems with race conditions?

The potential race here would be that something is unresolved when we get our current set of resolvers from memory, it resolves but does not write to disk on time for when we get the already resolved set from disk. I don't think this is a huge concern, because PendingChannels is a temporal endpoint, and htlcs are only resolved on block arrival, so the chances of hitting it are slim. Having a record of what's been fully resolved is still a step in the right direction, and this unlikely race would be addressed if/when we transition to stateless by keeping in memory until fully resolved then flushing to disk.

Potential options here:

  1. Do nothing, wait for channel arbitrator overhaul
    [+] design reporting system based on future needs rather than guessing now
    [-] might not be able to reconstruct historical resolvers; if we can, still have to write custom code to re-resolve already resolved channels to generate this data

  2. Start writing to new bucket now
    [+] we get the data we need now, in a separate bucket which does not affect existing state machine
    [-] locks us into a db schema that we may not want

  3. Keep resolvers around
    [+] also get to keep data, although we still don't have all the info we need on disk, would need to ad hoc add fields
    [-] large change to the state machine of a critical system

If we're going to be rehauling the channel arbitrator in the next few months, I'd say that it's worth waiting. But if that's not going to happen, I would like to start storing this information.

@Roasbeef Roasbeef added this to the 0.11.0 milestone Apr 14, 2020
@Roasbeef Roasbeef added accounting contracts database Related to the database/storage of LND v0.11 labels Apr 14, 2020
@Roasbeef
Copy link
Member

Great points @carlaKC, I don't think we cut off future possibilities of removing the nursery or removing the checkpointing from resolvers if we go through with this change. Those changes are IMO nice to haves, as the existing system does work, but is hauling around a bit of technical debt. The changes proposed in this PR would scratch several needs on our end, and also our major users w.r.t wanting to have air-tight accounting w.r.t on-chain sweeping and fees (also I'm all for filling in any low hanging gaps along the way). I worry that if we go the "long way around", then we'd risk landing this core feature, since as we all now, at times things have a tendency to sprawl. There's also the matter of the extra review cycles (particularly the nursery changes) as they touch a pretty critical area w.r.t safe operation of lnd.

In the end, I think #2 straddles a happy medium of making a small-er change here which allows us to finally have sane records of past channel history, without cutting off any possible refactors or slight re-designs in this area. Ultimately, I think we'll need to be storing this data anyway.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed an initial cursory pass, also occurs to me that this'll be rather useful for debug weird conditions related to chain claims. As is right now, we only have logs to go off, which themselves at times are missing critical information.

contractcourt/briefcase.go Outdated Show resolved Hide resolved
contractcourt/briefcase.go Outdated Show resolved Hide resolved
channeldb/reports.go Show resolved Hide resolved
channeldb/reports.go Outdated Show resolved Hide resolved
channeldb/reports.go Show resolved Hide resolved
channeldb/reports.go Outdated Show resolved Hide resolved
channeldb/reports.go Outdated Show resolved Hide resolved
contractcourt/channel_arbitrator.go Outdated Show resolved Hide resolved
contractcourt/htlc_incoming_contest_resolver.go Outdated Show resolved Hide resolved
@cfromknecht cfromknecht added this to In progress in v0.11.0-beta via automation Apr 21, 2020
@cfromknecht cfromknecht moved this from In progress to Review in progress in v0.11.0-beta Apr 21, 2020
@carlaKC carlaKC force-pushed the 2472-newbucketforresolutions branch 5 times, most recently from f5dae08 to 690f9c6 Compare May 24, 2020 15:42
@carlaKC carlaKC force-pushed the 2472-newbucketforresolutions branch 2 times, most recently from 01166e6 to 9d7c576 Compare May 26, 2020 10:51
@carlaKC carlaKC marked this pull request as ready for review May 26, 2020 12:56
@carlaKC carlaKC requested a review from cfromknecht as a code owner May 26, 2020 12:56
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to take a pass through the tests, but the diff is looking pretty good now IMO. The main comment I have is if we actually need that new method to scan for transactions in a height range, given that the sign desc will always contain the input value information we need. I might be missing something here though.

channeldb/reports.go Show resolved Hide resolved
channeldb/reports.go Outdated Show resolved Hide resolved
server.go Outdated Show resolved Hide resolved
contractcourt/htlc_incoming_contest_resolver.go Outdated Show resolved Hide resolved
contractcourt/htlc_success_resolver_test.go Show resolved Hide resolved
contractcourt/htlc_outgoing_contest_resolver.go Outdated Show resolved Hide resolved
channeldb/reports.go Outdated Show resolved Hide resolved
channeldb/reports.go Outdated Show resolved Hide resolved
// Checkpoint the resolver with a closure that will write the outcome
// of the resolver and its sweep transaction to disk.
return nil, c.Checkpoint(c, func(tx kvdb.RwTx) error {
return c.PutResolverReport(tx, report)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the resolvers now end with this PutResolverReport call. I'd try to move that to the caller of Resolve so that it isn't duplicated and always guaranteed to be executed. If you unify resolver report and contract report, the caller can request the report through the existing report() method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the PutReport call to be called in the same tx as our final Checkpoint (otherwise we could checkpoint, restart and then won't resume the complete resolver but also won't have the report). I'm unsure whether it makes sense to move the final Checkpoint out of each resolver? The came could have been said of Checkpoint itself before we started writing reports, but it leaves Resolve incomplete to move it out imo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer moving the CheckPoint out from the resolver rather than complicating the resolver with the closure.

contractcourt/channel_arbitrator.go Show resolved Hide resolved
contractcourt/briefcase.go Outdated Show resolved Hide resolved
channeldb/reports.go Show resolved Hide resolved
@carlaKC carlaKC force-pushed the 2472-newbucketforresolutions branch 3 times, most recently from 6cdc1f3 to f5a1e79 Compare June 4, 2020 18:13
@carlaKC carlaKC force-pushed the 2472-newbucketforresolutions branch from 3c6d0ab to 2d777aa Compare July 3, 2020 16:25
@carlaKC carlaKC requested a review from joostjager July 5, 2020 08:45
Copy link
Collaborator

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main comment is still the ResolverType / ResolverOutcome structure. I know we've seen many iterations already, but as this is part of the database and exposed over rpc, imo we need to be sure it is the optimal way to represent the resolution results.

// claimed the outpoint. This may be a sweep transaction, or a first
// stage success/timeout transaction.
SpendTxID *chainhash.Hash
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still interested in an answer to this.

channeldb/reports.go Outdated Show resolved Hide resolved
channeldb/reports.go Outdated Show resolved Hide resolved

// ResolverOutcomeTimeout indicates that a contract was timed out on
// chain.
ResolverOutcomeTimeout ResolverOutcome = 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That it is timeout, is already encoded in the resolver type. Isn't this lost?

// ResolverOutcomeFirstStage indicates that a htlc had to be claimed
// over two stages, with this outcome representing the confirmation
// of our success/timeout tx.
ResolverOutcomeFirstStage ResolverOutcome = 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there will be two ResolverTypeIncomingHtlc entries in the resolversBucket. One with outcome ResolverOutcomeFirstStage and another one with ResolverOutcomeClaimed?

select {
case preimage := <-preimageSubscription.WitnessUpdates:
case preimage := <-witnessUpdates:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss writing of the success resolution outcome in this resolver. Shouldn't the happy path also be concluded with a write to the new bucket?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The success resolver handles the write, since the claim has not confirmed yet at this stage.


// If we have a success tx, we append a report to represent our first
// stage claim.
if h.htlcResolution.SignedSuccessTx != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When exactly is the outcome of the first stage written to the bucket? I would expect that after the presigned tx confirms


// Once our success tx has confirmed, we add a resolution for
// our success tx first stage transaction.
successTx := h.htlcResolution.SignedTimeoutTx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing to call the timeout tx successTx ?

OutPoint: h.htlcResolution.ClaimOutpoint,
Amount: amt,
ResolverType: channeldb.ResolverTypeOutgoingHtlc,
ResolverOutcome: channeldb.ResolverOutcomeTimeout,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still struggling with the definition of those outcomes. Inside the context of an outgoing htlc, we can only get it or not get it.

c.reportLock.Unlock()

c.resolved = true
return nil, nil
return nil, c.PutResolverReport(nil, report)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't fully understand why PutResolverReport needs to be called directly. I'd think that always calling CheckPoint in resolvers makes it clearer what is going on. It checkpoints the full state (internal + reports) of a resolver and that is the only function that is used.

I know there is access via the arb config to other functions, but maybe that isn't a good thing anyway.

stages. This outcome represents the broadcast of a timeout or success
transaction for this two stage htlc claim.
*/
FIRST_STAGE = 4;
Copy link
Collaborator

@joostjager joostjager Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to CLAIMED_FIRST_STAGE to make it clear that this is a claim result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this a precursor to CLAIMED and TIMEOUT tho?


// Anchor was swept by someone else. This is possible after the
// 16 block csv lock.
case sweep.ErrRemoteSpend:
c.log.Warnf("our anchor spent by someone else")
outcome = channeldb.ResolverOutcomeAbandoned
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclaimed?

v0.11.0-beta automation moved this from Review in progress to Reviewer approved Jul 6, 2020
Copy link
Collaborator

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed some of the outstanding comments offline. Still not totally happy with the PR, but approving anyway to no longer hold up progress. That makes for the required two approvals, but I think it would still be good if @cfromknecht takes a final look at just the .proto changes.

Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few small nits

lnrpc/rpc.proto Outdated Show resolved Hide resolved
stages. This outcome represents the broadcast of a timeout or success
transaction for this two stage htlc claim.
*/
FIRST_STAGE = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this a precursor to CLAIMED and TIMEOUT tho?

lnrpc/rpc.proto Show resolved Hide resolved
lnrpc/rpc.proto Outdated Show resolved Hide resolved
@carlaKC carlaKC force-pushed the 2472-newbucketforresolutions branch from 2d777aa to b888fa9 Compare July 7, 2020 10:14
@carlaKC carlaKC requested a review from cfromknecht July 7, 2020 15:14
Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🙌

@cfromknecht
Copy link
Contributor

hold off on merging until #971 is in

carlaKC added 11 commits July 7, 2020 19:49
Add a new top level bucket which holds closed channels nested by chain
hash which contains additional information about channel closes. We add
resolver resolutions under their own key so that we can extend the
bucket with additional information if required.
To allow us to write the outcome of our resolver to disk, we add
optional resolver reports to the CheckPoint function. Variadic params
are used because some checkpoints may have no reports (when the resolver
is not yet complete) and some may have two (in the case of a two stage
resolution).
Our current set of reports contain much of the information we will
need to persist contract resolutions. We add a function to create
resolver reports from our exiting set of resolutions.
Incoming htlcs that are timed out or failed (invalid htlc or invoice
condition not met), save a single on chain resolution because we don't
need to take any actions on them ourselves (we don't need to worry
about 2 stage claims since this is the success path for our peer).
Checkpoint our htlc claims with on chain reasolutions, including our
first stage success tx where required.
When a remote peer claims one of our outgoing htlcs on chain, we do
not care whether they claimed with multiple stages. We simply store
the claim outgome then forget the resolver.
@cfromknecht
Copy link
Contributor

@carlaKC needs rebase

@carlaKC carlaKC force-pushed the 2472-newbucketforresolutions branch from b888fa9 to 177c314 Compare July 7, 2020 17:50
@cfromknecht cfromknecht merged commit 6044649 into lightningnetwork:master Jul 7, 2020
v0.11.0-beta automation moved this from Reviewer approved to Done Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accounting contracts database Related to the database/storage of LND
Projects
No open projects
v0.11.0-beta
  
Done
Development

Successfully merging this pull request may close these issues.

[Feature Request] Attach sweep transaction ids to closedchannels list
4 participants