New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chainntnfs: handle historical confs and spends asynchronously #1628

Merged
merged 6 commits into from Aug 1, 2018

Conversation

Projects
None yet
4 participants
@wpaulino
Collaborator

wpaulino commented Jul 26, 2018

In this PR, we modify our bitcoind and btcd chain notifiers to handle historical confirmations and spends asynchronously. This was motivated by users experiencing slow startups due to blocking during long rescans.

Fixes #1569.
Fixes #1616.
Fixes #1648.

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Jul 26, 2018

Looks to have PR-specific failures on the itests rn.

@wpaulino wpaulino force-pushed the wpaulino:async-rescans branch from 738a27e to 1d27e5f Jul 26, 2018

}
}
return fmt.Errorf("spending transaction not found within block range "+

This comment has been minimized.

@halseth

halseth Jul 26, 2018

Collaborator

Doesn't seem like this should be printed as an error? Can potentially make this a concrete error type we can filter out.

This comment has been minimized.

@wpaulino

wpaulino Jul 27, 2018

Collaborator

Fixed.

@@ -867,69 +867,14 @@ func testSpendBeforeNtfnRegistration(miner *rpctest.Harness,
// concrete implementations.
//
// To do so, we first create a new output to our test target address.
txid, err := getTestTxId(miner)

This comment has been minimized.

@halseth

halseth Jul 26, 2018

Collaborator

👍

msg.heightHint)
// Look up whether the transaction is already
// included in the active chain.
confDetails, err := n.historicalConfDetails(

This comment has been minimized.

@halseth

halseth Jul 26, 2018

Collaborator

why not async?

This comment has been minimized.

@wpaulino

wpaulino Jul 27, 2018

Collaborator

Fixed.

// the confirmation details must be provided with the UpdateConfDetails method,
// otherwise we will wait for the transaction to confirm even though it already
// has.
func (tcn *TxConfNotifier) Register(ntfn *ConfNtfn) error {

This comment has been minimized.

@halseth

halseth Jul 26, 2018

Collaborator

Commit message says "registrations for unconfirmed transactions only", but seems like this will also be called fro already confirmed transactions?

This comment has been minimized.

@wpaulino

wpaulino Jul 27, 2018

Collaborator

Clarified things a bit, let me know what you think.

@@ -245,24 +245,37 @@ out:
_, currentHeight, err := b.chainConn.GetBestBlock()

This comment has been minimized.

@halseth

halseth Jul 26, 2018

Collaborator

seems like this could be moved inside the go routine

// indicating that the transaction has already been
// confirmed.
select {
case ntfn.Event.Updates <- 0:

This comment has been minimized.

@halseth

halseth Jul 26, 2018

Collaborator

Can we risk the channel having a buffer one element to small in this case? (since it is created with numconfs capacity)

This comment has been minimized.

@wpaulino

wpaulino Jul 27, 2018

Collaborator

It's safe to do so as we don't send an update indicating the required number of confirmations. Now that I think about this though, it should do that in place of sending an update indicating there aren't any confirmations left as that's already handled by the Confirmed channel. Will address it in a follow-up PR.

if details.BlockHeight <= tcn.currentHeight {
numConfsLeft := confHeight - tcn.currentHeight
select {
case ntfn.Event.Updates <- numConfsLeft:

This comment has been minimized.

@halseth

halseth Jul 26, 2018

Collaborator

Seems like there might be an issue here if multiple callers register for confirmation of the same txid. Each time UpdateConfDetails is called as a result of this, all clients will be sent updates? This will lead to each client receiving the updates multiple times, and the channel might even get full.

From what I can gather, we shouldn't need to iterate the ntfn clients in this method, but pass in the one client that requested the historical dispatch, and only notify that one.

This comment has been minimized.

@wpaulino

wpaulino Jul 27, 2018

Collaborator

Yeah, I spent quite some time debugging this. Ended up adding an id so that we can uniquely track each notification. This is something that will be needed anyway once we support canceling confirmation notifications.

@wpaulino wpaulino force-pushed the wpaulino:async-rescans branch from 1d27e5f to cfab092 Jul 27, 2018

@@ -800,6 +801,7 @@ func (b *BtcdNotifier) RegisterConfirmationsNtfn(txid *chainhash.Hash,
ntfn := &confirmationNotification{
ConfNtfn: chainntnfs.ConfNtfn{
ID: atomic.AddUint64(&b.confClientCounter, 1),

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

confID for consistency

This comment has been minimized.

@wpaulino

wpaulino Jul 30, 2018

Collaborator

Fixed.

return nil
}
// UpdateConfDetails attempts to mark an existing unconfirmed transaction as

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

I can't make this godoc match what the method is actually doing.

This comment has been minimized.

@wpaulino

wpaulino Jul 30, 2018

Collaborator

Forgot to update it. Fixed.

// included in the active chain. We'll do this
// in a goroutine to prevent blocking
// potentially long rescans.
go func() {

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

maybe introduce a waitgroup to wait for during shutdowns?

This comment has been minimized.

@wpaulino

wpaulino Jul 30, 2018

Collaborator

Not sure if it's really needed here 🤔

This comment has been minimized.

@Roasbeef

Roasbeef Jul 30, 2018

Member

Yeah I don't think it's needed here. Otherwise a very distant rescan can cause the shutdown process to be held up.

This comment has been minimized.

@cfromknecht

cfromknecht Jul 31, 2018

Collaborator

we could thread a quit chan down to the rescan if we want. thinking it might be a good idea actually, to prevent the rescan from causing an accidental panic after the chain conns are deinitialized

This comment has been minimized.

@halseth

halseth Jul 31, 2018

Collaborator

I think it's a nice pattern to follow, as also test failures tend to be harder to trace if there are stray go routines remaining.

This comment has been minimized.

@wpaulino

wpaulino Aug 1, 2018

Collaborator

Fixed.

}
if confDetails != nil {
err := b.txConfNotifier.UpdateConfDetails(
*msg.TxID, msg.ID,

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

just pass the msg.ConfNtfn itself.

This comment has been minimized.

@wpaulino

wpaulino Jul 30, 2018

Collaborator

I think it's better to keep it as is to make sure the notification has been registered with the notifier first.

This comment has been minimized.

@halseth

halseth Jul 31, 2018

Collaborator

if moving the Register call right before, that would make those call more tightly coupled, and i think even the whole ConfID can be removed?

This comment has been minimized.

@wpaulino

wpaulino Aug 1, 2018

Collaborator

Sure, but then it's expected for the caller to know this, rather than just providing an interface that ensures the notification has been registered first. Having an ID will also come in handy when adding support for cancelling existing notifications (there's an existing TODO for this).

rescanUpdate := []neutrino.UpdateOption{
neutrino.AddTxIDs(*msg.TxID),
neutrino.Rewind(currentHeight),
}
if err := n.chainView.Update(rescanUpdate...); err != nil {
chainntnfs.Log.Errorf("unable to update rescan: %v", err)
err = n.chainView.Update(rescanUpdate...)

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

just a reminder to make sure we cannot potentially miss a confimation if a block comes in while this go routine is running.

This comment has been minimized.

@Roasbeef

Roasbeef Jul 30, 2018

Member

What do you mean by this @halseth ?

This comment has been minimized.

@halseth

halseth Jul 31, 2018

Collaborator

We are passing in currentHeight, which might be outdated at this point (because of potential long-running historicalConfDetails). I think this should be okay, but just meant that we should check that it indeed is.

This comment has been minimized.

@cfromknecht

cfromknecht Jul 31, 2018

Collaborator

Good point, it's likely we'd rescan all of the manual dispatch done before. Since current height is behind a mutex, should be safe to read again?

This comment has been minimized.

@wpaulino

wpaulino Aug 1, 2018

Collaborator

currentHeight is a copy so it should remain unchanged even if the actual height changes.

// included in the active chain. We'll do this
// in a goroutine to prevent blocking
// potentially long rescans.
go func() {

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

I think there might be a potential race here:

Imagine tx A is confirmed in block 101, and our currentHeight is 104. We want a notification on 5 confirmations.

[100] <- [101: A] <- [102] <- [103] <- [104]

We start a rescan from heightHint = 100, to 104. While we do this block 105 comes in. In ConnectTip we check ntfn.details == nil and skip it. The rescan finishes, ntfn.details is set, and the notification for block 105 is never sent.

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

I'm still thinking about how to best solve this without essentially ending back up at our sync rescan behaviour. One potential approach would be to always do a rescan from heightHint to bestHeight without registering the notification client first (that way new blocks can come in without altering our client), and then keep doing this till you know for sure that you have scanned all they way to the current best height (can be enforced with a mutex). Then you register the client (within the same scope of the mutex), making sure no block can come in in the meantime.

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

I was probably wrong, looks like the tcn.Lock acquired in UpdateConfDetails makes sure that no new block gets processed before we are done with the notifying, so we should be good 👍

TxID: txid,
NumConfirmations: numConfs,
Event: chainntnfs.NewConfirmationEvent(numConfs),
},
heightHint: heightHint,
}
if err := n.txConfNotifier.Register(&ntfn.ConfNtfn); err != nil {

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

Can we just move this call to where we are handling the confirmationsNotification, before we start the historical dispatch?

This comment has been minimized.

@wpaulino

wpaulino Jul 30, 2018

Collaborator

We can, but I figured it made more sense having it here like we do with NotifySpent in RegisterSpendNtfn.

This comment has been minimized.

@halseth

halseth Jul 31, 2018

Collaborator

For readability I think it would make sense to move it there, as they always need to be called in that order, right after one another, right?

This comment has been minimized.

@wpaulino

wpaulino Aug 1, 2018

Collaborator

Register must now be called first. Since it's possible that it fails, we should return the error to the caller rather than just logging it, as otherwise the caller will be expecting to receive the confirmation details for a failed registration. At the moment, it's unlikely for it to return an error other than when shutting down, but it could prevent an obscure bug later down the road.

if confDetails != nil {
err := n.txConfNotifier.UpdateConfDetails(
*msg.TxID, msg.ID,

This comment has been minimized.

@halseth

halseth Jul 27, 2018

Collaborator

Not sure if we actually need the new ID if we can just pass the ConfNtfn in here instead.

This comment has been minimized.

@wpaulino

wpaulino Jul 30, 2018

Collaborator

Same comment as above w.r.t. ensuring the notification has been registered first.

@wpaulino wpaulino force-pushed the wpaulino:async-rescans branch from cfab092 to 648f810 Jul 30, 2018

return nil
}
// The notifier has yet to reach the height at which the transaction was

This comment has been minimized.

@Roasbeef

Roasbeef Jul 30, 2018

Member

This is a bit confusing, how would we ever know in advance if the conf intent is already satisfied? Seems like it should read DispatchBlockHeight here instead of just a block height.

This comment has been minimized.

@Roasbeef

Roasbeef Jul 30, 2018

Member

If it's the block height that the transaction was initially confirmed at, why can we determine this independent of the notifier itself advancing? Do we have a test to exercise this case?

This comment has been minimized.

@wpaulino

wpaulino Jul 30, 2018

Collaborator

The reason behind the check is due to the lag between the TxConfNotifier and the backend. For example, let's say that the notifier starts at height 100 and the the transaction confirms at 102. The backend moves up to 102, but the TxConfNotifier is still processing height 101. In the event that UpdateConfDetails is called before ConnectTip(..., 102, ...), the notifier isn't yet aware of this height, so we defer until handling the notification dispatch until then.

Not sure what you mean by DispatchBlockHeight. Are you referring to renaming the method UpdateConfDetails to that?

// included in the active chain. We'll do this
// in a goroutine to prevent blocking
// potentially long rescans.
go func() {

This comment has been minimized.

@Roasbeef

Roasbeef Jul 30, 2018

Member

Yeah I don't think it's needed here. Otherwise a very distant rescan can cause the shutdown process to be held up.

rescanUpdate := []neutrino.UpdateOption{
neutrino.AddTxIDs(*msg.TxID),
neutrino.Rewind(currentHeight),
}
if err := n.chainView.Update(rescanUpdate...); err != nil {
chainntnfs.Log.Errorf("unable to update rescan: %v", err)
err = n.chainView.Update(rescanUpdate...)

This comment has been minimized.

@Roasbeef

Roasbeef Jul 30, 2018

Member

What do you mean by this @halseth ?

@@ -237,7 +241,7 @@ out:
b.spendNotifications[op] = make(map[uint64]*spendNotification)
}
b.spendNotifications[op][msg.spendID] = msg
b.chainConn.NotifySpent([]*wire.OutPoint{&op})

This comment has been minimized.

@Roasbeef

Roasbeef Jul 30, 2018

Member

Looks like this was a duplicate call even? We already do this at notification registration site.

@Roasbeef

Tested this on my testnet nodes with 50+ channels. Before this commit, the start up time was measured in the minutes (time from startup to responding to getinfo). After this commit, the start up time is measured in seconds. The one lingering issue which we've discussed offline, is the fact that the Start() method is blocking which can cause contention with the main server mutex. Once that final issue is patched, restarts will be much, much speedier.

LGTM 🦑

@cfromknecht

Nice job @wpaulino, looks pretty complete to me!

started int32 // To be used atomically.
stopped int32 // To be used atomically.
confClientCounter uint64 // To be used atomically.

This comment has been minimized.

@cfromknecht

cfromknecht Jul 31, 2018

Collaborator

recommend putting 64-bit atomic vars before 32-bit ones, as this pattern always produces valid alignments on 32-bit machines

This comment has been minimized.

@wpaulino

wpaulino Aug 1, 2018

Collaborator

Fixed.

started int32 // To be used atomically.
stopped int32 // To be used atomically.
confClientCounter uint64 // To be used aotmically.

This comment has been minimized.

@cfromknecht

cfromknecht Jul 31, 2018

Collaborator

same comment here about moving 64-bit vals before 32's, s/aotmically/atomically

This comment has been minimized.

@wpaulino

wpaulino Aug 1, 2018

Collaborator

Fixed.

// The notifier has yet to reach the height at which the transaction was
// included in a block, so we should defer until handling it then within
// ConnectTip.
if details.BlockHeight > tcn.currentHeight {

This comment has been minimized.

@cfromknecht

cfromknecht Jul 31, 2018

Collaborator

this line will panic if details is nil, which is possible as written. Might want to add a test for this

This comment has been minimized.

@wpaulino

wpaulino Aug 1, 2018

Collaborator

Fixed.

var (
// ErrTxConfNotifierExiting is an error returned when attempting to
// interact with the TxConfNotifier but it been shut down.
ErrTxConfNotifierExiting = errors.New("TxConfNotifier is exiting")

This comment has been minimized.

@cfromknecht

cfromknecht Jul 31, 2018

Collaborator

👍

// included in the active chain. We'll do this
// in a goroutine to prevent blocking
// potentially long rescans.
go func() {

This comment has been minimized.

@cfromknecht

cfromknecht Jul 31, 2018

Collaborator

we could thread a quit chan down to the rescan if we want. thinking it might be a good idea actually, to prevent the rescan from causing an accidental panic after the chain conns are deinitialized

return nil
}
// UpdateConfDetails attempts to update the confirmation details for the an

This comment has been minimized.

@halseth

halseth Jul 31, 2018

Collaborator

nit: the

This comment has been minimized.

@wpaulino

wpaulino Aug 1, 2018

Collaborator

Fixed.

rescanUpdate := []neutrino.UpdateOption{
neutrino.AddTxIDs(*msg.TxID),
neutrino.Rewind(currentHeight),
}
if err := n.chainView.Update(rescanUpdate...); err != nil {
chainntnfs.Log.Errorf("unable to update rescan: %v", err)
err = n.chainView.Update(rescanUpdate...)

This comment has been minimized.

@halseth

halseth Jul 31, 2018

Collaborator

We are passing in currentHeight, which might be outdated at this point (because of potential long-running historicalConfDetails). I think this should be okay, but just meant that we should check that it indeed is.

@halseth

This comment has been minimized.

Collaborator

halseth commented Jul 31, 2018

Mostly LGTM now (after addressing Conner's comments). I still think the Register should be moved though, as that will entail a smaller structural change from the existing code, and aids readability IMO. It also makes it possible to simplify a bit by removing the ConfIDs.

wpaulino added some commits Jul 27, 2018

chainntnfs/txconfnotifier: make confirmation notifcation registration…
… async

In this commit, we modify our TxConfNotifier struct to allow handling
notification registrations asynchronously. The Register method has been
refactored into two: Register and UpdateConfDetails. In the case that a
transaction we registered for notifications on has already confirmed,
we'll need to determine its confirmation details on our own. Once done,
this can be provided within UpdateConfDetails.

This change will pave down the road for our different chain notifiers to
handle potentially long rescans asynchronously to prevent blocking the
caller.

@wpaulino wpaulino force-pushed the wpaulino:async-rescans branch from 648f810 to 8f5f3bc Aug 1, 2018

@Roasbeef

LGTM 💣

@Roasbeef Roasbeef merged commit ad7f87e into lightningnetwork:master Aug 1, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.1%) to 54.648%
Details

@wpaulino wpaulino deleted the wpaulino:async-rescans branch Aug 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment