Skip to content

Conversation

samvrlewis
Copy link
Contributor

@samvrlewis samvrlewis commented Nov 9, 2017

Came across this as I was working on #27.

As described in the commit messages, the race condition happens when a notification is being registered to the notifier at the same time as a block containing that notification is generated. This causes both the attemptHistoricalDispatch and the checkConfirmationTrigger functions to add the same confirmation intent to the confirmation heap. It's probably a fairly rare case that's more likely to come up in tests than real use, but I think the ramifications of it occurring are fairly bad.

When the tx eventually does reach the required number of confirmations, this causes the confirmation to be written to the Confirmed channel twice and, because the channel is buffered with a size of 1, this can cause the notifier to block until the channel is read from at least once.

Running the new test (testLazyNtfnConsumer) on my PC without the fix to btcd.go applied, the test hangs forever about half of the time. With the fix it doesn't hang. (As a side note, this is probably my go noobiness speaking but why does it hang forever? The test looks like it runs in its own goroutine to the notifier's dispatch loop and the time.After(20 * time.Second) should time it out, shouldn't it?).

I also noticed that, in this scenario of the transaction being included in a block that the notifier has yet to process, the current height that's passed in to the historical dispatcher will be incorrect. This has the follow on effect of making the target confirmation height wrong. My third commit addresses this.

As always, yell at me if I've messed anything up. 😄

This test adds a test for a consumer that registers for a transaction
confirmation but takes some time to check if that confirmation has
occured.

The test reveals a race condition that can cause btcdnotify to add a
confirmation entry to its internal heap twice. If the notification
consumer is not prompt in reading from the confirmation channel, this
can cause the notifier to block indefinitely.
This race condition can occur if a transaction is included in a block
right when a notification is being added to the notifier for it AND when
the confirmation requires > 1 confirmations. In this case, the
confirmation gets added to the confirmation heap twice.
@samvrlewis samvrlewis force-pushed the btcd_chainntf_double_heap_add branch from a419738 to addee0e Compare November 9, 2017 11:11
Copy link
Contributor

@jimpo jimpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find -- looks like a good fix to me.

However, reading over this file, it looks like reorgs are not handled safely. That should probably be fixed separately. For example, a tx can get added to the confHeap then reorged out of the chain and it will still dispatch a confirmation notification at the original confirmation height.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip this and just do confHeight := block.Height + msg.numConfirmations - 1? Then the method wouldn't even need currentHeight as a parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like way better too. Have updated it now.

@cfromknecht
Copy link
Contributor

Great job @samvrlewis, this is a big win for reliability! I've tested the PR locally, your fix and tests are solid 🚀

Roasbeef
Roasbeef previously approved these changes Nov 11, 2017
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix!

LGTM 🌟

One you squash commit e4fd319 and 12ac489 into a single commit. I'll get this merged 👍

In the historical dispatch of btcdnotify, the dispatcher checks if a
transaction has been included in a block. If this check happens before
the notifier has processed the update, it's possible that the
currentHeight of the notifier and the currentHeight of the chain might
be out of sync which causes an off by one error when calculating a
target height for the transaction confirmation. This change uses the
height of the block the transaction was found in, rather than the
currentHeight that's known by the notifier to eliminate this.
@samvrlewis
Copy link
Contributor Author

Squashed!

@Roasbeef Roasbeef merged commit 1f95b66 into lightningnetwork:master Nov 14, 2017
@samvrlewis samvrlewis deleted the btcd_chainntf_double_heap_add branch November 20, 2017 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants