-
Notifications
You must be signed in to change notification settings - Fork 2.2k
chainntnfs: Fix btcdnotify race condition causing double heap add #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chainntnfs: Fix btcdnotify race condition causing double heap add #402
Conversation
This test adds a test for a consumer that registers for a transaction confirmation but takes some time to check if that confirmation has occured. The test reveals a race condition that can cause btcdnotify to add a confirmation entry to its internal heap twice. If the notification consumer is not prompt in reading from the confirmation channel, this can cause the notifier to block indefinitely.
This race condition can occur if a transaction is included in a block right when a notification is being added to the notifier for it AND when the confirmation requires > 1 confirmations. In this case, the confirmation gets added to the confirmation heap twice.
a419738
to
addee0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find -- looks like a good fix to me.
However, reading over this file, it looks like reorgs are not handled safely. That should probably be fixed separately. For example, a tx can get added to the confHeap then reorged out of the chain and it will still dispatch a confirmation notification at the original confirmation height.
chainntnfs/btcdnotify/btcd.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we skip this and just do confHeight := block.Height + msg.numConfirmations - 1
? Then the method wouldn't even need currentHeight
as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like way better too. Have updated it now.
Great job @samvrlewis, this is a big win for reliability! I've tested the PR locally, your fix and tests are solid 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the historical dispatch of btcdnotify, the dispatcher checks if a transaction has been included in a block. If this check happens before the notifier has processed the update, it's possible that the currentHeight of the notifier and the currentHeight of the chain might be out of sync which causes an off by one error when calculating a target height for the transaction confirmation. This change uses the height of the block the transaction was found in, rather than the currentHeight that's known by the notifier to eliminate this.
12ac489
to
232eeaf
Compare
Squashed! |
Came across this as I was working on #27.
As described in the commit messages, the race condition happens when a notification is being registered to the notifier at the same time as a block containing that notification is generated. This causes both the
attemptHistoricalDispatch
and thecheckConfirmationTrigger
functions to add the same confirmation intent to the confirmation heap. It's probably a fairly rare case that's more likely to come up in tests than real use, but I think the ramifications of it occurring are fairly bad.When the tx eventually does reach the required number of confirmations, this causes the confirmation to be written to the
Confirmed
channel twice and, because the channel is buffered with a size of 1, this can cause the notifier to block until the channel is read from at least once.Running the new test (
testLazyNtfnConsumer
) on my PC without the fix tobtcd.go
applied, the test hangs forever about half of the time. With the fix it doesn't hang. (As a side note, this is probably my go noobiness speaking but why does it hang forever? The test looks like it runs in its own goroutine to the notifier's dispatch loop and thetime.After(20 * time.Second)
should time it out, shouldn't it?).I also noticed that, in this scenario of the transaction being included in a block that the notifier has yet to process, the current height that's passed in to the historical dispatcher will be incorrect. This has the follow on effect of making the target confirmation height wrong. My third commit addresses this.
As always, yell at me if I've messed anything up. 😄