Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce fundingmanager chan send deadlock scenarios #1301

Merged
merged 10 commits into from Jun 2, 2018

Conversation

halseth
Copy link
Contributor

@halseth halseth commented May 30, 2018

This PR consists of several commits meant to simplify and reduce the cases where we can end up deadlocking on sending errors on the error channels within fundingmanager.

The biggest change is that we now determine whether to send on the resCtx.err channel inside failFundingFlow.

After this change we will send on the resCtx.err channel only after a call to cancelReservationCtx, making sure it won't be attempted again (which would cause a deadlock).

We also increase the buffer of the updateChan to 2, so we can safely send on it during the funding workflow without blocking.

Fixes #1254.

f.failFundingFlow(fmsg.peerAddress.IdentityKey,
pendingChanID, err)
fmsg.msg.ChanID, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


// If this was the last active reservation for this peer, delete the
// peer's entry altogether.
if len(nodeReservations) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

fndgLog.Warnf("unable to delete reservation: %v", err)
return
}
resCtx.err <- err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem that this actually solves the issue this PR was meant to. It should be safe to call handleErrorMsg in a completely async fashion.

Copy link
Contributor Author

@halseth halseth Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should solve it, as we now cancel the reservation before we send the error on the channel, while before we did it after.

Good point about the async calling though, I will attempt to write a test that exercises that and make sure all necessary mutexes are in place.

UPDATE: I just checked, and handleErrorMsg is only called from the reservationCoordinator, which means it should only be called from one thread at a time. However, this was also the case before this change. We could select on the resCtx.err channel to make sure we don't deadlock, but I'm reluctant to deploy such a fix that is only really mitigating a different bug. This change greatly simplifies the ways the error could be sent, so if it happens again it should at least be much easier to track down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit that makes sure cancelReservationCtx holds the mutex all the way through. Now a call to cancelReservationCtx should be always return the resCtx or an error, and since we now only send on the error channel after it is cancelled (exception is in CancelPeerReservations, but that's under the mutex) we should be sage.

@halseth halseth force-pushed the fundingmanager-double-error branch from 784d435 to 3d96462 Compare June 1, 2018 06:55
This commit changes cancelReservationCtx to gold the resMtx from start
to finish. Earlier it would lock at different times only when accessing
the maps, meaning that other goroutines (I'm looking at you
PeerTerminationWatcher) could come in and grab the context in between
locks, possibly leading to a race.
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐉

@Roasbeef Roasbeef merged commit 7e4abb4 into lightningnetwork:master Jun 2, 2018
@halseth halseth deleted the fundingmanager-double-error branch July 12, 2018 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants