Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix close deadlock #21

Merged
merged 3 commits into from
Dec 3, 2015
Merged

Fix close deadlock #21

merged 3 commits into from
Dec 3, 2015

Conversation

whyrusleeping
Copy link
Collaborator

There was a race condition, where upon closing a conn and all its child
streams, a (potential?) issue in the underlying transport library caused
the 'Close' call to hang, leading to deadlocks in the peerstream code.

This changes the lock to only be held during the closed'ness check, and
should protect go-peerstream users against issues (misunderstandings?)
in the underlying transport libs.

There was a race condition, where upon closing a conn and all its child
streams, a (potential?) issue in the underlying transport library caused
the 'Close' call to hang, leading to deadlocks in the peerstream code.

This changes the lock to only be held during the closed'ness check, and
should protect go-peerstream users against issues (misunderstandings?)
in the underlying transport libs.
@whyrusleeping
Copy link
Collaborator Author

I tried writing a test for this, but the race was very difficult to reproduce (reproduced once in 700 5 second runs). The ipfs dht tests do a better job of reproducing the issue (1/20 or so).

@whyrusleeping whyrusleeping mentioned this pull request Dec 1, 2015
14 tasks
if c.closed {
c.closeLock.Unlock()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wont work because then a second caller could exit way before the Conn is actually closed. maybe make them wait?

c.closeLock.Lock()
if c.closed {
  c.closeLock.Unlock()
  <-c.closeChan
  return c.closeErr
}
c.closed = true
c.closeLock.Unlock()
defer func() {
  c.closeChan<- struct{}{}
}()

or use https://github.com/jbenet/goprocess and avoid the close lock madness alltogether, which also gives us nice errors. (goproc doesnt have the UX bugs, it's goproc/ratelimit that has caused the problems in the last few months)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making them wait is the whole problem i'm trying to avoid, lol. The code in your example is functionally equivalent to the code causing the issue we have now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that most callers of Close don't really care to wait until it completes, they just perform some routine cleanup as part of some other method. Asking them all to wait until its 'done' isnt what we want (either that or we need a BeginClose method, or some way of signifying we dont care to wait)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then those callers should say go s.Close() -- the semantics of Close() are that it may be stuck waiting for it to fully close, and if something shouldnt be waiting, then it should really not be waiting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats also something we don't want though, we would end up with tons of random goroutines hanging around for an indeterminate period of time.

Are you opposed to a method that calls close, and exits early if its being closed already like this one? maybe StartClosing or something?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, call it c.GoClose() and have it spin a goroutine if it does need to
close (i.e. do honor the "go"). in practice it wont happen

On Wed, Dec 2, 2015 at 12:49 PM Jeromy Johnson notifications@github.com
wrote:

In conn.go
#21 (comment):

if c.closed {
  •   c.closeLock.Unlock()
    

thats also something we don't want though, we would end up with tons of
random goroutines hanging around for an indeterminate period of time.

Are you opposed to a method that calls close, and exits early if its being
closed already like this one? maybe StartClosing or something?


Reply to this email directly or view it on GitHub
https://github.com/jbenet/go-peerstream/pull/21/files#r46473117.

@jbenet
Copy link
Owner

jbenet commented Dec 2, 2015

@whyrusleeping good find

@whyrusleeping
Copy link
Collaborator Author

alright, that should address the problem, without any changes needed in go-ipfs.

jbenet added a commit that referenced this pull request Dec 3, 2015
@jbenet jbenet merged commit f3ab207 into master Dec 3, 2015
@jbenet
Copy link
Owner

jbenet commented Dec 3, 2015

ok thanks

so many locks!

@ghost ghost mentioned this pull request Dec 8, 2015
@jbenet jbenet deleted the fix/race-lock branch December 8, 2015 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants