Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential race condition in writing while connection drops #45

Closed
whyrusleeping opened this issue Nov 30, 2016 · 5 comments
Closed

potential race condition in writing while connection drops #45

whyrusleeping opened this issue Nov 30, 2016 · 5 comments

Comments

@whyrusleeping
Copy link
Contributor

I've been having some issues with yamux writes hanging lately. See goroutine 14008165 in: https://ipfs.io/ipfs/QmQn8YDTHWJzF7GVKUKanBzoxbzd5jQpwsgM2PfFVTKuGR

I think its also correlated with the internet connection of the machine its running on dropping during a write. Heres what i suspect is happening:

  • A write on a stream goes through, and consumes the entire sendWindow (resulting in s.sendWindow being not zero)
  • Another write comes in, and is delegated to the WAIT code (where we see the write being stuck in the linked stack trace)
  • At this point, the connection drops, resulting in us not receiving the window update
  • Now, since the writer is waiting on either a timeout (which by default is not set) or a notification is received on the recvNotifyCh, we have to hope we receive that notification.

I'm not sure if the above is exactly whats happening, but i'm quite confident that if we somehow ended up in the write wait loop after the stream has been closed, its possible that the sendNotifyCh signal got missed and we will block forever. To address that possibility, I think that we should close the notify channels when the streams get closed, so that they are always ready to receive on.

cc @slackpad

@whyrusleeping
Copy link
Contributor Author

Alternatively, we can just add the streams session shutdown channel to the wait loop select.

@whyrusleeping
Copy link
Contributor Author

I'm more convinced of this being an issue with the internet connection of a node dropping. when i see this failure, multiple different yamux connections get hung all at the same time, which indicates to me some external trigger for the issue.

@r0l1
Copy link
Contributor

r0l1 commented Sep 19, 2018

Any updates on this? Why is this not merged?

@whyrusleeping
Copy link
Contributor Author

@r0l1 unclear if it solves any problem.

@evanphx
Copy link
Contributor

evanphx commented Jul 28, 2022

Given the age of this issue, I'm going to go ahead and close it. #46 attempts to handle a situation which is already handled, namely that a session that dies force closes all it's streams, so there shouldn't be a hang from that.

@evanphx evanphx closed this as completed Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants