-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reconnect() calls close() when simultaneously 'disconnecting' in another thread #1082
Fix reconnect() calls close() when simultaneously 'disconnecting' in another thread #1082
Conversation
…another thread In the unlikely but possible scenario where the reader/writer threads would call into `handleCommunicationIssue` twice, `disconnecting` could be set by the other thread just before going into `reconnect()`. This would cause these lines to exit the reconnecting loop: ```java if (isDisconnectingOrClosed() || this.isClosing()) { keepGoing = false; } ``` Which would then call into `close()` here: ```java if (!isConnected()) { this.close(); return; } ``` This PR fixes that by not checking the `disconnected` flag, instead checking `isClosed() || isClosing()` as it was before this commit: c4d50b2#diff-bbcf614ee7ee4ccd15b22d1f95fa40e4e642098386b6d1d14127dab5968d89feR204 Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
WARNING: This code is purely meant to illustrate the reproduction steps, and not meant to be directly merged.
Code containing a (hacky) test that reproduces the issue can be found here: 1958b8a |
Would be happy to see how we could get a test for this case (if possible). |
As discussed privately, we probably instead want a way to check if we're already in the if (isDisconnectingOrClosed()) {
waitForDisconnectOrClose(this.options.getConnectionTimeout());
return;
} |
…tead Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Have implemented the above by checking if we're already in Still not quite sure if it's possible to write a (proper) test for this given that it's a race condition and EDIT: Using |
…oseSocket() calls Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
return; | ||
boolean wasConnected; | ||
|
||
statusLock.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both the closeSocketLock and the statusLock, or can we just rely on the closeSocketLock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed yes.
Both removing the statusLock
, solely relying on closeSocketLock
fails the tests. As well as wrapping the statusLock
around it fully, instead of using the closeSocketLock
.
@MauriceVanVeen What if we make change it from private to package scope? As long as it's not public... |
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Turned out it was already package scoped, not sure why I thought it was private.. Anyway, added a test just now which reproduces the issue with the fix commented out. The commit after uncomments the fix. Not sure why coveralls has fell in coverage now? Not really happy with how spammy it is, but don't know if there's another way to get the race condition tested. |
closeSocketThread.start(); | ||
|
||
// Ensure the closeSocket thread runs first | ||
Thread.sleep(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want these to run as close together as possible? Why didn't you just do two instances of the closeSocketThread code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't you just do two instances of the closeSocketThread code?
Running closeSocket(..)
twice would not reproduce (or at least not reliably in a test).
This part in closeSocket(..)
makes sure that while disconnecting=true
we don't run closeSocket(..)
twice:
if (isDisconnectingOrClosed()) {
waitForDisconnectOrClose(this.options.getConnectionTimeout());
return;
}
But, if we've reset disconnecting=false
and afterward call reconnect()
, this would allow another thread to go in and set disconnecting=true
. If at the same time the reconnect()
logic checks isDisconnectingOrClosed()
it would close the connection instead of staying in the reconnect loop.
It's hard to get these conditions to align, so that's why I've resorted to calling handleCommunicationIssue
continuously, which eventually calls closeSocket(...)
from different threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did simplify the test though, so have removed the closeSocketThread
and this sleep
. Only keeping the continuous handleCommunicationIssue
calls.
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
…slight throttling Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Coveralls seems to be happy now, after applying 55aa19e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In the unlikely but possible scenario where the reader/writer threads would call into
handleCommunicationIssue
twice,disconnecting
could be set by the other thread just before going intoreconnect()
.This would cause these lines to exit the reconnecting loop:
Which would then call into
close()
here:This PR fixes that by not checking the
disconnected
flag, instead checkingisClosed() || isClosing()
as it was before this commit.Related to issue reported on Slack: https://natsio.slack.com/archives/C1KU01HD2/p1708455358962679