Fix issues with reconnection #225
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We've noticed a small percentage of clients using jnats 2.4.3 (roughly 1-2%) were not able to reconnect when nats server restarted - they either seemed to hang or were stuck in a bad loop where a
Socket closed
exception kept being thrown every two seconds followed by a disconnected message.After looking into the issue we found the clients that seemed to hang were stuck at https://github.com/yuanmwang-wf/java-nats/blob/95a229a38eab802491593808f98501053deb05aa/src/main/java/io/nats/client/impl/NatsConnection.java#L1061, which blocks the reconnect thread. I've added an executor service which enforces the connection timeout for
readInitialInfo, checkVersionRequirements and upgradeToSecureIfNeeded
, if these operations could not finish before connection timeout, an exception is thrown which will then close the socket and the reconnect thread will be allowed to try again.Another issue we found seemed to be related to how
IOException
is saved toexceptionDuringConnectChange
inNatsConnection
. If anIOException
is thrown duringcloseSocket
while reconnecting here, that exception will be saved toexceptionDuringConnectChange
throughhandleCommunicationIssue
inNatsConnectionReader
. That exception is not cleaned up and the next reconnect attempt will fail even if the socket is created correctly and tls upgrade succeeded becauseexceptionDuringConnectChange
is not null: https://github.com/yuanmwang-wf/java-nats/blob/95a229a38eab802491593808f98501053deb05aa/src/main/java/io/nats/client/impl/NatsConnection.java#L362, this could cause the reconnect thread to be stuck in a loop if the same exception is thrown again incloseSocketImpl()
. I've added this line to resetexceptionDuringConnectChange
to null.I've been testing the reconnection story in jnats pretty extensively recently and with these changes all connections were able to reconnect in my test cases. Please let me know if you think there's a better approach. @sasbury