-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TimeoutException during new connection creation stops processing #262
Comments
When auto-recovery is disabled. References #262
@shakuzen I pushed a snapshot of 4.0.3, can you give it a try? Thanks! |
@shakuzen I released 4.0.3.RC1 on our Bintray Milestone repo, can you give it a try? Note there's also 4.1.1.RC1 which contains the bug fix as well. We'll release both GA as-is in the next few days. |
@acogoluegnes yesterday I gave the snapshots a run through the automated tests for one of our producer applications and did some ad hoc testing. Everything worked fine. Unfortunately, I was never able to reproduce the exact failure described in this issue. I've only ever had it happen in our production environment and my efforts to simulate it locally have been futile. If you have any ideas how the behavior could be forcefully reproduced, please let me know. I tried blocking outbound traffic on the AMQP port for one RabbitMQ instance hoping it would result in a |
@shakuzen no worries, thank you for the testing. There definitely are problems that are very hard to isolate and the only real way to confirm that they've been addressed is after not seeing them in production for a while. |
@shakuzen I think you could use qdisc to simulate the necessary conditions to reproduce using a delay just under the connection timeout, so the TCP connection is established but the handshake times out:
|
@noahhaon that you, |
@michaelklishin you bet - FYI the alternative for *BSD and OSX is http://info.iet.unipi.it/~luigi/dummynet/ Some more details on using https://wiki.linuxfoundation.org/networking/netem Cheers |
@shakuzen You can try the RC on your production environment and tell us if the bug fix works. It's not risky, the only change is the bug fix. |
As asked on the mailing list, there appears to be some unexpected behavior in the logic for creating a new connection.
/cc @michaelklishin
Scenario
GIVEN: multiple addresses available (a cluster of RabbitMQ nodes); first address (node) is unresponsive, second address (node) is healthy
WHEN: creating a new connection
THEN: a connection is successfully returned
Expected behavior
Connecting to the first node fails (after a timeout, i.e. handshake timeout), but connecting to the second node is attempted and succeeds. A connection to the second node should be returned.
Actual behavior
Connecting to the first node fails and a
TimeoutException
is thrown; no attempt is made to connect to any other node, as theTimeoutException
is uncaught. Note thatIOException
, on the other hand, is caught and would lead to the next address, if any, being tried.Solution
Would simply catching
TimeoutException
and treating it the same asIOException
(see specific code portions below) be the right thing to do here?Supplements
Stacktrace
Relevant portion of a stacktrace from this scenario:
Code
The code where this behavior manifests is in
com.rabbitmq.client.ConnectionFactory#newConnection(java.util.concurrent.ExecutorService, com.rabbitmq.client.AddressResolver, java.lang.String)
(lines 896 - 917):And for
AutorecoveringConnection
s the same issue would happen (though I haven't actually tried since auto-recovery is disabled in current versions of Spring AMQP). Fromcom.rabbitmq.client.impl.recovery.RecoveryAwareAMQConnectionFactory
:(not directly related to this issue but I thought the difference of shuffling the address list in the
RecoveryAwareAMQConnectionFactory
but not in the regularConnectionFactory
was interesting)The text was updated successfully, but these errors were encountered: