Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on NoRouteToHostException #26

Closed
rfrovarp opened this issue Dec 20, 2013 · 19 comments
Closed

Retry on NoRouteToHostException #26

rfrovarp opened this issue Dec 20, 2013 · 19 comments

Comments

@rfrovarp
Copy link

In our environment, it is entirely possible for systems to be trying to connect to our RabbitMQ server, while the server itself is down. Trying a new connection to a system that is powered off results in a NoRouteToHostException. In our case it would certainly make sense for that to be a retryable error.

@jhalterman
Copy link
Owner

That's a new one for me. Do you happen to have a stacktrace? I'm curious where it gets thrown.

@rfrovarp
Copy link
Author

Sorry, should have included that to begin with:

java.net.NoRouteToHostException: No route to host
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618)
at com.rabbitmq.client.ConnectionFactory.createFrameHandler(ConnectionFactory.java:445)
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:504)
at net.jodah.lyra.internal.ConnectionHandler$3.call(ConnectionHandler.java:214)
at net.jodah.lyra.internal.ConnectionHandler$3.call(ConnectionHandler.java:205)
at net.jodah.lyra.internal.RetryableResource.callWithRetries(RetryableResource.java:46)
at net.jodah.lyra.internal.ConnectionHandler.createConnection(ConnectionHandler.java:205)
at net.jodah.lyra.internal.ConnectionHandler.createConnection(ConnectionHandler.java:179)
at net.jodah.lyra.internal.ConnectionHandler.(ConnectionHandler.java:61)
at net.jodah.lyra.Connections.create(Connections.java:65)

Right after that is my code calling Connections.create(options, retryConfig);, then the threads and web framework I'm using.

@rfrovarp
Copy link
Author

If the connection is done by hostname, and the host name resolves in DNS, that is thrown (assuming the host isn't up). If an IP is specified, and that host isn't available, the same stack trace happens with that.

If a hostname is provided that doesn't resolve in DNS, a DNS related exception is thrown, and that isn't retryable, which makes sense to me.

@jhalterman
Copy link
Owner

I tried reproducing this on my machine (running OSX) and the only way I could get a NoRouteToHostException was by adding a route (route add ...) that rejects packets. I would think this is more of a fatal condition - something that retrying wouldn't help. I'm surprised you're hitting this just by having a machine down. Maybe it's OS specific? Any idea?

@rfrovarp
Copy link
Author

Turns out it appears to be OS specific. I tried replicating on Windows and was not able to. I haven't had a chance to try on RedHat, but I'm getting the error on Ubuntu 13.10. So it appears that this is a Linux problem.

@jhalterman
Copy link
Owner

If it's just an Ubuntu thing (perhaps it's a specific JRE version?) I'd lean towards not making this exception retryable. For any failure that isn't likely to be resolved while retrying, we should just throw right away.

Another option I've though of is to make the set of failures that are retryable a configurable thing. Still, having that be OS specific is not ideal.

@jhalterman
Copy link
Owner

Closing for now unless this exception proves to be something broader or something that should be recoverable.

@sergeyleyko
Copy link

Vary sad, because it is often reproducible when servers with rabbit just restarted on amazon ec2.

@michaelklishin
Copy link
Collaborator

@jhalterman while NoRouteToHostException is clearly an infrastructure-level issue, I wonder if it's worth making it re-triable, possibly as an option. It is indeed correct that on AWS, when instances are restarted or autoscaling group adds new ones, routing can be temporarily interrupted. AWS is an 800 pound gorilla in the infra room, so may be worth making an exception for.

@sergeyleyko what do you use for the hostname parameter, private IPs or private DNS? Public DNS?

@sergeyleyko
Copy link

@michaelklishin we use just custom DNS name assigned to EC2 instance. (not Public DNS generated by AWS).
@jhalterman by the way, I cannot find the way to add this No route to host Exception to retryableExceptions list in config.

@michaelklishin
Copy link
Collaborator

@sergeyleyko so your DNS is not managed by AWS in any way?

@sergeyleyko
Copy link

@michaelklishin Yes, it is managed by 3rd party service. Just pointed to ec2 instance

@michaelklishin
Copy link
Collaborator

@sergeyleyko then this is not AWS-specific. Still, I think it's reasonable to expect NoRouteToHostException to be re-triable in my opinion.

@sergeyleyko
Copy link

@michaelklishin you are right. by the way, to add exception just use config.getRecoverableExceptions().add(newExceptionToRecover)

@jhalterman
Copy link
Owner

@sergeyleyko If NoRouteToHost actually is recoverable, I'm fine with adding it to the default list. Curious: how long does it usually take to recover after a restart and a NoReouteToHostException? How long before the DNS issues are resolved?

@numbat
Copy link

numbat commented Feb 3, 2016

@jhalterman I was wondering if adding NoRouteToHostException is going to be added to the default list (or is this something that we can edit?). We're getting this issue on a deployment where the RabbitMQ instance goes down briefly each night and right now we have to do a corresponding restart of our services using Lyra on ubuntu instances.

@jhalterman
Copy link
Owner

@numbat I'm super tied up on other projects at the moment, but two options:

@jhalterman jhalterman reopened this Feb 3, 2016
@numbat
Copy link

numbat commented Feb 3, 2016

Cheers will do, thanks!

jhalterman added a commit that referenced this issue Feb 3, 2016
Adding NoRouteToHost to permanent list of recurring exceptions #26
@michaelklishin
Copy link
Collaborator

Was addressed in 99ca312.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants