Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client may hang during initial connection #65

Closed
jhalterman opened this issue May 13, 2015 · 4 comments
Closed

Client may hang during initial connection #65

jhalterman opened this issue May 13, 2015 · 4 comments
Assignees
Milestone

Comments

@jhalterman
Copy link

@jhalterman jhalterman commented May 13, 2015

There was an issue reported to Lyra that seems to have more to do with the rabbitmq-java-client. Basically the problem is that when rabbitmq connections are proxied through an AWS elastic load balancer, ELB might accept a TCP connection but not respond to the initial handshake which leaves the client hanging forever. ELB may even close the connection, but I believe the BlockingCell is just left waiting forever. Here's a call stack from a mocked up test that reproduces this scenario:

Thread [lyra-recovery-1] (Suspended)    
    waiting for: BlockingValueOrException<V,E>  (id=28) 
    Object.wait(long) line: not available [native method]   
    BlockingValueOrException<V,E>(Object).wait() line: 503  
    BlockingValueOrException<V,E>(BlockingCell<T>).get() line: 50   
    BlockingValueOrException<V,E>(BlockingCell<T>).uninterruptibleGet() line: 89    
    BlockingValueOrException<V,E>.uninterruptibleGetValue() line: 33    
    AMQChannel$SimpleBlockingRpcContinuation(AMQChannel$BlockingRpcContinuation<T>).getReply() line: 348    
    AMQConnection.start() line: 294 
    ConnectionFactory.newConnection(ExecutorService, Address[]) line: 603   
    ConnectionHandler$3.call() line: 243    
    ConnectionHandler$3.call() line: 236    
    ConnectionHandler(RetryableResource).callWithRetries(Callable<T>, RecurringPolicy<?>, RecurringStats, Set<Class<Exception>>, boolean, boolean) line: 51 
    ConnectionHandler.createConnection(RecurringPolicy<?>, Set<Class<Exception>>, boolean) line: 236    
    ConnectionHandler.recoverConnection() line: 271 
    ConnectionHandler.access$100(ConnectionHandler) line: 41    
    ConnectionHandler$ConnectionShutdownListener$1.run() line: 95   
    ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1110  
    ThreadPoolExecutor$Worker.run() line: 603   
    Thread.run() line: 722  

The first idea that comes to my mind is that everything that happens inside AMQConnection.start() should be covered by the connection timeout setting and/or an eventual connection closure should unblock the BlockingCell.

@jhalterman jhalterman changed the title Connection timeout does not cover initial handshake Client hangs during initial handshake May 13, 2015
@jhalterman jhalterman changed the title Client hangs during initial handshake Client may hang during initial connection May 13, 2015
@michaelklishin
Copy link
Member

@michaelklishin michaelklishin commented May 13, 2015

There is a socket read timeout but your observations may be correct, thank you.

@jhalterman
Copy link
Author

@jhalterman jhalterman commented May 13, 2015

Updated OP with stacktrace taken against version 3.5.2.

@michaelklishin
Copy link
Member

@michaelklishin michaelklishin commented May 13, 2015

@jhalterman can you try #66?

@jhalterman
Copy link
Author

@jhalterman jhalterman commented May 13, 2015

@michaelklishin #66 works well for me and looks to resolve the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.