Client may hang during initial connection #65

Closed
jhalterman opened this Issue May 13, 2015 · 4 comments

Comments

Projects
None yet
2 participants
@jhalterman

There was an issue reported to Lyra that seems to have more to do with the rabbitmq-java-client. Basically the problem is that when rabbitmq connections are proxied through an AWS elastic load balancer, ELB might accept a TCP connection but not respond to the initial handshake which leaves the client hanging forever. ELB may even close the connection, but I believe the BlockingCell is just left waiting forever. Here's a call stack from a mocked up test that reproduces this scenario:

Thread [lyra-recovery-1] (Suspended)    
    waiting for: BlockingValueOrException<V,E>  (id=28) 
    Object.wait(long) line: not available [native method]   
    BlockingValueOrException<V,E>(Object).wait() line: 503  
    BlockingValueOrException<V,E>(BlockingCell<T>).get() line: 50   
    BlockingValueOrException<V,E>(BlockingCell<T>).uninterruptibleGet() line: 89    
    BlockingValueOrException<V,E>.uninterruptibleGetValue() line: 33    
    AMQChannel$SimpleBlockingRpcContinuation(AMQChannel$BlockingRpcContinuation<T>).getReply() line: 348    
    AMQConnection.start() line: 294 
    ConnectionFactory.newConnection(ExecutorService, Address[]) line: 603   
    ConnectionHandler$3.call() line: 243    
    ConnectionHandler$3.call() line: 236    
    ConnectionHandler(RetryableResource).callWithRetries(Callable<T>, RecurringPolicy<?>, RecurringStats, Set<Class<Exception>>, boolean, boolean) line: 51 
    ConnectionHandler.createConnection(RecurringPolicy<?>, Set<Class<Exception>>, boolean) line: 236    
    ConnectionHandler.recoverConnection() line: 271 
    ConnectionHandler.access$100(ConnectionHandler) line: 41    
    ConnectionHandler$ConnectionShutdownListener$1.run() line: 95   
    ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1110  
    ThreadPoolExecutor$Worker.run() line: 603   
    Thread.run() line: 722  

The first idea that comes to my mind is that everything that happens inside AMQConnection.start() should be covered by the connection timeout setting and/or an eventual connection closure should unblock the BlockingCell.

@jhalterman jhalterman changed the title from Connection timeout does not cover initial handshake to Client hangs during initial handshake May 13, 2015

@jhalterman jhalterman changed the title from Client hangs during initial handshake to Client may hang during initial connection May 13, 2015

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
@michaelklishin

michaelklishin May 13, 2015

Member

There is a socket read timeout but your observations may be correct, thank you.

Member

michaelklishin commented May 13, 2015

There is a socket read timeout but your observations may be correct, thank you.

@jhalterman

This comment has been minimized.

Show comment
Hide comment
@jhalterman

jhalterman May 13, 2015

Updated OP with stacktrace taken against version 3.5.2.

Updated OP with stacktrace taken against version 3.5.2.

@michaelklishin

This comment has been minimized.

Show comment
Hide comment
Member

michaelklishin commented May 13, 2015

@jhalterman can you try #66?

@jhalterman

This comment has been minimized.

Show comment
Hide comment
@jhalterman

jhalterman May 13, 2015

@michaelklishin #66 works well for me and looks to resolve the issue!

@michaelklishin #66 works well for me and looks to resolve the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment