Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Zuul/Hystrix: HTTP connection leak in case of timeout #327
We came into scenarios where Zuul is leaking http connections (i.e. not returned to the pool) after execution. After a while, the pool's maximum limit is hit and Zuul can't provide any service anymore.
In short, it turns out that Zuul leaks HTTP connections if a service takes longer than the Hystrix execution timeout but before the ReadTimeout occurs.
We could reproduce the case using a Zuul instance left to its default settings except for the http pool size which is set to 1 to rapidly reproduce the problem:
Our test service exposes a simple REST API with a single argument to tell it how much time it should wait before returning a result. The service is proxied by Zuul such that a request for
Invoke the service with a delay of 500ms. This delay (simulated execution time) is lower than all Zuul timeout (Hystrix and Ribbon ReadTimeout).
Invoke the service with a delay of 1500ms.
From the logs:
Strange enough, the log says 'Connection can be kept alive indefinitely' - so it seems to care about the connection, but it is never returned to the pool.
Look at the
Invoke the service with a delay of 6000ms
Change the configuration to have the Hystrix timeout higher than the Ribbon ReadTimeout:
A request with a delay of 5500ms (readTimeout < service exec time < hystrixTimeout)
A request with a delay of 6500ms (readTimeout < hystrixTimeout < service exec time)
It turns out that Zuul leaks HTTP connections if a service takes longer than the Hystrix execution timeout but less than the ReadTimeout. To summarise:
Changing the Hystrix isolation from SEMAPHORE to THREAD doesn't solve the problem (
The problem can be reproduced with the above scenarios. However, I have no idea of where to look in the code to address the issue.
This issue look similar to:
To be honest, I could not even find where the HTTP connection is supposed to be closed and returned to the pool. The reactive code style of Hystrix is so hard to debug.. :-(
Under normal circumstances, the
However, because the
To fix the issue temporarily, I made changes to the
This is more a hack than a real fix for the issue - but it works and illustrates clearly the problem. A cleaner solution requires a better understanding of Hystrix and/or rxjava...
I forgot to mention that the proposed solution works for both the SEMAPHORE and THREAD execution strategies, and whether or not the executing thread is interrupted in case of timeout.
What do you think?