Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Client fail-fast (hazelcast.client.max.concurrent.invocations) not working for async operations causing OOM #8568
Client has a configurable hazelcast.client.max.concurrent.invocations property which limits outstanding client requests. We have observed at some cases that for async calls it is possible that this limit may not work as expected and this may cause outstanding requests to grow which may cause OOM.
When examining the issue we observe a lot of client messages in heap. We see that both the request and response messages are in the heap. This is most probably caused by slow Callback executions. Here is a test case for generation of the issue:
Observation: The client controls number of outstanding invocations using the correlation id. It increases it when it registers a request to be sent and the number is decreased when the response is received from the tcp channel for that request, but it is decremented before notifying the future (at ResponseThread.handleClientMessage ). Hence, this control is not including the part including and after the invocation notify. Hence, the client shall continue sending new requests because the counter is decreased and this may lead a lot of responses (more than the configured overload limit) being processed at the invocation.notify stage. This is especially true for async calls where they have andThen logic.
CallIdSequence will be completed to accept next invocation only when the callbacks running on internal executors are completed. A second change made to achieve back pressure safely. If response is already available when andThen is called with an internal callback. Then internal callback runs on calling thread instead of executor. Since it is already not permitted to do any blocking call in internal threads, this will achieve a natural backpressure. fixes hazelcast#9665 fixes hazelcast#8568