New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bytebuf leakage when terminating the stream with a timeout #119
Comments
Maybe this could be related to #118? |
The sample application also throws this exception, which also feels a bit worrying:
|
@smaldini I reproduced directly in reactor-netty in order to be able to more easily play around with the code. I noticed two patterns of leaks, and one was displaying the hint from the block at So I changed the hint to capture the buffer's toString, and here's what I got:
The other pattern is a buffer with much more usage, especially when Click to expand
|
Currently we need to do daily restarts for an application in production or it will crash with an OutOfDirectMemoryError saying no more direct memory can be allocated (using close to 2 GB of direct memory at that point). Anything we can do in our code right now to work around this issue? |
As requested by @simonbasle on Gitter, some information about our environment. We ran and verified the leak on OSX. Our production environment runs on RHEL 7 and Tomcat 8 |
@utwyko Is it possible for you to test against the latest available milestones: |
@violetagg I just updated the example project to the versions you mentioned, and still see the leaks. It would also be interesting if @simonbasle could share how he reproduced it directly in reactor-netty, as he mentioned above. Also, as far as our production application, it's not feasible to upgrade it to Spring Boot 2 at the moment, since we are depending on libraries that are not yet Spring Boot 2 compatible and we do not want to run a pre-release version. |
I can still reproduce this leak after the update of the dependencies:
|
I just tried the sample application with a snapshot build of the |
@breun yes we know and we are working on an extended fix |
Ok great, just thought I'd check. |
@breun Hi, Will it be possible to test the new patch? Thanks in advance, Violeta |
@violetagg First of all, thank you for your efforts. I just tried your branch I've put the entire log of running the test in the sample application here: |
@utwyko from the log I see you are still using reactor-core 3.1.0.M3 |
@utwyko I tested the sample application that you provided with reactor-core 3.1.0.BUILD-SNAPSHOT and the patch provided by PR#152 and I'm not able to see leakages any more. |
I've ran the sample application with a snapshot build of the However, lowering the timeout from 80 to 30 ms (https://github.com/utwyko/reactornetty-leak-detection-test-project/blob/master/src/main/java/com/bol/reactornetty/leakdetection/leakdetection/JsonHttpClient.java#L26) does still show leaks, and for some reason request processing also stalls frequently, which makes the test (250 requests) run for a couple of minutes instead of a couple of seconds. |
My team has decided that if we don't have a fix on September 1 for this issue, we'll start migrating our production applications to another non-blocking HTTP client, because we don't want to keep running with this memory leak and automated restarts in production. We would like to be able to continue using reactor-netty, so is there anything me and my colleagues can do to help fixing this issue? We are not familiar with the reactor-netty codebase, but we are willing to invest some time into getting this fixed, so if you maybe have some pointers for us, we could maybe help? |
Why is this issue closed? I'm still seeing
|
What can I provide to assist in debugging this issue? In Kotlin I have a list of Flux and it seems that the following block never progress pass the call to subscribe(() -> {}):
|
The above block was a bit naive and out of context, so I'm refining it into a sample app that attempts to reproduce the leak. So far, the leak occurs intermittently, after repeatedly streaming (as GET requests) the same set of resources through the ExchangeFunctions / ReactorClientHttpConnector construct. Eventually my console print out produces ...
... and at this point if I've made additional requests to the Rest interface that is managing these downloads, the server hangs and the memory starts to grow uncontrollably until I run out of direct memory. I will share the sample app shortly. |
https://github.com/dancingfrog/reactor-client-sample After building and running this small server app, a single call to http://localhost:8080/downThemLarge is usually enough to elicit the dreaded: The files are not actually downloaded to disk, just held in memory (which is the ideal, but previously exhausted InputStreams and the objects that ReactorClientHttpConnector creates seem to hang around indefinitely). |
Hi, @dancingfrog Can you add the code below to the
Then tell us whether you see memory leaks. For completeness: I added
|
Note that you can do the same without downcasting, i.e.:
|
Thanks, @violetagg So, with that java prop set, before adding the finally block, the call stack for the leak exception looks like:
I just applied your update to commit 3f86ee8. Now that I'm calling
|
@dancingfrog this means the server closes the connection. If you see other issues please report them as separate issues. If you have questions please use our gitter channel. I'm closing this issue and restoring the original milestone. Regards, |
We are currently running into critical memory issues using reactor netty on our production environment. Our guess is that the issue lies in the stream being terminated before the allocated Bytebufs are released when timeouts are used.
Our application uses reactor-netty to query various JSON web APIs, and those calls are wrapped with a Hystrix Command to manage timeouts and add a circuit breaker. Whenever a HystrixTimeout occurs, the stream on the HTTP client is terminated. However, if reactor-netty is still aggregating the response and thus has ByteBufs allocated, those ByteBufs are (according to the Netty leak detector) not cleaned up.
We've created a sample application demonstrating the issue. The sample application uses the standard Reactor timeout instead of Hystrix, but produces the same errors. Running the tests should show the leaks.
The text was updated successfully, but these errors were encountered: