-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LEAK: ByteBuf.release() was not called before it's garbage-collected #422
Comments
That's pretty interesting is this a particular service, using reactor directly or via something else (spring ?) |
🤦♂️ oops forgot to mention that. It's a service using Spring Boot 2.0.4 with webflux |
@madgnome did you run with |
@madgnome Are you using |
@violetagg haven't tried in @dave-fl yes we're using |
We began doing a more end to end style test and it failed after 11 hours. So although we got a longer (by removing elastic) test run, things still failed. Here are some of the recent leak logs. This is on the latest snapshot and Netty 4.1.29
|
@dave-fl I cannot see in these stack traces reactor netty classes. |
I will try but it might blow up as we are running this with production style load and if we do that, we might not be able to simulate the timeouts being observed. Note that the Cloud Gateway code is using Mono.timeout on http client, not sure if something odd could be going on here. |
@dave-fl then instead of all reactor netty logging, enable just this location |
@dave-fl I need the stack traces |
In an effort to try to simplify this. I'm reproducing with a single user. I can trigger things pretty easily, there is no particular unique action that is being performed that causes this to happen consistently. It's just random e.g. if I do the same action 100 times, it will happen a few times. When it does take place there are no errors, i.e. all requests are coming back correctly from the source systems. This may or may not be relevant but the logs are littered with this (even though no errors are taking place anywhere i.e. all requests are being properly sent and returned from source and destination systems).
Hopefully the below stack is useful, if you want to add more logging I can test with that as well.
|
Sorry last one. I really hope these are useful. This is basically the end to end of a single click to a leak. I've removed things where I have to.
|
I just ran |
@dave-fl yep I think we have an issue. I'm on it. |
Indirectly related, but it might be useful to add some sort of deterministic leak detection to the test cases as an enhancement, this way assertions can be used to check for buffer size rather than counting on the garbage collector to have run. Perhaps already "well known" or maybe even better solutions exist. But definitely makes visibility into the buffer count easier at least for me. |
@dave-fl Do you specify |
@violetagg Yes, we set all our timeouts |
@violetagg I am not sure that timeouts are the full issue here, e.g. if any sort of IO Exception happens mid-flight the same end result could be the case. We are seeing this in some of our calls. I think it would be worth creating a test case that attempts to simulate an abrupt termination of a connection although whatever has to be done in order to fix the timeout issue might end up fixing this as well.
|
@dave-fl Do you think you can try that |
@violetagg I can try that. I see that you have moved to Netty native timeouts. Can you please comment on the possibility of a leak with the connections being aborted. I am mentioning this because I still saw leaks when in paranoid mode even when no timeouts were taking place. |
@dave-fl With Reactor Netty 0.7.x implementation the timeouts (provided by Flux/Mono) cannot be handled correctly and because of that Spring Gateway implementation will use the Netty's ReadTimeoutHandler.
If you are able to provide some reproducible scenario? Otherwise I'll try to simulate something on my side and do a code review for possible memory leaks in such situations. |
@violetagg I wish I could, unfortunately I don't have one as this is happening on an end to end test. I believe there are some proxies out there that can simulate various network conditions. It might be worth running a gateway style test where the client and connection from gateway to the back end server route through one of these proxies that simulate the occasional connection/network error. I think I saw that 0.8.x is now RC1, are these cases already handled or on the TODO list before GA? @rstoyanchev Thanks for mentioning those. We've cherry already cherry picked SPR-17025. |
they are handled |
+1 Having the same issue popping at random on production deployment. We have Spring Boot 2 service, using WebFlux and reactive WebClient. Can we expect a fix any time soon, guys? |
@antoniy Till now we identified issue when |
Running the demo test project linked previously a few times built against the suggested snapshot versions, I haven't seen any buffer leak reports so far - but I do still sometimes see an IllegalReferenceCountException:
|
@violetagg Any comments on these latest logs and new errors? |
Will upgrade to SB 2.1.0 today and see if we still see the errors. |
@madgnome just a quick note, if you're using any of: request body predicate, request body transformer, response body transformer, please turn those off for now before testing. There are expected fixes for those in the next Spring Cloud Gateway release. |
We're not using Spring Cloud Gateway. Unfortunately the issue seems worse in Spring Boot 2.1 We use
|
@madgnome, do you have anything reproducible that we can use to investigate? if not can you describe the code in high level terms (e.g. Spring MVC / Boot app, using the WebClient, etc), and provide some relevant code snippets? Note also SPR-17473 just addressed but without any sample code I can't be sure if you use onStatus hooks or not. |
Thanks @rstoyanchev we do use |
One issue identified when sending data and request method GET |
Making sure we consume the responseBody on I will see if I can try the latest snapshot to get #512 fix |
Deployed the latest snapshot and still seeing the errors unfortunately =( I've ran the service for DEBUG log for And here's another one with all |
We are testing again. Latest build continues to leak. Spring 2.1.0 Release. <spring-cloud.version>Greenwich.M3</spring-cloud.version> |
For the latest version, it seems like using option (on Spring Cloud Gateway):
Is what is causing possibly a lot of the memory leak errors. What I wonder though is that if that is corrected on the Spring Cloud Gateway side, will the leaks still occur when legitimate timeouts are taking place. |
@violetagg Good news! I was able to reproduce the issue locally and I think it was fixed by #554 I confirm no more leaks in production when using v0.8.4 For people facing a related leak issue and trying to reproduce locally, here are things that helped us reproducing locally:
|
Same problem using Spring 2.1.2.RELEASE which uses reactor-netty-v0.8.4.RELEASE |
I also see the same issue in Spring Boot 2.1.2.RELEASE |
Same issue in Spring Boot 2.1.2 RELEASE Basic code is feeding bytes into parser. return body
.reduce(ParserContainer()) { c, buf ->
try {
// parse each fragment
c.next(buf.asByteBuffer())
} finally {
// memory leak detector says it needs to be released
DataBufferUtils.release(buf)
}
}
// extract the response
.map { it.response() ?: EmptyResponse() } Exception:
|
I'd try with boot 2.1.3.RELEASE as the reactor team has worked hard on these issues. |
@spencergibb will give it a try. |
Unfortunately I have a similar error in Spring Boot 2.1.3.RELEASE, occurring in production. It may be related to the other side closing the connection just before, as we saw this error just before: Not doing anything fancy. Very basic high-level code.
and that code being called in this way:
Error:
|
@Bas83 Try with the latest:
|
@violetagg have there been additional fixes in those versions or is that just to make sure it still exists in the current version? I would love to reproduce it in those versions but since the error is in production it's going to take some time. |
@violetagg I have the same issue in production using elastic Scheduler (unbounded) and the following version of the frameworks :
any help or explanation is appreciated. here is the stack trace
|
@saad14092 Please open a new issue. It will be really helpful if you provide more info for the use case and/or a test example |
Expected behavior
No leak
Actual behavior
LEAK: ByteBuf.release() was not called before it's garbage-collected
Steps to reproduce
Unfortunately I have no idea how to reproduce it as this error happens randomly in our service in production.
Reactor Netty version
0.7.8
JVM version (e.g.
java -version
)Java 8u181
The text was updated successfully, but these errors were encountered: