Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when enabling wiretap (and other tcpClient bootstrap options) #988

Closed
checketts opened this issue Feb 4, 2020 · 22 comments
Closed
Labels
type/bug A general bug

Comments

@checketts
Copy link

checketts commented Feb 4, 2020

Expected Behavior

Adding wiretap (or other tcpClient bootstrap options should not create excessive Micrometer meters.

Actual Behavior

When enabling HttpClient.create().wiretap(true) extra metrics are created like so:

reactor_netty_connection_provider_http_active_connections{id="1461210734",remote_address="localhost:443",} 0.0
reactor_netty_connection_provider_http_active_connections{id="1140564265",remote_address="localhost:443",} 0.0

This is because the id is generated from the hashcode of the BootStrap and the BootStrap is a lambda. (see

)

Steps to Reproduce

Create an HttpClient and enable metrics(true).wiretap(true) and make several calls and check the meter registry.

Possible Solution

Don't use a hashcode for the id. Or leave the ID off by default.

@checketts checketts added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels Feb 4, 2020
@violetagg
Copy link
Member

@checketts What's Reactor Netty version?

@violetagg violetagg added for/user-attention This issue needs user attention (feedback, rework, etc...) and removed status/need-triage A new issue that still need to be evaluated as a whole labels Feb 4, 2020
@checketts
Copy link
Author

reactor-netty version 0.9.4.RELEASE

@violetagg
Copy link
Member

@checketts In the logs how many times do you see Creating a new ... client pool?
Can you show how you create the client?

@violetagg violetagg added this to the 0.8.16.RELEASE milestone Feb 5, 2020
@violetagg violetagg reopened this Feb 5, 2020
@violetagg
Copy link
Member

@checketts I found an issue when wiretap(String, LogLevel) is used - the connection pool key is not generated correctly. Can you test 0.9.5.BUILD-SNAPSHOT?

@checketts
Copy link
Author

I just tested with 0.9.5-BUILD-SNAPSHOT and the wiretap(true) works, but the custom logger approach is still generating lots of metrics/ids:

HttpClient.create()
.metrics(true)
.tcpConfiguration { tc -> tc.bootstrap{ b -> 
     BootstrapHandlers.updateLogSupport(b, CustomLogger(CustomLogger::class.java))} 
}

I'm making the calls via the CloudFoundry client. So not using the HttpClient directly:

DefaultConnectionContext.builder() //CF Client connection config
                .apiHost(cf.apiHost)
                .port(cf.apiPort)
                .secure(!cf.skipSslValidation)
                .skipSslValidation(cf.skipSslValidation)
                .httpClient(HttpClient.create().metrics(true)
                .tcpConfiguration { tc -> tc.bootstrap{ b -> BootstrapHandlers.updateLogSupport(b, CustomLogger(CustomLogger::class.java))} }

@violetagg
Copy link
Member

@checketts

This is the problem

.tcpConfiguration { tc -> tc.bootstrap{ b -> 
     BootstrapHandlers.updateLogSupport(b, CustomLogger(CustomLogger::class.java))} 
}

either extract CustomLogger(CustomLogger::class.java) or implement equals/hashcode

this is because we keep track of the channels configuration in the connection pool i.e. all channels in one connection pool have exactly one and the same configuration. With the coding above you always change the CustomLogger

take a look at the fix that I applied for wiretap(String, LogLevel) -> 643c672

@checketts
Copy link
Author

OK adding in hashcode implementation fixed that custom logger. Thanks!

Was there any other testing or validation you would like me to do?

@violetagg
Copy link
Member

@checketts why do you use a custom handler and not one of the wiretap methods

@checketts
Copy link
Author

The wiretap outputs a bunch of hex with the text. So my custom one is just outputting the text. I only saw one wiretap option (wiretap(boolean))

@violetagg
Copy link
Member

is your handler doing something like this netty/netty#9915 (this will be in Netty 4.1.46)

there are several wiretap when you use tcpConfiguration

@checketts
Copy link
Author

Yes! That is pretty much what I need. (Though truly my wiretap efforts are a short term need to debug some prod issues)

@violetagg
Copy link
Member

OK I'll open a feature request in order to support this new functionality.
This issue I'll close as I committed a fix for wiretap(String, LogLevel)

@violetagg
Copy link
Member

this is the feature request #989

@skmedishetty
Copy link

@violetagg @twoseat I upgraded 0.9.5 reactive-netty and cloud-foundry to 4.4.0 now I don't see memory issue but see below error
org.cloudfoundry.reactor.util.Operator$ResponseReceiver$InvalidTokenException
It runs for couple of hours and I see this issue.

Do you guys have idea why it fails after couple of hours ?

@checketts
Copy link
Author

This is because of the latest change in cloudfoundry-java-client 4.4.0 (cloudfoundry/cf-java-client#1029)

Before it would retry unlimited times to refresh the token when it expired. Now it just does 5 times (though it is configurable). So you'll probably want to add your own retry/resumeOnError to your code.

We could move this to stackoverflow if you have more questions.

@skmedishetty
Copy link

skmedishetty commented Feb 27, 2020

@checketts I changed timeout to 20 seconds even after that it timeout.
Here is what happening... after application runs for couple of hours the time to get token and applications details both taking long time eventually always timeout.. we are in situation where even retry not working.

Why things becoming slow when system run for long duration.. did we tested this using new jars?
how can we make token never expire?

@checketts
Copy link
Author

after application runs for couple of hours the time to get token and applications details both taking long time eventually always timeout..

I just upgraded yesterday and am also seeing this behavior. I'm seeing my heap getting filled up, so the slowness is likely due to another leak. I'll try to do a dump today and see if I can get some analysis.

I had been running with the SNAPSHOT version fine when this had gotten fixed, so it must be something new.

how can we make token never expire?

That is a CloudFoundry config (I haven't operated my own CF, so I don't know much beyond that). It is an OAuth token, so it should be set to expire eventually. I guess a given installation could make them last incredibly long, but that would be a poor (and incomplete) solution to this.

@checketts
Copy link
Author

I just upgraded yesterday and am also seeing this behavior.

Nevermind. When I had 'upgraded' I had a typo and had actually downgraded. So at this point reactor-netty 0.9.5 and cloudfoundry-java-client 4.4.0 are both working great.

@skmedishetty
Copy link

@checketts can you please share the config where you are initiating client. also what is the spring boot version you are working on ?

@violetagg
Copy link
Member

@checketts @skmedishetty Please move the discussion to the appropriate channels. I don't think this issue is the right place.

@checketts
Copy link
Author

@violetagg agreed. Sorry for the noise.

@skmedishetty this belongs on stackoverflow as a question or as an issue on the cloudfoundry-java-client if you think it is a regression.

@skmedishetty
Copy link

skmedishetty commented Feb 28, 2020

@violetagg apologies I will check in stackoverflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug A general bug
Projects
None yet
Development

No branches or pull requests

3 participants