Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when using Jetty ALPN SSL provider #3080

Closed
JackOfMostTrades opened this issue Jun 8, 2017 · 7 comments · Fixed by netty/netty#7746
Closed

Memory leak when using Jetty ALPN SSL provider #3080

JackOfMostTrades opened this issue Jun 8, 2017 · 7 comments · Fixed by netty/netty#7746
Assignees
Labels
Milestone

Comments

@JackOfMostTrades
Copy link
Contributor

Please answer these questions before submitting your issue.

What version of gRPC are you using?

1.4.0

What JVM are you using (java -version)?

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-1-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

What did you do?

We have observed a memory leak when using the Jetty ALPN SSL provider. This occurs when using a name resolver that returns multiple results, one of which fails to connect (in practice this was because of faulty firewall rules, but for the sake of testing it can be reproduced by just using a bad port number). I believe the managed channel will keep trying to open up a subchannel, but the callbacks in the ALPN.objects map aren't getting cleared.

I've created a minimal reproducer for this here: https://github.com/JackOfMostTrades/memory-leak-reproducer

When left running overnight, the size of the map grew to over 2300 objects (since there's only ever one actual connection in this test, that's pretty clearly an issue).

This may be a bug in the underlying netty channel rather than gRPC's managed channel where it's not properly cleaning up its ALPN callback when this type of error occurs, but I didn't dig deep enough into the issue to be able to tell.

@jhspaybar
Copy link
Contributor

@ejona86 Wanted to make sure you saw this one. Happy to help if you can point us in the right direction as well.

@carl-mastrangelo
Copy link
Contributor

@JackOfMostTrades Are you sure that jar works with 131? Looking at https://github.com/jetty-project/jetty-alpn-agent/blob/master/src/main/java/org/mortbay/jetty/alpn/agent/Premain.java#L35 it looks like it only works up to 121

Any reason you aren't using the alpn-agent instead of boot?

@JackOfMostTrades
Copy link
Contributor Author

It seems to work fine for me with 131, but regardless I get the same behavior with Oracle JDK 121:
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

No, there's no particular reason for using alpn-boot as opposed to the agent. I also just tried substituting that in as well in the reproducer and I get the same.

@ejona86
Copy link
Member

ejona86 commented Jun 8, 2017

@carl-mastrangelo, it works with 131. I think the 121 is the first compatible version, not the last.

1.4.0

Drat. 😛

It looks like Netty adds the reference in the SSLEngine constructor. It seems like maybe there's some case where the SslHandler doesn't close the handler? gRPC's code looks pretty minimal. But the SslHandler also seems pretty solid.

We can also maybe try older gRPC versions and see if it is a semi-recent regression.

@ejona86 ejona86 added this to the Next milestone Jun 8, 2017
@JackOfMostTrades
Copy link
Contributor Author

I just went back through versions to see and it looks like the behavior first cropped up in 0.15.0 (it looks like there's no issue with 0.14.1). FWIW that's also when the listener API for the name resolver changed from List<ResolvedServerInfo> to List<List<ResolvedServerInfo>>.

@zhangkun83 zhangkun83 added bug and removed usability labels Sep 19, 2017
carl-mastrangelo added a commit to carl-mastrangelo/netty that referenced this issue Feb 22, 2018
Motivation:
When using the JdkSslEngine, the ALPN class is used keep a reference
to the engine.   In the event that the TCP connection fails, the
SSLEngine is not removed from the map, creating a memory leak.

Modification:
Always close the SSLEngine regardless of if the channel became
active.  Also, record the SSLEngine was closed in all places.

Result:
Fixes: grpc/grpc-java#3080
@carl-mastrangelo
Copy link
Contributor

@JackOfMostTrades I sent a PR to fix netty for this, sorry for sitting on it for so long. I really appreciate the reproducer!

@JackOfMostTrades
Copy link
Contributor Author

Awesome, thanks for digging in and figuring out where the issue was!

normanmaurer pushed a commit to netty/netty that referenced this issue Mar 4, 2018
Motivation:
When using the JdkSslEngine, the ALPN class is used keep a reference
to the engine.   In the event that the TCP connection fails, the
SSLEngine is not removed from the map, creating a memory leak.

Modification:
Always close the SSLEngine regardless of if the channel became
active.  Also, record the SSLEngine was closed in all places.

Result:
Fixes: grpc/grpc-java#3080
@ejona86 ejona86 modified the milestones: Next, 1.13 Nov 8, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Feb 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants