Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL Connection Leak #3038

Closed
wufei2016 opened this issue Oct 30, 2018 · 17 comments
Closed

SSL Connection Leak #3038

wufei2016 opened this issue Oct 30, 2018 · 17 comments
Assignees
Labels
Bug For general bugs on Jetty side Performance

Comments

@wufei2016
Copy link

We recently upgraded to 9.4.9.v20180320, we observed that there are about 200000 unclosed jetty connections for running 12 hours and the cpu usage become higher and higher during the test.
After a few days investigation, I think this should be a regression for "HTTP Close #1832 #2338". It works without any issue if we switch to 9.4.8.v20180619

@joakime
Copy link
Contributor

joakime commented Oct 30, 2018

Jetty 9.4.12.v20180830 is the latest codebase and has hundreds of bug fixes over 9.4.9.
Please retest with 9.4.12.

Jetty 9.4.8 is vulnerable to multiple CVEs, we do not recommend running that version.
https://www.eclipse.org/lists/jetty-announce/msg00123.html

@wufei2016
Copy link
Author

We tested 9.4.12, but unfortunately, it still has the same issue

@sbordet
Copy link
Contributor

sbordet commented Oct 31, 2018

@wufei2016 we have had other reports of this kind, and all of them turned out to not be a Jetty problem.
The problem with your report is that you provide no evidence of what you say, so we have nothing to work on.

If you can reproduce the issue, please attach a reproducible project for us to replicate.

Attach also a JVM thread dump and a server dump, so that we can analyze the situation.

@joakime
Copy link
Contributor

joakime commented Nov 6, 2018

@wufei2016 any update? do you have a reproduction case? or a server dump that we can analyze?

@wufei2016
Copy link
Author

I can provide the jetty log file if it helps. The dump file is too big. it is about 20G.

@sbordet
Copy link
Contributor

sbordet commented Nov 8, 2018

@wufei2016 I doubt the server dump file is 20G. We're not asking for the heap dump, but the server dump which is a text file easily compressible.

@joakime
Copy link
Contributor

joakime commented Nov 9, 2018

@wufei2016 please follow the link that @sbordet provided on how to collect a Jetty server dump.
The Jetty server dump is not a heapdump, it's a textual representation of the state of the server at the point in time when you trigger it.
This textual representation will be quite manageable in size. (often under 50kb)

@wufei2016
Copy link
Author

Here is the Jetty server dump:

serverDump.zip

@sbordet
Copy link
Contributor

sbordet commented Nov 13, 2018

The server dump shows 1046 connections that you have configured with a quite long idle timeout, 5 minutes.

998 of them are in TLS NEED_WRAP state, which is strange; 1031 have only processed just 1 request.

It is as if after the first request they are trying to close the connection and write the TLS close_notify message but they cannot.

I see you are using FIPS SslContextFactory. Are you sure you don't have any exception in the server logs? See also #2010 and #1547.

What exact JVM version and vendor are you using?

@wufei2016
Copy link
Author

FIPSSslContextFactory is a wrapper which overrides getKeyManagers to workaround FIPS mode issue.
In our test, we didn't enable FIPS mode. We didn't see any exception in the server log.

Here is JVM info:
openjdk version "1.8.0_172"
OpenJDK Runtime Environment (Zulu 8.30.0.1-win64) (build 1.8.0_172-b01)
OpenJDK 64-Bit Server VM (Zulu 8.30.0.1-win64) (build 25.172-b01, mixed mode)

We also tested it on oracle jdk 1.8.0_181 and oracle jdk 1.8.0_151, we got same result.

@sbordet
Copy link
Contributor

sbordet commented Nov 14, 2018

@wufei2016 what we see from the server dump is strange, but I have no idea how that situation could possibly be generated.
So we need either a reproduction test case, or DEBUG logs with a network trace taken e.g. via Wireshark.

@wufei2016
Copy link
Author

Here is the debug log and network trace:
debugLogAndNetworkTrace.zip

@sbordet
Copy link
Contributor

sbordet commented Nov 16, 2018

Thanks, we have been able to reproduce the issue.
We are discussing the fix and we will update the issue.

sbordet added a commit that referenced this issue Nov 16, 2018
First pass at fixing the issue by forcing input shutdown on the raw
EndPoint when SslConnection reads the close_notify TLS message.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
gregw added a commit that referenced this issue Nov 16, 2018
Try to read -1 normally on TLS close before forcing shutdown input
When HTTP/1 is completed, shutdown output if the input is shutdown

Signed-off-by: Greg Wilkins <gregw@webtide.com>
@joakime joakime added Bug For general bugs on Jetty side Performance labels Nov 16, 2018
@gregw
Copy link
Contributor

gregw commented Nov 17, 2018

There is definitely something strange going on. The current fixes in the branch do make it better, but it is still not perfect. Thus I don't think we've fully understood the hole in selection/closing after such request handling. More analysis needed but confident we've got a good reproduction now so we'll have a good fix next week.

gregw added a commit that referenced this issue Nov 18, 2018
Fixed SSL spin caused when fill had NEED_WRAP, but a flush/wrap
produced 0 bytes and stayed in NEED_WRAP

Removed check of isInputShutdown prior to filling that allowed EOF to
overtake data already read.

Reverted prior fix for SSL leak

Alternate fix for leak by shutting down output in HttpConnection if
filled -1 and the HttpChannelState was no longer processing
current request.

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Nov 19, 2018
Extra test for chunked input
Cleaned up after some review

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Nov 19, 2018
turn off debug

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Nov 19, 2018
Change to HttpParser made content and contentComplete events non
separable. Updated HttpClient to cope.

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Nov 19, 2018
Ensure content state reset before resuming

Signed-off-by: Greg Wilkins <gregw@webtide.com>
@gregw
Copy link
Contributor

gregw commented Nov 20, 2018

The problem was that if there was unconsumed input AND an EOF after a persistent request, then the consumeAll method would consume both the content and the EOF before recycling the request. The EOF was then seen in END state rather than in start state, so an earlyEOF event was not generated. This was compounded by end-chunk and TLS close handshakes looking like unconsumed input if the servlet had not read the -1.
The fix is to better handle a -1 read between requests.

gregw added a commit that referenced this issue Nov 20, 2018
Issue #3038 - SSL connection leak.

Fixed SSL spin caused when fill had NEED_WRAP, but a flush/wrap
produced 0 bytes and stayed in NEED_WRAP

Removed check of isInputShutdown prior to filling that allowed EOF to
overtake data already read.

Fix for leak by shutting down output in HttpConnection if
filled -1 and the HttpChannelState was no longer processing
current request.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Signed-off-by: Greg Wilkins <gregw@webtide.com>
@gregw gregw reopened this Nov 21, 2018
gregw added a commit that referenced this issue Nov 21, 2018
Don't call  handleContentMessage after content call if the content call
returns true.

This is a slight bending of the parser contract to work around the current
client interpretation that a true return will prevent other events from being
delivered.

Signed-off-by: Greg Wilkins <gregw@webtide.com>
@gregw
Copy link
Contributor

gregw commented Nov 21, 2018

The previous fix broke the proxy, so 041e8fd is an alternative fix that makes the MessageComplete event separable from the last Content event.

@gregw gregw closed this as completed Nov 21, 2018
@joakime joakime changed the title connection leak on 9.4.9.v20180320 SSL Connection Leak Feb 15, 2019
@sunng87
Copy link
Contributor

sunng87 commented Mar 22, 2019

I can reproduce this issue on 9.4.15 with TLS backed by conscrypt. Even if idleTimeout was set to a reasonable value, SslConnection were still retained in memory forever. Also with Conscrypt, we got off-heap memory leak because of this connection leak too.

To reproduce, start an https server and create some load with tools like wrk. You will see the process memory footprint goes straightly with connections you created. By dumping heap, there will as many SslConnection instance as the connections you created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side Performance
Projects
None yet
Development

No branches or pull requests

6 participants