SSL Connection Leak #3038

wufei2016 · 2018-10-30T01:22:07Z

We recently upgraded to 9.4.9.v20180320, we observed that there are about 200000 unclosed jetty connections for running 12 hours and the cpu usage become higher and higher during the test.
After a few days investigation, I think this should be a regression for "HTTP Close #1832 #2338". It works without any issue if we switch to 9.4.8.v20180619

joakime · 2018-10-30T10:29:09Z

Jetty 9.4.12.v20180830 is the latest codebase and has hundreds of bug fixes over 9.4.9.
Please retest with 9.4.12.

Jetty 9.4.8 is vulnerable to multiple CVEs, we do not recommend running that version.
https://www.eclipse.org/lists/jetty-announce/msg00123.html

wufei2016 · 2018-10-31T06:34:05Z

We tested 9.4.12, but unfortunately, it still has the same issue

sbordet · 2018-10-31T09:30:37Z

@wufei2016 we have had other reports of this kind, and all of them turned out to not be a Jetty problem.
The problem with your report is that you provide no evidence of what you say, so we have nothing to work on.

If you can reproduce the issue, please attach a reproducible project for us to replicate.

Attach also a JVM thread dump and a server dump, so that we can analyze the situation.

joakime · 2018-11-06T19:44:35Z

@wufei2016 any update? do you have a reproduction case? or a server dump that we can analyze?

wufei2016 · 2018-11-08T02:55:59Z

I can provide the jetty log file if it helps. The dump file is too big. it is about 20G.

sbordet · 2018-11-08T09:56:07Z

@wufei2016 I doubt the server dump file is 20G. We're not asking for the heap dump, but the server dump which is a text file easily compressible.

joakime · 2018-11-09T13:42:05Z

@wufei2016 please follow the link that @sbordet provided on how to collect a Jetty server dump.
The Jetty server dump is not a heapdump, it's a textual representation of the state of the server at the point in time when you trigger it.
This textual representation will be quite manageable in size. (often under 50kb)

wufei2016 · 2018-11-13T09:47:52Z

Here is the Jetty server dump:

serverDump.zip

sbordet · 2018-11-13T16:59:52Z

The server dump shows 1046 connections that you have configured with a quite long idle timeout, 5 minutes.

998 of them are in TLS NEED_WRAP state, which is strange; 1031 have only processed just 1 request.

It is as if after the first request they are trying to close the connection and write the TLS close_notify message but they cannot.

I see you are using FIPS SslContextFactory. Are you sure you don't have any exception in the server logs? See also #2010 and #1547.

What exact JVM version and vendor are you using?

wufei2016 · 2018-11-14T02:28:00Z

FIPSSslContextFactory is a wrapper which overrides getKeyManagers to workaround FIPS mode issue.
In our test, we didn't enable FIPS mode. We didn't see any exception in the server log.

Here is JVM info:
openjdk version "1.8.0_172"
OpenJDK Runtime Environment (Zulu 8.30.0.1-win64) (build 1.8.0_172-b01)
OpenJDK 64-Bit Server VM (Zulu 8.30.0.1-win64) (build 25.172-b01, mixed mode)

We also tested it on oracle jdk 1.8.0_181 and oracle jdk 1.8.0_151, we got same result.

sbordet · 2018-11-14T17:45:36Z

@wufei2016 what we see from the server dump is strange, but I have no idea how that situation could possibly be generated.
So we need either a reproduction test case, or DEBUG logs with a network trace taken e.g. via Wireshark.

wufei2016 · 2018-11-15T09:11:28Z

Here is the debug log and network trace:
debugLogAndNetworkTrace.zip

sbordet · 2018-11-16T11:30:57Z

Thanks, we have been able to reproduce the issue.
We are discussing the fix and we will update the issue.

First pass at fixing the issue by forcing input shutdown on the raw EndPoint when SslConnection reads the close_notify TLS message. Signed-off-by: Simone Bordet <simone.bordet@gmail.com>

Try to read -1 normally on TLS close before forcing shutdown input When HTTP/1 is completed, shutdown output if the input is shutdown Signed-off-by: Greg Wilkins <gregw@webtide.com>

gregw · 2018-11-17T08:25:22Z

There is definitely something strange going on. The current fixes in the branch do make it better, but it is still not perfect. Thus I don't think we've fully understood the hole in selection/closing after such request handling. More analysis needed but confident we've got a good reproduction now so we'll have a good fix next week.

Fixed SSL spin caused when fill had NEED_WRAP, but a flush/wrap produced 0 bytes and stayed in NEED_WRAP Removed check of isInputShutdown prior to filling that allowed EOF to overtake data already read. Reverted prior fix for SSL leak Alternate fix for leak by shutting down output in HttpConnection if filled -1 and the HttpChannelState was no longer processing current request. Signed-off-by: Greg Wilkins <gregw@webtide.com>

Extra test for chunked input Cleaned up after some review Signed-off-by: Greg Wilkins <gregw@webtide.com>

turn off debug Signed-off-by: Greg Wilkins <gregw@webtide.com>

Change to HttpParser made content and contentComplete events non separable. Updated HttpClient to cope. Signed-off-by: Greg Wilkins <gregw@webtide.com>

Ensure content state reset before resuming Signed-off-by: Greg Wilkins <gregw@webtide.com>

gregw · 2018-11-20T08:33:39Z

The problem was that if there was unconsumed input AND an EOF after a persistent request, then the consumeAll method would consume both the content and the EOF before recycling the request. The EOF was then seen in END state rather than in start state, so an earlyEOF event was not generated. This was compounded by end-chunk and TLS close handshakes looking like unconsumed input if the servlet had not read the -1.
The fix is to better handle a -1 read between requests.

Issue #3038 - SSL connection leak. Fixed SSL spin caused when fill had NEED_WRAP, but a flush/wrap produced 0 bytes and stayed in NEED_WRAP Removed check of isInputShutdown prior to filling that allowed EOF to overtake data already read. Fix for leak by shutting down output in HttpConnection if filled -1 and the HttpChannelState was no longer processing current request. Signed-off-by: Simone Bordet <simone.bordet@gmail.com> Signed-off-by: Greg Wilkins <gregw@webtide.com>

Don't call handleContentMessage after content call if the content call returns true. This is a slight bending of the parser contract to work around the current client interpretation that a true return will prevent other events from being delivered. Signed-off-by: Greg Wilkins <gregw@webtide.com>

gregw · 2018-11-21T16:30:45Z

The previous fix broke the proxy, so 041e8fd is an alternative fix that makes the MessageComplete event separable from the last Content event.

sunng87 · 2019-03-22T03:41:57Z

I can reproduce this issue on 9.4.15 with TLS backed by conscrypt. Even if idleTimeout was set to a reasonable value, SslConnection were still retained in memory forever. Also with Conscrypt, we got off-heap memory leak because of this connection leak too.

To reproduce, start an https server and create some load with tools like wrk. You will see the process memory footprint goes straightly with connections you created. By dumping heap, there will as many SslConnection instance as the connections you created.

janbartel added the More Info Required label Nov 1, 2018

gregw removed the More Info Required label Nov 16, 2018

sbordet added a commit that referenced this issue Nov 16, 2018

Issue #3038 - SSL connection leak.

a16ade2

First pass at fixing the issue by forcing input shutdown on the raw EndPoint when SslConnection reads the close_notify TLS message. Signed-off-by: Simone Bordet <simone.bordet@gmail.com>

gregw added a commit that referenced this issue Nov 16, 2018

Issue #3038 SSL Connection Leak

804fb66

Try to read -1 normally on TLS close before forcing shutdown input When HTTP/1 is completed, shutdown output if the input is shutdown Signed-off-by: Greg Wilkins <gregw@webtide.com>

joakime added Bug For general bugs on Jetty side Performance labels Nov 16, 2018

joakime assigned gregw and sbordet Nov 16, 2018

gregw mentioned this issue Nov 18, 2018

Jetty 9.4.x 3038 ssl connection leak #3121

Merged

gregw added a commit that referenced this issue Nov 19, 2018

Issue #3038 SSL Connection Leak

b20d934

Extra test for chunked input Cleaned up after some review Signed-off-by: Greg Wilkins <gregw@webtide.com>

gregw added a commit that referenced this issue Nov 19, 2018

Issue #3038 SSL Connection Leak

e72b2e8

turn off debug Signed-off-by: Greg Wilkins <gregw@webtide.com>

gregw added a commit that referenced this issue Nov 19, 2018

Issue #3038 SSL Connection Leak

5b99a39

Change to HttpParser made content and contentComplete events non separable. Updated HttpClient to cope. Signed-off-by: Greg Wilkins <gregw@webtide.com>

gregw added a commit that referenced this issue Nov 19, 2018

Issue #3038 SSL Connection Leak

3cde24c

Ensure content state reset before resuming Signed-off-by: Greg Wilkins <gregw@webtide.com>

gregw closed this as completed in #3121 Nov 20, 2018

gregw reopened this Nov 21, 2018

gregw closed this as completed Nov 21, 2018

gregw mentioned this issue Nov 21, 2018

HttpParser review #3135

Open

gregw mentioned this issue Dec 17, 2018

Connection getting closed by jetty before passing the whole response #2867

Closed

joakime changed the title ~~connection leak on 9.4.9.v20180320~~ SSL Connection Leak Feb 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSL Connection Leak #3038

SSL Connection Leak #3038

wufei2016 commented Oct 30, 2018

joakime commented Oct 30, 2018

wufei2016 commented Oct 31, 2018

sbordet commented Oct 31, 2018

joakime commented Nov 6, 2018

wufei2016 commented Nov 8, 2018

sbordet commented Nov 8, 2018

joakime commented Nov 9, 2018 •

edited

wufei2016 commented Nov 13, 2018

sbordet commented Nov 13, 2018

wufei2016 commented Nov 14, 2018

sbordet commented Nov 14, 2018

wufei2016 commented Nov 15, 2018

sbordet commented Nov 16, 2018

gregw commented Nov 17, 2018

gregw commented Nov 20, 2018

gregw commented Nov 21, 2018

sunng87 commented Mar 22, 2019

SSL Connection Leak #3038

SSL Connection Leak #3038

Comments

wufei2016 commented Oct 30, 2018

joakime commented Oct 30, 2018

wufei2016 commented Oct 31, 2018

sbordet commented Oct 31, 2018

joakime commented Nov 6, 2018

wufei2016 commented Nov 8, 2018

sbordet commented Nov 8, 2018

joakime commented Nov 9, 2018 • edited

wufei2016 commented Nov 13, 2018

sbordet commented Nov 13, 2018

wufei2016 commented Nov 14, 2018

sbordet commented Nov 14, 2018

wufei2016 commented Nov 15, 2018

sbordet commented Nov 16, 2018

gregw commented Nov 17, 2018

gregw commented Nov 20, 2018

gregw commented Nov 21, 2018

sunng87 commented Mar 22, 2019

joakime commented Nov 9, 2018 •

edited