Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete uploads with HTTP/2 client & server #4948

Closed
mklinger opened this issue Jun 8, 2020 · 22 comments
Closed

Incomplete uploads with HTTP/2 client & server #4948

mklinger opened this issue Jun 8, 2020 · 22 comments
Assignees
Labels
Stale For auto-closed stale issues and pull requests

Comments

@mklinger
Copy link

mklinger commented Jun 8, 2020

Jetty version
9.4.29.v20200521

Java version
OpenJDK 11.0.7

OS type/version
Ubuntu 18.04.4

Description

After upgrading to 9.4.29 in a project using HTTP/2 Jetty client and server I experienced problems with request bodies around 30k - 40k of size. It seems that the last chunk (last h2 frame?) is not transferred to the server sometimes.

In my case, this results in invalid JSON received on the server side. The size of the body read from the servlet input stream is smaller than the value given in the Content-Length request header. I made sure that my code around the client invocation produces correct JSON with the correct size. I'm using a byte-array for my JSON payload, so I can rule out I/O problems.

I had a hard time nailing down the source of the problem as I can only reproduce it in a quite big integration test of my application. This test runs 3 application processes that all communicate with each other concurrently and asynchronous. Within the test, a few hundred requests are made and usually one of them fails.

I tried to create a smaller test-case but could not reproduce the problem there. I am even unsure whether the problem is caused on the client or server side, as all traffic is encrypted and the debug log is not very clear about the number of bytes sent over the wire.

Downgrading Jetty client and server to 9.4.28 made the problem disappear.

After having a look at the HTTP/2 related changes between 9.4.28 and 9.4.29 I tried to set the recycleHttpChannels flags to false, on both, server and client side, without any effect (my tests where still failing).

May this be related to #4855 (comment) ?

I would like to kindly ask you to review the HTTP/2 related changes between 9.4.28 and 9.4.29. If I can provide any further information please let me know.

Thank you.

@sbordet
Copy link
Contributor

sbordet commented Jun 8, 2020

@mklinger it's unlikely that your problems are due to #4855, since those failure happens only when the client is sending to the server nasty failures to verify the behavior against the HTTP/2 specification.
Legit clients don't send those nasty failures and so you should not be affected by #4855.

The two major changes between .28 and .29 are stream creation and channel recycling.
Given you seem to have a problem with the last chunk, stream creation should not be an issue.
And you tested channel recycling to have no effect.

Do you have an intermediary between client and server?
Are you using Content-Encoding: gzip?

Can you please give us a server dump (see https://www.eclipse.org/jetty/documentation/current/jetty-dump-tool.html) on both client and server when this happens?

Any chance that you can run the test with DEBUG logs enabled?

@mklinger
Copy link
Author

mklinger commented Jun 8, 2020

@sbordet thank you for the quick response!

Do you have an intermediary between client and server?

Client and server are communicating directly without any (Reverse-)Proxy or similar. TLS termination happens within Jetty using Conscrypt 2.4.0. My integration tests run in local Docker containers using a Docker network.

Are you using Content-Encoding: gzip?

Gzip is enabled for server responses, but currently not for client requests. The problem only appears for requests, not responses.

Can you please give us a server dump (see https://www.eclipse.org/jetty/documentation/current/jetty-dump-tool.html) on both client and server when this happens?

I will do that, but it will take some time for me to trigger the dump programmatically at the right moment.

Any chance that you can run the test with DEBUG logs enabled?

Yes, I already did that. I will try to filter the log for the problematic request/response cycle (as otherwise the log gets huge) and attach it to this ticket.

@joakime
Copy link
Contributor

joakime commented Jun 8, 2020

Are you using Content-Encoding: gzip?

Gzip is enabled for server responses, but currently not for client requests. The problem only appears for requests, not responses.

If you enabled this via the Jetty GzipHandler on the server side then both client request automatic gzip uncompress and server response gzip automatic compress are enabled.

When you said "but currently not for client requests" did you mean that you didn't configure Jetty Client for compressing its requests?

@mklinger
Copy link
Author

mklinger commented Jun 8, 2020

@joakime I was wrong. Gzip compression is not enabled on both sides. I never see any Content-Encoding headers. However, I can see Accept-Encoding: gzip request headers from the client side. Does this make sense?

@mklinger
Copy link
Author

mklinger commented Jun 8, 2020

I ran the tests with DEBUG logs enabled and here are the condensed log lines for server and client for the request that fails:

client.log
server.log

The request body should have 47682 bytes, but only 16375 bytes end up in the JSON parser.

If this doesn't help, I can also provide the full log file (which is > 100 MB)

@sbordet
Copy link
Contributor

sbordet commented Jun 8, 2020

@mklinger seems a buffer corruption at TLS level: we pass 4 buffers with 9, 16384, 9 16384 bytes each.
SslConnection consumes 16384 on a first wrap, then 18 on a second wrap, then it thinks it's done, so something has cleared the 4th buffer.

Can you try without Conscrypt? I would like to verify whether it's a Conscrypt bug.

@mklinger
Copy link
Author

mklinger commented Jun 8, 2020

@sbordet Wow, thanks for the information. My first try getting it running without Conscrypt did not succeed. I will try again tomorrow morning and come back to you. Thanks!

@sbordet
Copy link
Contributor

sbordet commented Jun 8, 2020

@mklinger also are you serving mixed HTTP/1.1 and HTTP/2 requests?
I ask because we recently fixed #4828, which can poison the global ByteBufferPool and affect HTTP/2 as well.

Are you able to try the latest code in the `jetty-9.4.x' branch? We can help you build Jetty from that branch if you need help.

@mklinger
Copy link
Author

@sbordet Unfortunately, using a build from the jetty-9.4.x branch didn't help.

@sbordet
Copy link
Contributor

sbordet commented Jun 10, 2020

@mklinger:

  1. did you try without Conscrypt?
  2. do you have mixed HTTP/1.1 and HTTP/2 requests?
  3. would you be able to change the client code and use LeakTrackingByteBufferPool?

@mklinger
Copy link
Author

@sbordet

  1. did you try without Conscrypt?

I tried. The application is a Spring Boot application and I had to find out that current Spring Boot versions force the usage of Conscrypt for Jetty HTTP/2 setup. I browsed the code but did not find a way to change that in a reasonable time.

  1. do you have mixed HTTP/1.1 and HTTP/2 requests?

The application instances under test only do HTTP/2 requests. When double-checking, I found that the test itself uses two Jetty clients, one doing HTTP/2 and another doing HTTP/1.1. These clients are used for requests against the application instances. The test code runs in another process. The actual test failure happens for requests between application instances.

  1. would you be able to change the client code and use LeakTrackingByteBufferPool?

I did that and got the following output when the test environment is shut down. It looks to me like this output is produced by the test process. Application instances don't produce such output. (I did that using the 9.4.29 version)

[] WARN org.eclipse.jetty.io.LeakTrackingByteBufferPool - ByteBuffer java.nio.DirectByteBuffer[pos=0 lim=0 cap=18432] leaked at:
java.lang.Throwable
	at org.eclipse.jetty.util.LeakDetector$LeakInfo.<init>(LeakDetector.java:174)
	at org.eclipse.jetty.util.LeakDetector$LeakInfo.<init>(LeakDetector.java:163)
	at org.eclipse.jetty.util.LeakDetector.acquired(LeakDetector.java:79)
	at org.eclipse.jetty.io.LeakTrackingByteBufferPool.acquire(LeakTrackingByteBufferPool.java:67)
	at org.eclipse.jetty.io.ssl.SslConnection.acquireEncryptedInput(SslConnection.java:337)
	at org.eclipse.jetty.io.ssl.SslConnection.access$800(SslConnection.java:80)
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.fill(SslConnection.java:662)
	at org.eclipse.jetty.http2.HTTP2Connection.fill(HTTP2Connection.java:134)
	at org.eclipse.jetty.http2.HTTP2Connection.access$500(HTTP2Connection.java:44)
	at org.eclipse.jetty.http2.HTTP2Connection$HTTP2Producer.produce(HTTP2Connection.java:266)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:360)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:184)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
	at org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:170)
	at org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:125)
	at org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:348)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:543)
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:398)
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:161)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
	at java.base/java.lang.Thread.run(Thread.java:834)

@sbordet
Copy link
Contributor

sbordet commented Jun 29, 2020

@mklinger we have done further work on the buffer corruption issue in #4967.
Can you please try the changes in branch jetty-9.4.x-4967-buffer_corruption_for_http2_failures?

@mklinger
Copy link
Author

@sbordet: Unfortunately, the problem still persists using a SNAPSHOT build from branch jetty-9.4.x-4967-buffer_corruption_for_http2_failures.

@sbordet
Copy link
Contributor

sbordet commented Jun 30, 2020

@mklinger can you please attach the DEBUG logs for the run with branch jetty-9.4.x-4967-buffer_corruption_for_http2_failures?

Also, did you try without Conscrypt?

@mklinger
Copy link
Author

mklinger commented Jul 1, 2020

@sbordet here is the debug log. I tried to filter the log for the interesting parts. I hope, I got it right.

client.log
server.log

@mklinger
Copy link
Author

mklinger commented Jul 1, 2020

@sbordet:

Also, did you try without Conscrypt?

I wasn't able to use something else than Conscrypt on the server side, but I managed to use the SunJSSE provider on the client side. The result was the same, the test failed with the same error.

I will dig into Spring Boot's Jetty setup code again to find a way to get the server running without Conscrypt, but I'm not very optimistic about that.

@joakime
Copy link
Contributor

joakime commented Jul 1, 2020

I added an issue at spring-boot as a result of the discussion on this issue.

spring-projects/spring-boot#22188

@sbordet
Copy link
Contributor

sbordet commented Jul 2, 2020

@mklinger the client logs look ok, writing to the server 69078 bytes of PUT content.

On the server, we see a first read of 16375 bytes that is added to the HttpInput:

[qetcher-127.0.0.1:40001] 2020-07-01 13:51:03.140 DEBUG 1 --- [tp285074186-370] org.eclipse.jetty.server.HttpInput       : HttpInput@42e61dc4[c=0,q=1,[0]=Content@153d6fdc{HeapByteBuffer@436364c[p=0,l=16375,c=16375,r=16375]={<<<{"version":257,"clusterLi...10-8c2b-be1bb6db4bd4",">>>}},s=STREAM] addContent Content@153d6fdc{HeapByteBuffer@436364c[p=0,l=16375,c=16375,r=16375]={<<<{"version":257,"clusterLi...10-8c2b-be1bb6db4bd4",">>>}}

The HttpInput is in state s=STREAM.

Then the application is called, it goes through a number of filters and then lands on Spring's DispatcherServlet where the application reads the request content.
However, the first attempt to read (1 byte, probably via int ServletInputStream.read()) already shows the HttpInput in state s=EOF.

This is incorrect but we cannot find any trace in your logs of when the state was moved to EOF and it should be there.
Did you filter out lines that could show that?

Also, client and server times are a little off (the server seems to be few ms behind) - can you please confirm that?

The good news is that we don't see anymore the buffer corruption that we were seeing before - these last logs seem to show a different problem.

Can you please check what application code is doing the read of just 1 byte? Is it some filter or it's your application?

I'm baffled!

@sbordet
Copy link
Contributor

sbordet commented May 5, 2021

@mklinger what's the status on this? I guess we can close it?

@DenisDudinski
Copy link

DenisDudinski commented Nov 23, 2021

Hi @mklinger, @sbordet! It seems, that we experience somewhat similar problem with jetty embedded in hadoop hdfs namenode (jetty version 9.4.20.v20190813 as reported by maven). Upload silently fails on server side during upload of file, and debug log shows, that last filled ByteBuffer somehow is not applied to HttpInput, but no errors or warns is produced. Clients behaves itself in two ways - it either tells that all is ok, but file is effectively not uploaded, or shows error like 'Unexpected end of file from server'. But only sometimes, and it is impossible to predict when. I've attached two logs - one with debug info for successful upload and another one (unfortunately only as jpg image) that shows completion of failed upload. Hope it helps.
Comment to failure screen - expected uploaded size was 42772337. At EOF c=42764763. HeapByteBuffer with 7574 bytes is present but is not consumed.

Upload-fsimage-failure
snippet-fsimage-upload-success.txt

@sbordet
Copy link
Contributor

sbordet commented Nov 27, 2021

@DenisDudinski the image you attached shows that HTTP/1.1 is in use. This issue is about HTTP/2, so please open a new issue.

@github-actions
Copy link

This issue has been automatically marked as stale because it has been a
full year without activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the Stale For auto-closed stale issues and pull requests label Nov 28, 2022
@sbordet sbordet closed this as completed Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale For auto-closed stale issues and pull requests
Projects
None yet
Development

No branches or pull requests

4 participants