Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible buffer corruption in HTTP/2 session failures #4967

Closed
sbordet opened this issue Jun 13, 2020 · 2 comments · Fixed by #5001
Closed

Possible buffer corruption in HTTP/2 session failures #4967

sbordet opened this issue Jun 13, 2020 · 2 comments · Fixed by #5001

Comments

@sbordet
Copy link
Contributor

sbordet commented Jun 13, 2020

Jetty version
9.4.30

Description
Run the CometD benchmark with HTTP/2, all defaults but large message content (for example 32768 bytes).

The run will fail because of TCP congestion and HTTP/2 flow control stalls that eventually make the client time out and cancel all requests.

This is kind of expected in HTTP/2 since a single 32 KiB message is (with default configuration) fan-out 10 times, likely to clients on the same TCP connection (since there are so few TCP connections, as each one supports 128 concurrent streams).

Eventually, we see this in DEBUG logs for the same thread (and the issue is reproducible):

2020-06-13 17:20:05,561 qtp1153907750-19 [DEBUG][io.ChannelEndPoint] filled 16384 HeapByteBuffer@2c0be95a[p=0,l=16384,c=16384,r=16384]={<<<xxxxxxxxxxxxxxxxxxxxxxxxx...xxxxxxxxxxxxxxxxxxxxxxx>>>}
2020-06-13 17:20:05,561 qtp1153907750-19 [DEBUG][http2.HTTP2Connection] Filled 16384 bytes in NetworkBuffer@3f245c88{HeapByteBuffer@2c0be95a[p=0,l=16384,c=16384,r=16384]={<<<xxxxxxxxxxxxxxxxxxxxxxxxx...xxxxxxxxxxxxxxxxxxxxxxx>>>},r=1}
2020-06-13 17:20:05,561 qtp1153907750-19 [DEBUG][http2.HTTP2Session] Received DataFrame@6bf386cf#1481{length:11317,end=false}
2020-06-13 17:20:05,561 qtp1153907750-19 [DEBUG][http2.FlowControlStrategy] Data received, 11317 bytes, updated session recv window 8773936 -> 8762619 for HTTP2ClientSession@70a5444f{l:/127.0.0.1:49226 <-> r:localhost/127.0.0.1:8080,sendWindow=773712,recvWindow=8762619,streams=116,NOT_CLOSED,null}
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][http2.FlowControlStrategy] Data received, 11317 bytes, updated stream recv window 8235493 -> 8224176 for HTTP2Stream@15c02618#1481@70a5444f{sendWindow=524172,recvWindow=8224176,reset=false/false,LOCALLY_CLOSED,age=17049,attachment=HttpChannelOverHTTP2@43dfa60f(exchange=HttpExchange@46d11ab6 req=TERMINATED/null@null res=PENDING/null@null)[send=HttpSenderOverHTTP2@5a3d1b86(req=QUEUED,snd=COMPLETED,failure=null),recv=HttpReceiverOverHTTP2@1477aedd(rsp=CONTENT,failure=null)]}
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][http2.HTTP2Stream] Update close for HTTP2Stream@15c02618#1481@70a5444f{sendWindow=524172,recvWindow=8224176,reset=false/false,LOCALLY_CLOSED,age=17049,attachment=HttpChannelOverHTTP2@43dfa60f(exchange=HttpExchange@46d11ab6 req=TERMINATED/null@null res=PENDING/null@null)[send=HttpSenderOverHTTP2@5a3d1b86(req=QUEUED,snd=COMPLETED,failure=null),recv=HttpReceiverOverHTTP2@1477aedd(rsp=CONTENT,failure=null)]} update=false event=RECEIVED
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][client.HttpReceiver] Queueing content DataInfo@e01790e[DataFrame@6bf386cf#1481{length:11317,end=false}]
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][client.HttpReceiver] Resuming(false) processing(true) of content
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][client.HttpReceiver] Processing content DataInfo@e01790e[DataFrame@6bf386cf#1481{length:11317,end=false}]
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][client.HttpReceiver] Response content HttpResponse[HTTP/2.0 200 null]@42a3a53b
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][http2.FlowControlStrategy] Data consumed, 11317 bytes, session recv window level 8014597/8388608 for HTTP2ClientSession@70a5444f{l:/127.0.0.1:49226 <-> r:localhost/127.0.0.1:8080,sendWindow=773712,recvWindow=8762619,streams=116,NOT_CLOSED,null}
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][http2.FlowControlStrategy] Data consumed, 11317 bytes, stream recv window level 164432/4194304 for HTTP2Stream@15c02618#1481@70a5444f{sendWindow=524172,recvWindow=8224176,reset=false/false,LOCALLY_CLOSED,age=17050,attachment=HttpChannelOverHTTP2@43dfa60f(exchange=HttpExchange@46d11ab6 req=TERMINATED/null@null res=PENDING/null@null)[send=HttpSenderOverHTTP2@5a3d1b86(req=QUEUED,snd=COMPLETED,failure=null),recv=HttpReceiverOverHTTP2@1477aedd(rsp=TRANSIENT,failure=null)]}
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][client.HttpReceiver] Response demand=1/1, resume=false
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][client.HttpReceiver] Response content HttpResponse[HTTP/2.0 200 null]@42a3a53b, hasDemand=true
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][client.HttpReceiver] Processing content null
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][parser.Parser] Parsed DATA frame body from java.nio.HeapByteBuffer[pos=11317 lim=16384 cap=16384]
2020-06-13 17:20:05,562 qtp1153907750-19 [DEBUG][parser.Parser] Parsed [DATA|1|0|1549] frame header from java.nio.HeapByteBuffer[pos=11326 lim=16384 cap=16384]
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.HTTP2Session] Received DataFrame@1350db3b#1549{length:1,end=false}
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.FlowControlStrategy] Data received, 1 bytes, updated session recv window 8762619 -> 8762618 for HTTP2ClientSession@70a5444f{l:/127.0.0.1:49226 <-> r:localhost/127.0.0.1:8080,sendWindow=773712,recvWindow=8762618,streams=116,NOT_CLOSED,null}
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.HTTP2Session] Stream #1549 not found
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.FlowControlStrategy] Data consumed, 1 bytes, session recv window level 8014598/8388608 for HTTP2ClientSession@70a5444f{l:/127.0.0.1:49226 <-> r:localhost/127.0.0.1:8080,sendWindow=773712,recvWindow=8762618,streams=116,NOT_CLOSED,null}
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.HTTP2Session] Sending ResetFrame@6462e260#1549{stream_closed_error}
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.HTTP2Flusher] Appended ResetFrame@6462e260#1549{stream_closed_error}, entries=1
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][parser.Parser] Parsed DATA frame body from java.nio.HeapByteBuffer[pos=11327 lim=16384 cap=16384]
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][parser.Parser] Parsed [DATA|1|0|1535] frame header from java.nio.HeapByteBuffer[pos=11336 lim=16384 cap=16384]
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.HTTP2Session] Received DataFrame@612e3eeb#1535{length:1,end=false}
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.FlowControlStrategy] Data received, 1 bytes, updated session recv window 8762618 -> 8762617 for HTTP2ClientSession@70a5444f{l:/127.0.0.1:49226 <-> r:localhost/127.0.0.1:8080,sendWindow=773712,recvWindow=8762617,streams=116,NOT_CLOSED,null}
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.HTTP2Session] Stream #1535 not found
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.FlowControlStrategy] Data consumed, 1 bytes, session recv window level 8014599/8388608 for HTTP2ClientSession@70a5444f{l:/127.0.0.1:49226 <-> r:localhost/127.0.0.1:8080,sendWindow=773712,recvWindow=8762617,streams=116,NOT_CLOSED,null}
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.HTTP2Session] Sending ResetFrame@2f6933bc#1535{stream_closed_error}
2020-06-13 17:20:05,563 qtp1153907750-19 [DEBUG][http2.HTTP2Flusher] Appended ResetFrame@2f6933bc#1535{stream_closed_error}, entries=2
2020-06-13 17:20:05,564 qtp1153907750-19 [DEBUG][parser.Parser] Parsed DATA frame body from java.nio.HeapByteBuffer[pos=11337 lim=16384 cap=16384]
2020-06-13 17:20:05,564 qtp1153907750-19 [DEBUG][parser.Parser] Parsed [DATA|256|0|397612] frame header from java.nio.HeapByteBuffer[pos=11346 lim=16384 cap=16384]
2020-06-13 17:20:05,564 qtp1153907750-19 [DEBUG][http2.HTTP2Session] Received DataFrame@3a75ac69#397612{length:256,end=false}
2020-06-13 17:20:05,564 qtp1153907750-19 [DEBUG][http2.FlowControlStrategy] Data received, 256 bytes, updated session recv window 8762617 -> 8762361 for HTTP2ClientSession@70a5444f{l:/127.0.0.1:49226 <-> r:localhost/127.0.0.1:8080,sendWindow=773712,recvWindow=8762361,streams=116,NOT_CLOSED,null}
2020-06-13 17:20:05,564 qtp1153907750-19 [DEBUG][http2.HTTP2Session] Stream #397612 not found

The buffer contains multiple frames, and I have seen multiple times that a frame is parsed with length=1 just before the faulty frame Parsed [DATA|256|0|397612].

Note that this seems to be a rare case: normal CometD benchmark runs never fail, and it seems to be related to the fact that we are failing a lot of streams in one connection (due to timeout).

Since stream #397612 is bogus, this leads to the failure of the whole connection.

@sbordet
Copy link
Contributor Author

sbordet commented Jun 15, 2020

The issue can be reproduced with buffer pooling disabled on the client, but cannot be reproduced with buffer pooling disabled on the server.

It must therefore be a server issue.

@sbordet
Copy link
Contributor Author

sbordet commented Jun 16, 2020

This may be another, slightly different, occurrence of #4855.

The solution for #4855 assumes that the frame generation is synchronous at the time of send(), but if the flusher has a thread flushing, send() just enqueues the frame and returns immediately.
In that case, the flusher thread and the failing thread act concurrently during frame generation, producing wrong frames.

sbordet added a commit that referenced this issue Jun 26, 2020
Improved logging.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
sbordet added a commit that referenced this issue Jun 26, 2020
Partially reverted the changes introduced in #4855, because they
were working only when sends were synchronous.

Introduced ByteBufferPool.remove(ByteBuffer) to fix the issue.
Now when a concurrent failure happens while frames are being
generated or sent, the buffer is discarded instead of being
recycled, therefore resolving the buffer corruption.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@sbordet sbordet linked a pull request Jun 26, 2020 that will close this issue
@sbordet sbordet closed this as completed in ae43b70 Jul 3, 2020
sbordet added a commit that referenced this issue Jul 7, 2020
Fixed test failure after merge.

Made sure the lastStreamId is updated when resetting a new
stream since it has been seen by the server and handled.

This avoids that a new stream arriving during shutdown is
interpreted as a connection failure, therefore failing all
pending streams - we want them to complete.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant