Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upEnsure FlowControlled data frames will be correctly removed from the … #8726
Conversation
normanmaurer
requested review from
trustin
,
Scottmitch
,
ejona86
,
rkapsi
and
carl-mastrangelo
Jan 17, 2019
normanmaurer
added this to the 4.1.33.Final milestone
Jan 17, 2019
normanmaurer
self-assigned this
Jan 17, 2019
normanmaurer
added
the
defect
label
Jan 17, 2019
normanmaurer
referenced this pull request
Jan 17, 2019
Closed
OOME in SslHandlerCoalescingBufferQueue w/ H2 #8707
This comment has been minimized.
This comment has been minimized.
@normanmaurer LGTM. I went through a few memory dumps I had saved (September 5, 2018 being oldest) and the condition |
rkapsi
closed this
Jan 17, 2019
rkapsi
reopened this
Jan 17, 2019
rkapsi
approved these changes
Jan 17, 2019
Scottmitch
reviewed
Jan 17, 2019
// which will signal back that the whole frame was consumed. | ||
// | ||
// See https://github.com/netty/netty/issues/8707. | ||
padding = dataSize = 0; |
This comment has been minimized.
This comment has been minimized.
Scottmitch
Jan 17, 2019
•
Member
this has potentially interesting implications on returning bytes to flow control. consider adding a unit test that involves the real flow control and that the flow control window is properly tracked.
Also consider writing a test that would demonstrate the original issue (infinite loop) to verify the scenario doesn't re-occur. The interactions between the encoder, flow controller, and pipeline events may lead to interesting combinations which is not obvious if this will resolve the original issue. Http2ConnectionRounttripTest#writeOfEmptyReleasedBufferQueuedInFlowControllerShouldFail
hits the error condition, I wonder if we can enhance this test or do something similar to exercise the desired condition.
This comment has been minimized.
This comment has been minimized.
normanmaurer
Jan 17, 2019
Author
Member
@Scottmitch will check and see if I can add more tests. That said I think it should work as expected as we only reset after write
is called and we failed it before.
This comment has been minimized.
This comment has been minimized.
normanmaurer
Jan 17, 2019
Author
Member
@rkapsi @Scottmitch I was able to add a unit test that would result in the endless loop before the fix:
rkapsi
reviewed
Jan 17, 2019
// There's no need to write any data frames because there are only empty data frames in the | ||
// queue and it is not end of stream yet. Just complete their promises by getting the buffer | ||
// corresponding to 0 bytes and writing it to the channel (to preserve notification order). | ||
ChannelPromise writePromise = ctx.newPromise().addListener(this); |
This comment has been minimized.
This comment has been minimized.
rkapsi
Jan 17, 2019
Member
Wonder if we want to move the addListener()
after the write(...)
as well. Right now it's skewing the stack should the remove(...)
complete the promise.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
rkapsi
Jan 17, 2019
•
Member
Something like that...
ChannelPromise writePromise = ctx.newPromise();
ctx.write(queue.remove(0, writePromise), writePromise)
.addListener(this);
Otherwise if queue.remove(0, writePromise)
was to complete the promise it'd potentially notify the listener first, followed by calling ctx.write(...)
. Stack (traces) will look strange. Not sure there would be other side effects due to the order in which things execute but I think you want the listener to get notified after the write call returns.
This comment has been minimized.
This comment has been minimized.
normanmaurer
Jan 17, 2019
Author
Member
Got it... I would like to investigate as a followup here. Trying to keep changes as minimal as possible atm
This comment has been minimized.
This comment has been minimized.
@netty-bot test this please |
1 similar comment
This comment has been minimized.
This comment has been minimized.
@netty-bot test this please |
normanmaurer commentedJan 17, 2019
…flow-controller when a write error happens.
Motivation:
When a write error happens during writing of flowcontrolled data frames we miss to correctly detect this in the write loop which may result in an infinite loop as we will never detect that the frame should be removed from the queue.
Modifications:
Result:
Fixes #8707.