New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure FlowControlled data frames will be correctly removed from the … #8726

Merged
merged 3 commits into from Jan 19, 2019

Conversation

@normanmaurer
Copy link
Member

normanmaurer commented Jan 17, 2019

…flow-controller when a write error happens.

Motivation:

When a write error happens during writing of flowcontrolled data frames we miss to correctly detect this in the write loop which may result in an infinite loop as we will never detect that the frame should be removed from the queue.

Modifications:

  • When we fail a flowcontrolled data frame we ensure that the next frame.write(...) call will signal back that the whole frame was handled and so can be removed.
  • Add unit test.

Result:

Fixes #8707.

@rkapsi

This comment has been minimized.

Copy link
Member

rkapsi commented Jan 17, 2019

@normanmaurer LGTM. I went through a few memory dumps I had saved (September 5, 2018 being oldest) and the condition queue.isEmpty() is true in all cases. That should do the trick. Best Netty release ever!

@rkapsi rkapsi closed this Jan 17, 2019

@rkapsi rkapsi reopened this Jan 17, 2019

@rkapsi

rkapsi approved these changes Jan 17, 2019

Show resolved Hide resolved ...ain/java/io/netty/handler/codec/http2/DefaultHttp2ConnectionEncoder.java Outdated
// which will signal back that the whole frame was consumed.
//
// See https://github.com/netty/netty/issues/8707.
padding = dataSize = 0;

This comment has been minimized.

@Scottmitch

Scottmitch Jan 17, 2019

Member

this has potentially interesting implications on returning bytes to flow control. consider adding a unit test that involves the real flow control and that the flow control window is properly tracked.

Also consider writing a test that would demonstrate the original issue (infinite loop) to verify the scenario doesn't re-occur. The interactions between the encoder, flow controller, and pipeline events may lead to interesting combinations which is not obvious if this will resolve the original issue. Http2ConnectionRounttripTest#writeOfEmptyReleasedBufferQueuedInFlowControllerShouldFail hits the error condition, I wonder if we can enhance this test or do something similar to exercise the desired condition.

This comment has been minimized.

@normanmaurer

normanmaurer Jan 17, 2019

Author Member

@Scottmitch will check and see if I can add more tests. That said I think it should work as expected as we only reset after write is called and we failed it before.

This comment has been minimized.

@normanmaurer

normanmaurer Jan 17, 2019

Author Member

@rkapsi @Scottmitch I was able to add a unit test that would result in the endless loop before the fix:

3d2cff4

// There's no need to write any data frames because there are only empty data frames in the
// queue and it is not end of stream yet. Just complete their promises by getting the buffer
// corresponding to 0 bytes and writing it to the channel (to preserve notification order).
ChannelPromise writePromise = ctx.newPromise().addListener(this);

This comment has been minimized.

@rkapsi

rkapsi Jan 17, 2019

Member

Wonder if we want to move the addListener() after the write(...) as well. Right now it's skewing the stack should the remove(...) complete the promise.

This comment has been minimized.

@normanmaurer

normanmaurer Jan 17, 2019

Author Member

Not sure I understand... can you show me some code ?

This comment has been minimized.

@rkapsi

rkapsi Jan 17, 2019

Member

Something like that...

ChannelPromise writePromise = ctx.newPromise();
ctx.write(queue.remove(0, writePromise), writePromise)
    .addListener(this);

Otherwise if queue.remove(0, writePromise) was to complete the promise it'd potentially notify the listener first, followed by calling ctx.write(...). Stack (traces) will look strange. Not sure there would be other side effects due to the order in which things execute but I think you want the listener to get notified after the write call returns.

This comment has been minimized.

@normanmaurer

normanmaurer Jan 17, 2019

Author Member

Got it... I would like to investigate as a followup here. Trying to keep changes as minimal as possible atm

@normanmaurer

This comment has been minimized.

Copy link
Member Author

normanmaurer commented Jan 18, 2019

@netty-bot test this please

1 similar comment
@normanmaurer

This comment has been minimized.

Copy link
Member Author

normanmaurer commented Jan 18, 2019

@netty-bot test this please

normanmaurer added some commits Jan 17, 2019

Ensure FlowControlled data frames will be correctly removed from the …
…flow-controller when a write error happens.

Motivation:

When a write error happens during writing of flowcontrolled data frames we miss to correctly detect this in the write loop which may result in an infinite loop as we will never detect that the frame should be removed from the queue.

Modifications:

- When we fail a flowcontrolled data frame we ensure that the next frame.write(...) call will signal back that the whole frame was handled and so can be removed.
- Add unit test.

Result:

Fixes #8707.

@normanmaurer normanmaurer force-pushed the http2_flow_controller_write_error branch from 3d2cff4 to 5458a2f Jan 18, 2019

@normanmaurer normanmaurer merged commit dae5d9d into 4.1 Jan 19, 2019

4 checks passed

pull request validation (centos6-java11) Build finished.
Details
pull request validation (centos6-java12) Build finished.
Details
pull request validation (centos6-java8) Build finished.
Details
pull request validation (centos6-java9) Build finished.
Details

@normanmaurer normanmaurer deleted the http2_flow_controller_write_error branch Jan 19, 2019

normanmaurer added a commit that referenced this pull request Jan 19, 2019

Ensure FlowControlled data frames will be correctly removed from the … (
#8726)

Motivation:

When a write error happens during writing of flowcontrolled data frames we miss to correctly detect this in the write loop which may result in an infinite loop as we will never detect that the frame should be removed from the queue.

Modifications:

- When we fail a flowcontrolled data frame we ensure that the next frame.write(...) call will signal back that the whole frame was handled and so can be removed.
- Add unit test.

Result:

Fixes #8707.
@carl-mastrangelo
Copy link
Member

carl-mastrangelo left a comment

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment