-
Notifications
You must be signed in to change notification settings - Fork 3.9k
netty: http2 server transport graceful shutdown sends 2 GOAWAYs #4227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| decoder().flowController().initialWindowSize(connection().connectionStream()))); | ||
| } | ||
| } else if (data == MAX_CONNECTION_AGE_PING) { | ||
| checkNotNull(maxAgeShutdownRunner, "maxAgeShutdownRunner"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not throw a normal runtime exception in response to something received (we could throw a Http2Exception though). Logging a warning would be fine though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will never throw a RuntimeException unless there's a bug in the code. If there's a bug in the code, it will throw NPE anyway even if checkNotNull is not there. The checkNotNull here is just an assert without -ea flag. checkNotNull and assert can help find a bug, but logging a warning may not be that helpful. There are checkNotNull even in the constructors of Http2Exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then make it an assert or manually throw new AssertionError(). checkNotNull is best used when verifying your input, and telling the caller they made a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| @CheckForNull | ||
| private GracefulShutdownRunner maxAgeShutdownRunner; | ||
| @CheckForNull | ||
| private GracefulShutdownRunner maxIdleShutdownRunner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use two separate flows for the two different causes for shutdown. It doesn't matter why we are shutting down. If we're already shutting down and we want to shutdown again, then do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| private final class GracefulShutdownRunner { | ||
|
|
||
| ChannelHandlerContext ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no reason to save ctx here. Pass it in explicitly to each method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| private final class GracefulShutdownRunner { | ||
|
|
||
| ChannelHandlerContext ctx; | ||
| long payload; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the same id for all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| } | ||
| } | ||
|
|
||
| private final class GracefulShutdownRunner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a "runner"? It seems like run is just being used as a generic "do" method. Instead, give it a better name, like start (because it won't actually finish when it returns) or sendGoAway.
Even more though, I think the run() shouldn't be on this class. Instead, make a method directly in NettyServerHandler like gracefulShutdown or startGracefulShutdown. It would issue the initial GOAWAY and save state (this class) for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed Runner in name. Renamed run() by start().
| Http2Error.NO_ERROR.code(), | ||
| ByteBufUtil.writeAscii(ctx.alloc(), goAwayMessage), | ||
| ctx.newPromise()); | ||
| ctx.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flush is unnecessary. Another is following it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| TimeUnit.NANOSECONDS); | ||
|
|
||
| encoder().writePing(ctx, false /* isAck */, payload, ctx.newPromise()); | ||
| ctx.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer the caller do the flush. The only time explicit flushes are necessary here is for writes due to timers. For writes caused by reads, we do flush in onReadComplete. For writes coming from the application, it ends up flushing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| long gracefulShutdownPingTimeout = GRACEFUL_SHUTDOWN_PING_TIMEOUT_NANOS; | ||
| if (graceTimeInNanos != null) { | ||
| deadline = ticker.read() + graceTimeInNanos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count grace time starting at secondGoAwayAndClose(), since we'll still be accepting new requests until that point.
With that change, it also seems the ticker would no longer be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| Http2Error.NO_ERROR.code(), | ||
| ByteBufUtil.writeAscii(ctx.alloc(), goAwayMessage), | ||
| ctx.newPromise()); | ||
| ctx.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto remove. You only need the flush in the gracefulShutdownPingTimeout handling (not for the ping receiving).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| long savedGracefulShutdownTimeMillis = gracefulShutdownTimeoutMillis(); | ||
| long gracefulShutdownTimeoutMillis = savedGracefulShutdownTimeMillis; | ||
| if (graceTimeInNanos != null) { | ||
| gracefulShutdownTimeoutMillis = TimeUnit.NANOSECONDS.toMillis(deadline - ticker.read()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this consulting deadline? I thought we agreed that graceTimeInNanos would be relative to the second GOAWAY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a partial fix. Removing ticker is in the upcoming commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@ejona86 Thanks for the review. PTAL. |
| decoder().flowController().initialWindowSize(connection().connectionStream()))); | ||
| } | ||
| } else if (data == GRACEFUL_SHUTDOWN_PING) { | ||
| if (gracefulShutdown == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I remember why I put that comment. This is doing a check based on what is received. In your earlier response you said:
It will never throw a RuntimeException unless there's a bug in the code.
But the bug could be in the remote code, not in this code. That's why I had suggested a warning.
resolves #3442