netty: http2 server transport graceful shutdown sends 2 GOAWAYs #4227

dapengzhang0 · 2018-03-15T21:43:22Z

resolves #3442

ejona86 · 2018-03-21T15:54:20Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

              decoder().flowController().initialWindowSize(connection().connectionStream())));
        }
+      } else if (data == MAX_CONNECTION_AGE_PING) {
+        checkNotNull(maxAgeShutdownRunner, "maxAgeShutdownRunner");


We should not throw a normal runtime exception in response to something received (we could throw a Http2Exception though). Logging a warning would be fine though.

It will never throw a RuntimeException unless there's a bug in the code. If there's a bug in the code, it will throw NPE anyway even if checkNotNull is not there. The checkNotNull here is just an assert without -ea flag. checkNotNull and assert can help find a bug, but logging a warning may not be that helpful. There are checkNotNull even in the constructors of Http2Exception.

Then make it an assert or manually throw new AssertionError(). checkNotNull is best used when verifying your input, and telling the caller they made a mistake.

ejona86 · 2018-03-21T15:55:16Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

+  @CheckForNull
+  private GracefulShutdownRunner maxAgeShutdownRunner;
+  @CheckForNull
+  private GracefulShutdownRunner maxIdleShutdownRunner;


Don't use two separate flows for the two different causes for shutdown. It doesn't matter why we are shutting down. If we're already shutting down and we want to shutdown again, then do nothing.

ejona86 · 2018-03-21T16:00:03Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java


+  private final class GracefulShutdownRunner {
+
+    ChannelHandlerContext ctx;


I see no reason to save ctx here. Pass it in explicitly to each method.

ejona86 · 2018-03-21T16:00:52Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

+  private final class GracefulShutdownRunner {
+
+    ChannelHandlerContext ctx;
+    long payload;


Use the same id for all cases.

ejona86 · 2018-03-21T16:05:57Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

    }
  }

+  private final class GracefulShutdownRunner {


Why is this a "runner"? It seems like run is just being used as a generic "do" method. Instead, give it a better name, like start (because it won't actually finish when it returns) or sendGoAway.

Even more though, I think the run() shouldn't be on this class. Instead, make a method directly in NettyServerHandler like gracefulShutdown or startGracefulShutdown. It would issue the initial GOAWAY and save state (this class) for later.

Removed Runner in name. Renamed run() by start().

ejona86 · 2018-03-21T22:30:35Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

+          Http2Error.NO_ERROR.code(),
+          ByteBufUtil.writeAscii(ctx.alloc(), goAwayMessage),
+          ctx.newPromise());
+      ctx.flush();


This flush is unnecessary. Another is following it.

ejona86 · 2018-03-21T22:32:55Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

+          TimeUnit.NANOSECONDS);
+
+      encoder().writePing(ctx, false /* isAck */, payload, ctx.newPromise());
+      ctx.flush();


I'd prefer the caller do the flush. The only time explicit flushes are necessary here is for writes due to timers. For writes caused by reads, we do flush in onReadComplete. For writes coming from the application, it ends up flushing.

ejona86 · 2018-03-21T22:35:43Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

+
+      long gracefulShutdownPingTimeout = GRACEFUL_SHUTDOWN_PING_TIMEOUT_NANOS;
+      if (graceTimeInNanos != null) {
+        deadline = ticker.read() + graceTimeInNanos;


Count grace time starting at secondGoAwayAndClose(), since we'll still be accepting new requests until that point.

With that change, it also seems the ticker would no longer be necessary.

ejona86 · 2018-03-21T22:36:38Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

+          Http2Error.NO_ERROR.code(),
+          ByteBufUtil.writeAscii(ctx.alloc(), goAwayMessage),
+          ctx.newPromise());
+      ctx.flush();


Ditto remove. You only need the flush in the gracefulShutdownPingTimeout handling (not for the ping receiving).

ejona86 · 2018-03-26T23:56:50Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

+      long savedGracefulShutdownTimeMillis = gracefulShutdownTimeoutMillis();
+      long gracefulShutdownTimeoutMillis = savedGracefulShutdownTimeMillis;
+      if (graceTimeInNanos != null) {
+        gracefulShutdownTimeoutMillis = TimeUnit.NANOSECONDS.toMillis(deadline - ticker.read());


Why is this consulting deadline? I thought we agreed that graceTimeInNanos would be relative to the second GOAWAY.

This was a partial fix. Removing ticker is in the upcoming commit.

dapengzhang0 · 2018-03-27T00:28:28Z

@ejona86 Thanks for the review. PTAL.

ejona86 · 2018-03-28T21:13:19Z

netty/src/main/java/io/grpc/netty/NettyServerHandler.java

              decoder().flowController().initialWindowSize(connection().connectionStream())));
        }
+      } else if (data == GRACEFUL_SHUTDOWN_PING) {
+        if (gracefulShutdown == null) {


Oh, I remember why I put that comment. This is doing a check based on what is received. In your earlier response you said:

It will never throw a RuntimeException unless there's a bug in the code.

But the bug could be in the remote code, not in this code. That's why I had suggested a warning.

netty: http2 server transport graceful shutdown

c387c62

dapengzhang0 force-pushed the graceful branch from f123222 to c387c62 Compare March 16, 2018 20:45

dapengzhang0 requested a review from ejona86 March 16, 2018 21:24

dapengzhang0 assigned ejona86 Mar 19, 2018

ejona86 reviewed Mar 21, 2018

View reviewed changes

partially fix comments

915b638

ejona86 reviewed Mar 26, 2018

View reviewed changes

dapengzhang0 added 2 commits March 26, 2018 17:21

remove ticker

4102267

assertion

0f9e258

ejona86 reviewed Mar 28, 2018

View reviewed changes

log warning instead of throw

6117951

ejona86 approved these changes Mar 28, 2018

View reviewed changes

dapengzhang0 added the kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary label Mar 28, 2018

kokoro-team removed the kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary label Mar 28, 2018

dapengzhang0 merged commit bdecdae into grpc:master Mar 28, 2018

dapengzhang0 deleted the graceful branch May 2, 2018 17:14

lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019


		private final class GracefulShutdownRunner {

		ChannelHandlerContext ctx;

netty: http2 server transport graceful shutdown sends 2 GOAWAYs #4227

netty: http2 server transport graceful shutdown sends 2 GOAWAYs #4227

Uh oh!

Conversation

dapengzhang0 commented Mar 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dapengzhang0 commented Mar 27, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants