Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected exception: io.grpc.StatusRuntimeException: UNKNOWN: channel closed #7376

Closed
ahjohannessen opened this issue Aug 28, 2020 · 10 comments · Fixed by #7476
Closed

Unexpected exception: io.grpc.StatusRuntimeException: UNKNOWN: channel closed #7376

ahjohannessen opened this issue Aug 28, 2020 · 10 comments · Fixed by #7476
Assignees
Labels
Milestone

Comments

@ahjohannessen
Copy link

ahjohannessen commented Aug 28, 2020

What version of gRPC-Java are you using?

  • grpc-java: 1.30.1
  • netty-tcnative-boringssl-static: 2.0.30.Final

What is your environment?

  • OS: Mac OS X
  • JDK: jdk1.8.0_261 (Oracle)

What did you expect to see?

  • io.grpc.StatusRuntimeException: UNAVAILABLE: channel closed or similar.

What did you see instead?

io.grpc.StatusRuntimeException: UNKNOWN: channel closed

Steps to reproduce the bug

Not sure how to reproduce it, but if I kill the server I get the following.

Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
	at io.grpc.Status.asRuntimeException(Status.java:533)
	... <user stack redacted>
Caused by: java.nio.channels.ClosedChannelException
	at io.grpc.netty.Utils.statusFromThrowable(Utils.java:266)
	at io.grpc.netty.WriteBufferingAndExceptionHandler.exceptionCaught(WriteBufferingAndExceptionHandler.java:88)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.grpc.netty.ProtocolNegotiators$ClientTlsHandler.userEventTriggered0(ProtocolNegotiators.java:375)
	at io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.userEventTriggered(ProtocolNegotiators.java:741)
	at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
	at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
	at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
	at io.netty.handler.ssl.SslUtils.handleHandshakeFailure(SslUtils.java:347)
	at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1805)
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1079)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more
Caused by: java.nio.channels.ClosedChannelException
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1076)
	... 15 more

This happens before the exception:

DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND PING: ack=true bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] OUTBOUND HEADERS: streamId=51 headers=GrpcHttp2OutboundHeaders[:authority: 127.0.0.1:2113, :path: /event_store.client.streams.Streams/Read, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.30.1, authorization: Basic YWRtaW46Y2hhbmdlaXQ=, connection-name: sec, requires-leader: false, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] OUTBOUND DATA: streamId=51 padding=0 endStream=true length=19 bytes=000000000e0a0c12020a00281e42004a020a00
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND HEADERS: streamId=51 headers=GrpcHttp2ResponseHeaders[:status: 200, date: Fri, 28 Aug 2020 10:40:53 GMT, content-type: application/grpc, server: Kestrel, grpc-encoding: identity] padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND DATA: streamId=51 padding=0 endStream=false length=203 bytes=00000000c60ac3010abd010a170a15089a81e9ecb5a995fe6b10a5a2f092aa84d6c8ab01120f1a0d242424757365722d61646d696e20e70128e70132110a0474...
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] OUTBOUND PING: ack=false bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND DATA: streamId=51 padding=0 endStream=false length=202 bytes=00000000c50ac2010abc010a180a1608d58bb1c885b7e7cecf0110d6c8b0bde2f1bc9a8801120d1a0b242424757365722d6f7073208f03288f0332110a047479...
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND DATA: streamId=51 padding=0 endStream=false length=221 bytes=00000000d80ad5010acf010a180a1608a882a1cdfb95d0e6c00110beaa98efa59eb1ffaf01120e1a0c24242473636176656e67657320b50428b50432110a0474...
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND DATA: streamId=51 padding=0 endStream=false length=331 bytes=00000001460ac3020abd020a170a1508ef8595a6e2eddead7810a5c0d38bc7f4f7a09d01120d1a0b24757365722d61646d696e20ee0528ee0532140a04747970...
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND DATA: streamId=51 padding=0 endStream=false length=322 bytes=000000013d0aba020ab4020a180a1608c68af1ebbde0d8bed1011086ae8fa7cbbf9db48101120b1a0924757365722d6f707320950828950832140a0474797065...
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND DATA: streamId=51 padding=0 endStream=false length=149 bytes=00000000900a8d010a87010a170a1508ac82bde9f1b187ff0810d7b3c1f0b4d4cbcc9d0112081a0624757365727320b20a28b20a320d0a047479706512052455...
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND DATA: streamId=51 padding=0 endStream=false length=150 bytes=00000000910a8e010a88010a180a16089188f1cbd1b2b6f8ee0110d9e4a2f08ecdfbec9c0112081a062475736572731801209c0b289c0b320d0a047479706512...
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND HEADERS: streamId=51 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] padding=0 endStream=true
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND PING: ack=true bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] OUTBOUND HEADERS: streamId=53 headers=GrpcHttp2OutboundHeaders[:authority: 127.0.0.1:2113, :path: /event_store.client.gossip.Gossip/Read, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.30.1, authorization: Basic YWRtaW46Y2hhbmdlaXQ=, connection-name: sec, requires-leader: false, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] OUTBOUND DATA: streamId=53 padding=0 endStream=true length=5 bytes=0000000000
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND HEADERS: streamId=53 headers=GrpcHttp2ResponseHeaders[:status: 200, date: Fri, 28 Aug 2020 10:40:53 GMT, content-type: application/grpc, server: Kestrel, grpc-encoding: identity] padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND DATA: streamId=53 padding=0 endStream=false length=61 bytes=00000000380a360a170a15089d9bc5e4849ec5973710d0cb97d3aeb7dfb99e0110b4acb8ff8ae9b21c180820012a0e0a093132372e302e302e3110c110
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] OUTBOUND PING: ack=false bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND HEADERS: streamId=53 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] padding=0 endStream=true
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND PING: ack=true bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND GO_AWAY: lastStreamId=53 errorCode=0 length=0 bytes=
@ahjohannessen
Copy link
Author

ahjohannessen commented Aug 28, 2020

If I turn off TLS and use .usePlaintext() I do not get that ClosedChannelException, but rather something that I would expect UNAVAILABLE: Network closed for unknown reason or UNAVAILABLE: io exception.

On jdk1.8.0_261:

DEBUG i.g.n.NettyClientHandler - [id: 0x45359a81, L:/127.0.0.1:60334 - R:/127.0.0.1:2113] INBOUND PING: ack=true bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0x45359a81, L:/127.0.0.1:60334 - R:/127.0.0.1:2113] OUTBOUND HEADERS: streamId=21 headers=GrpcHttp2OutboundHeaders[:authority: 127.0.0.1:2113, :path: /event_store.client.gossip.Gossip/Read, :method: POST, :scheme: http, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.30.1, authorization: Basic YWRtaW46Y2hhbmdlaXQ=, connection-name: sec, requires-leader: false, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x45359a81, L:/127.0.0.1:60334 - R:/127.0.0.1:2113] OUTBOUND DATA: streamId=21 padding=0 endStream=true length=5 bytes=0000000000
DEBUG i.g.n.NettyClientHandler - [id: 0x45359a81, L:/127.0.0.1:60334 - R:/127.0.0.1:2113] INBOUND HEADERS: streamId=21 headers=GrpcHttp2ResponseHeaders[:status: 200, date: Fri, 28 Aug 2020 11:52:44 GMT, content-type: application/grpc, server: Kestrel, grpc-encoding: identity] padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x45359a81, L:/127.0.0.1:60334 - R:/127.0.0.1:2113] INBOUND DATA: streamId=21 padding=0 endStream=false length=61 bytes=00000000380a360a170a1508c299f9d59baaa5e00b10c289aba6dfeef5bcb70110c3a0ebdfadeab21c180820012a0e0a093132372e302e302e3110c110
DEBUG i.g.n.NettyClientHandler - [id: 0x45359a81, L:/127.0.0.1:60334 - R:/127.0.0.1:2113] OUTBOUND PING: ack=false bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0x45359a81, L:/127.0.0.1:60334 - R:/127.0.0.1:2113] INBOUND HEADERS: streamId=21 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] padding=0 endStream=true
DEBUG i.g.n.NettyClientHandler - [id: 0x45359a81, L:/127.0.0.1:60334 - R:/127.0.0.1:2113] INBOUND PING: ack=true bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0x45359a81, L:/127.0.0.1:60334 - R:/127.0.0.1:2113] INBOUND GO_AWAY: lastStreamId=21 errorCode=0 length=0 bytes=
DEBUG i.g.n.NettyClientHandler - [id: 0x43b5d62a, L:/127.0.0.1:60337 - R:/127.0.0.1:2113] OUTBOUND SETTINGS: ack=false settings={ENABLE_PUSH=0, MAX_CONCURRENT_STREAMS=0, INITIAL_WINDOW_SIZE=1048576, MAX_HEADER_LIST_SIZE=8192}
DEBUG i.g.n.NettyClientHandler - [id: 0x43b5d62a, L:/127.0.0.1:60337 - R:/127.0.0.1:2113] OUTBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=983041
*** Killing the server ***
WARN  Demo - Gossip: Failed read, attempt 1 of 100, retrying in 200 milliseconds. Details: UNAVAILABLE: io exception // io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:2113

When using jdk11, it first starts with

io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason

then follows with

io.grpc.StatusRuntimeException: UNAVAILABLE: io exception

when status.getCause is not null.

@ahjohannessen
Copy link
Author

@ejona86 - please let me know if there is anything more I can do.

@ejona86
Copy link
Member

ejona86 commented Aug 28, 2020

io.grpc.StatusRuntimeException: UNKNOWN: channel closed

There should be a stacktrace along with that, for the ClosedChannelException. gRPC goes out of its way to collect a bit more information in hopes of figuring out what happened.

Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]

That is the end of the message. There should have been additional details before it. Was this an UNKNOWN status code as well? Looking at the code, it seems likely it was.

When we see "Channel Pipeline" in the failure, we know we didn't get a working connection. In this specific case, there was a failure in the middle of a TLS handshake. If this is the problem you are experiencing for the UNKNOWN, we can definitely look into this.

This happens before the exception:

Those frames just show an actual working connection.

DEBUG i.g.n.NettyClientHandler - [id: 0xd9b53be2, L:/127.0.0.1:59629 - R:/127.0.0.1:2113] INBOUND GO_AWAY: lastStreamId=53 errorCode=0 length=0 bytes=

So the connection received a GO_AWAY. Depending on details, if you see an error after this it could be that the replacement connection had trouble.


With a hacked server, I've reproduced an UNKNOWN failure caused by ClosedChannelException when the TLS handshake is incomplete (this matches the backtrace posted).

diff --git a/netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java b/netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java
index 27b2c74be..432fcf79b 100644
--- a/netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java
+++ b/netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java
@@ -170,12 +170,18 @@ final class ProtocolNegotiators {
     }
 
     @Override
-    public void handlerAdded(ChannelHandlerContext ctx) throws Exception {
+    public void handlerAdded(final ChannelHandlerContext ctx) throws Exception {
       super.handlerAdded(ctx);
       SSLEngine sslEngine = sslContext.newEngine(ctx.alloc());
       ctx.pipeline().addBefore(ctx.name(), /* name= */ null, this.executor != null
           ? new SslHandler(sslEngine, false, this.executor)
           : new SslHandler(sslEngine, false));
+      ctx.flush();
+      ctx.executor().execute(new Runnable() {
+        @Override public void run() {
+          ctx.close();
+        }
+      });
     }
 
     @Override
$ ./gradlew :grpc-interop-testing:installDist -PskipCodegen=true
$ ./run-test-server.sh &
$ ./run-test-client.sh
Running test empty_unary
Exception in thread "main" io.grpc.StatusRuntimeException: UNKNOWN: channel closed
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at io.grpc.testing.integration.TestServiceGrpc$TestServiceBlockingStub.emptyCall(TestServiceGrpc.java:627)
	at io.grpc.testing.integration.AbstractInteropTest.emptyUnary(AbstractInteropTest.java:379)
	at io.grpc.testing.integration.TestServiceClient.runTest(TestServiceClient.java:227)
	at io.grpc.testing.integration.TestServiceClient.run(TestServiceClient.java:215)
	at io.grpc.testing.integration.TestServiceClient.main(TestServiceClient.java:73)
Caused by: java.nio.channels.ClosedChannelException
	at io.grpc.netty.Utils.statusFromThrowable(Utils.java:266)
	at io.grpc.netty.WriteBufferingAndExceptionHandler.exceptionCaught(WriteBufferingAndExceptionHandler.java:88)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
	at io.grpc.netty.ProtocolNegotiators$ClientTlsHandler.userEventTriggered0(ProtocolNegotiators.java:381)
	at io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.userEventTriggered(ProtocolNegotiators.java:747)
	at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
	at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
	at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
	at io.netty.handler.ssl.SslUtils.handleHandshakeFailure(SslUtils.java:347)
	at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1850)
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1108)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1105)
	... 15 more

That backtrace is strange because AbstractChannel:818 is part of deregister(). It'll need some more poking.

@ejona86 ejona86 added this to the Next milestone Aug 28, 2020
@ejona86 ejona86 added the bug label Aug 28, 2020
@ahjohannessen
Copy link
Author

Additional context:

INFO  Demo - gossip.read: ClusterInfo:
 ✔ Follower 127.0.0.1:2114 2020-08-28T17:22:40Z d6a777df-1a8a-459f-8990-ea221d4262c5
 ✔ Follower 127.0.0.1:2115 2020-08-28T17:22:40Z a020a0f6-42df-42e0-8f54-4ab4d6f31d27
 ✔ Leader   127.0.0.1:2116 2020-08-28T17:22:40Z b9929481-d581-4ab9-9dd7-b1ca247ddc9d
DEBUG i.g.n.NettyClientHandler - [id: 0x417b47c4, L:/127.0.0.1:64935 - R:/127.0.0.1:2116] INBOUND PING: ack=true bytes=1234
2020-08-28 18:22:41.950+0100 debug [NettyClientHandler] Window: 1048576
DEBUG i.g.n.NettyClientHandler - [id: 0x913dbab0, L:/127.0.0.1:64933 - R:/127.0.0.1:2116] OUTBOUND HEADERS: streamId=5 headers=GrpcHttp2OutboundHeaders[:authority: es.sec.local, :path: /event_store.client.gossip.Gossip/Read, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.30.1, authorization: Basic YWRtaW46Y2hhbmdlaXQ=, connection-name: sec, requires-leader: false, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x913dbab0, L:/127.0.0.1:64933 - R:/127.0.0.1:2116] OUTBOUND DATA: streamId=5 padding=0 endStream=true length=5 bytes=0000000000
DEBUG i.g.n.NettyClientHandler - [id: 0x913dbab0, L:/127.0.0.1:64933 - R:/127.0.0.1:2116] INBOUND HEADERS: streamId=5 headers=GrpcHttp2ResponseHeaders[:status: 200, date: Fri, 28 Aug 2020 17:22:41 GMT, content-type: application/grpc, server: Kestrel, grpc-encoding: identity] padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x913dbab0, L:/127.0.0.1:64933 - R:/127.0.0.1:2116] INBOUND DATA: streamId=5 padding=0 endStream=false length=176 bytes=00000000ab0a370a180a1608b99585ac9d90a5c9b901109db9f7a3a2b9eceb9d0110dc91a18090f0b21c180820012a0e0a093132372e302e302e3110c4100a37...
DEBUG i.g.n.NettyClientHandler - [id: 0x913dbab0, L:/127.0.0.1:64933 - R:/127.0.0.1:2116] OUTBOUND PING: ack=false bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0x913dbab0, L:/127.0.0.1:64933 - R:/127.0.0.1:2116] INBOUND HEADERS: streamId=5 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] padding=0 endStream=true
DEBUG i.g.n.NettyClientHandler - [id: 0x913dbab0, L:/127.0.0.1:64933 - R:/127.0.0.1:2116] INBOUND PING: ack=true bytes=1234
2020-08-28 18:22:41.994+0100 debug [NettyClientHandler] Window: 1048576
2020-08-28 18:22:42.078+0100 debug [NettyClientHandler] Network channel is closed
2020-08-28 18:22:42.078+0100 debug [NettyClientHandler] Network channel is closed
2020-08-28 18:22:42.080+0100 debug [Utils] Using custom allocator: forceHeapBuffer=false, defaultPreferDirect=true
2020-08-28 18:22:42.080+0100 debug [Utils] Using custom allocator: forceHeapBuffer=false, defaultPreferDirect=true
2020-08-28 18:22:42.102+0100 debug [WriteBufferingAndExceptionHandler] Ignoring duplicate failure
io.grpc.StatusRuntimeException: UNAVAILABLE: Connection closed while performing protocol negotiation for [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
	at io.grpc.Status.asRuntimeException(Status.java:524)
	at io.grpc.netty.WriteBufferingAndExceptionHandler.channelInactive(WriteBufferingAndExceptionHandler.java:81)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:386)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:351)
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1084)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2020-08-28 18:22:42.102+0100 debug [WriteBufferingAndExceptionHandler] Ignoring duplicate failure
io.grpc.StatusRuntimeException: UNAVAILABLE: Connection closed while performing protocol negotiation for [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
	at io.grpc.Status.asRuntimeException(Status.java:524)
	at io.grpc.netty.WriteBufferingAndExceptionHandler.channelInactive(WriteBufferingAndExceptionHandler.java:81)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:386)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:351)
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1084)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
WARN  Demo - Gossip: Failed read, attempt 1 of 100, retrying in 200 milliseconds. Details: Channel closed.
DEBUG i.g.n.NettyClientHandler - [id: 0x72a60c85, L:/127.0.0.1:64932 - R:/127.0.0.1:2115] OUTBOUND HEADERS: streamId=7 headers=GrpcHttp2OutboundHeaders[:authority: es.sec.local, :path: /event_store.client.gossip.Gossip/Read, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.30.1, authorization: Basic YWRtaW46Y2hhbmdlaXQ=, connection-name: sec, requires-leader: false, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x72a60c85, L:/127.0.0.1:64932 - R:/127.0.0.1:2115] OUTBOUND DATA: streamId=7 padding=0 endStream=true length=5 bytes=0000000000
DEBUG i.g.n.NettyClientHandler - [id: 0x72a60c85, L:/127.0.0.1:64932 - R:/127.0.0.1:2115] INBOUND HEADERS: streamId=7 headers=GrpcHttp2ResponseHeaders[:status: 200, date: Fri, 28 Aug 2020 17:22:42 GMT, content-type: application/grpc, server: Kestrel, grpc-encoding: identity] padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x72a60c85, L:/127.0.0.1:64932 - R:/127.0.0.1:2115] INBOUND DATA: streamId=7 padding=0 endStream=false length=174 bytes=00000000a90a350a180a1608b99585ac9d90a5c9b901109db9f7a3a2b9eceb9d01108988d58690f0b21c18082a0e0a093132372e302e302e3110c4100a370a18...
DEBUG i.g.n.NettyClientHandler - [id: 0x72a60c85, L:/127.0.0.1:64932 - R:/127.0.0.1:2115] OUTBOUND PING: ack=false bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0x72a60c85, L:/127.0.0.1:64932 - R:/127.0.0.1:2115] INBOUND HEADERS: streamId=7 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] padding=0 endStream=true
DEBUG i.g.n.NettyClientHandler - [id: 0x72a60c85, L:/127.0.0.1:64932 - R:/127.0.0.1:2115] INBOUND PING: ack=true bytes=1234
2020-08-28 18:22:42.200+0100 debug [NettyClientHandler] Window: 1048576
WARN  Demo - Gossip: Failed read, attempt 2 of 100, retrying in 200 milliseconds. Details: Channel closed.
2020-08-28 18:22:42.360+0100 debug [Utils] Using custom allocator: forceHeapBuffer=false, defaultPreferDirect=true
2020-08-28 18:22:42.362+0100 debug [Utils] Using custom allocator: forceHeapBuffer=false, defaultPreferDirect=true
DEBUG i.n.h.s.SslHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] HANDSHAKEN: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] OUTBOUND SETTINGS: ack=false settings={ENABLE_PUSH=0, MAX_CONCURRENT_STREAMS=0, INITIAL_WINDOW_SIZE=1048576, MAX_HEADER_LIST_SIZE=8192}
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] OUTBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=983041
DEBUG i.g.n.NettyClientHandler - [id: 0x644e50db, L:/127.0.0.1:64934 - R:/127.0.0.1:2114] OUTBOUND HEADERS: streamId=5 headers=GrpcHttp2OutboundHeaders[:authority: es.sec.local, :path: /event_store.client.gossip.Gossip/Read, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.30.1, authorization: Basic YWRtaW46Y2hhbmdlaXQ=, connection-name: sec, requires-leader: false, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x644e50db, L:/127.0.0.1:64934 - R:/127.0.0.1:2114] OUTBOUND DATA: streamId=5 padding=0 endStream=true length=5 bytes=0000000000
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] INBOUND SETTINGS: ack=false settings={MAX_CONCURRENT_STREAMS=100, INITIAL_WINDOW_SIZE=98304, MAX_HEADER_LIST_SIZE=32768}
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] OUTBOUND SETTINGS: ack=true
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] INBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=65537
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] INBOUND SETTINGS: ack=true
DEBUG i.n.h.s.SslHandler - [id: 0xfca6ab10, L:/127.0.0.1:64941 - R:/127.0.0.1:2115] HANDSHAKEN: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
DEBUG i.g.n.NettyClientHandler - [id: 0xfca6ab10, L:/127.0.0.1:64941 - R:/127.0.0.1:2115] OUTBOUND SETTINGS: ack=false settings={ENABLE_PUSH=0, MAX_CONCURRENT_STREAMS=0, INITIAL_WINDOW_SIZE=1048576, MAX_HEADER_LIST_SIZE=8192}
DEBUG i.g.n.NettyClientHandler - [id: 0xfca6ab10, L:/127.0.0.1:64941 - R:/127.0.0.1:2115] OUTBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=983041
DEBUG i.g.n.NettyClientHandler - [id: 0x644e50db, L:/127.0.0.1:64934 - R:/127.0.0.1:2114] INBOUND HEADERS: streamId=5 headers=GrpcHttp2ResponseHeaders[:status: 200, date: Fri, 28 Aug 2020 17:22:41 GMT, content-type: application/grpc, server: Kestrel, grpc-encoding: identity] padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x644e50db, L:/127.0.0.1:64934 - R:/127.0.0.1:2114] INBOUND DATA: streamId=5 padding=0 endStream=false length=174 bytes=00000000a90a350a180a1608b99585ac9d90a5c9b901109db9f7a3a2b9eceb9d0110c487ee8690f0b21c18082a0e0a093132372e302e302e3110c4100a370a18...
DEBUG i.g.n.NettyClientHandler - [id: 0x644e50db, L:/127.0.0.1:64934 - R:/127.0.0.1:2114] OUTBOUND PING: ack=false bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0x644e50db, L:/127.0.0.1:64934 - R:/127.0.0.1:2114] INBOUND HEADERS: streamId=5 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] padding=0 endStream=true
DEBUG i.g.n.NettyClientHandler - [id: 0xfca6ab10, L:/127.0.0.1:64941 - R:/127.0.0.1:2115] INBOUND SETTINGS: ack=false settings={MAX_CONCURRENT_STREAMS=100, INITIAL_WINDOW_SIZE=98304, MAX_HEADER_LIST_SIZE=32768}
DEBUG i.g.n.NettyClientHandler - [id: 0xfca6ab10, L:/127.0.0.1:64941 - R:/127.0.0.1:2115] OUTBOUND SETTINGS: ack=true
DEBUG i.g.n.NettyClientHandler - [id: 0xfca6ab10, L:/127.0.0.1:64941 - R:/127.0.0.1:2115] INBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=65537
DEBUG i.g.n.NettyClientHandler - [id: 0xfca6ab10, L:/127.0.0.1:64941 - R:/127.0.0.1:2115] INBOUND SETTINGS: ack=true
DEBUG i.g.n.NettyClientHandler - [id: 0x644e50db, L:/127.0.0.1:64934 - R:/127.0.0.1:2114] INBOUND PING: ack=true bytes=1234
2020-08-28 18:22:42.409+0100 debug [NettyClientHandler] Window: 1048576
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] OUTBOUND HEADERS: streamId=3 headers=GrpcHttp2OutboundHeaders[:authority: es.sec.local, :path: /event_store.client.gossip.Gossip/Read, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.30.1, authorization: Basic YWRtaW46Y2hhbmdlaXQ=, connection-name: sec, requires-leader: true, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] OUTBOUND DATA: streamId=3 padding=0 endStream=true length=5 bytes=0000000000
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] INBOUND HEADERS: streamId=3 headers=GrpcHttp2ResponseHeaders[:status: 200, date: Fri, 28 Aug 2020 17:22:41 GMT, content-type: application/grpc, server: Kestrel, grpc-encoding: identity] padding=0 endStream=false
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] INBOUND DATA: streamId=3 padding=0 endStream=false length=174 bytes=00000000a90a350a180a1608b99585ac9d90a5c9b901109db9f7a3a2b9eceb9d0110c487ee8690f0b21c18082a0e0a093132372e302e302e3110c4100a370a18...
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] OUTBOUND PING: ack=false bytes=1234
DEBUG i.g.n.NettyClientHandler - [id: 0x8c9993c1, L:/127.0.0.1:64940 - R:/127.0.0.1:2114] INBOUND HEADERS: streamId=3 headers=GrpcHttp2ResponseHeaders[grpc-status: 0] padding=0 endStream=true
INFO  Demo - gossip.read: ClusterInfo:
 ✔ Follower 127.0.0.1:2114 2020-08-28T17:22:41Z d6a777df-1a8a-459f-8990-ea221d4262c5
 ✔ Follower 127.0.0.1:2115 2020-08-28T17:22:41Z a020a0f6-42df-42e0-8f54-4ab4d6f31d27
 ✕ Leader   127.0.0.1:2116 2020-08-28T17:22:42Z b9929481-d581-4ab9-9dd7-b1ca247ddc9d

@ahjohannessen
Copy link
Author

That is the end of the message. There should have been additional details before it. Was this an UNKNOWN status code as well? Looking at the code, it seems likely it was.

Yep.

When we see "Channel Pipeline" in the failure, we know we didn't get a working connection. In this specific case, there was a failure in the middle of a TLS handshake. If this is the problem you are experiencing for the UNKNOWN, we can definitely look into this.

I only experience the UNKNOWN and channel closed when TLS is involved.

So the connection received a GO_AWAY. Depending on details, if you see an error after this it could be that the replacement connection had trouble.

When I stop the server, it sends go away, but I have also tried with kill. It runs in docker, so stop and kill in this case.

@ahjohannessen
Copy link
Author

@ejona86 I have tried 1.32.1 and tcnative-boring 2.30.1, no luck. I can trigger that error relatively easy by killing the server randomly. I also tried to see if using Conscrypt helped, jdk 11 / 15 etc. Is there something I could to do in order to isolate the issue? Some race perhaps, as you mention deregister?

@ahjohannessen
Copy link
Author

@ejona86 I have tried to dig a bit and added some println's in io.grpc.netty.Utils and io.grpc.netty.ClientTransportLifecycleManager.

I kill a server that my client talks to and see the following. Note that Utils: t.printStackTrace(); is only printed once (from statusFromThrowable):

DEBUG i.g.n.NettyClientHandler - [id: 0x06a794c0, L:/127.0.0.1:62370 - R:/127.0.0.1:2113] OUTBOUND DATA: streamId=9 padding=0 endStream=true length=5 bytes=0000000000  
notifyShutdown invoked
notifyGracefulShutdown invoked
notifyShutdown: shutdownStatus: Status{code=UNAVAILABLE, description=Network closed for unknown reason, cause=null}
notifyShutdown: shutdownThrowable: UNAVAILABLE: Network closed for unknown reason
notifyTerminated invoked
notifyShutdown invoked
notifyGracefulShutdown invoked
WARN  gossip_suite - Gossip > read failed, attempt 1 of 100, retrying in 250.0ms
notifyTerminated invoked
notifyShutdown invoked
notifyGracefulShutdown invoked
notifyShutdown: shutdownStatus: Status{code=UNAVAILABLE, description=io exception, cause=io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:2113
Caused by: java.net.ConnectException: Connection refused
        at java.base/sun.nio.ch.Net.pollConnect(Native Method)
        at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:660)
  | => sat java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:832)
}
notifyShutdown: shutdownThrowable: UNAVAILABLE: io exception
WARN  gossip_suite - Gossip > read failed, attempt 2 of 100, retrying in 375.0ms
WARN  gossip_suite - Gossip > read failed, attempt 3 of 100, retrying in 562.5ms
notifyTerminated invoked
notifyShutdown invoked
notifyGracefulShutdown invoked
notifyShutdown: shutdownStatus: Status{code=UNAVAILABLE, description=io exception, cause=io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:2113
Caused by: java.net.ConnectException: Connection refused
        at java.base/sun.nio.ch.Net.pollConnect(Native Method)
        at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:660)
  | => sat java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:832)
}
notifyShutdown: shutdownThrowable: UNAVAILABLE: io exception
WARN  gossip_suite - Gossip > read failed, attempt 4 of 100, retrying in 843.8ms
WARN  gossip_suite - Gossip > read failed, attempt 5 of 100, retrying in 1.266s
notifyTerminated invoked
notifyShutdown invoked
notifyGracefulShutdown invoked
notifyShutdown: shutdownStatus: Status{code=UNAVAILABLE, description=io exception, cause=io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:2113
Caused by: java.net.ConnectException: Connection refused
        at java.base/sun.nio.ch.Net.pollConnect(Native Method)
        at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:660)
  | => sat java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:832)
}
notifyShutdown: shutdownThrowable: UNAVAILABLE: io exception
WARN  gossip_suite - Gossip > read failed, attempt 6 of 100, retrying in 1.898s
notifyTerminated invoked
notifyShutdown invoked
notifyGracefulShutdown invoked
notifyShutdown: shutdownStatus: Status{code=UNAVAILABLE, description=io exception, cause=io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:2113
Caused by: java.net.ConnectException: Connection refused
        at java.base/sun.nio.ch.Net.pollConnect(Native Method)
        at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:660)
  | => sat java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:832)
}
notifyShutdown: shutdownThrowable: UNAVAILABLE: io exception
WARN  gossip_suite - Gossip > read failed, attempt 7 of 100, retrying in 2.848s
WARN  gossip_suite - Gossip > read failed, attempt 8 of 100, retrying in 4.271s
Utils: t.printStackTrace();
java.nio.channels.ClosedChannelException
        at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1076)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
  | => sat io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:832)
notifyTerminated invoked
notifyShutdown invoked
notifyGracefulShutdown invoked
notifyShutdown: shutdownStatus: Status{code=UNKNOWN, description=channel closed
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0], cause=java.nio.channels.ClosedChannelException
        at io.grpc.netty.Utils.statusFromThrowable(Utils.java:266)
        at io.grpc.netty.WriteBufferingAndExceptionHandler.exceptionCaught(WriteBufferingAndExceptionHandler.java:88)
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
        at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
        at io.grpc.netty.ProtocolNegotiators$ClientTlsHandler.userEventTriggered0(ProtocolNegotiators.java:375)
        at io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.userEventTriggered(ProtocolNegotiators.java:741)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
        at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
        at io.netty.handler.ssl.SslUtils.handleHandshakeFailure(SslUtils.java:347)
        at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1805)
        at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1079)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.nio.channels.ClosedChannelException
        at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1076)
        ... 15 more
}
notifyShutdown: shutdownThrowable: UNKNOWN: channel closed
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
WARN  gossip_suite - Gossip > read failed, attempt 9 of 100, retrying in 5.000s

If I use waitForReady channel options the error does not surface. I think it is unfortunate that UNAVAILABLE: io exception is turned into an UNKNOWN: channel closed, because I wish to retry on UNAVAILABLE.

ejona86 added a commit to ejona86/grpc-java that referenced this issue Oct 1, 2020
Normally the first exception/event experienced is the cause and is
followed by a stampede of ClosedChannelExceptions. In this case,
SslHandler is manufacturing a ClosedChannelException of its own and
propagating it _before_ the trigger event. This might be considered a
bug, but changing SslHandler's behavior would be very risky and almost
certainly break someone's code.

Fixes grpc#7376
@ejona86 ejona86 self-assigned this Oct 1, 2020
@ejona86 ejona86 modified the milestones: Next, 1.33 Oct 1, 2020
@ejona86
Copy link
Member

ejona86 commented Oct 1, 2020

@ahjohannessen, thanks for that. Taking another look at the stack trace and the other errors gave me an "aha" to realize what was happening. The original stack trace actually included enough information, but it didn't click.

@ahjohannessen
Copy link
Author

@ejona86 Awesome 👍 :)

ejona86 added a commit that referenced this issue Oct 1, 2020
Normally the first exception/event experienced is the cause and is
followed by a stampede of ClosedChannelExceptions. In this case,
SslHandler is manufacturing a ClosedChannelException of its own and
propagating it _before_ the trigger event. This might be considered a
bug, but changing SslHandler's behavior would be very risky and almost
certainly break someone's code.

Fixes #7376
@ahjohannessen
Copy link
Author

@ejona86 Seems like that did it:

WARN  gossip_suite - Gossip > read failed, attempt 12 of 100, retrying in 5.000s
UNAVAILABLE: Connection closed while performing TLS negotiation
io.grpc.StatusRuntimeException: UNAVAILABLE: Connection closed while performing TLS negotiation
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]

ejona86 added a commit to ejona86/grpc-java that referenced this issue Oct 1, 2020
Normally the first exception/event experienced is the cause and is
followed by a stampede of ClosedChannelExceptions. In this case,
SslHandler is manufacturing a ClosedChannelException of its own and
propagating it _before_ the trigger event. This might be considered a
bug, but changing SslHandler's behavior would be very risky and almost
certainly break someone's code.

Fixes grpc#7376
ejona86 added a commit that referenced this issue Oct 1, 2020
Normally the first exception/event experienced is the cause and is
followed by a stampede of ClosedChannelExceptions. In this case,
SslHandler is manufacturing a ClosedChannelException of its own and
propagating it _before_ the trigger event. This might be considered a
bug, but changing SslHandler's behavior would be very risky and almost
certainly break someone's code.

Fixes #7376
dfawley pushed a commit to dfawley/grpc-java that referenced this issue Jan 15, 2021
Normally the first exception/event experienced is the cause and is
followed by a stampede of ClosedChannelExceptions. In this case,
SslHandler is manufacturing a ClosedChannelException of its own and
propagating it _before_ the trigger event. This might be considered a
bug, but changing SslHandler's behavior would be very risky and almost
certainly break someone's code.

Fixes grpc#7376
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants