Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kill stream client caused server quit silently #193

Closed
zoutaiqi opened this issue May 11, 2015 · 11 comments
Closed

Kill stream client caused server quit silently #193

zoutaiqi opened this issue May 11, 2015 · 11 comments

Comments

@zoutaiqi
Copy link

@iamqizhao,
Client send request message to server per second, and server return the responses back, two side are stream mode. to reproduce this issue, I comment out the recv response message from server on client side, keep it several seconds, then killed client, server quit silently almost at the same time, no core dump file found on server side. I did this just want to know how server handle this scenario that client crashed(no deadline or timeout set in context on client side, and CloseSend() will not be called due to client crashed)? will the goroutine quit or just block on server side? thanks.

@iamqizhao
Copy link
Contributor

This should not happen. I quickly had a glance at the code:

The loop at https://github.com/grpc/grpc-go/blob/master/server.go#L193 will
exit only either https://github.com/grpc/grpc-go/blob/master/server.go#L196
or https://github.com/grpc/grpc-go/blob/master/server.go#L210. But I cannot
think your example triggered this.

It would be good to show me a reproducible example to debug this.

On Sun, May 10, 2015 at 10:17 PM, zoutaiqi notifications@github.com wrote:

@iamqizhao https://github.com/iamqizhao,
Client send request message to server per second, and server return the
responses back, two side are stream mode. to reproduce this issue, I
comment out the recv response message from server on client side, keep it
several seconds, then killed client, server quit silently almost at the
same time, no core dump file found on server side. I did this just want to
know how server handle this scenario that client crashed(no deadline or
timeout set in context on client side, and CloseSend() will not be called
due to client crashed)? will the goroutine quit or just block on server
side? thanks.


Reply to this email directly or view it on GitHub
#193.

@zoutaiqi
Copy link
Author

Sorry, it should be caused by my shell script, it works well if the server started by manual, thank you for your quick reply. and I have another question:
if client crashed before streaming rpc finished, will the server detect it and clean the related resource(goroutine)? I noticed that the rpc goroutine blocked at stream.Recv on server side after client crashed, did that expect? thanks!

@iamqizhao
Copy link
Contributor

On Mon, May 11, 2015 at 10:56 PM, zoutaiqi notifications@github.com wrote:

Sorry, it should be caused by my shell script, it works well if the server
started by manual, thank you for your quick reply. and I have another
question:
if client crashed before streaming rpc finished, will the server detect it
and clean the related resource(goroutine)? I noticed that the rpc goroutine
blocked at stream.Recv on server side after client crashed, did that
expect? thanks!

It depends. If the client kernel gets a chance to send TCP RST to the
server, the connection on the server side will be torn down also and you
should not see any streams blocking on stream.Recv. If the client kernel
does not get chance (e.g., network cable unplugging, power outage, etc.),
the server has no idea what happened on the client side until some tcp user
timeout gets triggered and you could see the blocking on stream.Recv in
this case for some minutes. The latter case should be addressed by a health
checking mechanism which is on my radar but pending for implementation.


Reply to this email directly or view it on GitHub
#193 (comment).

@zoutaiqi
Copy link
Author

Got it, thank you very much. and expect the health checking mechanism add in, before that I will set timeout or deadline in context on client side to handle this case.

@zoutaiqi
Copy link
Author

HI, @iamqizhao , I got the same problem after I fixed the script issue, and I got a very little output when they quit, I don't know if there is any useful information for you debug.

env:
I have 7 node as cluster in the test bed, every node just exchange three simple message each other as streaming Client and Server, they ran normally about 3 days, then one node quit with unknown reason(maybe the same issue), and the other nodes were trying to reconnect it, then I restart it and try to rejoin the cluster, then almost all of nodes quit here without core dump:
.....
goroutine 621 [select, 9 minutes]:
google.golang.org/grpc/transport.(*http2Server).controller(0xc20827e380)
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:594 +0x551
created by google.golang.org/grpc/transport.newHTTP2Server
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:132 +0x88a

goroutine 784 [IO wait]:
net.(_pollDesc).Wait(0xc2082a2450, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:84 +0x47
net.(_pollDesc).WaitRead(0xc2082a2450, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
net.(_netFD).Read(0xc2082a23f0, 0xc2083a1cd4, 0x9, 0x9, 0x0, 0x7fd8d2f03bf0, 0xc208364100)
/usr/local/go/src/net/fd_unix.go:242 +0x40f
net.(_conn).Read(0xc2082a6020, 0xc2083a1cd4, 0x9, 0x9, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:121 +0xdc
io.ReadAtLeast(0x7fd8d2f06730, 0xc2082a6020, 0xc2083a1cd4, 0x9, 0x9, 0x9, 0x0, 0x0, 0x0)
/usr/local/go/src/io/io.go:298 +0xf1
io.ReadFull(0x7fd8d2f06730, 0xc2082a6020, 0xc2083a1cd4, 0x9, 0x9, 0x1000100000147, 0x0, 0x0)
/usr/local/go/src/io/io.go:316 +0x6d
github.com/bradfitz/http2.readFrameHeader(0xc2083a1cd4, 0x9, 0x9, 0x7fd8d2f06730, 0xc2082a6020, 0x0, 0x0, 0x0, 0x0)
/home/elc/go/src/github.com/bradfitz/http2/frame.go:228 +0xa2
github.com/bradfitz/http2.(_Framer).ReadFrame(0xc2083a1cb0, 0x0, 0x0, 0x0, 0x0)
/home/elc/go/src/github.com/bradfitz/http2/frame.go:373 +0xf2
google.golang.org/grpc/transport.(_framer).readFrame(0xc2082aa210, 0x0, 0x0, 0x0, 0x0)
/home/elc/go/src/google.golang.org/grpc/transport/http_util.go:436 +0x50
google.golang.org/grpc/transport.(_http2Server).HandleStreams(0xc20827e100, 0xc2082b9380)
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:244 +0x691
google.golang.org/grpc.func·014()
/home/elc/go/src/google.golang.org/grpc/server.go:208 +0xc5
created by google.golang.org/grpc.(_Server).Serve
/home/elc/go/src/google.golang.org/grpc/server.go:212 +0x5f5

goroutine 783045 [select, 9 minutes]:
google.golang.org/grpc/transport.(*http2Server).controller(0xc20827e100)
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:594 +0x551
created by google.golang.org/grpc/transport.newHTTP2Server
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:132 +0x88a

@iamqizhao
Copy link
Contributor

These logs are not useful. Can you add some logging to the exit points of
this loop:
https://github.com/grpc/grpc-go/blob/master/server.go#L193
and rerun your stuffs? Then if it happens again, we know where grpc exits.
If it happens without any of these logs, it means your program terminates
the process instead of grpc.

On Fri, May 15, 2015 at 1:15 AM, zoutaiqi notifications@github.com wrote:

HI, @iamqizhao https://github.com/iamqizhao , I got the same problem
after I fixed the script issue, and I got a very little output when they
quit, I don't know if there is any useful information for you debug.

env:
I have 7 node as cluster in the test bed, every node just exchange three
simple message each other as streaming Client and Server, they ran normally
about 3 days, then one node quit with unknown reason(maybe the same issue),
and the other nodes were trying to reconnect it, then I restart it and try
to rejoin the cluster, then almost all of nodes quit here without core dump:

.....
goroutine 621 [select, 9 minutes]:
google.golang.org/grpc/transport.(*http2Server).controller(0xc20827e380)
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:594
+0x551
created by google.golang.org/grpc/transport.newHTTP2Server
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:132
+0x88a

goroutine 784 [IO wait]:
net.(

_pollDesc).Wait(0xc2082a2450, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:84 +0x47 net.(_pollDesc).WaitRead(0xc2082a2450,
0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
net.(

_netFD).Read(0xc2082a23f0, 0xc2083a1cd4, 0x9, 0x9, 0x0, 0x7fd8d2f03bf0,
0xc208364100) /usr/local/go/src/net/fd_unix.go:242 +0x40f net.(_conn).Read(0xc2082a6020,
0xc2083a1cd4, 0x9, 0x9, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:121 +0xdc
io.ReadAtLeast(0x7fd8d2f06730, 0xc2082a6020, 0xc2083a1cd4, 0x9, 0x9, 0x9,
0x0, 0x0, 0x0)
/usr/local/go/src/io/io.go:298 +0xf1
io.ReadFull(0x7fd8d2f06730, 0xc2082a6020, 0xc2083a1cd4, 0x9, 0x9,
0x1000100000147, 0x0, 0x0)
/usr/local/go/src/io/io.go:316 +0x6d
github.com/bradfitz/http2.readFrameHeader(0xc2083a1cd4, 0x9, 0x9,
0x7fd8d2f06730, 0xc2082a6020, 0x0, 0x0, 0x0, 0x0)
/home/elc/go/src/github.com/bradfitz/http2/frame.go:228 +0xa2
github.com/bradfitz/http2.(

_Framer).ReadFrame(0xc2083a1cb0, 0x0, 0x0, 0x0, 0x0)
/home/elc/go/src/github.com/bradfitz/http2/frame.go:373
http://github.com/bradfitz/http2/frame.go:373 +0xf2
google.golang.org/grpc/transport.(
http://google.golang.org/grpc/transport.(_framer).readFrame(0xc2082aa210,
0x0, 0x0, 0x0, 0x0)
/home/elc/go/src/google.golang.org/grpc/transport/http_util.go:436 +0x50
google.golang.org/grpc/transport.(

_http2Server).HandleStreams(0xc20827e100, 0xc2082b9380)
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:244
http://google.golang.org/grpc/transport/http2_server.go:244 +0x691
google.golang.org/grpc.func·014()
http://google.golang.org/grpc.func%C3%82%C2%B7014()
/home/elc/go/src/google.golang.org/grpc/server.go:208
http://google.golang.org/grpc/server.go:208 +0xc5 created by
google.golang.org/grpc.( http://google.golang.org/grpc.(_Server).Serve
/home/elc/go/src/google.golang.org/grpc/server.go:212 +0x5f5

goroutine 783045 [select, 9 minutes]:
google.golang.org/grpc/transport.(*http2Server).controller(0xc20827e100)
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:594
+0x551
created by google.golang.org/grpc/transport.newHTTP2Server
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:132
+0x88a


Reply to this email directly or view it on GitHub
#193 (comment).

@zoutaiqi
Copy link
Author

thanks,I will try it

发自我的 iPhone

在 2015年5月16日,2:01,Qi Zhao notifications@github.com 写道:

These logs are not useful. Can you add some logging to the exit points of
this loop:
https://github.com/grpc/grpc-go/blob/master/server.go#L193
and rerun your stuffs? Then if it happens again, we know where grpc exits.
If it happens without any of these logs, it means your program terminates
the process instead of grpc.

On Fri, May 15, 2015 at 1:15 AM, zoutaiqi notifications@github.com wrote:

HI, @iamqizhao https://github.com/iamqizhao , I got the same problem
after I fixed the script issue, and I got a very little output when they
quit, I don't know if there is any useful information for you debug.

env:
I have 7 node as cluster in the test bed, every node just exchange three
simple message each other as streaming Client and Server, they ran normally
about 3 days, then one node quit with unknown reason(maybe the same issue),
and the other nodes were trying to reconnect it, then I restart it and try
to rejoin the cluster, then almost all of nodes quit here without core dump:

.....
goroutine 621 [select, 9 minutes]:
google.golang.org/grpc/transport.(*http2Server).controller(0xc20827e380)
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:594
+0x551
created by google.golang.org/grpc/transport.newHTTP2Server
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:132
+0x88a

goroutine 784 [IO wait]:
net.(

_pollDesc).Wait(0xc2082a2450, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:84 +0x47 net.(_pollDesc).WaitRead(0xc2082a2450,
0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
net.(

_netFD).Read(0xc2082a23f0, 0xc2083a1cd4, 0x9, 0x9, 0x0, 0x7fd8d2f03bf0,
0xc208364100) /usr/local/go/src/net/fd_unix.go:242 +0x40f net.(_conn).Read(0xc2082a6020,
0xc2083a1cd4, 0x9, 0x9, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:121 +0xdc
io.ReadAtLeast(0x7fd8d2f06730, 0xc2082a6020, 0xc2083a1cd4, 0x9, 0x9, 0x9,
0x0, 0x0, 0x0)
/usr/local/go/src/io/io.go:298 +0xf1
io.ReadFull(0x7fd8d2f06730, 0xc2082a6020, 0xc2083a1cd4, 0x9, 0x9,
0x1000100000147, 0x0, 0x0)
/usr/local/go/src/io/io.go:316 +0x6d
github.com/bradfitz/http2.readFrameHeader(0xc2083a1cd4, 0x9, 0x9,
0x7fd8d2f06730, 0xc2082a6020, 0x0, 0x0, 0x0, 0x0)
/home/elc/go/src/github.com/bradfitz/http2/frame.go:228 +0xa2
github.com/bradfitz/http2.(

_Framer).ReadFrame(0xc2083a1cb0, 0x0, 0x0, 0x0, 0x0)
/home/elc/go/src/github.com/bradfitz/http2/frame.go:373
http://github.com/bradfitz/http2/frame.go:373 +0xf2
google.golang.org/grpc/transport.(
http://google.golang.org/grpc/transport.(_framer).readFrame(0xc2082aa210,
0x0, 0x0, 0x0, 0x0)
/home/elc/go/src/google.golang.org/grpc/transport/http_util.go:436 +0x50
google.golang.org/grpc/transport.(

_http2Server).HandleStreams(0xc20827e100, 0xc2082b9380)
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:244
http://google.golang.org/grpc/transport/http2_server.go:244 +0x691
google.golang.org/grpc.func·014()
http://google.golang.org/grpc.func%C3%82%C2%B7014()
/home/elc/go/src/google.golang.org/grpc/server.go:208
http://google.golang.org/grpc/server.go:208 +0xc5 created by
google.golang.org/grpc.( http://google.golang.org/grpc.(_Server).Serve
/home/elc/go/src/google.golang.org/grpc/server.go:212 +0x5f5

goroutine 783045 [select, 9 minutes]:
google.golang.org/grpc/transport.(*http2Server).controller(0xc20827e100)
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:594
+0x551
created by google.golang.org/grpc/transport.newHTTP2Server
/home/elc/go/src/google.golang.org/grpc/transport/http2_server.go:132
+0x88a


Reply to this email directly or view it on GitHub
#193 (comment).


Reply to this email directly or view it on GitHub.

@zoutaiqi
Copy link
Author

Hi, @iamqizhao, this is the last output that some of nodes panic almost at the same time, hope this is helpful for debug, thanks.

2015/05/22 06:56:30 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2015/05/22 06:56:30 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2015/05/22 06:56:30 grpc: Server.processStreamingRPC failed to write status: connection error: desc = "transport is closing"
2015/05/22 06:56:30 transport: http2Client.notifyError got notified that the client transport was broken EOF.
panic: close of closed channel

@zoutaiqi
Copy link
Author

Sorry, @iamqizhao , I found a bug might close a channel twice in my program, should be the root cause,I will close this issue. Thanks for your help.

@derekperkins
Copy link

@iamqizhao @zoutaiqi I'm getting the same transport: http2Client.notifyError got notified that the client transport was broken EOF., while not triggering an error. I'm running on App Engine Managed VMs using BigQuery and Bigtable.

@iamqizhao
Copy link
Contributor

I need more info to move forward. Can u elaborate what you were doing, what
you expect to see and what u observed?

On Mon, Sep 14, 2015 at 12:06 PM, Derek Perkins notifications@github.com
wrote:

@iamqizhao https://github.com/iamqizhao @zoutaiqi
https://github.com/zoutaiqi I'm getting the same "transport:
http2Client.notifyError got notified that the client transport was broken
EOF.", while not triggering an error. I'm running on App Engine Managed VMs
using BigQuery and Bigtable.


Reply to this email directly or view it on GitHub
#193 (comment).

@lock lock bot locked as resolved and limited conversation to collaborators Sep 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants