-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Likely invalid handling of the goaway frames #2176
Comments
@pfreixes Hi, long time no see. This does sound strange. The repro steps is nice. The less friction way to let others help debugging would be providing a minimum repro example. |
Reproducing needs to be done by triggering the Here the Nginx configuration that Ive used, where the worker_processes 1;
error_log /dev/stdout debug;
master_process off;
daemon off;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
access_log /dev/stdout;
sendfile on;
upstream backend {
server 127.0.0.1:8081;
}
server {
listen 8082 http2;
server_name localhost;
keepalive_time 120s;
location / {
grpc_pass grpc://backend;
grpc_socket_keepalive on;
grpc_read_timeout 60s;
}
}
include servers/*;
} Once you have the Nginx and the gRPC backend server running its a matter of start hitting the gRPC port provided by Nginx using a gRCP client using the You will notice that during the [1] https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/subchannel.ts#L525 |
I have discussed this with my team. The client is correct to transition to IDLE when receiving a GOAWAY, whether or not there are any ongoing requests. But it is not correct to stop the keepalive pings after it goes IDLE if the connection is still in use. Unfortunately, fixing that will require a bit of a rework of the low-level connection handling systems, so there will likely not be a fix very soon. |
Im not totally sure about this one, the semantics of IDLE tells specifically that there are no ongoing RPCs, from here [1]
It says specifically Additionally, channels that receive a GOAWAY when there are no active or pending RPCs should also switch to IDLE and in this case there are active RPCs [1] https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md |
Yeps I realised that fixing this will require some sort of refactoring, we will be using a watchdog in our application for now for canceling RPCs that might be in that state |
The Connectivity Semantics doc is not authoritative. It is out of date and is only used as a rough guideline for implementation. All current implementations switch to IDLE when receiving a GOAWAY, whether or not there are active RPCs. |
I have published a change that should fix this. Please try it out in version 1.8.3. |
Problem description
We had suffer sporadically some issues with streams that got hanged forever, we think that issue could come because of the way that
goaway
frames are Today managed bygrpc-node
which seems to be different compared to other drivers like the Golang one [1]In our scenario the streams are opened forever and we are expecting to get them recovered. We use heartbeats for having some defensive mechanism for TCP connections that are no longer usable - for any reason - and were not closed explicitly by the server.
From our understanding current code can not guarantee the usage of heartbeats in some scenarios, at least specifically during a
goaway
event [2] which basically will immediately stop [3] the handlers that will make sure that if no heartbeats are seen in some time the session should be proactively closed.This is in our opinion a problem, since during that window of time TCP connections that are not longer usable will become unprotected by the heartbeats, since the heartbeats wont kick in eventually for closing the connection at client side.
Further implication of this
There is another implication of this strategy that goes beyond to the proper bug. When a
goaway
package is consumed the sub-channel connection is marked asIDLE
and this state is also trespassed to thegetConnectivityState
reporting to the caller - the user that is using the API - that the status isIDLE
which from our understanding [4] it means that there are no outstanding RPCs, which is not true in case of agoaway
package where there is still at least one outstanding RPC.[1] https://github.com/grpc/grpc-go/blob/master/internal/transport/http2_client.go#L1199
[2] https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/subchannel.ts#L525
[3] https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/subchannel.ts#L718
[4] https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md
Reproduction steps
We reproduced this placing an
Nginx
on front of a simple gRPC server, and using thekeepalive_time
andgrpc_read_timeout
directives for reproducing the folloiwng:keepalive_time
- lets say configured to 120s - is reached, the next RPC would receive thegoaway
framegoaway
finishes,Nginx
closes proactively the TCP connectiongrpc_read_timeout
configured to something like60s
for stopping proactively streams that did not have any traffic at all.With Nginx configured and a gRPC stream server behind it, you can start running a gRPC stream client, and run the same stream one after the other in case of having it close it - which will happen because of the
grpc_read_timeout
, eventually when thekeepalive_time
kicks in the heart-beats will stop working, if by that time you add a rule - iptables, pf, etc - for dropping any response from the server the client, the client wont notice and the stream will hang forever.Environment
The text was updated successfully, but these errors were encountered: