Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GracefulStop hangs forever if the client doesn't the close underlying tcp connection. #5930

Closed
aureliar8 opened this issue Jan 12, 2023 · 1 comment · Fixed by #5968
Closed
Assignees

Comments

@aureliar8
Copy link

What version of gRPC are you using?

Go client & server: google.golang.org/grpc v1.51.0
Java client:

<grpc.version>1.51.1</grpc.version>
<protobuf.version>3.21.7</protobuf.version>
<protoc.version>3.21.7</protoc.version>

What version of Go are you using (go version)?

go1.19.1

What operating system (Linux, Windows, …) and version?

Ubuntu 22.10 (Linux 5.14)

What did you do?

First of all I'm not sure whether this something I misunderstood, a documentation issue or a bug.

When using a basic java client & the default go server configuration, if I call server.GracefulStop() during an RPC, the server never stops and hangs. This is not the case if I use a go client, or if GracefulStop is called while there is no active RPC.

I made a small repository to reproduce the case [[https://github.com/aureliar8/grpc-restart]].

What did you expect to see?

I expect the server to shutdown after the last in-flight RPCs finish.

What did you see instead?

The last in-flight RPC finishes but the server doesn't stop.

Additional details

The documentation for server.GracefulStop() reads:

// GracefulStop stops the gRPC server gracefully. It stops the server from
// accepting new connections and RPCs and blocks until all the pending RPCs are
// finished.

However it actually wait until all (tcp) connections are closed:

grpc-go/server.go

Lines 1861 to 1863 in ce56cef

for len(s.conns) != 0 {
s.cv.Wait()
}

I suppose that tcp connections are meant to be closed after GracefulStop is initiated and once in-flight RPCs finish.

This is what the go grpc client implementation does. But the java client doesn't (at least not with the default configuration). So the go server gets stuck because there is still one connection opened and thus never return from GracefulStop().

What's weird is that there is a mechanism to close the tcp connection from the go server if there are no active RPC. But it is only checked when the second GOAWAY frame is send (so quite early in the GracefulStop timeline)

if len(t.activeStreams) == 0 {
retErr = errors.New("second GOAWAY written and no active streams left to process")
}

(Specifically returning an error from the outgowingGoAwayHandler will make loopy return and close the underlying connection)

The question is:

  • Who's responsible for closing the underlying connection after GracefulStop has been requested and the last RPC finishes ?
  • If it's the client, why is there some specific code that checks that the number of in-flight RPC is zero when sending the second GOAWAY frame. (And this also means that the bug is in the java grpc client implementation)
  • If it's the server, there is no mechanism to do so if there are still in-flight RPCs after the second GOAWAY frame is send.
@aureliar8 aureliar8 changed the title GracefulStop hangs forever if the client doesn't the underlying tcp connection. GracefulStop hangs forever if the client doesn't the close underlying tcp connection. Jan 12, 2023
@dfawley dfawley self-assigned this Jan 24, 2023
@dfawley dfawley added the P1 label Jan 24, 2023
@dfawley
Copy link
Member

dfawley commented Jan 24, 2023

I suppose that tcp connections are meant to be closed after GracefulStop is initiated and once in-flight RPCs finish.

Correct!

This is what the go grpc client implementation does. But the java client doesn't (at least not with the default configuration).

Somehow coincidentally, we recently noticed that our server does not close the connection when it's draining and the final RPC completes. I didn't realize the Java client would not do this; this would make fixing our server behavior higher priority.

Who's responsible for closing the underlying connection after GracefulStop has been requested and the last RPC finishes ?

I believe both sides should do this. I'll check with the Java team and see what they say.

I think we can fix it by simply removing the client side check here:

if l.side == clientSide && l.draining && len(l.estdStreams) == 0 {
return errors.New("finished processing active streams while in draining mode")
}

dfawley added a commit that referenced this issue Jan 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants