Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loopyWriter continuing to run after GracefulStop causing race with t.Log #6485

Closed
fho opened this issue Jul 28, 2023 · 6 comments · Fixed by #6489
Closed

loopyWriter continuing to run after GracefulStop causing race with t.Log #6485

fho opened this issue Jul 28, 2023 · 6 comments · Fixed by #6489

Comments

@fho
Copy link
Contributor

fho commented Jul 28, 2023

Hello,

we are starting a grpc-server in a Go testserver.
On test finish we call Server.GracefulStop() via t.Cleanup().

We forward all log messages from the grpc packages to the test logger, by setting grpclog.SetLoggerV2 to a t.Log wrapper.

When we run the tests with the -race parameter, it happens from now and then that a race is detected because loopyWriter logs a message to t.Log() after the test finish.
As I understand Server.GracefulStop() does not wait until the loopyWriter go-routine has terminated.

The message that is logged after test termination from loopyWriter happens here:

l.logger.Infof("loopyWriter exiting with error: %v", err)
) maybe there are also other log calls that can happen after GracefulStop.

Is there a way to wait in a testcase until loopyWriter terminated that I'm missing?
If not, shouldn't it be possible to do a termination that ensures no go-routines are running afterwards?
This would also e.g. work well together with https://github.com/uber-go/goleak, that checks if any go-routines are leaked in a testcase.

grpc-go version: 1.57

@fho fho changed the title loopyWriter continuing to run after GracefulStop causing race with t.Log loopyWriter continuing to run after GracefulStop causing race with t.Log Jul 28, 2023
@dfawley
Copy link
Member

dfawley commented Jul 28, 2023

We forward all log messages from the grpc packages to the test logger, by setting grpclog.SetLoggerV2 to a t.Log wrapper.

We do the same thing in our tests. 😄

It looks like GracefulStop isn't waiting for all the resources to go away - it's just waiting for the in-flight streams to end.

Stop is really what you should be using instead for this purpose. GracefulStop makes sure pending streams have a chance to end gracefully, but in a test, you shouldn't have streams running when you think your test is done. If you do actually need the behavior of GracefulStop, you can always do both: t.Cleanup(func() { s.GracefulStop(); s.Stop() }) But as a cleanup, you really only want the hard-shutdown behavior, because if a test has hung, Stop will quickly terminate things, whereas GracefulStop will just hang forever.

GracefulStop should probably be updated to ensure the transports are fully shut down as well, but this would be a minor issue that we probably wouldn't be able to prioritize for some time.

@fho
Copy link
Contributor Author

fho commented Jul 28, 2023

@dfawley thanks a lot for your help.

Yes, s.Stop() is also sufficient in the testcases, all clients are shutdown before anyways.

Why should using s.Stop() help? It also does not wait for the loopyWriter go-routine.
I reproduced this by using goleak.VerifyNone() and by adding a time.Sleep() to the defer func() of loopyWriter.run().
goleak complains that the loopyWriter is still running, when goleaks runs after s.Stop() returned.
(I could provide a testcase to reproduce it next week, if wanted.)

Does t.Stop() maybe prevent that an error happens that the loopyWriter go-routine will log?

@dfawley
Copy link
Member

dfawley commented Jul 28, 2023

I reproduced this by using goleak.VerifyNone() and by adding a time.Sleep() to the defer func() of loopyWriter.run().

Interesting.. I was thinking st.Close() (called from here:

grpc-go/server.go

Line 1862 in b8d36ca

st.Close(errors.New("Server.Stop called"))
) would block until loopy exited because of
t.controlBuf.finish()
, but it looks like it doesn't. Maybe we can add something in there that waits for loopy to exit and that would take care of both paths?

@dfawley
Copy link
Member

dfawley commented Jul 28, 2023

Also our tests may not run into this because we also always use:

https://pkg.go.dev/google.golang.org/grpc@v1.57.0/internal/leakcheck

@fho
Copy link
Contributor Author

fho commented Jul 31, 2023

t.controlBuf.finish()

, but it looks like it doesn't. Maybe we can add something in there that waits for loopy to exit and that would take care of both paths?

I'll give it a try and create a PR if it works out.

Update: PR is out: #6489

@fho
Copy link
Contributor Author

fho commented Jul 31, 2023

Also our tests may not run into this because we also always use:

https://pkg.go.dev/google.golang.org/grpc@v1.57.0/internal/leakcheck

Yes, leakcheck does not detect it because it only complains if a go-routine is still running after 10s after the test terminated.
Condition like that the LoopyWriter routine terminates shortly after Stop() returned it can not detect.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants