Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

goroutine leaks #528

Closed
bradfitz opened this issue Feb 5, 2016 · 14 comments · Fixed by #535
Closed

goroutine leaks #528

bradfitz opened this issue Feb 5, 2016 · 14 comments · Fixed by #535

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Feb 5, 2016

stream.NewClientStream seems to leak a goroutine.

I have some unsent local changes to the end2end tests which check for goroutine leaks on each test.

I see:

--- FAIL: TestExceedMaxStreamsLimit (7.02s)
        end2end_test.go:365: Running test in TCP environment...
        end2end_test.go:1388: Leaked goroutine: goroutine 2738 [select]:
                google.golang.org/grpc.NewClientStream.func1(0x7f9b8946d248, 0xc82035e690, 0xc8200da2a0, 0xc820013400)
                        /home/bradfitz/src/google.golang.org/grpc/stream.go:148 +0x257
                created by google.golang.org/grpc.NewClientStream
                        /home/bradfitz/src/google.golang.org/grpc/stream.go:156 +0xa99
--- FAIL: TestCompressOK (6.06s)
        end2end_test.go:365: Running test in TCP environment...
        end2end_test.go:1388: Leaked goroutine: goroutine 2750 [select]:
                google.golang.org/grpc.NewClientStream.func1(0x7f9b8946d248, 0xc82035e870, 0xc82010c9a0, 0xc820013860)
                        /home/bradfitz/src/google.golang.org/grpc/stream.go:148 +0x257
                created by google.golang.org/grpc.NewClientStream
                        /home/bradfitz/src/google.golang.org/grpc/stream.go:156 +0xa99
FAIL
FAIL    google.golang.org/grpc/test     33.222s

That leak is from stream.go:

// NewClientStream creates a new Stream for the client side. This is called
// by generated code.
func NewClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, method string, opts ...CallOption) (ClientStream, error) {
....
....
....
        // Listen on ctx.Done() to detect cancellation when there is no pending
        // I/O operations on this stream.
        go func() {
                select {
                case <-t.Error():
                        // Incur transport error, simply exit.
                case <-s.Context().Done():
                        err := s.Context().Err()
                        cs.finish(err)
                        cs.closeTransportStream(transport.ContextErr(err))
                }
        }()
        return cs, nil
}

I started looking for this when running the race detector with a high go test -race -count=NNN value, because the race detector hit its trackable goroutine limit and crashed, which made me suspicious what was leaking.

I'll send changes for adding the goroutine leak checker.

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 5, 2016

I think this is only a problem with the tests. e.g. this fixes CompressOK:

@@ -1283,10 +1308,12 @@ func testCompressOK(t *testing.T, e env) {
                t.Fatalf("TestService/UnaryCall(_, _) = _, %v, want _, <nil>", err)
        }
        // Streaming RPC
-       stream, err := tc.FullDuplexCall(context.Background())
+       ctx, cancel := context.WithCancel(context.Background())
+       stream, err := tc.FullDuplexCall(ctx)
        if err != nil {
                t.Fatalf("%v.FullDuplexCall(_) = _, %v, want <nil>", tc, err)
        }
+       defer cancel()
        respParam := []*testpb.ResponseParameters{
                {
                        Size: proto.Int32(31415),

@iamqizhao
Copy link
Contributor

Yes, it is the test problem. These two tests keep the streams open when they exit. I am going to fix them soon. Thanks for reporting.

@iamqizhao
Copy link
Contributor

BTW, goroutine leaking checker would be very helpful. Previously, I used debug/pprof/block to check manually. Thanks.

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 5, 2016

I have fixes for these locally too. I can include them in my CL if you want, or send separately.

@iamqizhao
Copy link
Contributor

My PR is also ready. Running tests now to make sure it is okay. I think making it a separate PR would be better. Feel free to send your fix out if you want. Otherwise, I can send mine too. :)

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 5, 2016

I'll send mine, since it has the leak checker in the same CL. I'll send it first, before my bigger changes.

@iamqizhao
Copy link
Contributor

sgtm

On Fri, Feb 5, 2016 at 1:24 PM, Brad Fitzpatrick notifications@github.com
wrote:

I'll send mine, since it has the leak checker in the same CL. I'll send it
first, before my bigger changes.


Reply to this email directly or view it on GitHub
#528 (comment).

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 6, 2016

I noticed more failures on the Travis run once I enabled this:

--- FAIL: TestHealthCheckOnFailure-4 (5.31s)
    end2end_test.go:356: Running test in tcp-clear environment...
    end2end_test.go:356: Running test in tcp-tls environment...
    end2end_test.go:356: Running test in unix-clear environment...
    end2end_test.go:356: Running test in unix-tls environment...
    <autogenerated>:31: Leaked goroutine: goroutine 10361 [select]:
        google.golang.org/grpc/transport.(*http2Server).controller(0xc2080e6000)
            /home/travis/gopath/src/google.golang.org/grpc/transport/http2_server.go:626 +0x84f
        created by google.golang.org/grpc/transport.newHTTP2Server
            /home/travis/gopath/src/google.golang.org/grpc/transport/http2_server.go:134 +0xb66

@iamqizhao, this one doesn't look as obviously easy to fix. I'm going to disable the leak checker for now and get it submitted, and then you can fix this bug.

bradfitz added a commit to bradfitz/grpc-go that referenced this issue Feb 6, 2016
Some leaks remain, so this is disabled for now.

Updates grpc#528
bradfitz added a commit to bradfitz/grpc-go that referenced this issue Feb 6, 2016
Some leaks remain, so this is disabled for now.

Updates grpc#528
bradfitz added a commit to bradfitz/grpc-go that referenced this issue Feb 8, 2016
Some leaks remain, so this is disabled for now.

Updates grpc#528
@iamqizhao
Copy link
Contributor

I am looking.

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 8, 2016

Should we re-open this bug? I didn't mean to close it in the commit message but somehow managed to.

@mwitkow
Copy link
Contributor

mwitkow commented Feb 8, 2016

@bradfitz thanks a bunch for finding these! we spent a while trying to get down to where we were leaking goroutines in our prod servers. Rolling restarts got tiring after a while :)

Will test these out ASAP :)

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 9, 2016

@mwitkow, the ones I fixed I think only affect tests, not prod. I don't think you'll see a difference yet. @iamqizhao is still looking for the transport.(*http2Server).controller leak.

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 9, 2016

@mwitkow, but please comment here with details of which goroutines of yours you see leaking.

@iamqizhao
Copy link
Contributor

#535 will fix this. Waiting for travis about 2 hrs already. It is overwhelmed by a lot of pull requests from grpc C++...

@lock lock bot locked as resolved and limited conversation to collaborators Sep 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants