Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmux leaks goroutines #55

Closed
runcom opened this issue Oct 27, 2017 · 19 comments
Closed

cmux leaks goroutines #55

runcom opened this issue Oct 27, 2017 · 19 comments

Comments

@runcom
Copy link

runcom commented Oct 27, 2017

Hey there, we're using cmux to serve HTTP and gRPC together but our application is receiving a very large amount of requests (both gRPC and HTTP). We see tons of goroutines leaking around like:

   813 goroutine 754 [IO wait, 215 minutes]:
   814 internal/poll.runtime_pollWait(0x7eff86b282b0, 0x72, 0x0)
   815         /usr/lib/golang/src/runtime/netpoll.go:173 +0x57
   816 internal/poll.(*pollDesc).wait(0xc420329718, 0x72, 0xffffffffffffff00, 0x341d980, 0x340d       888)
   817         /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:85 +0xae
   818 internal/poll.(*pollDesc).waitRead(0xc420329718, 0xc42172a000, 0x1000, 0x1000)
   819         /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:90 +0x3d
   820 internal/poll.(*FD).Read(0xc420329700, 0xc42172a000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
   821         /usr/lib/golang/src/internal/poll/fd_unix.go:125 +0x18a
   822 net.(*netFD).Read(0xc420329700, 0xc42172a000, 0x1000, 0x1000, 0x13ce, 0xc420956000, 0x13       ce)
   823         /usr/lib/golang/src/net/fd_unix.go:202 +0x52
   824 net.(*conn).Read(0xc4207b2158, 0xc42172a000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
   825         /usr/lib/golang/src/net/net.go:176 +0x6d
   826 github.com/kubernetes-incubator/cri-o/vendor/github.com/soheilhy/cmux.(*bufferedReader).       Read(0xc4207be380, 0xc42172a000, 0x1000, 0x1000, 0xc42098f8c8, 0x74db11, 0xc4202e9688)
   827         /root/go/src/github.com/kubernetes-incubator/cri-o/_output/src/github.com/kubern       etes-incubator/cri-o/vendor/github.com/soheilhy/cmux/buffer.go:53 +0x144
   828 github.com/kubernetes-incubator/cri-o/vendor/github.com/soheilhy/cmux.(*MuxConn).Read(0x       c4207be370, 0xc42172a000, 0x1000, 0x1000, 0x0, 0x13ce, 0x1c6e)
   829         /root/go/src/github.com/kubernetes-incubator/cri-o/_output/src/github.com/kubern       etes-incubator/cri-o/vendor/github.com/soheilhy/cmux/cmux.go:259 +0x4f
   830 net/http.(*connReader).Read(0xc4202e9680, 0xc42172a000, 0x1000, 0x1000, 0xc420460380, 0x       1cee4a0, 0xc420460380)
   831         /usr/lib/golang/src/net/http/server.go:753 +0x105
   832 bufio.(*Reader).fill(0xc4200599e0)
   833         /usr/lib/golang/src/bufio/bufio.go:97 +0x11a
   834 bufio.(*Reader).ReadSlice(0xc4200599e0, 0x46860a, 0x48d7dd, 0xc42098fa20, 0xfffffffe1104       c0ba, 0xc42098fa60, 0x48da77)
   835         /usr/lib/golang/src/bufio/bufio.go:338 +0x2c
   836 bufio.(*Reader).ReadLine(0xc4200599e0, 0x100, 0xf8, 0x2157540, 0x468676, 0x22002098faa8,        0xf8)
   837         /usr/lib/golang/src/bufio/bufio.go:367 +0x34
   838 net/textproto.(*Reader).readLineSlice(0xc4202e96b0, 0xc42098fb20, 0xc42098fb20, 0x41b6d8       , 0x100, 0x2157540)
   839         /usr/lib/golang/src/net/textproto/reader.go:55 +0x70
   840 net/textproto.(*Reader).ReadLine(0xc4202e96b0, 0xc420750800, 0x0, 0x0, 0xc42098fba8)
   841         /usr/lib/golang/src/net/textproto/reader.go:36 +0x2b
   842 net/http.readRequest(0xc4200599e0, 0x0, 0xc420750800, 0x0, 0x0)
   843         /usr/lib/golang/src/net/http/request.go:925 +0x99
   844 net/http.(*conn).readRequest(0xc42096c280, 0x3435900, 0xc4207e4880, 0x0, 0x0, 0x0)
   845         /usr/lib/golang/src/net/http/server.go:933 +0x17c
   846 net/http.(*conn).serve(0xc42096c280, 0x3435900, 0xc4207e4880)
   847         /usr/lib/golang/src/net/http/server.go:1739 +0x50e
   848 created by net/http.(*Server).Serve
   849         /usr/lib/golang/src/net/http/server.go:2720 +0x288

Do you guys know what's happening?

@runcom
Copy link
Author

runcom commented Oct 28, 2017

We can reproduce this fairly easily also, I'm looking into this but if you have something in mind and need some testing, I can give access to a vm which can reproduce the leak.

@soheilhy
Copy link
Owner

Thanks for the report. Do you know which cmux commit (git sha1) is vendored in your project?

AFAICT, you're using cmux on a unix socket and the only way to get around this is to add a read timeout to cmux. I'll figure something out for that.

@soheilhy
Copy link
Owner

Please try setting a reasonable read timeout:
https://github.com/soheilhy/cmux/blob/master/cmux.go#L99
that should probably fix your issue.

@runcom
Copy link
Author

runcom commented Oct 28, 2017

Thanks! I'll try that and report back!

@runcom
Copy link
Author

runcom commented Oct 28, 2017

BTW we're using your latest release, which timeout do you think is reasonable though?

@soheilhy
Copy link
Owner

It really depends on the characteristics of that unix socket. The one above is blocked on read for 215 minutes. I'd say something about 1-2 minutes should be enough to give up on the unix socket.

@runcom
Copy link
Author

runcom commented Oct 29, 2017

It really depends on the characteristics of that unix socket. The one above is blocked on read for 215 minutes. I'd say something about 1-2 minutes should be enough to give up on the unix socket.

thanks I'll try with a 1 minute timeout

@runcom
Copy link
Author

runcom commented Oct 29, 2017

unfortunately even setting a read timeout isn't helping:

# TYPE go_goroutines gauge
go_goroutines 1624

@soheilhy
Copy link
Owner

All of these go routines are stuck in IO wait? can you send me the stack trace of a couple of them? I'm not sure how to reproduce.

@runcom
Copy link
Author

runcom commented Oct 29, 2017

All of these go routines are stuck in IO wait? can you send me the stack trace of a couple of them? I'm not sure how to reproduce.

I can send you a stack trace but they look exactly the same as the one I pasted in the first comment

@soheilhy
Copy link
Owner

do you still see "215 minutes" of poll time? if yes, there might be a bug in how we set read timeout in cmux.

@runcom
Copy link
Author

runcom commented Oct 29, 2017

do you still see "215 minutes" of poll time? if yes, there might be a bug in how we set read timeout in cmux.

I'll provide more goroutine stacks tomorrow but I'm pretty sure the goroutine counts wasn't lowering after the read timeout (in my case I set 1 minute). So maybe read timeout isn't fully respected.

@soheilhy
Copy link
Owner

Oh wait. I just took a deeper look and it seems like that the go-routine isn't leaked by cmux and in fact cmux has already multiplexed the connection to the actual listener. As you can see in the stack trace, the "src/net/textproto/reader.go" is trying to read the proto payload from the connection. The reason that you see cmux in the stack trace is that cmux wraps the connection to pass the sniffed payload first to Read calls. In this case, the sniffed payload was read and the proto library is still trying to read from the connection.

This is exactly why setting the read timeout on cmux isn't effective. The read connection timeout is only for when cmux is reading from the connection.

IMO, this is not a cmux issue but rather an issue on your listener. You probably want to set read timeout when your listener accepts the connection.

Here is the annotated stacktrace:

   813 goroutine 754 [IO wait, 215 minutes]:
   814 internal/poll.runtime_pollWait(0x7eff86b282b0, 0x72, 0x0)
   815         /usr/lib/golang/src/runtime/netpoll.go:173 +0x57
   816 internal/poll.(*pollDesc).wait(0xc420329718, 0x72, 0xffffffffffffff00, 0x341d980, 0x340d       888)
   817         /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:85 +0xae
   818 internal/poll.(*pollDesc).waitRead(0xc420329718, 0xc42172a000, 0x1000, 0x1000)
   819         /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:90 +0x3d
 
// And you're stuck here: 

  820 internal/poll.(*FD).Read(0xc420329700, 0xc42172a000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
   821         /usr/lib/golang/src/internal/poll/fd_unix.go:125 +0x18a
   822 net.(*netFD).Read(0xc420329700, 0xc42172a000, 0x1000, 0x1000, 0x13ce, 0xc420956000, 0x13       ce)
   823         /usr/lib/golang/src/net/fd_unix.go:202 +0x52

// CMux is simply reading the connection as a wrapper.  

   824 net.(*conn).Read(0xc4207b2158, 0xc42172a000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
   825         /usr/lib/golang/src/net/net.go:176 +0x6d
   826 github.com/kubernetes-incubator/cri-o/vendor/github.com/soheilhy/cmux.(*bufferedReader).       Read(0xc4207be380, 0xc42172a000, 0x1000, 0x1000, 0xc42098f8c8, 0x74db11, 0xc4202e9688)
   827         /root/go/src/github.com/kubernetes-incubator/cri-o/_output/src/github.com/kubern       etes-incubator/cri-o/vendor/github.com/soheilhy/cmux/buffer.go:53 +0x144
   828 github.com/kubernetes-incubator/cri-o/vendor/github.com/soheilhy/cmux.(*MuxConn).Read(0x       c4207be370, 0xc42172a000, 0x1000, 0x1000, 0x0, 0x13ce, 0x1c6e)
   829         /root/go/src/github.com/kubernetes-incubator/cri-o/_output/src/github.com/kubern       etes-incubator/cri-o/vendor/github.com/soheilhy/cmux/cmux.go:259 +0x4f
   830 net/http.(*connReader).Read(0xc4202e9680, 0xc42172a000, 0x1000, 0x1000, 0xc420460380, 0x       1cee4a0, 0xc420460380)

// Bufio is filling the buffer:

   831         /usr/lib/golang/src/net/http/server.go:753 +0x105
   832 bufio.(*Reader).fill(0xc4200599e0)
   833         /usr/lib/golang/src/bufio/bufio.go:97 +0x11a
   834 bufio.(*Reader).ReadSlice(0xc4200599e0, 0x46860a, 0x48d7dd, 0xc42098fa20, 0xfffffffe1104       c0ba, 0xc42098fa60, 0x48da77)
   835         /usr/lib/golang/src/bufio/bufio.go:338 +0x2c
   836 bufio.(*Reader).ReadLine(0xc4200599e0, 0x100, 0xf8, 0x2157540, 0x468676, 0x22002098faa8,        0xf8)

// Proto library is trying to read the line:

   837         /usr/lib/golang/src/bufio/bufio.go:367 +0x34
   838 net/textproto.(*Reader).readLineSlice(0xc4202e96b0, 0xc42098fb20, 0xc42098fb20, 0x41b6d8       , 0x100, 0x2157540)
   839         /usr/lib/golang/src/net/textproto/reader.go:55 +0x70
   840 net/textproto.(*Reader).ReadLine(0xc4202e96b0, 0xc420750800, 0x0, 0x0, 0xc42098fba8)
   841         /usr/lib/golang/src/net/textproto/reader.go:36 +0x2b
   842 net/http.readRequest(0xc4200599e0, 0x0, 0xc420750800, 0x0, 0x0)
   843         /usr/lib/golang/src/net/http/request.go:925 +0x99
   844 net/http.(*conn).readRequest(0xc42096c280, 0x3435900, 0xc4207e4880, 0x0, 0x0, 0x0)
   845         /usr/lib/golang/src/net/http/server.go:933 +0x17c

// The actual HTTP server is serving the connection:

   846 net/http.(*conn).serve(0xc42096c280, 0x3435900, 0xc4207e4880)
   847         /usr/lib/golang/src/net/http/server.go:1739 +0x50e
   848 created by net/http.(*Server).Serve
   849         /usr/lib/golang/src/net/http/server.go:2720 +0x288

@runcom
Copy link
Author

runcom commented Oct 29, 2017

So the listener is a normal golang unix listener, should I set the read timeout there then?

@soheilhy
Copy link
Owner

Yes, exactly. On the connection returned from listener.Accept(), you need to set https://golang.org/pkg/net/#UnixConn.SetReadDeadline

@runcom
Copy link
Author

runcom commented Oct 29, 2017

Yes, exactly. On the connection returned from listener.Accept(), you need to set https://golang.org/pkg/net/#UnixConn.SetReadDeadline

wait, how do we set that? Our low is basically:

listener := net.Listen("unix", socketPath)
m := cmux.New(listener)
[...] m.Match( [...]
m.Serve()

@runcom
Copy link
Author

runcom commented Oct 29, 2017

Yes, exactly. On the connection returned from listener.Accept(), you need to set https://golang.org/pkg/net/#UnixConn.SetReadDeadline
wait, how do we set that? Our low is basically:

listener := net.Listen("unix", socketPath)
m := cmux.New(listener)
[...] m.Match( [...]
m.Serve()

actually, we can set ReadTimeout on the http.Server

@runcom
Copy link
Author

runcom commented Oct 29, 2017

alright, that seems to be draining goroutines, I set it to 5 * time.Second and it seems to work :) Thanks!

@runcom runcom closed this as completed Oct 29, 2017
@soheilhy
Copy link
Owner

Great! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants