Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: Copy is easy to misuse and leak goroutines blocked on reads #58628

Open
cpuguy83 opened this issue Feb 21, 2023 · 17 comments
Open

io: Copy is easy to misuse and leak goroutines blocked on reads #58628

cpuguy83 opened this issue Feb 21, 2023 · 17 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@cpuguy83
Copy link

What version of Go are you using (go version)?

All

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

N/A

What did you do?

Because of the way io.Copy works (and really any other solution to the same need) it is very easy to wedge a goroutine, and probably an fd along with it.

Example:

pr1, _ := io.Pipe()
_, pw2 := io.Pipe()
go io.Copy(pw2, pr1)

pw2.Close()

In this case, the goroutine will leak once pw2 is closed because there is no way to cancel that copy without also closing pr1.
In some cases this makes sense, especially if you control both pipes.

In docker/moby this issue is a common case where we have a client connected over a unix or TCP socket, it is attached to a container's stdio streams.
If the client disconnects there is no mechanism (aside from platform specific code like epoll) to get notified that the copy operation needs to be cancelled.

I suggest an interface be added, potentially to io to receive close notifications from the runtime.
Example:

type CloseNotifier() {
    CloseNotify() <-chan struct{}
}

The actual interface could be something like (maybe something that returns a context that gets cancelled?).
I'm not married to the above interface (I know there's a similarly named one in net/http).

*os.File and net.Conn implementations could implement this interface to help with the above scenario, and since the go runtime is already doing a bunch of fd notification magic it should generally be easily supported in the stdlib across platforms.

@ianlancetaylor ianlancetaylor changed the title io.Copy is easy to mis-use and leak goroutines blocked on reads runtime: io.Copy is easy to mis-use and leak goroutines blocked on reads Feb 21, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 21, 2023
@ianlancetaylor
Copy link
Member

I don't understand what would be reading from that channel. How does it help with the io.Copy, which is presumably sitting in a call to Read?

@Jorropo
Copy link
Member

Jorropo commented Feb 22, 2023

If the client disconnects there is no mechanism (aside from platform specific code like epoll) to get notified that the copy operation needs to be cancelled.

.Read returns io.EOF (or an other error if the connection was killed or a deadline is reached) on all OSes, you don't need to implement anything using epoll the runtime do this for you already.
See here this very simple example:

package main

import (
	"fmt"
	"io"
	"net"
	"os"
)

func main() {
	conn, err := net.Dial("tcp", "localhost:11111")
	if err != nil {
		panic(err)
	}
	defer conn.Close()

	_, err = io.Copy(os.Stdout, conn)
	fmt.Println("copy exited:", err)
}

If I listen using nc -lvp 11111, run go run a.go and finaly CTRL + C netcat, I see:

copy exited: <nil>

I think you have some other bug and this is not related to io.Copy (timeouts ?).

*os.File and net.Conn implementations could implement this interface to help with the above scenario, and since the go runtime is already doing a bunch of fd notification magic it should generally be easily supported in the stdlib across platforms.

They already do by returning an error on .Read when the ressource is closed while you were reading from it.

@cpuguy83
Copy link
Author

@ianlancetaylor

I don't understand what would be reading from that channel.

Another goroutine that can close the read side.

@Jorropo The issue here is that the read side is never closed, only the write side.

@Jorropo
Copy link
Member

Jorropo commented Feb 22, 2023

Another goroutine that can close the read side.
The issue here is that the read side is never closed, only the write side.

So the issue here is that dst (let's say the container) is closing the write end (it's stdin but not stdout and stderr) however src (let's say the remote client) is deadlocked (it has no more data to send and waits on the write end on to be closed) ?
And since you are blocked on the remote read, you never attempt to write to stdin and thus never notice it is closed for writing (which if you did you would propagate to the remote read).

Is the paragraph above correct ?

(it's stdin but not stdout and stderr)

This is important for the setup here, because if stdout and or stderr are closed, your container -> client goroutines could be tasked to close the client -> container connection (with some extra logic).

@ianlancetaylor
Copy link
Member

If I understand correctly, you are suggesting that an os.File should implement a CloseNotify method that it can use to notify people when it is closed. If so, that seems like something that can be implemented as a wrapper around os.File.

@cpuguy83
Copy link
Author

@ianlancetaylor It could indeed be, but since the runtime already has epoll (or equivalent on other platforms) wired up it seemed like a good case to generalize.

@ianlancetaylor
Copy link
Member

Closing a descriptor is not the same as closing a os.File. It's the other way around: closing an os.File closes the descriptor.

@rittneje
Copy link

It isn't very clear from your description which of the following situations you are in.

  1. You have a file/socket in your process that you are writing to via io.Copy in some goroutine. Elsewhere in your application, you decide to close that file/socket. However, since io.Copy is waiting to read data from some other file/socket, the goroutine does not exit.
  2. You have a pair of sockets/fds. One fd is used for writing (in your process), while the other is used for reading (in some other process). The other process decides to close its fd (or it exits or gets killed or whatever). Again, since io.Copy is waiting to read data from some other file/socket, the goroutine does not exit.

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 22, 2023
@cpuguy83
Copy link
Author

@rittneje In the specific case with moby, we have a long running process (ie a container) which we allow API users to attach to the stdio streams of.
So we have goroutines that effectively look like io.Copy(apiStream, processStdout).
When apiStream is disconnected the goroutine is wedged unless there is new data to read from processStdout.

Basically there is no facility built-in to go to know when a fd has closed other than by trying to read/write on it (perhaps some other socket op would do as well).
My solution to our specific problem right now is to drop down to the syscall.RawConn, setup epoll to notify me when the fd closes, wait for it in a goroutine, then run a cancel function. I'm not familiar with the Windows side of how to do this, but we'll need to do the equivalent on Windows.
I guess the other option is to say "this old API is broken" and add a new API that tracks streams and has explicit call to signal a close rather than just closing the fd.

@bcmills
Copy link
Contributor

bcmills commented Feb 22, 2023

If the client disconnects there is no mechanism (aside from platform specific code like epoll) to get notified that the copy operation needs to be cancelled.

Both net.TCPConn and net.UnixConn are bidirectional: they support not only reading but also writing. I wonder: for your use cases, are you also making use of the read side of those connections?

In particular, if you don't expect the other side to write anything more to the socket, you could call Read on it, which would block until the other side disconnects. Then you run your io.Copy, and if the Read call on the write socket unblocks you can close or cancel whatever is producing the input to the Copy operation as well.

That is, I would expect a realistic example to look more like:

var (
	dst io.ReadWriteCloser =src io.ReadCloser = …
)

errc := make(chan error, 1)
go func() {
	_, err := io.Copy(io.Discard, dst)
	// The connection to dst is broken.
	// Close src to interrupt the io.Copy from it.
	src.Close()
	errc <- err
}()

_, srcErr := io.Copy(dst, src)
// Close dst to interrupt the io.Copy from it.
dst.Close()
dstErr := <-errc

@cpuguy83
Copy link
Author

@bcmills We do use the read side. Your example also doesn't work if/when the client does a close-write on the socket, which wouldn't be completely unheard of it the client has nothing to send and is only expecting to receive.

@rittneje
Copy link

@cpuguy83 My question was really more getting to the fact that your proposed CloseNotify() method is kind of ambiguous. Does the channel signal that:

  1. This file descriptor was closed?
  2. The other end of this file descriptor was closed or shut down (i.e., "broken pipe")?
  3. Both?

Also, I assume that currently the runtime poller only polls if you are currently trying to read/write with the fd. But for CloseNotify() to function it would have to eternally poll every fd (or at least, every fd where that method had ever been called). Also you would specifically be waiting for the fd NOT to be available for writing, which AFAIK is not something that epoll offers.

@rittneje
Copy link

I will also point out that io.Copy simply is not a safe operation in production applications in general.

For remote TCP connections in particular, you need some sort of application-level ping between processes in order to properly deal with half-closed connections.

On a somewhat related note, io.Copy does not provide any straightforward way of applying timeouts/deadlines to each read or write operation. So for example, if you are copying to a pipe, and the process on the other side of the pipe stops reading, you will essentially deadlock.

Certainly there are use cases for io.Copy that are willing to accept the aforementioned caveats. But then the fact that io.Copy won't notice the destination is no longer writable until it has something to write is just another caveat.

@mknyszek mknyszek changed the title runtime: io.Copy is easy to mis-use and leak goroutines blocked on reads io: Copy is easy to misuse and leak goroutines blocked on reads Feb 24, 2023
@mknyszek mknyszek removed the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 24, 2023
@mknyszek mknyszek added this to the Backlog milestone Feb 24, 2023
@cpuguy83
Copy link
Author

@rittneje On Linux I would expect this to be waiting for EPOLLHUP, however half-closed (EPOLLRDHUP) is indeed another scenario that should definitely be considered when designing the interface.

I don't think it's particularly useful to say io.Copy is not safe.
It is for the most part pretty straight forward and even has performance optimizations (with support for splice, sendfile, and copy_file_range).

@rittneje
Copy link

On Linux I would expect this to be waiting for EPOLLHUP, however half-closed (EPOLLRDHUP) is indeed another scenario that should definitely be considered when designing the interface.

@cpuguy83 Again, AFAIK, you cannot poll an fd until EPOLLHUP. Rather, you can only poll until the fd is available for writing (or reading). If you go to poll and the fd is already shut down then you'll get EPOLLHUP, but you cannot tell it to block until that happens. Thus what you are proposing would require what amounts to a busy loop.

@cpuguy83
Copy link
Author

@rittneje You are right EPOLLHUP does not seem to work on tcp sockets, though it does with unix sockets.
In any case, EPOLLRDHUP seems to do fine there.
https://gist.github.com/cpuguy83/377f1a8eedf32159094845dc1ae77f95

@xvertile
Copy link

xvertile commented Sep 22, 2024

I have also encountered this issue of it just blocking for ever in a TCP server setup. What I have done is set the deadline on a connection:

timeoutDuration := 1 * time.Second
proxyConn.Conn.SetDeadline(time.Now().Add(timeoutDuration))

its not perfect and I really wish the io.Copy had some lower level implementation to prevent it from blocking for ever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

9 participants