Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: StartProcess clearing O_NONBLOCK can lead to goroutine hang on Linux #43894

Open
lmb opened this issue Jan 25, 2021 · 4 comments
Open

os: StartProcess clearing O_NONBLOCK can lead to goroutine hang on Linux #43894

lmb opened this issue Jan 25, 2021 · 4 comments

Comments

@lmb
Copy link
Contributor

@lmb lmb commented Jan 25, 2021

What version of Go are you using (go version)?

go version go1.15.7 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

func TestHang(t *testing.T) {
	c, err := net.Dial("udp", "127.0.0.1:0")
	if err != nil {
		t.Fatal(err)
	}
	defer c.Close()

	f, err := c.(*net.UDPConn).File()
	if err != nil {
		t.Fatal(err)
	}
	defer f.Close()

	cat, err := os.StartProcess("/bin/cat", []string{"/bin/cat"}, &os.ProcAttr{
		Files: []*os.File{os.Stdin, os.Stdout, os.Stderr, f},
	})
	if err != nil {
		t.Fatal(err)
	}
	defer cat.Wait()
	defer cat.Kill()

	go func() {
		time.Sleep(time.Second)
		t.Log("Closing")
		c.Close()
	}()

	buf := make([]byte, 1)
	// This read never returns.
	t.Log(c.Read(buf))
}

What did you expect to see?

The read on c returns after the goroutine calls Close(). The same problem exists for other syscalls like accept.

What did you see instead?

The read on c blocks indefinitely. The root cause is that os.StartProcess calls f.Fd() which in turn calls poll.FD.SetBlocking. That method does the following:

  1. Clear O_NONBLOCK from the open file descriptor flags
  2. Set fd.isBlocking to true

The problem in the example I've given is that O_NONBLOCK is shared by both f and c, but only f.isBlocking is ever set to true. The call to c.Close() sees that c.isBlocking == 0 and therefore blocks while trying to acquire runtime_Semacquire(&fd.csema) in poll.FD.Close. The same interaction is possible when using exec.Cmd.ExtraFiles since that just ends up calling os.StartProcess.

I think there are two related problems here:

  1. Clearing O_NONBLOCK from an fd can hang the go runtime.
  2. StartProcess clears O_NONBLOCK.

This behaviour is the root cause of a bug I fixed my library: cloudflare/tableflip@cae714b I was aware that the runtime clears O_NONBLOCK when calling File.Fd() while writing the library, and tried my best to avoid the problem and still got stung by this.

@bcmills
Copy link
Member

@bcmills bcmills commented Jan 25, 2021

CC @ianlancetaylor; see also #29277

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 25, 2021

Change https://golang.org/cl/286352 mentions this issue: internal/poll: query status of nonblocking before closing

@ianlancetaylor ianlancetaylor added this to the Go1.17 milestone Jan 26, 2021
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 26, 2021

This is basically a dup of #29277. We pretty much have to set the file to blocking mode when we pass the descriptor to the child process. Many processes will fail if they are given a non-blocking standard input.

@lmb
Copy link
Contributor Author

@lmb lmb commented Jan 26, 2021

Yeah, I agree. The only additional thing here is that the os.StartProcess behaviour is not documented anywhere, one has to go look at the source as far as I can tell.

I understand the reasoning for stdin, stdout, stderr, but should the same apply to other files as well? It seems like touching the file status flags is best avoided as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants