Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: StartProcess clearing O_NONBLOCK can lead to goroutine hang on Linux #43894

Closed
lmb opened this issue Jan 25, 2021 · 7 comments
Closed

os: StartProcess clearing O_NONBLOCK can lead to goroutine hang on Linux #43894

lmb opened this issue Jan 25, 2021 · 7 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Linux
Milestone

Comments

@lmb
Copy link
Contributor

lmb commented Jan 25, 2021

What version of Go are you using (go version)?

go version go1.15.7 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

func TestHang(t *testing.T) {
	c, err := net.Dial("udp", "127.0.0.1:0")
	if err != nil {
		t.Fatal(err)
	}
	defer c.Close()

	f, err := c.(*net.UDPConn).File()
	if err != nil {
		t.Fatal(err)
	}
	defer f.Close()

	cat, err := os.StartProcess("/bin/cat", []string{"/bin/cat"}, &os.ProcAttr{
		Files: []*os.File{os.Stdin, os.Stdout, os.Stderr, f},
	})
	if err != nil {
		t.Fatal(err)
	}
	defer cat.Wait()
	defer cat.Kill()

	go func() {
		time.Sleep(time.Second)
		t.Log("Closing")
		c.Close()
	}()

	buf := make([]byte, 1)
	// This read never returns.
	t.Log(c.Read(buf))
}

What did you expect to see?

The read on c returns after the goroutine calls Close(). The same problem exists for other syscalls like accept.

What did you see instead?

The read on c blocks indefinitely. The root cause is that os.StartProcess calls f.Fd() which in turn calls poll.FD.SetBlocking. That method does the following:

  1. Clear O_NONBLOCK from the open file descriptor flags
  2. Set fd.isBlocking to true

The problem in the example I've given is that O_NONBLOCK is shared by both f and c, but only f.isBlocking is ever set to true. The call to c.Close() sees that c.isBlocking == 0 and therefore blocks while trying to acquire runtime_Semacquire(&fd.csema) in poll.FD.Close. The same interaction is possible when using exec.Cmd.ExtraFiles since that just ends up calling os.StartProcess.

I think there are two related problems here:

  1. Clearing O_NONBLOCK from an fd can hang the go runtime.
  2. StartProcess clears O_NONBLOCK.

This behaviour is the root cause of a bug I fixed my library: cloudflare/tableflip@cae714b I was aware that the runtime clears O_NONBLOCK when calling File.Fd() while writing the library, and tried my best to avoid the problem and still got stung by this.

@seankhliao seankhliao added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux labels Jan 25, 2021
@bcmills
Copy link
Contributor

bcmills commented Jan 25, 2021

CC @ianlancetaylor; see also #29277

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/286352 mentions this issue: internal/poll: query status of nonblocking before closing

@ianlancetaylor ianlancetaylor added this to the Go1.17 milestone Jan 26, 2021
@ianlancetaylor
Copy link
Member

This is basically a dup of #29277. We pretty much have to set the file to blocking mode when we pass the descriptor to the child process. Many processes will fail if they are given a non-blocking standard input.

@lmb
Copy link
Contributor Author

lmb commented Jan 26, 2021

Yeah, I agree. The only additional thing here is that the os.StartProcess behaviour is not documented anywhere, one has to go look at the source as far as I can tell.

I understand the reasoning for stdin, stdout, stderr, but should the same apply to other files as well? It seems like touching the file status flags is best avoided as much as possible.

@dmitshur
Copy link
Contributor

Should this be closed in favor of #29277, or is it helpful to track this issue separately?

I'll move this to Backlog as it seems this won't be fixed in 1.17.

@dmitshur dmitshur modified the milestones: Go1.17, Backlog May 21, 2021
@ianlancetaylor
Copy link
Member

I think the only thing to do here beyond #29277 is a doc fix. Sending a CL.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/321852 mentions this issue: os: document that StartProcess puts files into blocking mode

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 21, 2021
@dmitshur dmitshur modified the milestones: Backlog, Go1.17 May 21, 2021
@golang golang locked and limited conversation to collaborators May 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Linux
Projects
None yet
Development

No branches or pull requests

6 participants