Skip to content

os/exec: syscall.forkExec hang when spawning multiple processes concurrently on darwin #61080

@JacobOaks

Description

@JacobOaks

What version of Go are you using (go version)?

$ go version
go version go1.21rc2 darwin/arm64

Does this issue reproduce with the latest release?

Yes, it reproduces since this change.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/joaks/Library/Caches/go-build'
GOENV='/Users/joaks/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/joaks/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/joaks/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/joaks/go/src/github.com/golang/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/joaks/go/src/github.com/golang/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21rc2'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/xj/2wbc4_xn293gkz7_6rzxsz5w0000gn/T/go-build3971151895=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

We have a code generator that we use at Uber that spawns up many concurrent child processes that communicate via stdin & stdout. While doing internal testing with Go1.21rc2, we noticed the code generator hanging. A very minimal runnable repro can be found in this repository: https://github.com/JacobOaks/Go1.21rc2-syscall.forkExec-hanging-repro.

Essentially, we are spinning up a bunch of external processes with stdin & stdout pipes concurrently. Something like (see link above for full repro):

func spawn(binaryPath string, n int) []*client {
	clients := make([]*client, n)
	for i := 0; i < n; i++ {
		clients[i] = newClient(binaryPath)
	}

	var wg sync.WaitGroup
	for i := 0; i < n; i++ {
		wg.Add(1)
		client := clients[i]
		go func() {
			if err := client.start(); err != nil {
				panic("TODO")
			}
			wg.Done()
		}()
	}
	wg.Wait()
	return clients
}

type client struct {
	cmd *exec.Cmd

	stdout io.ReadCloser
	stdin  io.WriteCloser
}

func newClient(binary string) *client {
	return &client{
		cmd: exec.Command(binary),
	}
}

func (c *client) start() error {
	var err error
	c.stdout, err = c.cmd.StdoutPipe()
	if err != nil {
		return fmt.Errorf("create stdout pipe: %w", err)
	}
	c.stdin, err = c.cmd.StdinPipe()
	if err != nil {
		return fmt.Errorf("create stdin pipe: %w", err)
	}
	if err = c.cmd.Start(); err != nil {
		return fmt.Errorf("run cmd: %w", err)
	}
	return nil
}

Attaching delve to the hanging process, we notice the issue occurs in cmd.Start, where syscall.forkExec seems to hang:

(dlv) grs
  Goroutine 1 - User: /Users/joaks/go/src/github.com/golang/go/src/runtime/sema.go:62 sync.runtime_Semacquire (0x1026bf57c) [semacquire]
  Goroutine 2 - User: /Users/joaks/go/src/github.com/golang/go/src/runtime/proc.go:399 runtime.gopark (0x102695198) [force gc (idle)]
  Goroutine 3 - User: /Users/joaks/go/src/github.com/golang/go/src/runtime/proc.go:399 runtime.gopark (0x102695198) [GC sweep wait]
  Goroutine 4 - User: /Users/joaks/go/src/github.com/golang/go/src/runtime/proc.go:399 runtime.gopark (0x102695198) [GC scavenge wait]
  Goroutine 5 - User: /Users/joaks/go/src/github.com/golang/go/src/runtime/proc.go:399 runtime.gopark (0x102695198) [finalizer wait]
  Goroutine 12 - User: /Users/joaks/go/src/github.com/golang/go/src/runtime/sys_darwin.go:24 syscall.syscall (0x1026bfaf8) (thread 18311682) [timer goroutine (idle)]
[6 goroutines]
(dlv) gr 12
Switched from 0 to 12 (thread 18311682)
(dlv) stack
 0  0x000000018f884acc in ???
    at ?:-1
 1  0x00000001026c0b58 in runtime.systemstack_switch
    at /Users/joaks/go/src/github.com/golang/go/src/runtime/asm_arm64.s:200
 2  0x00000001026b19dc in runtime.libcCall
    at /Users/joaks/go/src/github.com/golang/go/src/runtime/sys_libc.go:49
 3  0x00000001026bfaf8 in syscall.syscall
    at /Users/joaks/go/src/github.com/golang/go/src/runtime/sys_darwin.go:24
 4  0x00000001026daa5c in syscall.readlen
    at /Users/joaks/go/src/github.com/golang/go/src/syscall/syscall_darwin.go:242
 5  0x00000001026d9c30 in syscall.forkExec
    at /Users/joaks/go/src/github.com/golang/go/src/syscall/exec_unix.go:217
 6  0x00000001026e9628 in syscall.StartProcess
    at /Users/joaks/go/src/github.com/golang/go/src/syscall/exec_unix.go:334
 7  0x00000001026e9628 in os.startProcess
    at /Users/joaks/go/src/github.com/golang/go/src/os/exec_posix.go:54
 8  0x00000001026e9340 in os.StartProcess
    at /Users/joaks/go/src/github.com/golang/go/src/os/exec.go:111
 9  0x00000001026fc534 in os/exec.(*Cmd).Start
    at /Users/joaks/go/src/github.com/golang/go/src/os/exec/exec.go:693
10  0x00000001026ff368 in main.(*client).start
    at ./server/main.go:105
11  0x00000001026fefa8 in main.spawn.func1
    at ./server/main.go:46
12  0x00000001026c3024 in runtime.goexit
    at /Users/joaks/go/src/github.com/golang/go/src/runtime/asm_arm64.s:1197

This behavior is flaky and in our investigation, only appears on Go1.21rc2 on darwin-arm64.

git bisect indicated this change to be the culprit.

What did you expect to see?

I would expect the program in the linked repro to not hang, as in Go 1.20.

What did you see instead?

It occasionally hangs, see above.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions