Skip to content

os/exec: .Wait(): non-deterministic behavior (race?) #68308

@flynx

Description

@flynx

Go version

go1.21.11 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/f_lynx/.cache/go-build'
GOENV='/home/f_lynx/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/f_lynx/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/f_lynx/go'
GOPRIVATE=''
GOPROXY='direct'
GOROOT='/usr/lib/golang'
GOSUMDB='off'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/lib/golang/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.11'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/f_lynx/work/filemanager/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1310955679=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Run an external command and handle it's output:

  • cmd := exec.Command("ls"),
  • Setup handling of stdout of said command,
  • cmd.Start(),
  • cmd.Wait().

The demo program:

  • Get the number of files in current dir (should be more than a few),
  • Run ls -a a number of times, each time comparing the resulting number of lines with the result from os.ReadDir("."),
  • If the numbers match print "." if not report it.

Complete code (https://go.dev/play/p/DmkCA_pYJiy):

package main

import (
	"bufio"
	"fmt"
	"os"
	"os/exec"
)

var RunCount = 1000

func main() {
	files, _ := os.ReadDir(".")
	// +2 accounts for "." and ".." in the output of ls...
	c := len(files) + 2
	s := 0
	report := func(n int) {
		if n != c {
			fmt.Printf("\nListed: %v of %v (no errors)\n", n, c)
			s++
		} else {
			fmt.Print(".")
		}
	}
	for i := 0; i < RunCount; i++ {
		n := 0
		done := make(chan bool)
		c := exec.Command("ls", "-a")
		out, _ := c.StdoutPipe()
		go func() {
			scanner := bufio.NewScanner(out)
			for scanner.Scan() {
				scanner.Text()
				n++
			}
			// handle output...
			report(n)
			close(done)
		}()
		// start...
		if err := c.Start(); err != nil {
			fmt.Println("!!! START:", err)
		}
		// XXX this breaks the run script some of the time...
		if err := c.Wait(); err != nil {
			fmt.Println("!!! WAIT:", err)
		}
		<-done
	}
	fmt.Println("")
	if s > 0 {
		fmt.Printf("Skipped part of the output %v times of %v\n", s, RunCount)
	}
}

What did you see happen?

Some iterations/runs of produce the expected results, i.e. the number of files returned by os.ReadDir(".") match the number returned by exec.Command("ls", "-a"), but some of the time the later produces a number less than the expected number, mostly but not always 0, without producing any errors (see below for exception).

Example output:

$ go run experiments/exec_wait_race.go 
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Listed: 0 of 27 (no errors)
.....................................................................................................................................
Listed: 0 of 27 (no errors)
..............................................................................................................................
Skipped part of the output 2 times of 1000

Only on go1.22.1 android/arm sometimes the code segfaults (different stacks/locations), example output (truncated):

...
Listed: 0 of 23 (no errors)
.........................
Listed: 0 of 23 (no errors)
...
Listed: 0 of 23 (no errors)
..................
Listed: 0 of 23 (no errors)

Listed: 0 of 23 (no errors)
................panic during panic
SIGSEGV: segmentation violation
PC=0xb2590290 m=0 sigcode=2fatal: morestack on gsignal
signal: segmentation fault

Note that executing cmd.Wait() after the scanner.Scan() loop makes the issue less prevalent, and I could no longer reproduce the issue after inserting a time.Sleep(..) before .Wait().

What did you expect to see?

As stated in the documentation (https://pkg.go.dev/os/exec#Cmd.Wait):

Wait waits for the command to exit and waits for any copying to stdin or copying from stdout or stderr to complete.

i.e. the executed command should complete and it's output be fully captured before wait is done.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions