New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: Wait behavior changed from Go 1.9 to Go 1.10 on Windows #25835

Open
rsc opened this Issue Jun 11, 2018 · 25 comments

Comments

Projects
None yet
5 participants
@rsc
Contributor

rsc commented Jun 11, 2018

Copying from #23019 (comment)

Here is the reproduction as a main function with correct markdown :-)
THX @ysmolsky!

Result does not change: works with go 1.9, breaks with go 1.10.

package main

import (
	"bufio"
	"context"
	"fmt"
	"os"
	"os/exec"
	"strings"
	"syscall"
	"time"
)

func fatal(format string, args ...interface{}) {
	fmt.Println(fmt.Sprintf(format, args...))
	os.Exit(1)
}

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), time.Second)
	defer cancel()

	cmd := exec.CommandContext(ctx, "sh", "-c", "echo hello world && sleep 5")
	stdoutPipe, err := cmd.StdoutPipe()
	if err != nil {
		fatal("no pipe: %v", err)
	}
	start := time.Now()

	if err = cmd.Start(); err != nil {
		fatal("start failed: %v", err)
	}

	go func() {
		var stdout string
		buf := bufio.NewReader(stdoutPipe)
		for {
			line, err := buf.ReadString('\n')
			if len(line) > 0 {
				stdout = stdout + line + "\n"
			}
			if err != nil {
				if !strings.HasPrefix(stdout, "hello world") {
					fatal("wrong output: %q", stdout)
				}
				return
			}
		}
	}()

	err = cmd.Wait()
	d := time.Since(start)

	if err != nil {
		exiterr := err.(*exec.ExitError)
		status := exiterr.Sys().(syscall.WaitStatus)
		if status.ExitStatus() == 0 {
			fatal("wrong exit status: %v", status.ExitStatus())
		}
	}

	if d.Seconds() >= 3 {
		fatal("Cancelation took too long: %v", d)
	}
	fmt.Println("Success!")
}

@rsc rsc added this to the Go1.11 milestone Jun 11, 2018

@rsc

This comment has been minimized.

Contributor

rsc commented Jun 11, 2018

@dsymonds

This comment has been minimized.

Member

dsymonds commented Jun 12, 2018

Ah, I have also tripped over this recently! I have been observing the output of subprocesses writing to stderr (by the parent's os.Stderr being assigned to cmd.Stderr) happening after Wait has returned. For me it is logging, so not a big deal, but it was certainly confusing and I hadn't had time to track it down.

@dsymonds

This comment has been minimized.

Member

dsymonds commented Jun 12, 2018

Sorry, my mistake. My situation also has cmd.Stderr connected to a pipe like the reproduction above.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 13, 2018

@kgadams I can't recreate the problem. On my GNU/Linux system, the example program shown above behaves the same with Go 1.9.6, 1.10.3, and tip. In all cases it prints Success!.

What system are you running on? What exact version of Go are you using? Thanks.

@dsymonds

This comment has been minimized.

Member

dsymonds commented Jun 13, 2018

I've got what I believe is a simpler reproduction:

package main

import (
        "context"
        "fmt"
        "log"
        "os/exec"
        "time"
)

func main() {
        ctx, cancel := context.WithTimeout(context.Background(), time.Second)
        defer cancel()

        cmd := exec.CommandContext(ctx, "sh", "-c", "echo hello world && sleep 5")

        start := time.Now()
        output, err := cmd.CombinedOutput()
        d := time.Since(start)
        fmt.Printf("output: %q\n", output)
        fmt.Printf("err: %v\n", err)
        fmt.Printf("d: %v\n", d)

        if err == nil {
                log.Fatalf("command didn't fail after %v", d)
        }

        if d.Seconds() >= 3 {
                log.Fatalf("Cancelation took too long: %v", d)
        }
        fmt.Println("Success!")
}

This fails (with the "Cancelation took too long" message) on go version devel +e5f0c1f6c9 Thu Jun 7 07:00:46 2018 +0000 darwin/amd64. However, it also fails on go version go1.7.4 linux/amd64, so maybe it's been buggy all that time.

@dsymonds

This comment has been minimized.

Member

dsymonds commented Jun 13, 2018

The possible connection to the case reported up the top is this: if I change from cmd.CombinedOutput() to cmd.Run(), I get a signal: killed error after 1s (the context cancelation time). So it seems to be an issue with capturing the output.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 13, 2018

I think we are seeing shell dependent behavior. What is sh on your PATH?

@dsymonds

This comment has been minimized.

Member

dsymonds commented Jun 13, 2018

sh is /bin/sh, which is GNU bash, version 3.2.57(1)-release. But does it matter? cmd.Wait() or cmd.CombinedOutput() shouldn't last much longer than the context provided to exec.CommandContext. In my real world case, I am executing a Mach-O binary, not sh.

@kgadams

This comment has been minimized.

kgadams commented Jun 13, 2018

I tested all go/os combinations I could get my hands on, and I found that the version of @dsymonds fails consistently on all of them.
The original version of the test program however succeeds on

  • go version go1.9.7 windows/amd64
  • go version go1.9.3 windows/amd64
  • go version go1.10.3 linux/amd64
  • go version go1.9.7 linux/amd64
  • go version go1.10.3 linux/ppc64le
  • go version go1.9.7 linux/ppc64le

It fails on:

  • go version go1.10.3 windows/amd64
  • go version go1.10.1 windows/amd64

So it looks like a windows only problem to me. And somehow there is a difference between using CombinedOutput() and StdoutPipe()

FWIW: I run it with $GOROOT/bin/go run cmd.go under a git environment, so the sh used on Windows is
GNU bash, version 4.4.12(2)-release (x86_64-pc-msys)

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 13, 2018

There is definitely a significant difference between CombinedOutput and StdoutPipe. The former creates a goroutine to pull all the data from the pipe, and waits for that goroutine to complete, which means that it waits for the descriptor to close. That is #23019. StdoutPipe does not create a goroutine.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 13, 2018

@kgadams Thanks for explaining that your failure is only on Windows. That makes a huge difference.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 13, 2018

The reason @dsymonds 's program works for me is indeed a shell feature. With the shell I am using (bash 4.4.12(1)-release on GNU/Linux) the sleep 5 is executed directly using execve, rather than calling fork first. If I change the program to run echo hello world && sleep 5 && echo slept then I see the delay. This is simply #23019, and is presumably unrelated to the original program's failure on WIndows.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 13, 2018

There is more about exec on Windows with child processes in #17245. I don't know why it would have changed between 1.9 and 1.10, though.

@ianlancetaylor ianlancetaylor changed the title from os/exec: Wait behavior changed from Go 1.9 to Go 1.10 to os/exec: Wait behavior changed from Go 1.9 to Go 1.10 on Windows Jun 13, 2018

@dsymonds

This comment has been minimized.

Member

dsymonds commented Jun 13, 2018

@ianlancetaylor: Thanks, but #23019 doesn't seem to have much to say about the provided context not being respected. In my real case, I'm running a real binary with a 5s context timeout, and it runs for several minutes. That's running on Linux, not Windows. Is there a bug open that would be relevant to that? I couldn't find one with a bit of searching.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 13, 2018

@dsymonds The problem you are seeing--CombinedOutput does not return even though the context has been canceled--is exactly what #23019 is about. It's because the Wait does not complete until all the child processes have exited.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 27, 2018

@crvv

This comment has been minimized.

Contributor

crvv commented Aug 22, 2018

The behavior changing comes from 382d492

cmd.Wait() is waiting for closing the pipe reader at

c.closeDescriptors(c.closeAfterWait)

In Go1.10, (*os.File).Close is blocked by (*os.File).Read. This won't happen in Go1.9
This only happens on Windows because of #19098

(*os.File).Read is waiting for grandchildren, which is similar to #23019

https://golang.org/pkg/os/exec/#Cmd.StdoutPipe
The document says "it is incorrect to call Wait before all reads from the pipe have completed"
So maybe this is not a bug of Go stdlib.

@kgadams

This comment has been minimized.

kgadams commented Aug 22, 2018

Does anybody know of a way to work around this problem on Windows?
I very much would like to have the same behavior on Windows with Go 1.10 then we have on Linux with Go 1.9 and Go 1.10 and on Windows with Go 1.9.
In my specific scenario I have no control about the grandchildren but I want to be able to cancel the shell reliably and quickly, even if some output from grandchildren is lost.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Aug 22, 2018

@kgadams This problem only arises when pointing the Stdout or Stderr field to something that isn't a *os.File, so you can always work around it by setting them to the writer returned by os.Pipe.

@kgadams

This comment has been minimized.

kgadams commented Aug 22, 2018

@ianlancetaylor: First I thought it works, but in the real code it does not.
Also: Cmd.StdoutPipe() does just what you suggested as a workaround: it creates an os.Pipe() and assigns it to cmd.Stdout. Why would it make a difference if I do it manually?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Aug 22, 2018

Sure, use StdoutPipe. That is even better if it works for you.

If this doesn't solve the problem then I don't understand what the problem is.

@kgadams

This comment has been minimized.

kgadams commented Aug 22, 2018

See the opening post in this thread. It uses StdoutPipe().
cmd.Wait() does not return after canceling the Context until the grandchild is dead on Windows/go1.10.
It returns immediately on Unix and with go1.9 on Windows.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Aug 22, 2018

Ah, OK, StdoutPipe does wait to close the reading end of the pipe before returning from Wait. So I was wrong: using StdoutPipe is not equivalent to using your own os.Pipe. But I still don't see why os.Pipe doesn't work, as long as you coordinate closing the pipe with reading from the pipe.

@crvv

This comment has been minimized.

Contributor

crvv commented Aug 23, 2018

The problem is that Go doesn't use the runtime poller to operate pipe on Windows.
Pipes are treated as regular files.
For a pipe, (*os.File).Read() can block indefinitely, and it won't be interrupted by (*os.File).Close()

So the problem itself is how to

coordinate closing the pipe with reading from the pipe

How about make os/exec put the pipe into runtime poller, so that the Read() will be interrupted by Close()?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Aug 23, 2018

Can we just fix the os package to add pipes to the poller when you call os.Pipe? Is there any reason that wouldn't work?

@crvv

This comment has been minimized.

Contributor

crvv commented Aug 23, 2018

That is #19098
(*os.File).Fd() won't work as expected.

StdoutPipe() also returns an *os.File. Changing os/exec just make that file's Fd() not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment