Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: cmd.Start got a deadlock when set cmd.Stdin #47061

Open
Ace-Tang opened this issue Jul 6, 2021 · 13 comments
Open

os/exec: cmd.Start got a deadlock when set cmd.Stdin #47061

Ace-Tang opened this issue Jul 6, 2021 · 13 comments

Comments

@Ace-Tang
Copy link

@Ace-Tang Ace-Tang commented Jul 6, 2021

What version of Go are you using (go version)?

$ go version
go version go1.15.11 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="off"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ace/.cache/go-build"
GOENV="/home/ace/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE="gitlab.alibaba-inc.com"
GOMODCACHE="/home/ace/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ace/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build205131986=/tmp/go-build -gno-record-gcc-switches"

What did you do?

There is a deadlock in cmd.Start, this is only happen when command set cmd.Stdin

the calling stack: cmd.Start -> cmd set stdin, add stdin io copy in a goroutine after os.StartProcess successfully -> begin os.StartProcess -> finally in forkExec() -> fail in readlen (donnot know why) -> deal with the fail, begin to call wait4, but child process has run successfully, child wait goroutine to copy io, it was hang, and wait4 never success

image

use the follow code to reproduce

parent

package main                                                                                                                                                                                  
                                                                                                                                                                                              
import (                                                                                                                                                                                      
        "bufio"                                                                                                                                                                               
        "bytes"                                                                                                                                                                               
        "encoding/json"                                                                                                                                                                       
        "fmt"                                                                                                                                                                                 
        "io"                                                                                                                                                                                  
        "os"                                                                                                                                                                                  
        "os/exec"                                                                                                                                                                             
        "time"                                                                                                                                                                                
)                                                                                                                                                                                             
                                                                                                                                                                                              
func forwardLog(fd io.Reader) {                                                                                                                                                               
        reader := bufio.NewReader(fd)                                                                                                                                                         
        for {                                                                                                                                                                                 
                line, err := reader.ReadBytes('\n')                                                                                                                                           
                if err == io.EOF {                                                                                                                                                            
                        fmt.Println("end of read child log")                                                                                                                                  
                        return                                                                                                                                                                
                                                                                                                                                                                              
                }                                                                                                                                                                             
                                                                                                                                                                                              
                if err != nil {                                                                                                                                                               
                        fmt.Printf("failed to read child log: %s\n", err)
                        return
                }

                fmt.Printf("child log: %s\n", line)

        }
}

func main() {
        r, w, err := os.Pipe()
        go forwardLog(r)
        data := "hello"
        raw, err := json.Marshal(data)
        if err != nil {
                panic(err)
        }

        cmd := exec.Command("/home/ace/go/src/test/cmd-start-hang/read")
        cmd.ExtraFiles = append(cmd.ExtraFiles, w)
        cmd.Stdin = bytes.NewReader(raw)

        if err := cmd.Start(); err != nil {
                panic(err)

        }

        fmt.Printf("proc %+v\n", cmd.Process)

        if err := cmd.Wait(); err != nil {
                panic(err)
        }

        time.Sleep(10 * time.Second)

}

child

package main

import (
	"fmt"
	"io"
	"io/ioutil"
	"os"
	"time"
)

func getData(r io.Reader) ([]byte, error) {
	raw, err := ioutil.ReadAll(r)
	return raw, err
}

func main() {
	log := os.NewFile(uintptr(3), "")

	fmt.Fprintf(log, "child run\n")
	raw, err := getData(os.Stdin)
	if err != nil {
		fmt.Fprintf(log, "%s", err)
	}

	fmt.Fprintf(log, "read: %s\n", raw)

	time.Sleep(10 * time.Second)
	os.Exit(0)
}

Since I donnot know why readline fail, so I modify source code, create a error after readlen, make it fails

image

What did you expect to see?

cmd.Start got a error return

What did you see instead?

Both parent and child process were hang forever

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Jul 6, 2021

I can't reproduce this, this exits normally on my system

Loading

@Ace-Tang
Copy link
Author

@Ace-Tang Ace-Tang commented Jul 6, 2021

Hi, @seankhliao do you modify golang source code in /usr/local/go/src/syscall/exec_unix.go like the picture. Child process build with the original source code, parent process build with the modified source code.

parent process hang on wait4

image

child process wait on reading pipe

image

Loading

@mknyszek mknyszek added this to the Backlog milestone Jul 7, 2021
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jul 7, 2021

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 10, 2021

Please describe code as ordinary text or as links to the Go source code. Please do not use images. Images are hard to read. Thanks.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 10, 2021

I can't recreate this problem either. I see this output:

proc &{Pid:1084700 handle:0 isdone:0 sigMu:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}}
child log: child run

child log: read: "hello"

There is a ten second delay after the final "child log" output.

Loading

@Ace-Tang
Copy link
Author

@Ace-Tang Ace-Tang commented Jul 12, 2021

  1. build child code with origin golang code
package main

import (
	"fmt"
	"io"
	"io/ioutil"
	"os"
	"time"
)

func getData(r io.Reader) ([]byte, error) {
	raw, err := ioutil.ReadAll(r)
	return raw, err
}

func main() {
	log := os.NewFile(uintptr(3), "")

	fmt.Fprintf(log, "child run\n")
	raw, err := getData(os.Stdin)
	if err != nil {
		fmt.Fprintf(log, "%s", err)
	}

	fmt.Fprintf(log, "read: %s\n", raw)

	time.Sleep(10 * time.Second)
	os.Exit(0)
}
  1. modify source code in https://github.com/golang/go/blob/master/src/syscall/exec_unix.go#L219-L227, create a error after L227 , as
	// Read child error status from pipe.
	Close(p[1])
	for {
		n, err = readlen(p[0], (*byte)(unsafe.Pointer(&err1)), int(unsafe.Sizeof(err1)))
		if err != EINTR {
			break
		}
	}
	Close(p[0])
	err = EINVAL

and then build parent code with the changed source code

package main                                                                                                                                                                                  
                                                                                                                                                                                              
import (                                                                                                                                                                                      
        "bufio"                                                                                                                                                                               
        "bytes"                                                                                                                                                                               
        "encoding/json"                                                                                                                                                                       
        "fmt"                                                                                                                                                                                 
        "io"                                                                                                                                                                                  
        "os"                                                                                                                                                                                  
        "os/exec"                                                                                                                                                                             
        "time"                                                                                                                                                                                
)                                                                                                                                                                                             
                                                                                                                                                                                              
func forwardLog(fd io.Reader) {                                                                                                                                                               
        reader := bufio.NewReader(fd)                                                                                                                                                         
        for {                                                                                                                                                                                 
                line, err := reader.ReadBytes('\n')                                                                                                                                           
                if err == io.EOF {                                                                                                                                                            
                        fmt.Println("end of read child log")                                                                                                                                  
                        return                                                                                                                                                                
                                                                                                                                                                                              
                }                                                                                                                                                                             
                                                                                                                                                                                              
                if err != nil {                                                                                                                                                               
                        fmt.Printf("failed to read child log: %s\n", err)
                        return
                }

                fmt.Printf("child log: %s\n", line)

        }
}

func main() {
        r, w, err := os.Pipe()
        go forwardLog(r)
        data := "hello"
        raw, err := json.Marshal(data)
        if err != nil {
                panic(err)
        }

        cmd := exec.Command("/home/ace/go/src/test/cmd-start-hang/read")
        cmd.ExtraFiles = append(cmd.ExtraFiles, w)
        cmd.Stdin = bytes.NewReader(raw)

        if err := cmd.Start(); err != nil {
                panic(err)

        }

        fmt.Printf("proc %+v\n", cmd.Process)

        if err := cmd.Wait(); err != nil {
                panic(err)
        }

        time.Sleep(10 * time.Second)

}
  1. Finally, the calling stack: cmd.Start -> cmd set stdin, add stdin io copy in a goroutine after os.StartProcess successfully -> begin os.StartProcess -> finally in forkExec() -> fail in readlen (donnot know why) -> deal with the fail, begin to call wait4, but child process has run successfully, child wait goroutine to copy io, it was hang, and wait4 never success

Could you please try these steps again, @ianlancetaylor @seankhliao , thanks

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 13, 2021

I don't understand. Are you saying that this problem only happens if you change the standard library? Doesn't that mean that there is a bug with the way that you are changing the standard library? I haven't tried to fully understand what you are doing, but it sounds like you are changing the code so that the child runs but the parent thinks that it did not run. Why do you expect that to work?

Loading

@Ace-Tang
Copy link
Author

@Ace-Tang Ace-Tang commented Jul 13, 2021

It is my fault not describe the timing at this case happen, it was happened online much times. I read the source, found that when the code goes into the error handling logic, it will hang forever. Since I can't reproduce in offline test environment, to verify my guess, I change the source code to reproduce the error case, and it show the same error as online.

So I ask here to see if some one can help me. Since it really happens online, and the stack dump online is just like the picture I post, I think it is a deadlock in source code, through I not found the root case about why child process run, but the code goes into the error handling logic.

Do you need more information, @ianlancetaylor

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 13, 2021

The fact that you get the same stack trace doesn't mean that you are seeing the same problem. The parent and child cooperate to report status. By using different code for the child and parent, you are reaching a case where the child succeeds but the parent think that is has failed. The parent then waits for the child process to complete. If the child process had in fact failed, that would be fine. But in your case the child process has not failed, so the parent waits forever.

I don't know what the real problem is, but as far as I can tell your example is showing a different problem that happens to have the same stack trace. We need to fix the real problem, not the problem that you are describing.

Loading

@Ace-Tang
Copy link
Author

@Ace-Tang Ace-Tang commented Jul 15, 2021

Yes, you are correct, I am not describe this problem clearly, I just found this maybe got a error after child process create successfully, as child need read stdin to finish, so both child and parent process hang. But I can't found any other information, any idea for debug this, @ianlancetaylor ?

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 15, 2021

It's hard to know what to suggest without seeing the real code. Typically approaches would be for both the parent and the child to print out before and after they make an I/O call, so that you can see where the hang occurs. Or sometimes it helps to run the program under strace -f to see that last communication that happens before the hang.

Loading

@Ace-Tang
Copy link
Author

@Ace-Tang Ace-Tang commented Jul 16, 2021

The big problem is unable to reproduce offline, online environment I can only see the hang case. from the stack, I think the most probable error is readline(https://github.com/golang/go/blob/master/src/syscall/exec_unix.go#L222), but dlv can not read the err value.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 18, 2021

Why do you think that the problem is in readlen?

What is the real problem? The stack traces above show that the parent process is hanging in Wait while the child process is hanging in Read. That doesn't look like a problem in readlen. It looks like a deadlock between the parent and the child, the kind of thing that would happen if the parent never closed the pipe being used to send data to the child. I can't help but notice that in your sample code the parent never closes w, though I know that that is not your real code.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants