Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.13.5 version, sometimes run subprocess hang #39249

Open
yutongprogram opened this issue May 26, 2020 · 5 comments
Open

1.13.5 version, sometimes run subprocess hang #39249

yutongprogram opened this issue May 26, 2020 · 5 comments

Comments

@yutongprogram
Copy link

@yutongprogram yutongprogram commented May 26, 2020

What version of Go are you using (go version)?

1.13.5

What did you do?

cmd := exec.CommandContext(ctx, "bash", "-c", "xxx")
cmd.Run() hang

What did you expect to see?

image

gdb parent prcoess,all thread futex in /usr/local/src/runtime/sys_linux_amd64.s:536
image

strace parent process's thread
image

@davecheney
Copy link
Contributor

@davecheney davecheney commented May 26, 2020

Can you please

  1. Provide a short, runnable, code sample that exhibits the problem.
  2. Assert this issue is present with the latest version of go 1.14
@yutongprogram
Copy link
Author

@yutongprogram yutongprogram commented May 26, 2020

I can't reproduce bugs steadily. I have checked syslog and hardware, nothing useful.
Maybe someone have meet this problem.
So any idea?

It's a deamon process to run user's cmd
code demo

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"os/exec"
	"os/signal"
	"strings"
	"syscall"
)

func main() {
	ctx, cancel := context.WithCancel(context.TODO())

	cmdStr := strings.Join(os.Args[1:], "")

	cmd := exec.CommandContext(ctx, "bash", "-c", cmdStr)

	exitCh := make(chan int)
	go runJob(cmd, cancel, exitCh)

	var exitCode = 0
	select {
	case exitCode = <-exitCh:
		log.Println("Exit Code: ", exitCode)
	}

	// If pid is negative, but not -1, sig shall be sent to all processes (excluding an unspecified set of system processes)
	// whose process group ID is equal to the absolute value of pid, and for which the process has permission to send a signal.
	err := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)
	if err != nil {
		fmt.Println("Warning sending SIGKILL to customer processes group, err: ", err.Error())
	}

	os.Exit(exitCode)
}

func runJob(cmd *exec.Cmd, cancel context.CancelFunc, exitCh chan<- int) {
	defer func() {
		if r := recover(); r != nil {
			// Recovered from panic. This is just a safe net.
			log.Println("Panic when running job! ", r)
		}
		cancel()
	}()

	cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
	cmd.Stderr = os.Stderr
	cmd.Stdout = os.Stdout

	go signalDeamon(cmd)

	if err := cmd.Run(); err != nil {
		log.Println("Run cmd error:", err.Error())
	}

	success := cmd.ProcessState != nil && cmd.ProcessState.Success()
	defer func(ok bool) {
		var exitCode = 0
		w := cmd.ProcessState.Sys().(syscall.WaitStatus)
		if w.Exited() {
			exitCode = w.ExitStatus()
		} else if w.Signaled() {
			exitCode = int(w.Signal())
		} else {
			exitCode = 1
		}

		exitCh <- exitCode
	}(success)

}

func signalDeamon(cmd *exec.Cmd) {
	c := make(chan os.Signal, 1)
	signal.Notify(c, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM)
	s := <-c

	_ = cmd.Process.Signal(s)
}

@davecheney
Copy link
Contributor

@davecheney davecheney commented May 26, 2020

Thank you for providing a sample. There are a few things that can probably be used to cut down this reproduced to the core issue

Can you do away with the signalDeamon code. If you don't want the parent process to handle those signals you can do something like this

c := make(chan os.Signal, 1)
signal.Notify(c, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM)
go func() { <- c }

The child will still respond to those signals and w.Signaled() will report true.

@davecheney
Copy link
Contributor

@davecheney davecheney commented May 26, 2020

Also consider returning the exit code directly from runJob

	go runJob(cmd, cancel, exitCh)

	var exitCode = 0
	select {
	case exitCode = <-exitCh:
		log.Println("Exit Code: ", exitCode)
	}

Is equivalent to

exitCode := runJob(cmd, cancel)

As the main goroutine cannot move past the select block until something is sent over exitCh

@yutongprogram
Copy link
Author

@yutongprogram yutongprogram commented May 26, 2020

I'll try, thanks. if any idea, please @yutongprogram , thinkyou

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.