Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: using go test, goroutine stack dump sometimes not emitted on SIGQUIT #27189

Open
mark-rushakoff opened this Issue Aug 24, 2018 · 6 comments

Comments

Projects
None yet
4 participants
@mark-rushakoff
Copy link
Contributor

mark-rushakoff commented Aug 24, 2018

What version of Go are you using (go version)?

Tried go version go1.10.3 darwin/amd64, go version go1.11rc1 darwin/amd64, and go version devel +6e76aeba0b Thu Aug 23 20:27:47 2018 +0000 darwin/amd64.

What did you do?

I had a test that appeared to be stuck in a blocking operation, and I pressed ^\ to send SIGQUIT, but I didn't receive any output. Then when I ran the test again using go test -v, sending SIGQUIT produced the goroutine stacks I expected to see.

Here's a sample test I'm using in the following output:

package main

import (
	"testing"
	"time"
)

func TestSleep(t *testing.T) {
	time.Sleep(time.Minute)
}

Running just go test . doesn't produce output on SIGQUIT, but I do see output with go test -v ..

$ /usr/local/Cellar/go/1.10.3/bin/go test .
^\
$ /usr/local/Cellar/go/1.10.3/bin/go test -v .
=== RUN   TestSleep
^\SIGQUIT: quit
PC=0x105508b m=0 sigcode=0

goroutine 0 [idle]:
...
FAIL	_/tmp/b	2.084s

$ gotip test .
^\
$  gotip test -v .
=== RUN   TestSleep
^\SIGQUIT: quit
PC=0x7fff706a6a1e m=0 sigcode=0

goroutine 0 [idle]:
...
FAIL	_/tmp/b	1.859s

If I change the file to include a test that calls t.Log, then I seem to be able to get a stack dump with tip, but not with 1.10 or 1.11rc1:

$ cat sleep_test.go
package main

import (
	"testing"
	"time"
)

func TestPass(t *testing.T) {
	t.Logf("ok")
}

func TestSleep(t *testing.T) {
	time.Sleep(time.Minute)
}

$ /usr/local/Cellar/go/1.10.3/bin/go test  .
^\
$ go test  .
^\
$ gotip test  .
^\SIGQUIT: quit
PC=0x7fff706a6a1e m=0 sigcode=0

goroutine 0 [idle]:
...
FAIL	_/tmp/b	6.311s

If I comment out the t.Logf call, but keep TestPass, then gotip no longer produces a stack dump on SIGQUIT.

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor Author

mark-rushakoff commented Aug 24, 2018

If I use go test -c to compile a test binary, than I seem to always get a stack dump on SIGQUIT, for 1.10, 1.11rc1, and tip.

@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Aug 24, 2018

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor Author

mark-rushakoff commented Sep 4, 2018

I don't fully understand what's going on here, but it seems related to the call to base.StartSigHandlers in cmd/go/internal/test/test.go:

base.StartSigHandlers()

which seems to swallow SIGQUIT:

// Interrupted is closed when the go command receives an interrupt signal.
var Interrupted = make(chan struct{})
// processSignals setups signal handler.
func processSignals() {
sig := make(chan os.Signal)
signal.Notify(sig, signalsToIgnore...)
go func() {
<-sig
close(Interrupted)
}()
}
var onceProcessSignals sync.Once
// StartSigHandlers starts the signal handlers.
func StartSigHandlers() {
onceProcessSignals.Do(processSignals)
}

var signalsToIgnore = []os.Signal{os.Interrupt, syscall.SIGQUIT}

@davecheney

This comment has been minimized.

Copy link
Contributor

davecheney commented Sep 5, 2018

@bcmills

This comment has been minimized.

Copy link
Member

bcmills commented Nov 15, 2018

If I change the file to include a test that calls t.Log, then I seem to be able to get a stack dump with tip, but not with 1.10 or 1.11rc1

Possibly related to #24929?

@bcmills bcmills modified the milestones: Go1.12, Go1.13 Nov 15, 2018

@bcmills bcmills added the GoCommand label Nov 15, 2018

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor Author

mark-rushakoff commented Apr 17, 2019

I played around with this a little more last night. My current repro case is a little simpler than the top post. Here's my test file:

package a_test

import (
	"testing"
	"time"
)

func TestSleep(t *testing.T) {
	time.Sleep(20 * time.Second)
}

And then in bash I run this one-liner rather than manually sending SIGQUIT: go test . & sleep 2 && kill -QUIT % && wait. Note the single & to run go test in the background before sleeping for a second and then sending SIGQUIT to the go test process.

With either go version go1.12.3 darwin/amd64 or go version devel +5781df421e Tue Apr 16 20:27:15 2019 +0000 darwin/amd64, I get varying behavior where sometimes I get the full stack trace:

bash-3.2$ gotip test . & sleep 2 && kill -QUIT % && wait
[1] 45042
SIGQUIT: quit
PC=0x7fff5dc44a1e m=0 sigcode=0

goroutine 0 [idle]:
runtime.pthread_cond_wait(0x1241188, 0x1241148, 0x0)
<snip>
gs     0x0
FAIL	a/test	1.614s
[1]+  Exit 1                  /Users/mr/gotip/src/github.com/golang/go/bin/go test .

And other times it just silently quits:

gotip test . & sleep 2 && kill -QUIT % && wait
[1] 44939
[1]+  Exit 1                  /Users/mr/gotip/src/github.com/golang/go/bin/go test .

There doesn't seem to be any obvious reason why it sometimes prints a stack trace and sometimes doesn't. For a test that has minimal dependencies, sleep 2 seems to be plenty of time to actually be running the test executable, rather than during the background build.

The silent failure is particularly annoying when running a test suite that normally finishes quickly, seeing 10 seconds later that unexpectedly it hasn't finished and CPUs are idle, and then manually sending SIGQUIT only to have it silently exit without dumping the stack.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Apr 17, 2019

When you press ^\ or kill -QUIT the job SIGQUIT is sent to all processes, in this case both the cmd/go process and the test binary. The go tool installs a SIGQUIT handler that causes it to simply quit with an exit status of 1. Running go test builds the test binary and then runs it as a child, handling all its output so that the output can be cached. So I would guess that what is happening is that sometimes the subjob handling the test output runs first, and the output is sent to the terminal. Or, sometimes the SIGQUIT is seen first, and cmd/go exits without letting the subjob complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.