Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: when using a custom TestMain, m.Run does not return if one of the tests it runs panics #37206

Open
orlangure opened this issue Feb 13, 2020 · 22 comments
Milestone

Comments

@orlangure
Copy link

@orlangure orlangure commented Feb 13, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.7 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/yury/Library/Caches/go-build"
GOENV="/Users/yury/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/yury/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/yury/go/src/github.com/orlangure/myproject/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/p1/rjgq2gp55pj58ckbsn94yfz80000gn/T/go-build564127360=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I wrote a function, a test for it, and TestMain to perform setup and teardown for testing this function:

main.go

package main

import "fmt"

func main() {
	fmt.Println("vim-go")
}

func p() {
	panic("foo")
}

p_test.go

package main

import "testing"

func TestP(t *testing.T) {
	p()
}

main_test.go

package main

import (
	"fmt"
	"os"
	"testing"
)

func TestMain(m *testing.M) {
	os.Exit(testMain(m))
}

func testMain(m *testing.M) int {
	setup()
	defer teardown()

	return m.Run()
}

func setup() {
	fmt.Println("setting up")
}

func teardown() {
	fmt.Println("tearing down")
}

What did you expect to see?

I expected to see "setting up" and "tearing down" at some point, and a panic "foo" message with a stack trace.

What did you see instead?

Only setup and panic output appeared:

setting up
--- FAIL: TestP (0.00s)
panic: foo [recovered]
        panic: foo

goroutine 19 [running]:
testing.tRunner.func1(0xc0000b6100)
        /usr/local/go/src/testing/testing.go:874 +0x3a3
panic(0x1111220, 0x116b4d0)
        /usr/local/go/src/runtime/panic.go:679 +0x1b2
akeyless.io/akeyless-main-repo/go/src/t.p(...)
...
FAIL

The program probably called os.Exit at some point while running the tests, or panicked in a separate go routine with no way for me to recover. I would expect both setup and teardown functions to be called at some point.

@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Feb 15, 2020

I've verified m.Run indeed doesn't return if one of the tests it runs causes a panic by doing:

func testMain(m *testing.M) int {
        defer func() {
                if e := recover(); e != nil {
                        fmt.Println("there was a panic")
                }
        }()
        setup()
        defer teardown()
        return m.Run()
}

The string "there was a panic" doesn't get printed.

It does get printed if there is a panic before m.Run():

func testMain(m *testing.M) int {
	defer func() {
		if e := recover(); e != nil {
			fmt.Println("there was a panic")
		}
	}()
	setup()
	defer teardown()

	panic("about to m.Run")
	return m.Run()
}

The code in testing that orchestrates everything and handles all the edge case is quite complex and subtle. I'm not sure if it's feasible to make m.Run return an exit code in thus case. If it is possible, then I'm not sure if it's desirable to do so. It may be a good idea to document m.Run that it doesn't return if a test panics, but maybe we don't want to commit to that being a part of the testing API that we can't change.

This needs further investigation.

/cc @mpvl @josharian per owners.

@dmitshur dmitshur added this to the Backlog milestone Feb 15, 2020
@dmitshur dmitshur changed the title os.Exit is called at some point while running TestMain when a test panics testing: when using a custom TestMain, m.Run does not return if one of the tests it runs panics Feb 15, 2020
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Feb 18, 2020

#34129 is somewhat relevant, although the implementation for that issue probably will not address this one.

(CC @changkun @abuchanan-nr)

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 19, 2020

Change https://golang.org/cl/219977 mentions this issue: testing: allow m.Run return if a test panics

@changkun

This comment has been minimized.

Copy link
Contributor

@changkun changkun commented Feb 19, 2020

@bcmills Thanks for CC me. I basically agree with @dmitshur . The Go team must decide either document the behavior or allow testing.M.Run to return properly if there is a test panic.

For the following code snippet:

package main

import (
	"fmt"
	"testing"
)

func TestMain(m *testing.M) {
	setup()
	defer teardown()
	m.Run()
}
func TestP(t *testing.T) {
	panic("foo")
}
func setup() {
	fmt.Println("setup()")
}
func teardown() {
	fmt.Println("teardown()")
}

outputs:

setup()
--- FAIL: TestP (0.00s)
panic: foo [recovered]
        panic: foo

goroutine 19 [running]:
testing.tRunner.func1(0xc0000b8100)
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:874 +0x3a3
panic(0x11111a0, 0x116b390)
        /usr/local/Cellar/go/1.13.8/libexec/src/runtime/panic.go:679 +0x1b2
_/Users/changkun/Desktop/testing.TestP(0xc0000b8100)
        /Users/changkun/Desktop/testing/main_test.go:14 +0x39
testing.tRunner(0xc0000b8100, 0x114f9b8)
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:909 +0xc9
created by testing.(*T).Run
        /usr/local/Cellar/go/1.13.8/libexec/src/testing/testing.go:960 +0x350
exit status 2
FAIL    _/Users/changkun/Desktop/testing        0.295s

If we decide to allow testing.M.Run to return, then it will execute teardown():

setup()
--- FAIL: TestP (0.00s)
panic: foo [recovered]
        panic: foo

goroutine 18 [running]:
testing.tRunner.func1.1(0x1110c20, 0x11690c0)
        /Users/changkun/dev/go-gerrit/src/testing/testing.go:941 +0x355
testing.tRunner.func1(0xc0000ce120)
        /Users/changkun/dev/go-gerrit/src/testing/testing.go:945 +0x427
panic(0x1110c20, 0x11690c0)
        /Users/changkun/dev/go-gerrit/src/runtime/panic.go:967 +0x15d
_/Users/changkun/Desktop/testing.TestP(0xc0000ce120)
        /Users/changkun/Desktop/testing/main_test.go:14 +0x39
testing.tRunner(0xc0000ce120, 0x114aaa8)
        /Users/changkun/dev/go-gerrit/src/testing/testing.go:997 +0xdc
created by testing.(*T).Run
        /Users/changkun/dev/go-gerrit/src/testing/testing.go:1048 +0x357
teardown()
exit status 2
FAIL    _/Users/changkun/Desktop/testing        0.163s
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 24, 2020

We can never make this perfect. If some code in a test calls

    go func() { panic("die die die") }()

then m.Run is not going to return no matter what we do.

The only question here is how we should handle a test that panics in the goroutine used to run the test. Should we try to handle that specific case? Or should we just treat it like a panic in a different goroutine, which is in effect what we do now?

I don't have a strong opinion about that. But the testing package is already fairly baroque. Is it really worth complicating it further to handle one specific case when we can't handle other similar cases?

@changkun

This comment has been minimized.

Copy link
Contributor

@changkun changkun commented Feb 24, 2020

@ianlancetaylor

No, we can't do anything about it yet.

I didn't write any test regarding user goroutine panics and left this case to the CL's reviewers, and I think this particular case can simply be ignored because this type of panic happens in a user goroutine. The testing package has done everything it could achieve (panics happens directly in tests and subtests).

If we interested in handling this particular case, more investigation could be done with subsequent CLs.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 25, 2020

I suppose I'm asking for opinions.

Is it worth complicating the testing package to return from m.Run if a panic occurs directly in the test, given that we can't do that if a test start a goroutine that panics?

@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Feb 25, 2020

Thank you for explaining the current situation and trade-offs in #37206 (comment), @ianlancetaylor.

My opinion is that we should hold off on complicating the testing package until a time when all panics can be handled (since doing it for only some goroutines isn't very worthwhile), and document the rationale for m.Run not returning in the case of a panic so we can look it up in the future. If it's viable to document it publicly without locking in the current behavior, that would be better, but otherwise it can be documented internally in the testing package.

@changkun

This comment has been minimized.

Copy link
Contributor

@changkun changkun commented Feb 25, 2020

I was answering an opinion.

This particular case can simply be ignored because this type of panic happens in a user goroutine.
If we interested in handling this particular panic case, more investigation could be done in subsequent CLs.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Feb 25, 2020

If I get CL 134395 cleaned up and merged, it will be fairly straightforward for users to use an errgroup to structure their code such that all goroutines that may panic propagate that panic back to the main goroutine.

Moreover, I suspect that the vast majority of tests do not explicitly spawn goroutines, let alone goroutines that may panic. It seems reasonable for users to expect that their best-effort cleanup using defer will actually be invoked most of the time.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Feb 25, 2020

At least part of the complexity of the implementation seems to arise because the runtime does not provide a mechanism for a program to re-raise an existing panic without altering its stack trace. (I've discussed that deficiency with at least @aclements before, but I don't see an issue filed for it.)

If we provided a general mechanism to re-panic without altering the stack trace, then I think the testing package would not need nearly so much complexity on top of that.

(CC @danscales)

@changkun

This comment has been minimized.

Copy link
Contributor

@changkun changkun commented Feb 25, 2020

If we provided a general mechanism to re-panic without altering the stack trace, then I think the testing package would not need nearly so much complexity on top of that.

This might be already off-topic: Was there any existed discussion on registering global handler of panics? It seems (server-side) Go users nowadays avoid to use panic/recover pair, just because they do not want their service down by accident, and there is no way to capture a panic outside a goroutine. The runtime also does not have this privilege.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Feb 25, 2020

I don't think a global panic handler would be an appropriate solution, here or in servers. An uncaught panic may indicate than an important program invariant was violated, and trying to resume execution may just replace a panic with a clear backtrace with a hard-to-diagnose deadlock.

@changkun

This comment has been minimized.

Copy link
Contributor

@changkun changkun commented Feb 25, 2020

My opinion is ... until a time when all panics can be handled ...

I don't see any way of doing it, handling all panics seems to be equivalent to the status in GC and mutators. That is to say, a user goroutine may panic intentionally, the runtime should not do anything with it by default. This basically matches what I said in the beginning: "The testing package has done everything it could achieve with the CL. If we interested in handling this particular panic case, more investigation could be done in subsequent CLs."

cc @dmitshur

@changkun

This comment has been minimized.

Copy link
Contributor

@changkun changkun commented Feb 25, 2020

... may indicate than an important program invariant was violated, and trying to resume execution may just replace a panic with a clear backtrace with a hard-to-diagnose deadlock.

Good point. Agree. I just randomly threw some virgin ideas, since I didn't experience much use cases regarding capturing a panic outside a goroutine except this, not suggesting anything :)

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 26, 2020

I personally am not convinced by "let's do this now and figure out how to do more later." Sometimes that is the right approach, but sometimes it's important to at least understand how we could do more later.

Right now people are confused because when they panic in a test deferred functions in TestMain are not run. If we fix that, people will be confused because when they panic in a goroutine started by a test deferred functions in TestMain are not run. Although the current state is not good, at least it's consistent.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Feb 26, 2020

I don't think we do need to “figure out how to do more later”. We should structure our own code to propagate panics from non-main goroutines back to the main goroutine, and encourage users to do the same.

Especially for non-Parallel tests, the mapping of test functions to goroutines should be an implementation detail internal to the testing package. It should not matter whether the Test function runs on the same goroutine as TestMain or an entirely different goroutine, but today it does. We should fix that.

@changkun

This comment has been minimized.

Copy link
Contributor

@changkun changkun commented Feb 26, 2020

I am also agreed with you about the part " ... 'let's do this now and figure out how to do more later.' ..." is not a good decision in this case.

However, the current situation is I don't see any way of doing it better than the proposed CL because we cannot differentiate an accidental panic and an intentional panic in a spawned goroutine from the runtime unless people give a clear indication to the TestMain or equivalent. This knob makes the usage and behavior even more complex than the proposed CL.

If we carefully think about panics propagate out from a test goroutine, things become more subtle. Go users were taught to fail a test by t.Error / t.Fatal family. An intentional panic simply does not happen in a test. In this case, panic detected in a test only happens in an accidental scenario, which exactly fits what we would like to fix.

I personally argue this is the best action we can offer and there are no further thoughts from my side.

cc @ianlancetaylor

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 28, 2020

Change https://golang.org/cl/221321 mentions this issue: regexp: convert test into *_test package

@changkun

This comment has been minimized.

Copy link
Contributor

@changkun changkun commented Mar 2, 2020

Maybe we could turn this into a proposal that can be reviewed in the proposal review meeting minutes?

cc @rsc @andybons

@progressnerd

This comment has been minimized.

Copy link

@progressnerd progressnerd commented Mar 27, 2020

It appears that the proposed solution would propagate the panic to TestMain. An alternative would be that a panic within a test instead triggers an immediate failure of that test but allows the other tests to continue, such that the panic is never propagated to TestMain but arguably never needs to be. On the downside, this may encourage test writers to deliberately call panic to fail -- but I'm not sure we're at risk of that becoming common practice. On the plus side, I think this alternative might be:

  • A bit easier to implement, less complexity in the test runner
  • More in line with the "keep going" philosophy of test failures
  • More user-friendly to not have the entire test suite rollover due to a panic in one test case
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Mar 27, 2020

@progressnerd, an unexpected panic potentially leaves the program in an arbitrarily corrupted state. Swamping the panic in noise from subsequent test failures — or worse, deadlocks — would make the panic harder to diagnose, in addition to delaying the output of the panic, for at best marginal additional information from the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.