Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: completed parallel subtests leak goroutines until earlier subtests finish #45127

bcmills opened this issue Mar 19, 2021 · 2 comments
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.


Copy link

bcmills commented Mar 19, 2021

Looking at the goroutine dump in, there is a huge amount of useless noise from blocked goroutines.

Many of the goroutines are blocked on t.Parallel(), for which I may file a separate issue. However, many of them are blocked — for many minutes at a time! — waiting to notify the parent that the subtest is complete.

~279 copies in the linked test output:

goroutine 312 [chan send, 8 minutes]:
testing.tRunner.func1.1(0xc0008edd7f, 0x0, 0x0, 0xc000655980, 0x1)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1166 +0x90
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1227 +0x62b
testing.tRunner(0xc000655980, 0xc0006249c0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1240 +0x256
created by testing.(*T).Run
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1281 +0x78c

~123 copies:

goroutine 412 [chan send, 8 minutes]:
testing.tRunner.func1.1(0xc0009b779f, 0x0, 0x0, 0xc00073f980, 0x1)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1166 +0x90
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1227 +0x62b
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/panic.go:633 +0x1da
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:843 +0x65
testing.(*common).Skip(0xc00073f980, 0x0, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:823 +0x8a
cmd/go_test.(*testScript).cmdSkip(0xc000317cc0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/script_test.go:849 +0x21e
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/script_test.go:393 +0xb22
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/script_test.go:93 +0x3fc
testing.tRunner(0xc00073f980, 0xc00018cb40)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1236 +0x22e
created by testing.(*T).Run
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1281 +0x78c

The blocked channel send for both of those traces is here:


Lines 1163 to 1166 in 9289c12

// Only report that the test is complete if it doesn't panic,
// as otherwise the test binary can exit before the panic is
// reported to the user. See issue 41479.
t.signal <- signal

The purpose of the channel seems to be to keep the parent test blocked until the subtest either finishes or calls t.Parallel:

t.signal <- true // Release calling test.

If the subtest calls t.Parallel, the subtest is added to the sub field for the parent test:


Lines 1045 to 1046 in 3979fb9

// Add to the list of tests to be released by the parent.
t.parent.sub = append(t.parent.sub, t)

The parent later waits for the parallel subtests to complete:


Lines 1158 to 1161 in 3979fb9

// Wait for subtests to complete.
for _, sub := range t.sub {

It is not at all obvious to me why the t.signal channel is unbuffered. The sender does not appear to care whether the parent has actually received the signal, and does not appear to do any additional work after the send that would require synchronizing on it.

For tests with a lot of parallel subtests, such as cmd/go.TestScript, the amount of noise introduced by this (temporary) leak can be quite significant.

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 19, 2021
@bcmills bcmills added this to the Backlog milestone Mar 19, 2021
Copy link
Member Author

bcmills commented Mar 19, 2021 illustrates the pollution from these goroutines. The leaky subtests are reported as passing pretty much immediately, but nonetheless appear in the goroutine dump when the slow subtest finally panics.

Copy link

gopherbot commented Mar 19, 2021

Change mentions this issue: testing: allow parallel-subtest goroutines to exit when the subtest is complete

@bcmills bcmills self-assigned this Mar 19, 2021
@bcmills bcmills modified the milestones: Backlog, Go1.17 Mar 19, 2021
@golang golang locked and limited conversation to collaborators Mar 19, 2022
@rsc rsc unassigned bcmills Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
None yet

No branches or pull requests

2 participants