Skip to content

testing: completed parallel subtests leak goroutines until earlier subtests finish #45127

@bcmills

Description

@bcmills

Looking at the goroutine dump in https://build.golang.org/log/a70cdaaff0171cc2515eac00f3e909829c7309c2, there is a huge amount of useless noise from blocked goroutines.

Many of the goroutines are blocked on t.Parallel(), for which I may file a separate issue. However, many of them are blocked — for many minutes at a time! — waiting to notify the parent that the subtest is complete.

~279 copies in the linked test output:

goroutine 312 [chan send, 8 minutes]:
testing.tRunner.func1.1(0xc0008edd7f, 0x0, 0x0, 0xc000655980, 0x1)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1166 +0x90
testing.tRunner.func1(0xc000655980)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1227 +0x62b
testing.tRunner(0xc000655980, 0xc0006249c0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1240 +0x256
created by testing.(*T).Run
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1281 +0x78c

~123 copies:

goroutine 412 [chan send, 8 minutes]:
testing.tRunner.func1.1(0xc0009b779f, 0x0, 0x0, 0xc00073f980, 0x1)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1166 +0x90
testing.tRunner.func1(0xc00073f980)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1227 +0x62b
runtime.Goexit()
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/panic.go:633 +0x1da
testing.(*common).SkipNow(0xc00073f980)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:843 +0x65
testing.(*common).Skip(0xc00073f980, 0x0, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:823 +0x8a
cmd/go_test.(*testScript).cmdSkip(0xc000317cc0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/script_test.go:849 +0x21e
cmd/go_test.(*testScript).run(0xc000317cc0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/script_test.go:393 +0xb22
cmd/go_test.TestScript.func1(0xc00073f980)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/script_test.go:93 +0x3fc
testing.tRunner(0xc00073f980, 0xc00018cb40)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1236 +0x22e
created by testing.(*T).Run
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1281 +0x78c

The blocked channel send for both of those traces is here:

go/src/testing/testing.go

Lines 1163 to 1166 in 9289c12

// Only report that the test is complete if it doesn't panic,
// as otherwise the test binary can exit before the panic is
// reported to the user. See issue 41479.
t.signal <- signal

The purpose of the channel seems to be to keep the parent test blocked until the subtest either finishes or calls t.Parallel:

t.signal <- true // Release calling test.

If the subtest calls t.Parallel, the subtest is added to the sub field for the parent test:

go/src/testing/testing.go

Lines 1045 to 1046 in 3979fb9

// Add to the list of tests to be released by the parent.
t.parent.sub = append(t.parent.sub, t)

The parent later waits for the parallel subtests to complete:

go/src/testing/testing.go

Lines 1158 to 1161 in 3979fb9

// Wait for subtests to complete.
for _, sub := range t.sub {
<-sub.signal
}


It is not at all obvious to me why the t.signal channel is unbuffered. The sender does not appear to care whether the parent has actually received the signal, and does not appear to do any additional work after the send that would require synchronizing on it.

For tests with a lot of parallel subtests, such as cmd/go.TestScript, the amount of noise introduced by this (temporary) leak can be quite significant.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions