-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: race condition raised with parallel tests, panic(nil) and -race #64649
Comments
$ go test -race -run 'Test(One|Two)' "-gcflags=all=-trimpath=$(go env GOPATH)" -count=1 -parallel=1
PASS
ok github.com/dolmen-go/rendezvous 1.271s
$ go test -race -run 'Test(One|Two)' "-gcflags=all=-trimpath=$(go env GOPATH)" -count=1 -parallel=2
==================
WARNING: DATA RACE
Read at 0x00c000076500 by goroutine 14:
internal/godebug.(*Setting).IncNonDefault()
/opt/homebrew/Cellar/go/1.21.5/libexec/src/internal/godebug/godebug.go:102 +0x58
internal/godebug.(*Setting).IncNonDefault-fm()
<autogenerated>:1 +0x34
runtime.(*godebugInc).IncNonDefault()
/opt/homebrew/Cellar/go/1.21.5/libexec/src/runtime/runtime.go:111 +0xd0
github.com/dolmen-go/rendezvous.WaitAll.func1()
src/github.com/dolmen-go/rendezvous/rendezvous.go:56 +0xec
github.com/dolmen-go/rendezvous.WaitAll.func2()
src/github.com/dolmen-go/rendezvous/rendezvous.go:57 +0x44
Previous write at 0x00c000076500 by goroutine 12:
internal/godebug.New()
/opt/homebrew/Cellar/go/1.21.5/libexec/src/internal/godebug/godebug.go:74 +0x40
internal/godebug.newIncNonDefault()
/opt/homebrew/Cellar/go/1.21.5/libexec/src/internal/godebug/godebug.go:203 +0x28
runtime.(*godebugInc).IncNonDefault()
/opt/homebrew/Cellar/go/1.21.5/libexec/src/runtime/runtime.go:108 +0x84
github.com/dolmen-go/rendezvous.WaitAll.func1()
src/github.com/dolmen-go/rendezvous/rendezvous.go:56 +0xec
github.com/dolmen-go/rendezvous.WaitAll.func2()
src/github.com/dolmen-go/rendezvous/rendezvous.go:57 +0x44
Goroutine 14 (running) created at:
github.com/dolmen-go/rendezvous.WaitAll()
src/github.com/dolmen-go/rendezvous/rendezvous.go:47 +0x210
github.com/dolmen-go/rendezvous_test.TestTwo()
src/github.com/dolmen-go/rendezvous/rendezvous_test.go:118 +0xe4
testing.tRunner()
/opt/homebrew/Cellar/go/1.21.5/libexec/src/testing/testing.go:1595 +0x1b0
testing.(*T).Run.func1()
/opt/homebrew/Cellar/go/1.21.5/libexec/src/testing/testing.go:1648 +0x40
Goroutine 12 (running) created at:
github.com/dolmen-go/rendezvous.WaitAll()
src/github.com/dolmen-go/rendezvous/rendezvous.go:47 +0x210
github.com/dolmen-go/rendezvous_test.TestOne()
src/github.com/dolmen-go/rendezvous/rendezvous_test.go:104 +0x180
testing.tRunner()
/opt/homebrew/Cellar/go/1.21.5/libexec/src/testing/testing.go:1595 +0x1b0
testing.(*T).Run.func1()
/opt/homebrew/Cellar/go/1.21.5/libexec/src/testing/testing.go:1648 +0x40
==================
--- FAIL: TestOne (0.00s)
testing.go:1465: race detected during execution of test
--- FAIL: TestTwo (0.00s)
testing.go:1465: race detected during execution of test
FAIL
exit status 1
FAIL github.com/dolmen-go/rendezvous 0.314s |
cc @golang/runtime |
Read is here: https://cs.opensource.google/go/go/+/master:src/internal/godebug/godebug.go;l=102;drc=39effbc105f5c54117a6011af3c48e3c8f14eca9 Write is here: https://cs.opensource.google/go/go/+/master:src/internal/godebug/godebug.go;l=74;drc=39effbc105f5c54117a6011af3c48e3c8f14eca9 The write is the initialization of the struct. It must happen before the read. Indeed, I believe the atomic store at https://cs.opensource.google/go/go/+/release-branch.go1.21:src/runtime/runtime.go;l=109;drc=1a9893a969e0a73dc4f1e48ed40ccaf29ec238a6 is publishing the value. But this is a runtime/internal/atomic value, which doesn't include race annotations. So I think we need to manually annotate that path. |
@prattmic I was looking at this the other day and I also think we should instrument it manually. So something like this would suffice? I'm wondering how to come up with a test for this 😅
|
Has anyone analyzed why this happens with parallel testing? What Go debug events metric is incremented in that case? Is there a workaround besides:
All those solutions are unfortunately about avoiding parallel testing. |
As package rendezvous_test
import (
"fmt"
"os"
"runtime/metrics"
"strings"
"testing"
)
func TestMain(m *testing.M) {
// call flag.Parse() here if TestMain uses flags
exitCode := m.Run()
descs := metrics.All()
var godebugMetrics []metrics.Sample
for i := range descs {
if strings.HasPrefix(descs[i].Name, "/godebug/") && strings.HasSuffix(descs[i].Name, ":events") {
godebugMetrics = append(godebugMetrics, metrics.Sample{Name: descs[i].Name})
}
}
metrics.Read(godebugMetrics)
for i := range godebugMetrics {
if godebugMetrics[i].Value.Uint64() > 0 {
fmt.Printf("%s: %v\n", godebugMetrics[i].Name, godebugMetrics[i].Value.Uint64())
}
}
os.Exit(exitCode)
} The result: $ go test -run TestOne
PASS
/godebug/non-default-behavior/panicnil:events: 1
ok github.com/dolmen-go/rendezvous 0.151s I have in fact a test that runs |
To investigate golang/go#64649 (race detector reports a race condition in package runtime related to the increment of a runtime metric), I'm adding a TestMain that dumps runtime/metrics at the end of the tests. Result: $ go test -run TestOne PASS /godebug/non-default-behavior/panicnil:events: 1 ok github.com/dolmen-go/rendezvous 0.151s
My final workaround is to drop the |
I'm having no luck with a small reproducer, this is on Darwin amd64, 1.21.5, 1.21.3, also tip. Here's my current attempt, I'm about to try this on darwin/arm64. To get an event count I have to specify 1.20 in go.mod, or
|
Here is my small reproducer: package main_test
import "testing"
func Test(t *testing.T) {
test := func(t *testing.T) {
t.Parallel()
defer func() {
recover()
}()
panic(nil)
}
t.Run("One", test)
t.Run("Two", test)
} $ GODEBUG=panicnil=1 go test -run 'Test$' -race -parallel=2 |
@dr2chase Your Also I have fixed a few issues on my metrics dump. See https://github.com/dolmen-go/rendezvous/blob/eab2b2b76f826e0daca0703147018d02a3641cbd/main_test.go#L15 |
Change https://go.dev/cl/549796 mentions this issue: |
@gopherbot please consider this for backport to 1.21. It broke someone's integration tests, and since |
Backport issue(s) opened: #64756 (for 1.20), #64757 (for 1.21). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Change https://go.dev/cl/551856 mentions this issue: |
CL 549796 adds race annotations to godebugInc. It uses racerelease to model a CompareAndSwap. However, a CompareAndSwap is essentially a load and a store. Modeling it as just racerelease makes it not synchronized with other racerelease, i.e. other CAS. For the following execution thread A B load, got nil load, got nil set *inc set *inc racerelease CAS success racerelease CAS fail load raceacquire use *inc (from A) On thread B, the raceacquire synchronizes with the previous racerelease, which is not synchronized with racerelease on thread A, so it doesn't know that the use of *inc on thread B is after the set on thread A, and will report a race. Change it to use racereleasemerge, which synchronizes with previous racerelease and racereleasemerge. So in the case above it knows thread B's CAS is after thread A's. Also remove stale comment that was more relevant when the code used atomic store, where CL 549796 changed to CAS. Updates #64649. Change-Id: I17671090a19c0699fcb4e6481e2abd98ef2e5542 Reviewed-on: https://go-review.googlesource.com/c/go/+/551856 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
The fix has been released with Go 1.21.6 (see #64757). |
Also use CompareAndSwap to make the code actually less racy. Added a test which will be meaningful when run under the race detector (tested it -race with broken fix in runtime, it failed). Fixes golang#64649 Change-Id: I5972e08901d1adc8ba74858edad7eba91be1b0ce Reviewed-on: https://go-review.googlesource.com/c/go/+/549796 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
CL 549796 adds race annotations to godebugInc. It uses racerelease to model a CompareAndSwap. However, a CompareAndSwap is essentially a load and a store. Modeling it as just racerelease makes it not synchronized with other racerelease, i.e. other CAS. For the following execution thread A B load, got nil load, got nil set *inc set *inc racerelease CAS success racerelease CAS fail load raceacquire use *inc (from A) On thread B, the raceacquire synchronizes with the previous racerelease, which is not synchronized with racerelease on thread A, so it doesn't know that the use of *inc on thread B is after the set on thread A, and will report a race. Change it to use racereleasemerge, which synchronizes with previous racerelease and racereleasemerge. So in the case above it knows thread B's CAS is after thread A's. Also remove stale comment that was more relevant when the code used atomic store, where CL 549796 changed to CAS. Updates golang#64649. Change-Id: I17671090a19c0699fcb4e6481e2abd98ef2e5542 Reviewed-on: https://go-review.googlesource.com/c/go/+/551856 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
Go version
go version go1.21.5 darwin/arm64
What operating system and processor architecture are you using (
go env
)?What did you do?
I have a testsuite for my github.com/dolmen-go/rendezvous package.
Some characteristics that might be relevant:
rendezvous
package launches goroutines in its tests.rendezvous
testsuite uses parallel testing at top testing level:This issue is also visible on GitHub Actions: https://github.com/dolmen-go/rendezvous/actions/runs/7147035848/job/19465833264#step:4:32
What did you expect to see?
Test success like with previous Go versions.
What did you see instead?
Race condition in
runtime
:Link to runtime source: https://github.com/golang/go/blob/go1.21.5/src/runtime/runtime.go#L97
A comment says:
If this is related, I think this is "big deal" as running
go test -race
in CI is quite common. (Callingt.Parallel()
from top level tests is not so common, this is unfortunate, and this issue blocks from pushing for it)The text was updated successfully, but these errors were encountered: