Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestRuntimeLockMetricsAndProfile failures #64253

Open
gopherbot opened this issue Nov 17, 2023 · 25 comments
Open

runtime: TestRuntimeLockMetricsAndProfile failures #64253

gopherbot opened this issue Nov 17, 2023 · 25 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@gopherbot
Copy link

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestRuntimeLockMetricsAndProfile (0.03s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.02s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s)
            metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.006333s)
            metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006331s)
            metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6332772
            metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 17, 2023
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-17 23:04 linux-ppc64-sid-power10 go@0b31a46f runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.03s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.02s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s)
            metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.006333s)
            metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006331s)
            metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6332772
            metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)
2023-11-17 23:04 windows-386-2016 go@0b31a46f runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.21s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.01s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.000335s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=491 value=173075

watchflakes

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 17, 2023
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-17 23:16 windows-amd64-longtest go@f664031b runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.22s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.02s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.002332s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=3062 value=988506

watchflakes

@rhysh rhysh self-assigned this Nov 18, 2023
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-17 23:16 linux-ppc64-sid-power10 go@f664031b runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.03s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.02s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s)
            metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.006328s)
            metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006326s)
            metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6328117
            metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)
2023-11-17 23:34 linux-ppc64-sid-power10 go@631a6c2a runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.02s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.01s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s)
            metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.005366s)
            metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.005365s)
            metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=86 value=5366454
            metrics_test.go:1156: mutex profile reported contention count different from the known true count (86 != 100)

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-17 23:15 windows-arm64-11 go@f67b2d8f runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.16s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.01s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.003580s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=3101 value=2156101
2023-11-17 23:16 windows-arm64-11 go@f664031b runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.13s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.02s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.003674s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=3308 value=2331713
2023-11-17 23:34 windows-arm64-11 go@631a6c2a runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.21s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.07s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.005268s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=5951 value=3491388

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-19 02:15 windows-amd64-2016 go@aa9dd500 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.22s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.02s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.001346s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=3436 value=1006621

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-19 07:31 linux-ppc64-sid-power10 go@d67ac938 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.02s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.02s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s)
            metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.006294s)
            metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006293s)
            metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6294492
            metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)
2023-11-19 07:31 windows-arm64-11 go@d67ac938 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.16s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.05s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.009316s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=8900 value=6224493
2023-11-19 15:24 linux-ppc64-sid-power10 go@1c15291f runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.02s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.01s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s)
            metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.006170s)
            metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006168s)
            metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6169552
            metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)
2023-11-19 15:24 windows-arm64-11 go@1c15291f runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.16s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.05s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.006365s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=4658 value=3628090
2023-11-19 17:06 linux-ppc64-sid-power10 go@2551fffd runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.01s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.01s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s)
            metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.000038s)
            metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000038s)
            metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=2 value=37499
            metrics_test.go:1153: runtime/metrics reported less than the known minimum contention duration (0.000038s < 0.001000s)
            metrics_test.go:1156: mutex profile reported contention count different from the known true count (2 != 100)

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-19 21:11 windows-arm64-11 go@06145fe0 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.15s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.01s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.002936s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=2560 value=1676526
2023-11-19 22:05 windows-arm64-11 go@63828938 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.23s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.08s)
        metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric
        metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.007995s)
        metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s)
        metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=5920 value=5630416

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-21 21:30 linux-ppc64le-buildlet go@41f58b22 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.10s)
    metrics_test.go:967: NumCPU 4
    metrics_test.go:968: GOMAXPROCS 4
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.08s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.05s)
            metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.020478s)
            metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.081851s)
            metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=106 value=20478295
            metrics_test.go:1199: views differ by more than 3.750000x

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-21 21:27 linux-386-longtest go@48a6362d runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.17s)
    metrics_test.go:967: NumCPU 16
    metrics_test.go:968: GOMAXPROCS 16
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.05s)
        metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.000048s)
        metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000049s)
        metrics_test.go:1085: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0
        metrics_test.go:1087: profile:
            PeriodType: contentions count
            Period: 1
...
            7: 0x8048000/0x841a000/0x0 /workdir/tmp/go-build1114361604/b001/runtime.test  [FN]
            8: 0xf7d10000/0xf7e64000/0x19000 /lib32/libc-2.31.so 1a7ffd46bcc790ed7017919b4ca6fc812c8c615c 
            9: 0xf7ee4000/0xf7ef5000/0x5000 /lib32/libpthread-2.31.so 505f3eecd0c7c22229602b598ee187d5b5a94bbb 
            10: 0xf7f04000/0xf7f11000/0x3000 /lib32/libresolv-2.31.so d8095c858bdd3a4d2a534300f09990a6a0644ea4 
            11: 0xf7f28000/0xf7f2a000/0x0 [vdso]  
            12: 0xf7f2b000/0xf7f48000/0x1000 /lib32/ld-2.31.so dd55267740b58ae626dc11439ff85e923974f7d5 
        metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1089: have stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1091: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]

watchflakes

@gopherbot
Copy link
Author

Change https://go.dev/cl/544375 mentions this issue: runtime: test for contention in both semaphore paths

@rhysh
Copy link
Contributor

rhysh commented Nov 21, 2023

Most of the failures are from before https://go.dev/cl/544195. Following that, I've seen four failures (via fetchlogs/greplogs).

https://build.golang.org/log/c733e91b4693774eb42501bbe93a8ee071ef312e addressed by https://go.dev/cl/544375
https://build.golang.org/log/79b92d5304bd79fbdfac90c953d53b172f2d8250 addressed by https://go.dev/cl/544375
https://build.golang.org/log/2e5e4f5629a179365a02d2bb9548ed5b94597c54 during a slow runtime test which I expect adds a lot of non-contention work, making it harder to encounter the desired contention ... logging to confirm in https://go.dev/cl/544375
https://build.golang.org/log/6ee5577c3e0a08531b8d7f009631323b4aab7c23 on linux-ppc64le, showing pprof with the amount of contention I expect but a metrics view with much more weight. One of the earlier linux-390x failures shows a similar behavior (though also failed due to the big-endian bug I fixed in CL 544195).

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-21 21:29 linux-ppc64-sid-buildlet go@90ba4452 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.22s)
    metrics_test.go:967: NumCPU 4
    metrics_test.go:968: GOMAXPROCS 4
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.19s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.15s)
            metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.264071s)
            metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.067915s)
            metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=134 value=264071372
            metrics_test.go:1199: views differ by more than 3.750000x

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-21 21:29 aix-ppc64 go@71052169 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.75s)
    metrics_test.go:967: NumCPU 12
    metrics_test.go:968: GOMAXPROCS 12
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.31s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.03s)
            metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.022040s)
            metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.056988s)
            metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=100 value=22039857
            metrics_test.go:1169: views differ by more than 1.500000x

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-22 18:12 linux-arm64-longtest go@67600298 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.04s)
    metrics_test.go:967: NumCPU 16
    metrics_test.go:968: GOMAXPROCS 4
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.00s)
        metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.000130s)
        metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000001s)
        metrics_test.go:1085: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0
        metrics_test.go:1087: profile:
            PeriodType: contentions count
            Period: 1
...
        metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.park_m runtime.mcall]
        metrics_test.go:1089: have stack [runtime._LostContendedLock]
        metrics_test.go:1089: have stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1089: have stack [runtime.unlock runtime.(*sweepLocked).sweep.(*mheap).freeSpan.func3 runtime.(*mheap).freeSpan runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.gcMarkTermination.func4 runtime.forEachPInternal runtime.gcMarkTermination.forEachP.func6 runtime.systemstack runtime.forEachP runtime.gcMarkTermination runtime.gcMarkDone runtime.gcBgMarkWorker]
        metrics_test.go:1089: have stack [runtime.unlock runtime.GOMAXPROCS runtime_test.TestRuntimeLockMetricsAndProfile.func6 testing.tRunner]
        metrics_test.go:1089: have stack [runtime.unlock runtime.semrelease1 sync.runtime_Semrelease sync.(*WaitGroup).Add sync.(*WaitGroup).Done runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1089: have stack [runtime.unlock runtime.(*sweepLocked).sweep.(*mheap).freeSpan.func3 runtime.(*mheap).freeSpan runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.acquirep runtime.stopm runtime.gcstopm runtime.findRunnable runtime.schedule runtime.park_m runtime.mcall]
        metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1089: have stack [runtime.unlock runtime.GOMAXPROCS runtime_test.TestRuntimeLockMetricsAndProfile.func6 testing.tRunner]
        metrics_test.go:1091: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]

watchflakes

@rhysh
Copy link
Contributor

rhysh commented Nov 22, 2023

That's two more ppc64 failures from a clock mismatch (not sure what's up with ppc64's clocks), and one more of the type that https://go.dev/cl/544375 addresses.

@mknyszek mknyszek added this to the Go1.22 milestone Nov 22, 2023
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-27 17:23 solaris-amd64-oraclerel go@e158cb21 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (1.15s)
    metrics_test.go:967: NumCPU 21
    metrics_test.go:968: GOMAXPROCS 4
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.59s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.47s)
            metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.139584s)
            metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.797815s)
            metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=92 value=139583621
            metrics_test.go:1199: views differ by more than 3.750000x

watchflakes

gopherbot pushed a commit that referenced this issue Nov 28, 2023
Most contention on the runtime locks inside semaphores is observed in
runtime.semrelease1, but it can also appear in runtime.semacquire1. When
examining contention profiles in TestRuntimeLockMetricsAndProfile, allow
call stacks that include either.

For #64253

Change-Id: Id4f16af5e9a28615ab5032a3197e8df90f7e382f
Reviewed-on: https://go-review.googlesource.com/c/go/+/544375
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Rhys Hiltner <rhys@justin.tv>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@prattmic
Copy link
Member

Note that there are some additional timeouts at #55308 (comment).

I'm not sure how to update the watchflakes rule to match these timeouts.

@rhysh
Copy link
Contributor

rhysh commented Nov 29, 2023

From that issue comment, 2023-11-19 22:05 darwin-amd64-longtest go@63828938 runtime and 2023-11-20 02:02 darwin-amd64-longtest go@237715cf runtime are timeouts during the maymorestack=mayMoreStackMove re-run of the runtime tests. Those predate the roll-forward of https://go.dev/cl/544195 , which includes a related update to the semrelease test to address those exact failures. (I'd found them via fetchlogs/greplogs.)

Previously, the test involved three threads and required one of them to hold the semaphore's lock, one to contend for it, and the third to notice that contention and mark the test as complete. My interpretation of the mayMoreStackMove failures is that they added a lot of non-contention work, which made it hard for the test's three threads to each be in those exact right spots at a single instant.

In https://go.dev/cl/544195 , I relaxed that requirement: now the test can finish early if one of the threads notices the other two have created the necessary contention, otherwise it'll give up after 10,000,000 iterations. (My darwin/arm64 system typically needed 500,000–4,000,000 iterations during the mayMoreStackPreempt re-test.) In either case, the only verification it does of the result is that there's "more than zero" contention and that the call stack appears as expected.

I continue to struggle with the balance between wanting thorough tests for this part of the runtime (involving concurrency and clocks), and the need for 0% flakes. Maybe the performance dashboard sets an example there, including an opportunity for more output than ok runtime 690.399s? I'd appreciate any advice you can share—thanks.

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-11-29 19:13 freebsd-arm64-dmgk go@636c6e35 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.03s)
    metrics_test.go:967: NumCPU 2
    metrics_test.go:968: GOMAXPROCS 4
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.00s)
        metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.000013s)
        metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000013s)
        metrics_test.go:1095: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0
        metrics_test.go:1095: stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0
        metrics_test.go:1098: profile:
            PeriodType: contentions count
...
            2: 0x0/0x0/0x0   [FN]
        metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1100: have stack [runtime.unlock runtime.parkunlock_c runtime.park_m runtime.mcall]
        metrics_test.go:1100: have stack [runtime.unlock runtime.stopm runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall]
        metrics_test.go:1100: have stack [runtime.unlock runtime.semrelease1 sync.runtime_Semrelease sync.(*WaitGroup).Add sync.(*WaitGroup).Done runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1100: have stack [runtime.unlock runtime.stopm runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall]
        metrics_test.go:1103: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1103: want stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1271: finished test early (9999534 tries remaining)

watchflakes

@mknyszek
Copy link
Contributor

mknyszek commented Dec 1, 2023

This failure appeared on a first-class port on https://go.dev/cl/546635. @bcmills was that CL based on a commit before https://go.dev/cl/544375 landed by any chance?

EDIT: Yes, it was. Here's the failure: https://ci.chromium.org/ui/p/golang/builders/try-workers/gotip-linux-arm64-test_only/b8762881407384401921/overview

@rhysh
Copy link
Contributor

rhysh commented Dec 1, 2023

From https://ci.chromium.org/ui/p/golang/builders/try-workers/gotip-linux-arm64-test_only/b8762881407384401921/overview, I get https://logs.chromium.org/logs/golang/buildbucket/cr-buildbucket/8762881407384401921/+/u/step/11/log/2, which shows two "metrics_test.go:1103: want stack" lines, indicating that it includes https://go.dev/cl/544375.

CL 546635 at PS 4 is https://go.googlesource.com/go/+/30e6fc629529abf8da4528f4fdbb5a78363624fb, with parent https://go.googlesource.com/go/+/2e6387cbec924dbd01007421d7442125037c66b2 .

I don't see a line in the failing build log matching https://go.googlesource.com/go/+/2e6387cbec924dbd01007421d7442125037c66b2/src/runtime/metrics_test.go#1271 , which means the test ran for 10,000,000 iterations without the test code noticing contention on the semaphore lock. The mutex profile result shows that the runtime itself didn't encounter contention either. (In #55160, the contention was present but the test had been unable to notice it and so would run forever.)

The "TestRuntimeLockMetricsAndProfile/runtime.lock" test verifies that runtime-internal lock contention is able to be reported with the correct count, magnitude, and call stack. The role of the "TestRuntimeLockMetricsAndProfile/runtime.semrelease" test is to check that the call stack ends at a particular depth, mostly so we can notice when that changes (so we can update the skip parameter, for example). It's proven tricker to test, since the lock itself isn't under the test's control. Do you have advice on how to test semacquire/semrelease, or is the best option to t.Skip that for now?

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-12-01 21:50 linux-ppc64-sid-buildlet go@3220bbe1 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (1.08s)
    metrics_test.go:967: NumCPU 4
    metrics_test.go:968: GOMAXPROCS 4
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (1.04s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.11s)
            metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.126391s)
            metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.024345s)
            metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=114 value=126390697
            metrics_test.go:1212: views differ by more than 3.750000x

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-12-06 17:29 linux-386-buster go@c80bd631 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.05s)
    metrics_test.go:967: NumCPU 16
    metrics_test.go:968: GOMAXPROCS 16
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.00s)
        metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.000127s)
        metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000007s)
        metrics_test.go:1095: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0
        metrics_test.go:1095: stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0
        metrics_test.go:1098: profile:
            PeriodType: contentions count
...
        metrics_test.go:1100: have stack [runtime.unlock runtime.(*sweepLocked).sweep.(*mheap).freeSpan.func3 runtime.(*mheap).freeSpan runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.gcMarkTermination.func4 runtime.forEachPInternal runtime.gcMarkTermination.forEachP.func6 runtime.systemstack runtime.forEachP runtime.gcMarkTermination runtime.gcMarkDone runtime.gcBgMarkWorker]
        metrics_test.go:1100: have stack [runtime.unlock runtime.(*sweepLocked).sweep.(*mheap).freeSpan.func3 runtime.(*mheap).freeSpan runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.acquirep runtime.stopm runtime.gcstopm runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack]
        metrics_test.go:1100: have stack [runtime.unlock runtime.wakep runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack]
        metrics_test.go:1100: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall]
        metrics_test.go:1100: have stack [runtime.unlock runtime.(*spanSet).push runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.acquirep runtime.stopm runtime.gcstopm runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack]
        metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1100: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall]
        metrics_test.go:1103: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1103: want stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]
        metrics_test.go:1271: finished test early (9998114 tries remaining)

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-12-08 20:35 android-386-emu go@9869a0ce runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.18s)
    metrics_test.go:967: NumCPU 2
    metrics_test.go:968: GOMAXPROCS 4
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.17s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.09s)
            metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.077049s)
            metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.120200s)
            metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=100 value=77049011
            metrics_test.go:1182: views differ by more than 1.500000x

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"
2023-12-08 03:28 aix-ppc64 go@78b42a53 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (1.10s)
    metrics_test.go:967: NumCPU 12
    metrics_test.go:968: GOMAXPROCS 2
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (1.10s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.43s)
            metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.373353s)
            metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.778912s)
            metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=100 value=373353395
            metrics_test.go:1182: views differ by more than 1.500000x
2023-12-08 18:34 aix-ppc64 go@6cdf2cca runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (1.25s)
    metrics_test.go:967: NumCPU 12
    metrics_test.go:968: GOMAXPROCS 4
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (1.03s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.61s)
            metrics_test.go:1056: lock contention growth in runtime/pprof's view  (1.082800s)
            metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.228814s)
            metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=122 value=1082799576
            metrics_test.go:1212: views differ by more than 3.750000x

watchflakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
Status: In Progress
Status: No status
Status: No status
Development

No branches or pull requests

4 participants