Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: mayMoreStackPreempt failures #55160

Closed
rsc opened this issue Sep 20, 2022 · 40 comments
Closed

runtime: mayMoreStackPreempt failures #55160

rsc opened this issue Sep 20, 2022 · 40 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. early-in-cycle A change that should be done early in the 3 month dev cycle. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Sep 20, 2022

#!watchflakes
post <- section == "maymorestack=mayMoreStackPreempt"

Intermittent failures in the new mayMoreStackPreempt. This bug will replace #54778 and #55073.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 20, 2022
@rsc rsc added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 20, 2022
@gopherbot
Copy link

gopherbot commented Sep 20, 2022

Found new matching flaky dashboard failures.

2022-09-02 19:08 linux-386-longtest go@dbf442b1 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.12s)
    testenv.go:468: [/workdir/tmp/go-build505441303/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:08 linux-386-longtest go@b91e3737 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.07s)
    testenv.go:468: [/workdir/tmp/go-build4244626990/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:09 linux-386-longtest go@55ca6a20 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.47s)
    testenv.go:468: [/workdir/tmp/go-build771899026/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:22 linux-386-longtest go@0fda8b19 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.94s)
    testenv.go:468: [/workdir/tmp/go-build428612025/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:22 linux-386-longtest go@0fda8b19 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.76s)
    testenv.go:468: [/workdir/tmp/go-build428612025/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-03 15:45 linux-386-longtest go@f798dc68 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.90s)
    testenv.go:468: [/workdir/tmp/go-build1281399444/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-03 18:21 linux-386-longtest go@a0f05823 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.46s)
    testenv.go:468: [/workdir/tmp/go-build2488815781/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-03 18:21 linux-386-longtest go@a0f05823 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.50s)
    testenv.go:468: [/workdir/tmp/go-build2488815781/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-04 04:17 linux-386-longtest go@535fe2b2 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.50s)
    testenv.go:468: [/workdir/tmp/go-build3332541852/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 07:14 linux-386-longtest go@3fbcf05d runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.14s)
    testenv.go:468: [/workdir/tmp/go-build2637521369/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 07:17 linux-386-longtest go@4e7e7ae1 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.21s)
    testenv.go:468: [/workdir/tmp/go-build436015263/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 08:08 linux-386-longtest go@4ad55cd9 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.91s)
    testenv.go:468: [/workdir/tmp/go-build596332148/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 08:08 linux-386-longtest go@af7f4176 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.57s)
    testenv.go:468: [/workdir/tmp/go-build2390128426/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 08:28 linux-386-longtest go@bd5595d7 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.36s)
    testenv.go:468: [/workdir/tmp/go-build202959485/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 08:28 linux-386-longtest go@bd5595d7 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (1.14s)
    testenv.go:468: [/workdir/tmp/go-build202959485/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-05 21:39 linux-386-longtest go@4c1ca42a runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.43s)
    testenv.go:468: [/workdir/tmp/go-build2185756284/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-06 14:44 linux-386-longtest go@a60a3dc5 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.37s)
    testenv.go:468: [/workdir/tmp/go-build1980997514/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-06 15:44 linux-386-longtest go@32f68b5a runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.11s)
    testenv.go:468: [/workdir/tmp/go-build1672150549/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-06 15:44 linux-386-longtest go@32f68b5a runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.40s)
    testenv.go:468: [/workdir/tmp/go-build1672150549/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-06 15:48 linux-386-longtest go@07b19bf5 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.15s)
    testenv.go:468: [/workdir/tmp/go-build2315874385/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:08 linux-amd64-longtest go@dbf442b1 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.87s)
    testenv.go:468: [/workdir/tmp/go-build2891867005/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-02 19:08 linux-amd64-longtest go@dbf442b1 runtime.TestCgoCCodeSIGPROF (log)
--- FAIL: TestCgoCCodeSIGPROF (0.32s)
    testenv.go:468: [/workdir/tmp/go-build2891867005/testprogcgo.exe CgoCCodeSIGPROF] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:215: expected "OK\n" got 
2022-09-02 19:08 linux-amd64-longtest go@b91e3737 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (3.24s)
    testenv.go:468: [/workdir/tmp/go-build687456613/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-03 15:45 linux-amd64-longtest go@f798dc68 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (2.81s)
    testenv.go:468: [/workdir/tmp/go-build4078786360/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-03 18:21 linux-amd64-longtest go@a0f05823 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.97s)
    testenv.go:468: [/workdir/tmp/go-build956381195/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-04 04:17 linux-amd64-longtest go@535fe2b2 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (3.27s)
    testenv.go:468: [/workdir/tmp/go-build2525853216/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-05 08:12 linux-amd64-longtest go@67e65424 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.42s)
    testenv.go:468: [/workdir/tmp/go-build3155816942/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-05 08:12 linux-amd64-longtest go@67e65424 runtime.TestCgoCCodeSIGPROF (log)
--- FAIL: TestCgoCCodeSIGPROF (0.18s)
    testenv.go:468: [/workdir/tmp/go-build3155816942/testprogcgo.exe CgoCCodeSIGPROF] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:215: expected "OK\n" got 
2022-09-05 21:39 linux-amd64-longtest go@4c1ca42a runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.67s)
    testenv.go:468: [/workdir/tmp/go-build1237047364/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-06 11:14 linux-amd64-longtest go@1c504843 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (2.60s)
    testenv.go:468: [/workdir/tmp/go-build4096451727/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 

watchflakes

@rsc
Copy link
Contributor Author

rsc commented Sep 20, 2022

These were coming in multiple times per day and have now stopped for almost two weeks. Closing.

@rsc rsc closed this as completed Sep 20, 2022
@gopherbot
Copy link

gopherbot commented Sep 20, 2022

Found new matching flaky dashboard failures for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2022-08-23 03:09 windows-amd64-longtest go@0a52d806 (log)
XXXBANNERXXX:Test execution environment.
# GOARCH: amd64
# CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
# GOOS: windows
# OS Version: 10.0.14393
fatal error: advapi32.dll not found
runtime: panic before malloc heap initialized
go: error obtaining buildID for go tool compile: exit status 0xc0000005

go tool dist: FAILED: go list -f={{if .Stale}}	STALE {{.ImportPath}}: {{.StaleReason}}{{end}} std: exit status 1

watchflakes

@gopherbot gopherbot reopened this Sep 20, 2022
@rsc rsc closed this as completed Sep 20, 2022
@rsc rsc changed the title runtime: flaky failures with mayMoreStackPreempt runtime: mayMoreStackPreempt failures Sep 20, 2022
@gopherbot
Copy link

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2022-09-06 15:48 linux-amd64-longtest go@07b19bf5 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.58s)
    testenv.go:468: [/workdir/tmp/go-build3380638690/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 

watchflakes

@gopherbot gopherbot reopened this Nov 4, 2022
@gopherbot
Copy link

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2022-10-31 21:00 linux-386-longtest go@ec0b5402 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    testenv.go:454: context deadline exceeded: terminating command: /workdir/tmp/go-build82099082/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build82099082/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffb85348
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-02 18:19 linux-386-longtest go@07a70bca runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build3697616094/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build3697616094/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0xfffffff5
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffb80458
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 15:17 linux-386-longtest go@1bfb51f8 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build368093199/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build368093199/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffbb95b4
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 15:30 linux-386-longtest go@e81263c7 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build1088517119/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build1088517119/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x1
        ebx    0x9c80310
        ecx    0x81
        edx    0x1
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffdec718
        eip    0x80b7eff
        eflags 0x292
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 17:01 linux-386-longtest go@667c53e1 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build1853768809/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build1853768809/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffcd9648
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 17:40 linux-386-longtest go@7abc8a2e runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build3148955378/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build3148955378/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffa0a878
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 18:33 linux-386-longtest go@44cabb80 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build2865322188/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build2865322188/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7efd m=0 sigcode=0

        eax    0xf0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xfff0a0e8
        eip    0x80b7efd
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 19:34 linux-386-longtest go@3511c822 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build1678369203/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build1678369203/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffac9090
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 19:59 linux-386-longtest go@d031e9e0 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build3827993796/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build3827993796/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7d6f m=0 sigcode=0

        eax    0x1
        ebx    0x9846e50
        ecx    0x81
        edx    0x1
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xff9e7c18
        eip    0x80b7d6f
        eflags 0x292
        cs     0x23
        fs     0x0
        gs     0x63

watchflakes

@mknyszek mknyszek added this to the Go1.20 milestone Nov 9, 2022
@mknyszek mknyszek self-assigned this Nov 9, 2022
@mknyszek mknyszek removed their assignment Nov 23, 2022
@cherrymui
Copy link
Member

No more failure for a while. Probably fixed by CL https://golang.org/cl/447495 , which is submitted at Nov. 4, so the time seems to match.

@gopherbot gopherbot reopened this Feb 2, 2023
@gopherbot
Copy link

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-01-25 16:38 linux-arm64-longtest go@1d3088ef runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m39s)

runtime.gopark(0x4362a0?, 0xffff64c08d00?, 0x2?, 0x1b?, 0x5?)
	/tmp/workdir/go/src/runtime/proc.go:381 +0x124 fp=0x4000bbfcb0 sp=0x4000bbfc90 pc=0x53e84
runtime.netpollblock(0xffff64c08ce8?, 0x72?, 0x0?)
	/tmp/workdir/go/src/runtime/netpoll.go:527 +0x190 fp=0x4000bbfcf0 sp=0x4000bbfcb0 pc=0x4b9b0
internal/poll.runtime_pollWait(0xffff64c08ce8, 0x72)
	/tmp/workdir/go/src/runtime/netpoll.go:306 +0xd0 fp=0x4000bbfd20 sp=0x4000bbfcf0 pc=0x8da60
internal/poll.(*pollDesc).wait(0x4a1470?, 0x6a05c0?, 0x1)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x4000bbfd50 sp=0x4000bbfd20 pc=0xc0108
internal/poll.(*pollDesc).waitRead(...)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x4000981f80, {0x4000bbfea7, 0x1, 0x1})
	/tmp/workdir/go/src/internal/poll/fd_unix.go:167 +0x200 fp=0x4000bbfdf0 sp=0x4000bbfd50 pc=0xc0df0
os.(*File).read(...)
	/tmp/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0x40026ae180, {0x4000bbfea7?, 0x4000bbff30?, 0x1?})
	/tmp/workdir/go/src/os/file.go:118 +0x58 fp=0x4000bbfe50 sp=0x4000bbfdf0 pc=0xc55c8
runtime_test.TestCrashDumpsAllThreads(0x4000e0fba0)
	/tmp/workdir/go/src/runtime/crash_unix_test.go:117 +0x4ac fp=0x4000bbff60 sp=0x4000bbfe50 pc=0x2c235c
testing.tRunner(0x4000e0fba0, 0x437308)
2023-02-01 21:30 linux-amd64-longtest go@cda461bb runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m41s)

runtime.gopark(0x83f3d8?, 0x7f22a44dab20?, 0x2?, 0x1b?, 0x5?)
	/workdir/go/src/runtime/proc.go:381 +0x110 fp=0xc000a17cf8 sp=0xc000a17cd8 pc=0x440f30
runtime.netpollblock(0x7f22a44dab08?, 0x72?, 0x0?)
	/workdir/go/src/runtime/netpoll.go:527 +0x117 fp=0xc000a17d30 sp=0xc000a17cf8 pc=0x4391d7
internal/poll.runtime_pollWait(0x7f22a44dab08, 0x72)
	/workdir/go/src/runtime/netpoll.go:306 +0xa5 fp=0xc000a17d50 sp=0xc000a17d30 pc=0x4778a5
internal/poll.(*pollDesc).wait(0x8a7c60?, 0xaa75c8?, 0x1)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000a17d78 sp=0xc000a17d50 pc=0x4a8c67
internal/poll.(*pollDesc).waitRead(...)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0007b96e0, {0xc000a17eaf, 0x1, 0x1})
	/workdir/go/src/internal/poll/fd_unix.go:167 +0x27a fp=0xc000a17e10 sp=0xc000a17d78 pc=0x4a997a
os.(*File).read(...)
	/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0xc00238e190, {0xc000a17eaf?, 0xc000985f38?, 0x1?})
	/workdir/go/src/os/file.go:118 +0x4c fp=0xc000a17e68 sp=0xc000a17e10 pc=0x4ae5cc
runtime_test.TestCrashDumpsAllThreads(0xc000b17a00)
	/workdir/go/src/runtime/crash_unix_test.go:117 +0x579 fp=0xc000a17f70 sp=0xc000a17e68 pc=0x6d0ef9
testing.tRunner(0xc000b17a00, 0x8404c8)

watchflakes

@dmitshur dmitshur modified the milestones: Go1.20, Backlog Feb 2, 2023
@prattmic
Copy link
Member

cc @aclements

These timeouts blocked release branch CLs as flakes; we should get them fixed or skipped.

@aclements
Copy link
Member

aclements commented Feb 15, 2023

This reproduces pretty quickly on my linux/amd64 laptop with:

cd runtime
go test -c
GOFLAGS="-gcflags=runtime=-d=maymorestack=runtime.mayMoreStackPreempt -gcflags=runtime/testdata/...=" stress2 ./runtime.test -test.run=CrashDumpsAllThreads -test.timeout=30m

Of course, my usual first step would be to send it a SIGQUIT and see what's happening, but that's exactly what fails in this test. 😛 I haven't debugged yet. gdb may hold answers.

For reference, I initially tried running the testprog directly with the below script before I realized there are various pipes and signals involved in running this test. We might still need to reproduce those, but I'm hoping we can debug this through the test driver.

cd runtime/testdata/testprog
GOFLAGS="-gcflags=runtime=-d=maymorestack=runtime.mayMoreStackPreempt -gcflags=runtime/testdata/...=" go build
# Expects to write to FD 3 and then get a SIGQUIT
GOTRACEBACK=crash GOGC=off GODEBUG=asyncpreemptoff=1 ./testprog CrashDumpsAllThreads

@aclements
Copy link
Member

All of the Ms are just parked in mPark in normal places in the scheduler, except one thread that's in the infinite loop in crashDumpsAllThreadsLoop. /proc/PID/status and /proc/PID/task/*/status says there are no signals pending, blocked, or ignored by the process or any of its threads. It's a little suspicious that the blocked mask is 0, and in another Go process I have running it's fffffffc3bfa3a00.

@aclements
Copy link
Member

Ah hah, it's getting stuck before the SIGQUIT is involved. It's actually getting stuck in this loop, before the subprocess even indicates to the parent that it's ready for the SIGQUIT.

@aclements
Copy link
Member

Amusingly, this means you can send the subprocess a SIGQUIT. I've pasted the interesting part of the traceback below. All other goroutines and threads are uninteresting (this is running with GOTRACEBACK=crash so we see all threads). The test sets GOMAXPROCS to 5 and starts 4 infinite loops with async preemption disabled, and this looks to me like we're failing to steal goroutine 1 onto another P.

SIGQUIT: quit
PC=0x4b615d m=0 sigcode=0

goroutine 33 [running]:
main.crashDumpsAllThreadsLoop(0x0?, 0x0?)
	/home/austin/go.dev/src/runtime/testdata/testprog/crashdump.go:56 +0x1d fp=0xc0001127c0 sp=0xc0001127a8 pc=0x4b615d
main.CrashDumpsAllThreads.func1()
	/home/austin/go.dev/src/runtime/testdata/testprog/crashdump.go:29 +0x25 fp=0xc0001127e0 sp=0xc0001127c0 pc=0x4b6105
runtime.goexit()
	/home/austin/go.dev/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001127e8 sp=0xc0001127e0 pc=0x467441
created by main.CrashDumpsAllThreads
	/home/austin/go.dev/src/runtime/testdata/testprog/crashdump.go:29 +0x70

goroutine 1 [runnable]:
runtime.makechan(0x4ca180?, 0x0?)
	/home/austin/go.dev/src/runtime/chan.go:72 +0x1df fp=0xc0000bce58 sp=0xc0000bce50 pc=0x4050bf
main.CrashDumpsAllThreads()
	/home/austin/go.dev/src/runtime/testdata/testprog/crashdump.go:28 +0xe9 fp=0xc0000bcee8 sp=0xc0000bce58 pc=0x4b5f49
main.main()
	/home/austin/go.dev/src/runtime/testdata/testprog/main.go:34 +0x133 fp=0xc0000bcf80 sp=0xc0000bcee8 pc=0x4baeb3
runtime.main()
	/home/austin/go.dev/src/runtime/proc.go:250 +0x207 fp=0xc0000bcfe0 sp=0xc0000bcf80 pc=0x439487
runtime.goexit()
	/home/austin/go.dev/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000bcfe8 sp=0xc0000bcfe0 pc=0x467441

@aclements
Copy link
Member

Faster repro. Apply the following diff

--- a/src/runtime/testdata/testprog/crashdump.go
+++ b/src/runtime/testdata/testprog/crashdump.go
@@ -29,6 +29,8 @@ func CrashDumpsAllThreads() {
                <-c
        }
 
+       return
+
        // Tell our parent that all the goroutines are executing.
        if _, err := os.NewFile(3, "pipe").WriteString("x"); err != nil {
                fmt.Fprintf(os.Stderr, "write to pipe failed: %v\n", err)

Then

cd runtime/testdata/testprog
GOFLAGS="-gcflags=runtime=-d=maymorestack=runtime.mayMoreStackPreempt -gcflags=runtime/testdata/...=" go build
GOTRACEBACK=crash GOGC=off GODEBUG=asyncpreemptoff=1 stress2 ./testprog CrashDumpsAllThreads

@aclements
Copy link
Member

Here's a version that works without a patch:

cd runtime/testdata/testprog
GOFLAGS="-gcflags=runtime=-d=maymorestack=runtime.mayMoreStackPreempt -gcflags=runtime/testdata/...=" go build
GOTRACEBACK=crash GOGC=off GODEBUG=asyncpreemptoff=1 stress2 -timeouts-fail -max-fails=1 -timeout=10s -pass="write to pipe" -max-runs=50000 ./testprog CrashDumpsAllThreads

@aclements
Copy link
Member

This isn't new, either. I reproduced in Go 1.18, which was the first release to have mayMoreStack.

@gopherbot
Copy link

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-02-23 06:07 linux-arm64-longtest go@71c02bed runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m40s)

runtime.gopark(0x819328?, 0xffff943a0f20?, 0x2?, 0x1b?, 0x5?)
	/tmp/workdir/go/src/runtime/proc.go:381 +0x120 fp=0x40002a0cb0 sp=0x40002a0c90 pc=0x442f50
runtime.netpollblock(0xffff943a0f08?, 0x72?, 0x0?)
	/tmp/workdir/go/src/runtime/netpoll.go:527 +0x190 fp=0x40002a0cf0 sp=0x40002a0cb0 pc=0x43a960
internal/poll.runtime_pollWait(0xffff943a0f08, 0x72)
	/tmp/workdir/go/src/runtime/netpoll.go:306 +0xd0 fp=0x40002a0d20 sp=0x40002a0cf0 pc=0x47c900
internal/poll.(*pollDesc).wait(0x8830a0?, 0xa997b8?, 0x1)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x40002a0d50 sp=0x40002a0d20 pc=0x4af8a8
internal/poll.(*pollDesc).waitRead(...)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x40007ab320, {0x40002a0ea7, 0x1, 0x1})
	/tmp/workdir/go/src/internal/poll/fd_unix.go:167 +0x200 fp=0x40002a0df0 sp=0x40002a0d50 pc=0x4b0590
os.(*File).read(...)
	/tmp/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0x400141e238, {0x40002a0ea7?, 0x40002a0f30?, 0x1?})
	/tmp/workdir/go/src/os/file.go:118 +0x58 fp=0x40002a0e50 sp=0x40002a0df0 pc=0x4b4f28
runtime_test.TestCrashDumpsAllThreads(0x400016e340)
	/tmp/workdir/go/src/runtime/crash_unix_test.go:117 +0x4ac fp=0x40002a0f60 sp=0x40002a0e50 pc=0x6afbbc
testing.tRunner(0x400016e340, 0x81a410)

watchflakes

@prattmic
Copy link
Member

prattmic commented Jun 8, 2023

There is actually a second, similar issue: gosched (runtime.Gosched as well as morestack and async preemption) don't call wakep. They are submitting runnable work to the scheduler, so they must follow the same protocols for synchronizing with spinning Ms as any other source of work.

@gopherbot
Copy link

Change https://go.dev/cl/501976 mentions this issue: runtime: call wakep in gosched

@gopherbot
Copy link

Change https://go.dev/cl/501975 mentions this issue: runtime: check global runq during "delicate dance"

@gopherbot
Copy link

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-06-16 17:08 linux-arm64-longtest go@2b0ff4b6 runtime.TestCrashDumpsAllThreads (log)
--- FAIL: TestCrashDumpsAllThreads (2681.12s)
    exec.go:190: test timed out while running command: /tmp/workdir/tmp/go-build2653187426/testprog.exe CrashDumpsAllThreads
    crash_unix_test.go:127: reading from pipe: EOF
    exec.go:207: command was started, but test did not wait for it to complete: /tmp/workdir/tmp/go-build2653187426/testprog.exe CrashDumpsAllThreads

watchflakes

@mknyszek
Copy link
Contributor

mknyszek commented Jun 30, 2023

Although this bug was causing failures, I think @prattmic decided to wait until Go 1.22 to land the fix? The changes aren't that complex, but the modified parts of the codebase are subtle. The primary failure modes are mitigated (I think? Why did we see a failure on June 16th if the test was skipped with mayMoreStack on June 6th? Oh. Is that a release branch?). We can also choose to backport it later after the fix gets a chance to soak.

@mknyszek mknyszek modified the milestones: Backlog, Go1.22 Jun 30, 2023
@prattmic
Copy link
Member

Correct, the sweet results from my CLs are:

                     │ baseline.results │          cl501975.results          │          cl501976.results          │
                     │      sec/op      │   sec/op     vs base               │   sec/op     vs base               │
BiogoIgor                    13.89 ± 0%    13.90 ± 1%       ~ (p=0.523 n=30)    13.92 ± 0%       ~ (p=0.192 n=30)
BiogoKrishna                 11.99 ± 1%    11.95 ± 0%       ~ (p=0.197 n=30)    11.92 ± 1%       ~ (p=0.328 n=30)
BleveIndexBatch100           4.744 ± 1%    4.744 ± 1%       ~ (p=0.775 n=30)    4.755 ± 1%       ~ (p=0.592 n=30)
EtcdPut                     32.99m ± 1%   32.76m ± 1%       ~ (p=0.552 n=30)   33.07m ± 1%       ~ (p=0.843 n=30)
EtcdSTM                     146.7m ± 0%   147.1m ± 0%       ~ (p=0.382 n=30)   147.1m ± 0%       ~ (p=0.254 n=30)
GoBuildKubelet               51.63 ± 0%    51.72 ± 0%  +0.18% (p=0.036 n=30)    51.86 ± 0%  +0.45% (p=0.000 n=30)
GoBuildKubeletLink           7.316 ± 1%    7.317 ± 0%       ~ (p=0.959 n=30)    7.317 ± 0%       ~ (p=0.971 n=30)
GoBuildIstioctl              44.36 ± 1%    44.57 ± 0%  +0.46% (p=0.013 n=30)    44.60 ± 0%  +0.54% (p=0.007 n=30)
GoBuildIstioctlLink          7.789 ± 0%    7.798 ± 0%       ~ (p=0.286 n=30)    7.799 ± 0%       ~ (p=0.562 n=30)
GoBuildFrontend              16.04 ± 0%    16.08 ± 0%  +0.26% (p=0.021 n=30)    16.10 ± 0%  +0.41% (p=0.001 n=30)
GoBuildFrontendLink          1.282 ± 1%    1.272 ± 1%       ~ (p=0.177 n=30)    1.274 ± 1%       ~ (p=0.358 n=30)
GopherLuaKNucleotide         21.87 ± 1%    21.80 ± 1%       ~ (p=0.485 n=30)    21.49 ± 1%  -1.77% (p=0.000 n=30)
MarkdownRenderXHTML         259.5m ± 2%   259.4m ± 1%       ~ (p=0.552 n=30)   263.4m ± 3%       ~ (p=0.843 n=30)
Tile38QueryLoad             511.6µ ± 0%   511.9µ ± 0%       ~ (p=0.562 n=30)   514.5µ ± 0%  +0.55% (p=0.000 n=30)
geomean                      2.087         2.086       -0.06%                   2.090       +0.11%

The effect here is pretty small, but has the potential to be bigger for edge-case applications. I'd be more comfortable giving this time to soak during 1.22.

This can cause complete stuckness when GOMAXPROCS=1 and GODEBUG=asyncpreemptoff=1. Avoiding either of those should avoid hangs (though the scheduler may still underutilize GOMAXPROCS) [1].

[1] Technically if async preemption is disabled, then it takes GOMAXPROCS goroutines spinning in an infinite loop to cause stuckness. So GOMAXPROCS=2 could still hang if there are 2 spinning goroutines.

@prattmic
Copy link
Member

https://go.dev/cl/501229 should be skipping this test entirely. I agree it is odd that we got another failure on June 16 (it isn't a release branch). Maybe that GCFLAGS check isn't correct?

@prattmic prattmic added NeedsFix The path to resolution is known, but the work has not been done. early-in-cycle A change that should be done early in the 3 month dev cycle. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 30, 2023
@gopherbot
Copy link

Change https://go.dev/cl/507359 mentions this issue: runtime: check GOFLAGS not GCFLAGS

gopherbot pushed a commit that referenced this issue Jun 30, 2023
GCFLAGS doesn't have any defined meaning. cmd/dist enables
mayMoreStackPreempt with GOFLAGS.

For #55160.

Change-Id: I7ac71e4a1a983a56bd228ab5d24294db5cc595f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/507359
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
bradfitz pushed a commit to tailscale/go that referenced this issue Jul 15, 2023
GCFLAGS doesn't have any defined meaning. cmd/dist enables
mayMoreStackPreempt with GOFLAGS.

For golang#55160.

Change-Id: I7ac71e4a1a983a56bd228ab5d24294db5cc595f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/507359
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
@gopherbot
Copy link

This issue is currently labeled as early-in-cycle for Go 1.22.
That time is now, so a friendly reminder to look at it again.

gopherbot pushed a commit that referenced this issue Jul 20, 2023
When a thread transitions to spinning to non-spinning it must recheck
all sources of work because other threads may submit new work but skip
wakep because they see a spinning thread.

However, since the beginning of time (CL 7314062) we do not check the
global run queue, only the local per-P run queues.

The global run queue is checked just above the spinning checks while
dropping the P. I am unsure what the purpose of this check is. It
appears to simply be opportunistic since sched.lock is already held
there in order to drop the P. It is not sufficient to synchronize with
threads adding work because it occurs before decrementing
sched.nmspinning, which is what threads us to decide to wake a thread.

Resolve this by adding an explicit global run queue check alongside the
local per-P run queue checks.

Almost nothing happens between dropped sched.lock after dropping the P
and relocking sched.lock: just clearing mp.spinning and decrementing
sched.nmspinning. Thus it may be better to just hold sched.lock for this
entire period, but this is a larger change that I would prefer to avoid
in the freeze and backports.

For #55160.

Change-Id: Ifd88b5a4c561c063cedcfcfe1dd8ae04202d9666
Reviewed-on: https://go-review.googlesource.com/c/go/+/501975
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot gopherbot reopened this Nov 19, 2023
@gopherbot
Copy link

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-11-19 02:15 linux-arm64-longtest go@aa9dd500 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (111.03s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.05s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.02s)
            metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.007161s)
            metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.007391s)
            metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6615799
            metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)

watchflakes

@gopherbot
Copy link

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-11-21 22:06 linux-amd64-longtest go@5a6f1b35 runtime.TestRuntimeLockMetricsAndProfile (log)
--- FAIL: TestRuntimeLockMetricsAndProfile (0.07s)
    metrics_test.go:967: NumCPU 16
    metrics_test.go:968: GOMAXPROCS 16
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.02s)
        metrics_test.go:1056: lock contention growth in runtime/pprof's view  (0.000402s)
        metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000147s)
        metrics_test.go:1085: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0
        metrics_test.go:1087: profile:
            PeriodType: contentions count
            Period: 1
...
        metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack]
        metrics_test.go:1089: have stack [runtime.unlock runtime.wakep runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack]
        metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack]
        metrics_test.go:1089: have stack [runtime.unlock runtime.wakep runtime.resetspinning runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack]
        metrics_test.go:1089: have stack [runtime.unlock runtime.wakep runtime.ready runtime.readyWithTime.goready.func1 runtime.systemstack runtime.goready runtime.readyWithTime runtime.semrelease1 sync.runtime_Semrelease sync.(*WaitGroup).Add sync.(*WaitGroup).Done runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1.2 runtime_test.(*contentionWorker).run]
        metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall]
        metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack]
        metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall]
        metrics_test.go:1089: have stack [runtime._LostContendedLock]
        metrics_test.go:1091: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]

watchflakes

@mknyszek mknyszek assigned mknyszek and unassigned prattmic Nov 29, 2023
@mknyszek
Copy link
Contributor

Both failures are #64253.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. early-in-cycle A change that should be done early in the 3 month dev cycle. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Status: Done
Development

No branches or pull requests

9 participants