Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestSegv failures with schedule: holding locks #65416

Open
gopherbot opened this issue Jan 31, 2024 · 15 comments
Open

runtime: TestSegv failures with schedule: holding locks #65416

gopherbot opened this issue Jan 31, 2024 · 15 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@gopherbot
Copy link

gopherbot commented Jan 31, 2024

#!watchflakes
post <- pkg == "runtime" && test ~ `TestSegv` && `schedule: holding locks`

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/SegvInCgo (0.01s)
        crash_cgo_test.go:656: /tmp/workdir/tmp/go-build2656727872/testprogcgo.exe SegvInCgo: exit status 2
        crash_cgo_test.go:657: fatal error: schedule: holding locks
            SIGSEGV: segmentation violation
            PC=0x3e4d5f m=4 sigcode=65537 addr=0x0

            goroutine 1 gp=0xc0000061c0 m=4 mp=0xc000088008 [syscall]:
            syscall.Syscall(0x25, 0x2af3, 0xb, 0x0)
            	/tmp/workdir/go/src/syscall/asm_unix_amd64.s:19 +0x9 fp=0xc000065e20 sp=0xc000065e10 pc=0x3e4d49
...
            r13    0x1
            r14    0xc0000061c0
            r15    0x3
            rip    0x3e4d5f
            rflags 0x206
            cs     0x43
            fs     0x13
            gs     0x1b

        crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 31, 2024
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestSegv"
2024-01-31 22:40 freebsd-amd64-12_3 go@e069587c runtime.TestSegv (log)
--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/SegvInCgo (0.01s)
        crash_cgo_test.go:656: /tmp/workdir/tmp/go-build2656727872/testprogcgo.exe SegvInCgo: exit status 2
        crash_cgo_test.go:657: fatal error: schedule: holding locks
            SIGSEGV: segmentation violation
            PC=0x3e4d5f m=4 sigcode=65537 addr=0x0

            goroutine 1 gp=0xc0000061c0 m=4 mp=0xc000088008 [syscall]:
            syscall.Syscall(0x25, 0x2af3, 0xb, 0x0)
            	/tmp/workdir/go/src/syscall/asm_unix_amd64.s:19 +0x9 fp=0xc000065e20 sp=0xc000065e10 pc=0x3e4d49
...
            r13    0x1
            r14    0xc0000061c0
            r15    0x3
            rip    0x3e4d5f
            rflags 0x206
            cs     0x43
            fs     0x13
            gs     0x1b

        crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output

watchflakes

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 31, 2024
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestSegv"
2024-01-31 22:40 linux-ppc64le-power9osu go@e069587c runtime.TestSegv (log)
--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/TgkillSegvInCgo (0.03s)
        crash_cgo_test.go:656: /workdir/tmp/go-build3922838053/testprogcgo.exe TgkillSegvInCgo: exit status 2
        crash_cgo_test.go:657: fatal error: schedule: holding locks
            SIGSEGV: segmentation violation
            PC=0x100071b4 m=3 sigcode=4294967290 addr=0x7fae74000010

            goroutine 1 gp=0xc0000021c0 m=3 mp=0xc000043008 [running]:
            runtime/internal/syscall.Syscall6()
            	/workdir/go/src/runtime/internal/syscall/asm_linux_ppc64x.s:18 +0x24 fp=0xc000051d40 sp=0xc000051d40 pc=0x100071b4
...
            r22  0xc0000513a0	r23  0x1
            r24  0x0	r25  0x7fae9347f8e0
            r26  0x7fae9347ea70	r27  0x7fffc86765c0
            r28  0x7fffc867630f	r29  0x7fae9347f170
            r30  0xc0000021c0	r31  0x100a8a28
            pc   0x100071b4	ctr  0x0
            link 0x1000717c	xer  0x0
            ccr  0x24424880	trap 0xc00

        crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output

watchflakes

@kolyshkin
Copy link
Contributor

Seeing this in 10+ different builds. The oldest one failing seems to be https://ci.chromium.org/ui/b/8757370110283072257

@kolyshkin
Copy link
Contributor

Possible cause is https://go-review.googlesource.com/c/go/+/546135

@kolyshkin
Copy link
Contributor

Cc @cherrymui

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestSegv"
2024-01-31 20:51 freebsd-arm64-dmgk go@5c0d0929 runtime.TestSegv (log)
--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/SegvInCgo (0.05s)
        crash_cgo_test.go:656: /tmp/workdir-host-freebsd-arm64-dmgk/tmp/go-build2585557007/testprogcgo.exe SegvInCgo: exit status 2
        crash_cgo_test.go:657: fatal error: schedule: holding locks
            SIGSEGV: segmentation violation
            PC=0x495ea4 m=5 sigcode=65537 addr=0x0

            goroutine 1 gp=0x40000021c0 m=5 mp=0x4000080008 [syscall]:
            syscall.Syscall(0x25, 0xee41, 0xb, 0x0)
            	/tmp/workdir-host-freebsd-arm64-dmgk/go/src/syscall/asm_freebsd_arm64.s:15 +0x10 fp=0x400004ce00 sp=0x400004cdf0 pc=0x495e90
...
            r26     0x0
            r27     0x1
            r28     0x40000021c0
            r29     0x400004cde8
            lr      0x495e90
            sp      0x400004cdf0
            pc      0x495ea4
            fault   0x0

        crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestSegv"
2024-01-31 22:40 linux-mips64le-rtrk go@e069587c runtime.TestSegv (log)
--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/TgkillSegvInCgo (0.06s)
        crash_cgo_test.go:656: /tmp/gobuilder-mips64le/tmp/go-build3499950964/testprogcgo.exe TgkillSegvInCgo: exit status 2
        crash_cgo_test.go:657: fatal error: schedule: holding locks
            SIGSEGV: segmentation violation
            PC=0x12000675c m=3 sigcode=4294967290 addr=0x2014

            goroutine 1 gp=0xc0000021c0 m=3 mp=0xc000041008 [running]:
            runtime/internal/syscall.Syscall6(0x1469, 0x2014, 0x2019, 0xb, 0x0, 0x0, 0x0)
            	/tmp/gobuilder-mips64le/go/src/runtime/internal/syscall/asm_linux_mips64x.s:20 +0x24 fp=0xc00004fd90 sp=0xc00004fd90 pc=0x12000675c
...
            r20  0x7bd09b91bb9b8a2a	r21  0x17624176b18554d8
            r22  0x0	r23  0x120360000
            r24  0x8	r25  0xfffc8cd310
            r26  0x0	r27  0x0
            r28  0x100000000	r29  0xc00004fd90
            r30  0xc0000021c0	r31  0x12000670c
            pc   0x12000675c	link 0x12000670c
            lo   0x120	hi   0x0

        crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output

watchflakes

@bcmills
Copy link
Member

bcmills commented Feb 1, 2024

@bcmills
Copy link
Member

bcmills commented Feb 1, 2024

More spurious LUCI failures:
http://ci.chromium.org/b/8757282631689012161/test-results?sortby=&groupby=

@golang/runtime, is there something we can revert quickly to get the TryBots back into a stable state?

@bcmills bcmills added the Soon This needs to be done soon. (regressions, serious bugs, outages) label Feb 1, 2024
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestSegv"
2024-01-31 22:40 linux-ppc64le-power10osu go@e069587c runtime.TestSegv (log)
--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/TgkillSegvInCgo (0.02s)
        crash_cgo_test.go:656: /workdir/tmp/go-build2592650088/testprogcgo.exe TgkillSegvInCgo: exit status 2
        crash_cgo_test.go:657: fatal error: schedule: holding locks
            SIGSEGV: segmentation violation
            PC=0x100049b4 m=4 sigcode=4294967290 addr=0x7cc9693df530

            goroutine 1 gp=0xc0000021c0 m=4 mp=0xc000043808 [running]:
            runtime/internal/syscall.Syscall6()
            	/workdir/go/src/runtime/internal/syscall/asm_linux_ppc64x.s:18 +0x24 fp=0xc00004cd40 sp=0xc00004cd40 pc=0x100049b4
...
            r22  0xc00004c3a0	r23  0x7cc96acce190
            r24  0x7d1d582bd	r25  0xa1fb63d07
            r26  0x6f8171604	r27  0x7d066e1d
            r28  0xc579191f	r29  0x173b55df
            r30  0xc0000021c0	r31  0x100a82b8
            pc   0x100049b4	ctr  0x0
            link 0x1000497c	xer  0x0
            ccr  0x24424488	trap 0xc00

        crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output
2024-01-31 23:21 linux-ppc64le-power10osu go@2f6a25f4 runtime.TestSegv (log)
--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/SegvInCgo (0.03s)
        crash_cgo_test.go:656: /workdir/tmp/go-build315738768/testprogcgo.exe SegvInCgo: exit status 2
        crash_cgo_test.go:657: fatal error: schedule: holding locks
            SIGSEGV: segmentation violation
            PC=0x100821b8 m=7 sigcode=0 addr=0x7e8678000010

            goroutine 0 gp=0xc0001021c0 m=7 mp=0xc000100008 [idle]:
            runtime.rtsigprocmask(0x2, 0x7e86819fe2e0, 0x0, 0x8)
            	/workdir/go/src/runtime/sys_linux_ppc64x.s:404 +0x18 fp=0x7e86819fe2a0 sp=0x7e86819fe2a0 pc=0x100821b8
...
            r22  0x7e86811feb38	r23  0x80e8e0
            r24  0x4dd50db0	r25  0x10145ad0
            r26  0x7e86832edd40	r27  0x0
            r28  0x7e86819ff4bc	r29  0x0
            r30  0xc0001021c0	r31  0x1005e948
            pc   0x100821b8	ctr  0x0
            link 0x1005ebac	xer  0x0
            ccr  0x22424422	trap 0xc00

        crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output

watchflakes

@bcmills bcmills changed the title runtime: TestSegv failures runtime: TestSegv failures with schedule: holding locks Feb 1, 2024
@prattmic
Copy link
Member

prattmic commented Feb 1, 2024

This definitely looks like a recent regression. I'm taking a look for a potential cause.

@prattmic prattmic self-assigned this Feb 1, 2024
@prattmic
Copy link
Member

prattmic commented Feb 1, 2024

This is reproducible with go test -run=TestSegv/TgkillSegvInCgo -count=10000 runtime (usually far less than 10k, but I wanted to be sure for the bisect).

Bisect blames https://go.dev/cl/546135.

I think the problem might be that we allow async preemption with locks held ((!canPreemptM(mp) && !noResume)), but asyncPreempt calls into the scheduler.

cc @cherrymui

@gopherbot
Copy link
Author

Change https://go.dev/cl/560455 mentions this issue: Revert "runtime: preempt more aggressively when panicking"

gopherbot pushed a commit that referenced this issue Feb 1, 2024
This reverts CL 546135.

Reason for revert: Causes occasional throw during panic

For #65416.

Change-Id: I78c15637da18f85ede785363b777aa7d1dead3c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/560455
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test ~ `TestSegv` && `schedule: holding locks`
2024-01-31 20:26 gotip-linux-amd64-boringcrypto go@a5fb6560 runtime.TestSegv/TgkillSegvInCgo (log)
=== RUN   TestSegv/TgkillSegvInCgo
=== PAUSE TestSegv/TgkillSegvInCgo
=== CONT  TestSegv/TgkillSegvInCgo
    crash_cgo_test.go:656: /home/swarming/.swarming/w/ir/x/t/go-build2174216024/testprogcgo.exe TgkillSegvInCgo: exit status 2
    crash_cgo_test.go:657: fatal error: schedule: holding locks
        SIGSEGV: segmentation violation
        PC=0x404f4e m=4 sigcode=18446744073709551610 addr=0x3e800007fb5
        
        goroutine 1 gp=0xc0000061c0 m=4 mp=0xc00006b808 [running]:
        runtime/internal/syscall.Syscall6()
...
        r14    0xc0000061c0
        r15    0x1
        rip    0x404f4e
        rflags 0x212
        cs     0x33
        fs     0x0
        gs     0x0
        
    crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output
--- FAIL: TestSegv/TgkillSegvInCgo (0.05s)
2024-01-31 20:51 gotip-linux-amd64 go@5c0d0929 runtime.TestSegv/TgkillSegvInCgo (log)
=== RUN   TestSegv/TgkillSegvInCgo
=== PAUSE TestSegv/TgkillSegvInCgo
=== CONT  TestSegv/TgkillSegvInCgo
    crash_cgo_test.go:656: /home/swarming/.swarming/w/ir/x/t/go-build3106397352/testprogcgo.exe TgkillSegvInCgo: exit status 2
    crash_cgo_test.go:657: fatal error: schedule: holding locks
        SIGSEGV: segmentation violation
        PC=0x404f4e m=4 sigcode=18446744073709551610 addr=0xe31a
        
        goroutine 1 gp=0xc0000061c0 m=4 mp=0xc00006b808 [running]:
        runtime/internal/syscall.Syscall6()
...
        r14    0xc0000061c0
        r15    0xd
        rip    0x404f4e
        rflags 0x212
        cs     0x33
        fs     0x0
        gs     0x0
        
    crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output
--- FAIL: TestSegv/TgkillSegvInCgo (0.08s)
2024-01-31 20:51 gotip-linux-arm64 go@5c0d0929 runtime.TestSegv/TgkillSegvInCgo (log)
=== RUN   TestSegv/TgkillSegvInCgo
=== PAUSE TestSegv/TgkillSegvInCgo
=== CONT  TestSegv/TgkillSegvInCgo
    crash_cgo_test.go:656: /home/swarming/.swarming/w/ir/x/t/go-build2401246298/testprogcgo.exe TgkillSegvInCgo: exit status 2
    crash_cgo_test.go:657: fatal error: schedule: holding locks
        SIGSEGV: segmentation violation
        PC=0x403ec0 m=5 sigcode=18446744073709551610 addr=0x0
        
        goroutine 1 gp=0x40000021c0 m=5 mp=0x4000088008 [running]:
        runtime/internal/syscall.Syscall6(0x83, 0x6b391, 0x6b3a8, 0xb, 0x0, 0x0, 0x0)
...
        r27     0x703000
        r28     0x40000021c0
        r29     0x4000062d68
        lr      0x403e7c
        sp      0x4000062d70
        pc      0x403ec0
        fault   0x0
        
    crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output
--- FAIL: TestSegv/TgkillSegvInCgo (0.04s)
2024-01-31 22:40 gotip-linux-amd64-longtest-race go@e069587c runtime.TestSegv/TgkillSegvInCgo (log)
=== RUN   TestSegv/TgkillSegvInCgo
=== PAUSE TestSegv/TgkillSegvInCgo
=== CONT  TestSegv/TgkillSegvInCgo
    crash_cgo_test.go:656: /home/swarming/.swarming/w/ir/x/t/go-build4182855096/testprogcgo.exe TgkillSegvInCgo: exit status 2
    crash_cgo_test.go:657: fatal error: schedule: holding locks
        SIGSEGV: segmentation violation
        PC=0x404f4e m=3 sigcode=18446744073709551610 addr=0x3e800037666
        
        goroutine 1 gp=0xc0000061c0 m=3 mp=0xc00006b008 [running]:
        runtime/internal/syscall.Syscall6()
...
        r14    0xc0000061c0
        r15    0x3
        rip    0x404f4e
        rflags 0x212
        cs     0x33
        fs     0x0
        gs     0x0
        
    crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output
--- FAIL: TestSegv/TgkillSegvInCgo (0.02s)
2024-01-31 22:40 gotip-linux-amd64-staticlockranking go@e069587c runtime.TestSegv/TgkillSegvInCgo (log)
=== RUN   TestSegv/TgkillSegvInCgo
=== PAUSE TestSegv/TgkillSegvInCgo
=== CONT  TestSegv/TgkillSegvInCgo
    crash_cgo_test.go:656: /home/swarming/.swarming/w/ir/x/t/go-build2594685541/testprogcgo.exe TgkillSegvInCgo: exit status 2
    crash_cgo_test.go:657: fatal error: schedule: holding locks
        SIGSEGV: segmentation violation
        PC=0x404f4e m=5 sigcode=18446744073709551610 addr=0x3e80000537e
        
        goroutine 1 gp=0xc0000061c0 m=5 mp=0xc000100008 [running]:
        runtime/internal/syscall.Syscall6()
...
        r14    0xc0000061c0
        r15    0x9
        rip    0x404f4e
        rflags 0x212
        cs     0x33
        fs     0x0
        gs     0x0
        
    crash_cgo_test.go:688: unexpectedly saw "fatal error: " in output
--- FAIL: TestSegv/TgkillSegvInCgo (0.04s)

watchflakes

@cherrymui
Copy link
Member

Culprit CL reverted. Remove "Soon" label.

@cherrymui cherrymui removed the Soon This needs to be done soon. (regressions, serious bugs, outages) label Feb 6, 2024
@mknyszek mknyszek added this to the Go1.23 milestone Feb 7, 2024
@mknyszek mknyszek assigned cherrymui and unassigned prattmic Feb 7, 2024
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
This reverts CL 546135.

Reason for revert: Causes occasional throw during panic

For golang#65416.

Change-Id: I78c15637da18f85ede785363b777aa7d1dead3c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/560455
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: No status
Development

No branches or pull requests

6 participants