Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: mayMoreStackPreempt tests flaky after stack frame CLs #54885

Closed
prattmic opened this issue Sep 6, 2022 · 6 comments
Closed

runtime: mayMoreStackPreempt tests flaky after stack frame CLs #54885

prattmic opened this issue Sep 6, 2022 · 6 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@prattmic
Copy link
Member

prattmic commented Sep 6, 2022

greplogs --dashboard -md -l -e 'SignalInVDSO'

2022-09-05T21:39:28-4c1ca42/linux-386-longtest
2022-09-05T08:28:34-bd5595d/linux-386-longtest
2022-09-05T08:08:24-af7f417/linux-386-longtest
2022-09-05T08:08:18-4ad55cd/linux-386-longtest
2022-09-05T08:07:47-357b922/linux-386-longtest
2022-09-05T07:17:56-4e7e7ae/linux-386-longtest
2022-09-05T07:14:08-3fbcf05/linux-386-longtest
2022-09-04T04:17:04-535fe2b/linux-386-longtest
2022-09-03T18:21:45-a0f0582/linux-386-longtest
2022-09-03T15:45:36-f798dc6/linux-386-longtest
2022-09-02T19:22:26-0fda8b1/linux-386-longtest
2022-09-02T19:09:03-55ca6a2/linux-386-longtest
2022-09-02T19:08:56-b91e373/linux-386-longtest
2022-09-02T19:08:53-dbf442b/linux-386-longtest

##### maymorestack=mayMoreStackPreempt
--- FAIL: TestVDSO (0.43s)
    testenv.go:468: [/workdir/tmp/go-build2185756284/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:
        
        
        wanted:
        success
FAIL
FAIL	runtime	95.198s
FAIL

The first failure occurred on https://go.dev/cl/424514. It is flaky, so it may not be precisely that CL at fault, but it seems likely that one of the stack frame CLs is to blame.

@prattmic prattmic added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Sep 6, 2022
@prattmic prattmic added this to the Go1.20 milestone Sep 6, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 6, 2022
@aclements
Copy link
Member

aclements commented Sep 6, 2022

@aclements aclements changed the title runtime: TestVDSO flaky with mayMoreStackPreempt on linux-386 after stack frame CLs runtime: mayMoreStackPreempt tests flaky after stack frame CLs Sep 6, 2022
@aclements
Copy link
Member

aclements commented Sep 6, 2022

Quick reproducer:

cd runtime/testdata/testprogcgo
go build -gcflags=runtime=-d=maymorestack=runtime.mayMoreStackPreempt
./testprogcgo CgoPprofCallback

This fails ~25% on linux/amd64 at tip.

(Note that, relative to what dist passes, -asmflags isn't necessary, and we only need to cover the runtime package itself, not the several other packages that dist does.)

@aclements
Copy link
Member

aclements commented Sep 6, 2022

Bisecting points at 511cd9b (CL 424257), which is a bit surprising because I was pretty sure that couldn't affect any existing logic. I'll dig into it.

@aclements
Copy link
Member

aclements commented Sep 6, 2022

I found the cause:

  1. We force a preemption on g1. (This could happen in any test, but mayMoreStackPreempt makes it way more likely, which is great, because it means that test mode is doing what it's supposed to!)
  2. The preemption goes into the scheduler, which picks another G, g2, to run and enters execute.
  3. execute sets mp.curg = g2, but hasn't yet set g2.m = mp.
  4. A profiling signal comes in and starts a stack trace with _TraceJumpStack.
  5. Traceback sees the morestack from the preemption, and sees that curg != nil, so it follows curg into g2. This is a little weird because g2 hasn't really started running yet, but should be harmless.
  6. On the next frame, gentraceback tries to check gp.m.g0, but g2.m is nil because g2 is not quite running yet, so it segfaults. Since we're in a signal, the process goes down hard.

@gopherbot
Copy link

gopherbot commented Sep 6, 2022

Change https://go.dev/cl/428656 mentions this issue: runtime: in traceback, only jump stack if M doesn't change

@mknyszek
Copy link
Contributor

mknyszek commented Sep 6, 2022

There are a number of related failures from this week's triage, but AFAICT they were all from more than 6 hours ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
Status: Done
Development

No branches or pull requests

4 participants