Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: STW GC stops working on arm64/mips64le #38404

Closed
mengzhuo opened this issue Apr 13, 2020 · 9 comments
Closed

runtime: STW GC stops working on arm64/mips64le #38404

mengzhuo opened this issue Apr 13, 2020 · 9 comments
Assignees
Milestone

Comments

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented Apr 13, 2020

What version of Go are you using (go version)?

$ go version
2545323c633136a28aa57b000b81e95780cbac13

Does this issue reproduce with the latest release?

no. Go1.13 doing fine.

What operating system and processor architecture are you using (go env)?

linux/arm64
linux/mips64le

What did you do?

 
import (
        "fmt"
)
 
func main() {
        for i := 0; i < 10000; i++ {
                n := 150
                r := deferHeapAndStack(n)
                if r != n*2 {
                        fmt.Println(i, r)
                }
        }
}
 
// deferHeapAndStack(n) computes 2*n
func deferHeapAndStack(n int) (r int) {
        if n == 0 {
                return 0
        }
 
        if n%2 == 0 {
                // heap-allocated defers
                for i := 0; i < 2; i++ {
                        defer func() {
                                r++
                        }()
                }
        } else {
                // stack-allocated defers
                defer func() {
                        r++
                }()
                defer func() {
                        r++
                }()
        }
 
        r = deferHeapAndStack(n - 1)
        escapeMe(new([1024]byte)) // force some GCs
        return
}
 
// Pass a value to escapeMe to force it to escape.
var escapeMe = func(x interface{}) {}

What did you see instead?

GODEBUG='gctrace=1,gcstoptheworld=1' ./defer


runtime: checkdead: find g 1 in status 1
fatal error: checkdead: runnable g

runtime stack:
runtime.throw(0xd0caf, 0x15)
        /root/godev/src/runtime/panic.go:1116 +0x54
runtime.checkdead()
        /root/godev/src/runtime/proc.go:4489 +0x43c
runtime.mput(...)
        /root/godev/src/runtime/proc.go:4906
runtime.stopm()
        /root/godev/src/runtime/proc.go:1879 +0x9c
runtime.findrunnable(0x4000022800, 0x0)
        /root/godev/src/runtime/proc.go:2413 +0xa2c
runtime.schedule()
        /root/godev/src/runtime/proc.go:2613 +0x31c
runtime.preemptPark(0x4000000180)
        /root/godev/src/runtime/proc.go:2851 +0xb4
runtime.newstack()
        /root/godev/src/runtime/stack.go:1026 +0x1e0
runtime.morestack()
        /root/godev/src/runtime/asm_arm64.s:308 +0x70

goroutine 1 [runnable]:
main.deferHeapAndStack(0x281, 0xd5d90)
        /root/defer.go:41 +0xd8
main.deferHeapAndStack(0x282, 0x40004979b0)
        /root/defer.go:40 +0xbc
main.deferHeapAndStack(0x283, 0xd5d90)
        /root/defer.go:40 +0xbc
main.deferHeapAndStack(0x284, 0x40004979a0)

Might relate to #34736

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Apr 13, 2020

@shawn-xdji
Copy link
Contributor

@shawn-xdji shawn-xdji commented Apr 14, 2020

Hi @mengzhuo , is it a random issue?

I tried on several arm64 machines with both 2545323 and the latest commit 5a706d1 but failed to reproduce the crash.
But I did observe a (random) deadlock if running your case by 'go run'.

@mengzhuo
Copy link
Contributor Author

@mengzhuo mengzhuo commented Apr 14, 2020

@shawn-xdj I can reproduce on my linux/arm64 box at 201cb04

Did you run with GODEBUG='gcstoptheworld=1' ?

@shawn-xdji
Copy link
Contributor

@shawn-xdji shawn-xdji commented Apr 14, 2020

Edit: I could reproduce the same crash on both x86 and arm64 machines, with a few commits checked in this month.

===============================================================
@mengzhuo Yes, I followed your steps, 'go build <test.go>', 'GODEBUG='gcstoptheworld=1' '.
Just verified your case on all available machines, running 100 times each, arm64 didn't witness any problem, but the same crash was hit on one of x86 machines.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 15, 2020

I can reproduce, and with GOTRACEBACK=crash it looks like this is coming from the following chain of events (at least on linux/amd64):

  1. A G allocates, and at the end of allocation realizes it needs to start a GC.
  2. It acquires worldsema, stops the world, does everything it needs to do, disables user goroutine scheduling due to GODEBUG=gcstoptheworld=1, and starts the world again.
  3. It goes to release worldsema and notices a preemption request. It then goes into the scheduler becomes runnable, but user goroutine scheduling is disabled so it stops its M, checkdead is called, all Ms are idle, and there's a runnable G left over -> assumes deadlock.

It likely relates to the fact that we release worldsema now instead of holding onto it across the mark phase. It's possible that the problem here is just that there's a preemption point before the worldsema release now where there wasn't before. Maybe the trick is this comment:

// In STW mode, we could block the instant systemstack
// returns, so don't do anything important here. Make sure we
// block rather than returning to user code.

That systemstack block referred to there is the one containing startTheWorldWithSema. Perhaps the semrelease should happen on the system stack? It's a little tricky because there's a goyield inside semrelease, but luckily that codepath will never get hit because handoff=false. That's a little fragile, so we could instead defensively increment acquirem to enforce that there are no preemption points between the startTheWorld and Gosched.

I'll try that and see if I can still reproduce the problem.

@mknyszek mknyszek self-assigned this Apr 15, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 15, 2020

Change https://golang.org/cl/228337 mentions this issue: runtime: prevent preemption while releasing worldsema in gcStart

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 15, 2020

Yeah I think I have a fix. Hasn't crashed yet. @mengzhuo can you verify?

@mknyszek mknyszek added this to the Go1.15 milestone Apr 15, 2020
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 15, 2020

Also, this is related to #19812, noting it for bookkeeping purposes.

@gopherbot gopherbot closed this in 03ba6b0 Apr 16, 2020
@mengzhuo
Copy link
Contributor Author

@mengzhuo mengzhuo commented Apr 16, 2020

Yeah I think I have a fix. Hasn't crashed yet. @mengzhuo can you verify?

Thanks your CL, 03ba6b0 fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.