Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: 'fatal: morestack on g0' on FreeBSD amd64 with PGO #62489

Closed
elindsey opened this issue Sep 7, 2023 · 9 comments
Closed

cmd/compile: 'fatal: morestack on g0' on FreeBSD amd64 with PGO #62489

elindsey opened this issue Sep 7, 2023 · 9 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@elindsey
Copy link

elindsey commented Sep 7, 2023

What version of Go are you using (go version)?

$ go version
1.21.1, cross compiling on a Linux host to CGO_ENABLED=0 GOOS=freebsd GOARCH=amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

FreeBSD, amd64

What did you do?

Unfortunately I can't share the original binary and I'm having trouble getting it to reproduce in a smaller program. I apologize, I know it's not ideal and might be intractable without - still working on a smaller repro.

We have a server process that runs on FreeBSD/amd64, does not use cgo. It was recently upgraded to go 1.21.1, and I enabled PGO to see what would happen (gathering a profile from the existing go 1.20.7 production deployment).

The produced binary panics immediately on startup. 1.21.1 without PGO works fine on FreeBSD/amd64, go 1.20.7 with PGO and the same profile on FreeBSD/amd64 works fine, and 1.21.1 with PGO works fine on both Linux/amd64 and macOS/arm64.

Backtrace follows:

(gdb) run
Starting program: /var/svm/f 
[New LWP 363928 of process 68756]
[New LWP 363929 of process 68756]
[New LWP 363930 of process 68756]
[New LWP 363931 of process 68756]
fatal: morestack on g0

Thread 4 received signal SIGTRAP, Trace/breakpoint trapwarning: could not convert 'si_code' from the host encoding (ISO-8859-1) to UTF-32.
This normally should not happen, please file a bug report.
.
Breakpoint.
[Switching to LWP 363930 of process 68756]
0x0000000000478186 in ?? ()
(gdb) bt
#0  0x0000000000478186 in ?? ()
#1  0x0000000000476585 in runtime.morestack () at .goroot/1.21.1/src/runtime/asm_amd64.s:560
#2  0x000000000043a490 in runtime.netpoll (delay=, ~r0=...)
    at .goroot/1.21.1/src/runtime/netpoll_kqueue.go:121
#3  0x000000000044633f in runtime.findRunnable (gp=, inheritTime=, 
    tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3191
#4  0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582
#5  runtime.park_m (gp=0xc000006ea0) at .goroot/1.21.1/src/runtime/proc.go:3745
#6  0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#7  0x0000000000000000 in ?? ()
(gdb) thread apply all bt

Thread 5 (LWP 363931 of process 68756):
#0  runtime.sys_umtx_op () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:57
#1  0x000000000043a8b3 in runtime.futexsleep1 (addr=, val=0x0, ns=) at .goroot/1.21.1/src/runtime/os_freebsd.go:174
#2  0x000000000040c0fe in runtime.notesleep.futexsleep.func1 () at .goroot/1.21.1/src/runtime/os_freebsd.go:162
#3  0x000000000040c067 in runtime.futexsleep (ns=0xffffffffffffffff, addr=, val=) at .goroot/1.21.1/src/runtime/os_freebsd.go:161
#4  runtime.notesleep (n=0xc000100150) at .goroot/1.21.1/src/runtime/lock_futex.go:160
#5  0x0000000000444bca in runtime.mPark () at .goroot/1.21.1/src/runtime/proc.go:1632
#6  runtime.stopm () at .goroot/1.21.1/src/runtime/proc.go:2536
#7  0x000000000044667e in runtime.findRunnable (gp=, inheritTime=, tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3229
#8  0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582
#9  runtime.park_m (gp=0xc0001024e0) at .goroot/1.21.1/src/runtime/proc.go:3745
#10 0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#11 0x0000000000000000 in ?? ()

Thread 4 (LWP 363930 of process 68756):
#0  0x0000000000478186 in ?? ()
#1  0x0000000000476585 in runtime.morestack () at .goroot/1.21.1/src/runtime/asm_amd64.s:560
#2  0x000000000043a490 in runtime.netpoll (delay=, ~r0=...) at .goroot/1.21.1/src/runtime/netpoll_kqueue.go:121
#3  0x000000000044633f in runtime.findRunnable (gp=, inheritTime=, tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3191
#4  0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582
#5  runtime.park_m (gp=0xc000006ea0) at .goroot/1.21.1/src/runtime/proc.go:3745
#6  0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#7  0x0000000000000000 in ?? ()

Thread 3 (LWP 363929 of process 68756):
#0  runtime.sys_umtx_op () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:57
#1  0x000000000043a8b3 in runtime.futexsleep1 (addr=, val=0x0, ns=) at .goroot/1.21.1/src/runtime/os_freebsd.go:174
#2  0x000000000040c0fe in runtime.notesleep.futexsleep.func1 () at .goroot/1.21.1/src/runtime/os_freebsd.go:162
#3  0x000000000040c067 in runtime.futexsleep (ns=0xffffffffffffffff, addr=, val=) at .goroot/1.21.1/src/runtime/os_freebsd.go:161
#4  runtime.notesleep (n=0xc000080550) at .goroot/1.21.1/src/runtime/lock_futex.go:160
#5  0x0000000000444bca in runtime.mPark () at .goroot/1.21.1/src/runtime/proc.go:1632
#6  runtime.stopm () at .goroot/1.21.1/src/runtime/proc.go:2536
#7  0x000000000044570a in runtime.startlockedm (gp=) at .goroot/1.21.1/src/runtime/proc.go:2808
#8  0x0000000000448c13 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3628
#9  runtime.park_m (gp=0xc000006d00) at .goroot/1.21.1/src/runtime/proc.go:3745
#10 0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#11 0x0000000000000000 in ?? ()

Thread 2 (LWP 363928 of process 68756):
#0  runtime.usleep () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:477
#1  0x000000000044d84b in runtime.sysmon () at .goroot/1.21.1/src/runtime/proc.go:5528
#2  0x00000000004434d3 in runtime.mstart1 () at .goroot/1.21.1/src/runtime/proc.go:1600
#3  0x0000000000443416 in runtime.mstart0 () at .goroot/1.21.1/src/runtime/proc.go:1557
#4  0x0000000000476405 in runtime.mstart () at .goroot/1.21.1/src/runtime/asm_amd64.s:394
#5  0x0000000000479aae in runtime.thr_start () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:86
#6  0x0000000000000000 in ?? ()

Thread 1 (LWP 101416 of process 68756):
#0  0x000000000042e4b1 in runtime.(*mheap).initSpan (h=0xc3d1a0 , s=0x8477665f0, typ=0x0, spanclass=0x4b, base=, npages=0x2) at .goroot/1.21.1/src/runtime/mheap.go:1404
#1  0x000000000042e1f3 in runtime.(*mheap).allocSpan (h=0xc3d1a0 , npages=0x2, typ=0x0, spanclass=0x4b, s=) at .goroot/1.21.1/src/runtime/mheap.go:1344
#2  0x0000000000419a7f in runtime.(*mcentral).grow.(*mheap).alloc.func1 () at .goroot/1.21.1/src/runtime/mheap.go:968
#3  0x000000000047650a in runtime.systemstack () at .goroot/1.21.1/src/runtime/asm_amd64.s:509
#4  0x00007fffffffe9c8 in ?? ()
#5  0x000000000047a93f in runtime.newproc (fn=0x47638f ) at :1
#6  0x0000000000476405 in runtime.mstart () at .goroot/1.21.1/src/runtime/asm_amd64.s:394
#7  0x000000000047638f in runtime.rt0_go () at .goroot/1.21.1/src/runtime/asm_amd64.s:358
#8  0x0000000000000001 in ?? ()
#9  0x00007fffffffea18 in ?? ()
#10 0x0000000000000000 in ?? ()

Nothing there looks like user code to me. On some runs I do see a few things starting to get runtime.doInit()'d in the stacks (some compiled regex and so on), but this seems to panic very early. While I try to get a smaller repro, are there any things in the stack that jump out, or any suggestions on how to debug this?

@bcmills bcmills added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 7, 2023
@cherrymui cherrymui changed the title 'fatal: morestack on g0' on FreeBSD amd64 with PGO cmd/compile: 'fatal: morestack on g0' on FreeBSD amd64 with PGO Sep 7, 2023
@cherrymui
Copy link
Member

Thanks for report!

(gdb) bt
#0  0x0000000000478186 in ?? ()
#1  0x0000000000476585 in runtime.morestack () at .goroot/1.21.1/src/runtime/asm_amd64.s:560
#2  0x000000000043a490 in runtime.netpoll (delay=, ~r0=...)
    at .goroot/1.21.1/src/runtime/netpoll_kqueue.go:121
#3  0x000000000044633f in runtime.findRunnable (gp=, inheritTime=, 
    tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3191
#4  0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582
#5  runtime.park_m (gp=0xc000006ea0) at .goroot/1.21.1/src/runtime/proc.go:3745
#6  0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#7  0x0000000000000000 in ?? ()

This is interesting. The stack looks totally valid, not sure why it calls morestack... Could you print the SP at frame 2 (the runtime.netpoll frame, and perhaps other frames as well) in GDB, and also dump the content of the G structure pointed by R14 register (something like x/10a $r14)? Thanks.

@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 7, 2023
@cherrymui cherrymui added this to the Go1.22 milestone Sep 7, 2023
@elindsey
Copy link
Author

elindsey commented Sep 7, 2023

I forgot to save the core last time, so this is a new execution but same backtrace. Let me know if that got all the info you were looking for!

GH was interpreting some of the <> as html tags, even in a pre block - so I put it in a gist. https://gist.github.com/elindsey/3959c40c20360d41a49f0bd3e6b5074b

@cherrymui
Copy link
Member

Thanks! The SP and stack look quite valid.

(gdb) x/10a $r14
0xc0000071e0:	0xc00008a000	0xc00008c000

This is g.stack.lo and g.stack.hi, i.e. the stack bounds. It has 8 KB in size, which matches https://cs.opensource.google/go/go/+/master:src/runtime/proc.go;l=1941 (as this is a non-cgo program). 8 KB g0 stack looks rather small to me. Maybe due to PGO the stack frames are larger and just pushes it over the limit... Maybe we should increase the g0 stack size a bit...

@cherrymui
Copy link
Member

@elindsey could you try if just increasing the g0 stack size to 16 KB would fix the issue? That is, apply this patch

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index 9fd200ea32..afb33c1e8b 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -1543,7 +1543,7 @@ func mstart0() {
 		// but is somewhat arbitrary.
 		size := gp.stack.hi
 		if size == 0 {
-			size = 8192 * sys.StackGuardMultiplier
+			size = 16384 * sys.StackGuardMultiplier
 		}
 		gp.stack.hi = uintptr(noescape(unsafe.Pointer(&size)))
 		gp.stack.lo = gp.stack.hi - size + 1024
@@ -1939,7 +1939,7 @@ func allocm(pp *p, fn func(), id int64) *m {
 	if iscgo || mStackIsSystemAllocated() {
 		mp.g0 = malg(-1)
 	} else {
-		mp.g0 = malg(8192 * sys.StackGuardMultiplier)
+		mp.g0 = malg(16384 * sys.StackGuardMultiplier)
 	}
 	mp.g0.m = mp

And rebuild the program with the same profile. Thanks.

@elindsey
Copy link
Author

elindsey commented Sep 7, 2023

Bumping the stack size to 16KB did fix it - I'm no longer getting the crash on startup. 🙂

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/526995 mentions this issue: runtime: increase g0 stack size in non-cgo case

@cherrymui cherrymui self-assigned this Sep 8, 2023
@cherrymui
Copy link
Member

@elindsey thanks for confirming!

Since this issue and #62120 are similar with the same fix, I'll use a single backport issue for both. See #62537. Thanks.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/527055 mentions this issue: [release-branch.go1.21] runtime: increase g0 stack size in non-cgo case

@elindsey
Copy link
Author

elindsey commented Sep 8, 2023

Thank you very much @cherrymui!

gopherbot pushed a commit that referenced this issue Sep 22, 2023
Currently, for non-cgo programs, the g0 stack size is 8 KiB on
most platforms. With PGO which could cause aggressive inlining in
the runtime, the runtime stack frames are larger and could
overflow the 8 KiB g0 stack. Increase it to 16 KiB. This is only
one per OS thread, so it shouldn't increase memory use much.

Updates #62120.
Updates #62489.
Fixes #62537.

Change-Id: I565b154517021f1fd849424dafc3f0f26a755cac
Reviewed-on: https://go-review.googlesource.com/c/go/+/526995
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
(cherry picked from commit c6d550a)
Reviewed-on: https://go-review.googlesource.com/c/go/+/527055
bradfitz pushed a commit to tailscale/go that referenced this issue Sep 25, 2023
Currently, for non-cgo programs, the g0 stack size is 8 KiB on
most platforms. With PGO which could cause aggressive inlining in
the runtime, the runtime stack frames are larger and could
overflow the 8 KiB g0 stack. Increase it to 16 KiB. This is only
one per OS thread, so it shouldn't increase memory use much.

Updates golang#62120.
Updates golang#62489.
Fixes golang#62537.

Change-Id: I565b154517021f1fd849424dafc3f0f26a755cac
Reviewed-on: https://go-review.googlesource.com/c/go/+/526995
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
(cherry picked from commit c6d550a)
Reviewed-on: https://go-review.googlesource.com/c/go/+/527055
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants
@elindsey @bcmills @gopherbot @cherrymui and others