-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: running with delve on arm64 throws nosplit stack over 792 byte limit #64113
Comments
Thanks for filing a bug; I'm aware of this issue but forgot to write it down. I'll fix this before the release candidate. |
I can reproduce that with a simple go build without delve:
|
Sorry for the delay here. We discussed this offline last week and I have a path forward, just haven't gotten around to it. I'll send a change to fix this today. |
Change https://go.dev/cl/544535 mentions this issue: |
@mknyszek Reopening this since I started seeing this again on master, even after your fix, upon running I did some digging and found a fix, happy to submit it if you agree with the changes. I tried looking into using systemstack but there is not much room for it, it seems like the culprit is nanotime in casgstatus. The function wrapper is probably not nice for nosplit, but it is the only way I found to supress the issue, other than running nanotime on the system stack, which is probably not ok either. patchdiff --git a/src/runtime/proc.go b/src/runtime/proc.go
index 6348335804..b9db1cef56 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -1133,22 +1133,30 @@ func casgstatus(gp *g, oldval, newval uint32) {
const yieldDelay = 5 * 1000
var nextYield int64
+ // Wrap nanotime to avoid a nosplit overflow build failure
+ // on some platforms when built with -N -l. See #64113.
+ var nowfunc = func() int64 {
+ return nanotime()
+ }
+
// loop if gp->atomicstatus is in a scan state giving
// GC time to finish and change the state to oldval.
for i := 0; !gp.atomicstatus.CompareAndSwap(oldval, newval); i++ {
if oldval == _Gwaiting && gp.atomicstatus.Load() == _Grunnable {
- throw("casgstatus: waiting for Gwaiting but is Grunnable")
+ systemstack(func() {
+ throw("casgstatus: waiting for Gwaiting but is Grunnable")
+ })
}
if i == 0 {
- nextYield = nanotime() + yieldDelay
+ nextYield = nowfunc() + yieldDelay
}
- if nanotime() < nextYield {
+ if nowfunc() < nextYield {
for x := 0; x < 10 && gp.atomicstatus.Load() != oldval; x++ {
procyield(1)
}
} else {
osyield()
- nextYield = nanotime() + yieldDelay/2
+ nextYield = nowfunc() + yieldDelay/2
}
}
@@ -1173,7 +1181,7 @@ func casgstatus(gp *g, oldval, newval uint32) {
// We transitioned out of runnable, so measure how much
// time we spent in this state and add it to
// runnableTime.
- now := nanotime()
+ now := nowfunc()
gp.runnableTime += now - gp.trackingStamp
gp.trackingStamp = 0
case _Gwaiting:
@@ -1186,7 +1194,7 @@ func casgstatus(gp *g, oldval, newval uint32) {
// a more representative estimate of the absolute value.
// gTrackingPeriod also represents an accurate sampling period
// because we can only enter this state from _Grunning.
- now := nanotime()
+ now := nowfunc()
sched.totalMutexWaitTime.Add((now - gp.trackingStamp) * gTrackingPeriod)
gp.trackingStamp = 0
}
@@ -1197,12 +1205,12 @@ func casgstatus(gp *g, oldval, newval uint32) {
break
}
// Blocking on a lock. Write down the timestamp.
- now := nanotime()
+ now := nowfunc()
gp.trackingStamp = now
case _Grunnable:
// We just transitioned into runnable, so record what
// time that happened.
- now := nanotime()
+ now := nowfunc()
gp.trackingStamp = now
case _Grunning:
// We're transitioning into running, so turn off
syscall.ptrace: nosplit stack over 792 byte limit
|
@mauri870 Thanks. Unfortunately that patch doesn't help because Running |
Thanks for looking into it, I tried running nanotime on the system stack but as you said it impacted performance heavily, so not really a solution we can consider. |
@mknyszek What about wrapping casgstatus itself in systemstack? I see that casgstatus is called on the systemstack in quite a lot of places, maybe that is fine to do in reentersyscall? |
I think we should just wrap the contents of
What did you use to measure it? It occurs to me that most platforms already have to switch to the system stack for At this point, I think my main concern is more about having to add wrappers around quite a few call sites. Since this issue is specific to darwin, handling it in |
I took a look at the code generation for
which spill registers to autotmps but they are literally never used. We should be able to remove those without affecting debugging experience. I wonder for |
Change https://go.dev/cl/545275 mentions this issue: |
I believe I misinterpreted my benchmark results. The one that became slow was wrapping casgstatus in systemstack, I tested the fix and it looks fine. |
Change https://go.dev/cl/545276 mentions this issue: |
On Darwin, the ptrace syscall is called in ptrace1, which then be called in ptrace. This allows ptrace1 be disabled on iOS (by implementing ptrace differently). But we can also achieve this by adding a conditional directly in ptrace. This reduces stack usage with -N -l, while keeping ptrace disabled on iOS. For #64113. Change-Id: I89d8e317e77352fffdbb5a25ba21ee9cdf2e1e20 Reviewed-on: https://go-review.googlesource.com/c/go/+/545276 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
I think this should be fixed now as of Cherry's fix (not mine). |
CL https://go.dev/cl/545276 eliminates the ptrace1 frame, which should save ~80 bytes. I think this should get us back to under the limit. I tried building cmd/go with -N -l and it doesn't overflow. In the next cycle maybe we'll make -N less aggressive on nosplit functions, while keeping a reasonable debugging experience. |
What version of Go are you using (
go version
)?tip
Does this issue reproduce with the latest release?
Only tip
What operating system and processor architecture are you using (
go env
)?darwin/arm64
What did you do?
dlv debug cmd/go # any go program if that matters
What did you expect to see?
delve starts up
What did you see instead?
This was introduced by CL 494187. Based on the comments there and my short investigation it appears that what is causing this is the new tracer allocated on the stack in exitsyscall.
cc @mknyszek
The text was updated successfully, but these errors were encountered: