-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: framepointer-based stack unwinding omits memmove's callee #58835
Comments
Is this fixed by https://go.dev/cl/466316? |
I don't think so. Do you know if C memmove saves frame pointer (I guess it depends on implementation) and whether |
It's not fixed by CL 466316 (details below).
For what it's worth, I came across this while asking This would affect a question @aclements had elsewhere recently about whether, when using I don't know what performance is saved by not saving the frame pointer, but I want to make the point that the costs of various forms of work that take place in those frameless leaf functions can add up. I don't know my way around stack management, so maybe this is a nonsense suggestion: is there room for a middle-ground where the caller carves out an extra word of stack space (as it would for an additional Thanks!
|
Maybe the heuristics for this could be extended to take the "complexity" of the leaf function into account, similar to how it is done for inlining? Not sure how feasible this is, but looking at memmove's implementation, I suspect it wouldn't cause a noticeable perf hit if it pushed a frame pointer. But I haven't measured that 😅.
Not sure if this could work. I think it would cause the positions of the return addr and the frame pointer to be reversed on the stack. Frame pointer unwinders rely on the relative position of these two values on the stack. @rhysh quick question: Have you tried using the DWARF unwinder in perf for this? |
Thanks for the suggestion, but I think we'd want to keep the assembler simple and avoid surprises (e.g. it doesn't save frame pointer for a 50-instruction function, but suddenly does for a 51-instruction function?) If you think it is helpful, it wouldn't be hard to manually save the frame pointer, or declare a nonzero frame size (if this comes often, we can think about adding a new text flag, e.g. HAVEFRAME, to force saving frame pointer). |
Agree with @cherrymui, frameless leaf functions should remain frameless, for performance and code size. AFAIK, |
Indeed @cherrymui , changing the end of the TEXT directive from What sort of data would motivate accepting this change to
I haven't used that since Here's part of what I get from
And with @cherrymui 's suggested workaround and
|
go run omits DWARF info, but I'm not sure about
I think Either way, I'm +1 on your proposed change to memmove if it doesn't cause noticeable performance regressions. |
If benchmark results show minimal performance impact, and it helps debugging/profiling, I think it would be acceptable.
|
In C&RT triage, @rhysh are you planning on working on this? Assigning to you optimistically but feel free to unassign. |
This had been on my backlog, but I'm no longer planning to work on it. |
It looks like the immediate caller of
runtime.memmove
is reported correctly inruntime/pprof
profiles, but is missing inperf
's framepointer-based unwinding. I don't know whether this affects callers of other assembly functions, or if it's specific tomemmove
. CC @golang/runtime , and probably also of interest to @felixgeWhat version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, go1.20.1 is the latest stable release. I first identified this behavior in production use with go1.19.6.
What operating system and processor architecture are you using (
go env
)?For the reproducer, I'm cross-compiling from darwin/arm64 to linux/amd64. The production behavior I'm reproducing was with linux/amd64 builds for linux/amd64.
go env
OutputWhat did you do?
I used Linux's
perf
to report collect data on an app's performance using the-g
flag to collect stack traces using the frame pointer.What did you expect to see?
I expected
go tool pprof
andperf record -g
+perf script
to agree on the call stacks, including the identity of the immediate caller ofruntime.memmove
. I expected the function namedcommand-line-arguments.leaf
to show up in the call stack, with an immediate caller ofcommand-line-arguments.root
and an immediate callee ofruntime.memmove
.What did you see instead?
With
go tool pprof
, I see reports of CPU time spent inruntime.memmove
, called bycommand-line-arguments.leaf
, called bycommand-line-arguments.root
, as I expect.With
perf record -g
andperf script
, I see the on-CPU time attributed toruntime.memmove
as called bycommand-line-arguments.root
. The...leaf
frame is missing, which I consider to be a bug.The text was updated successfully, but these errors were encountered: