New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use frame pointers for callers #16638

Open
dvyukov opened this Issue Aug 8, 2016 · 6 comments

Comments

Projects
None yet
8 participants
@dvyukov
Member

dvyukov commented Aug 8, 2016

Traceback is the main source of slowdown for tracer. On net/http.BenchmarkClientServerParallel4:
BenchmarkClientServerParallel4-6 200000 10627 ns/op 4482 B/op 57 allocs/op
with tracer:
BenchmarkClientServerParallel4-6 200000 16444 ns/op 4482 B/op 57 allocs/op
That's +55% slowdown. Top functions of profile are:
6.09% http.test http.test [.] runtime.pcvalue
5.88% http.test http.test [.] runtime.gentraceback
5.41% http.test http.test [.] runtime.readvarint
4.31% http.test http.test [.] runtime.findfunc
2.98% http.test http.test [.] runtime.step
2.12% http.test http.test [.] runtime.mallocgc

runtime.callers/gcallers/Callers are not interested in frame/func/sp/args/etc for each frame, they only need PC values. PC values can be obtained using frame pointers, which must be much faster. Note that there calls are always synchronous (can't happen during function prologue or in the middle of goroutine switch), so should be much simpler to handle.

We should use frame pointers in runtime.callers.

@aclements @ianlancetaylor @hyangah

@quentinmit quentinmit added this to the Go1.8 milestone Aug 8, 2016

@randall77

This comment has been minimized.

Contributor

randall77 commented Aug 11, 2016

That could work, and be much faster.
My only worry is that at some point we're going to tackle the "inline non-leaf functions" problem and then just the list of PCs from the frame pointer walk won't be enough. We'll need to somehow expand PCs that correspond to call sites that have been inlined into other functions. I'm not sure how that would work without doing everything the generic gentraceback is doing (findfunc, mainly).

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Aug 11, 2016

There is no need to handle inlining at traceback time.  Today, each PC corresponds to a single file/line/function.  When we can inline non-leaf functions, each PC corresponds to a list of file/line/function tuples.  At traceback time, we only need that PC. At symbolization time, we need to expand that PC into a list of file/line/function tuples. There is already code for handling this in runtime.Frames.Next, we just need to move from FuncForPC to something that can return multiple file/line/function tuples.

The point is, traceback can be fast while still supporting non-leaf inlined functions when we interpret the traceback.

@gopherbot

This comment has been minimized.

gopherbot commented Dec 1, 2016

CL https://golang.org/cl/33754 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 1, 2016

cmd/compile: generate frame pointers for otherwise frameless functions
func f() {
    g()
}

We mistakenly don't add a frame pointer for f.  This means f
isn't seen when walking the frame pointer linked list.  That
matters for kernel-gathered profiles, and is an impediment for
issues like #16638.

To fix, allocate a stack frame even for otherwise frameless functions
like f.  It is a bit tricky because we need to avoid some runtime
internals that really, really don't want one.

No test at the moment, as only kernel CPU profiles would catch it.
Tests will come with the implementation of #16638.

Fixes #18103

Change-Id: I411206cc9de4c8fdd265bee2e4fa61d161ad1847
Reviewed-on: https://go-review.googlesource.com/33754
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot

This comment has been minimized.

gopherbot commented Dec 2, 2016

CL https://golang.org/cl/33895 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 7, 2016

runtime: on stack copy, adjust BP
When we copy the stack, we need to adjust all BPs.
We correctly adjust the ones on the stack, but we also
need to adjust the one that is in g.sched.bp.

Like CL 33754, no test as only kernel-gathered profiles will notice.
Tests will come (in 1.9) with the implementation of #16638.

The invariant should hold that every frame pointer points to
somewhere within its stack.  After this CL, it is mostly true, but
something about cgo breaks it.  The runtime checks are disabled
until I figure that out.

Update #16638
Fixes #18174

Change-Id: I6023ee64adc80574ee3e76491d4f0fa5ede3dbdb
Reviewed-on: https://go-review.googlesource.com/33895
Reviewed-by: Austin Clements <austin@google.com>

@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.9Early May 3, 2017

@bradfitz bradfitz added the Performance label May 3, 2017

@josharian

This comment has been minimized.

Contributor

josharian commented Jun 1, 2017

CL 43150 may also help speed up tracing; that list of hot functions looks familiar from when I was working on that CL.

@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017

@gopherbot

This comment has been minimized.

gopherbot commented Sep 21, 2017

Change https://golang.org/cl/33809 mentions this issue: runtime: use frame pointers for callers

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jun 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment