cmd/compile: enable mid-stack inlining #19348

Open
davidlazar opened this Issue Mar 1, 2017 · 17 comments

Comments

Projects
None yet
10 participants
Member

davidlazar commented Mar 1, 2017

Design doc: https://golang.org/design/19348-midstack-inlining

@davidlazar davidlazar added the Proposal label Mar 1, 2017

@davidlazar davidlazar self-assigned this Mar 1, 2017

CL https://golang.org/cl/37231 mentions this issue.

CL https://golang.org/cl/37233 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 3, 2017

cmd/compile,link: generate PC-value tables with inlining information
In order to generate accurate tracebacks, the runtime needs to know the
inlined call stack for a given PC. This creates two tables per function
for this purpose. The first table is the inlining tree (stored in the
function's funcdata), which has a node containing the file, line, and
function name for every inlined call. The second table is a PC-value
table that maps each PC to a node in the inlining tree (or -1 if the PC
is not the result of inlining).

To give the appearance that inlining hasn't happened, the runtime also
needs the original source position information of inlined AST nodes.
Previously the compiler plastered over the line numbers of inlined AST
nodes with the line number of the call. This meant that the PC-line
table mapped each PC to line number of the outermost call in its inlined
call stack, with no way to access the innermost line number.

Now the compiler retains line numbers of inlined AST nodes and writes
the innermost source position information to the PC-line and PC-file
tables. Some tools and tests expect to see outermost line numbers, so we
provide the OutermostLine function for displaying line info.

To keep track of the inlined call stack for an AST node, we extend the
src.PosBase type with an index into a global inlining tree. Every time
the compiler inlines a call, it creates a node in the global inlining
tree for the call, and writes its index to the PosBase of every inlined
AST node. The parent of this node is the inlining tree index of the
call. -1 signifies no parent.

For each function, the compiler creates a local inlining tree and a
PC-value table mapping each PC to an index in the local tree.  These are
written to an object file, which is read by the linker.  The linker
re-encodes these tables compactly by deduplicating function names and
file names.

This change increases the size of binaries by 4-5%. For example, this is
how the go1 benchmark binary is impacted by this change:

section             old bytes   new bytes   delta
.text               3.49M ± 0%  3.49M ± 0%   +0.06%
.rodata             1.12M ± 0%  1.21M ± 0%   +8.21%
.gopclntab          1.50M ± 0%  1.68M ± 0%  +11.89%
.debug_line          338k ± 0%   435k ± 0%  +28.78%
Total               9.21M ± 0%  9.58M ± 0%   +4.01%

Updates #19348.

Change-Id: Ic4f180c3b516018138236b0c35e0218270d957d3
Reviewed-on: https://go-review.googlesource.com/37231
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Mar 3, 2017

runtime: use inlining tables to generate accurate tracebacks
The code in https://play.golang.org/p/aYQPrTtzoK now produces the
following stack trace:

goroutine 1 [running]:
main.(*point).negate(...)
	/tmp/go/main.go:8
main.main()
	/tmp/go/main.go:14 +0x23

Previously the stack trace missed the inlined call:

goroutine 1 [running]:
main.main()
	/tmp/go/main.go:14 +0x23

Fixes #10152.
Updates #19348.

Change-Id: Ib43c67012f53da0ef1a1e69bcafb65b57d9cecb2
Reviewed-on: https://go-review.googlesource.com/37233
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Member

dhananjay92 commented Mar 4, 2017

This is awesome.

I probably missed some discussion, but is there a design doc or proposal doc I can look at?

Out of curiosity, is there a plan to emit this information as part of DWARF? It would be a nice feature if debuggers can access the InlTree info (right now they can't print correct backtraces for inlined calls; I confirmed with 781fd39).

Member

davidlazar commented Mar 6, 2017

There is an outdated proposal doc. I'll update and publish it this week. In the meantime, these slides give an overview of the approach: https://golang.org/s/go19inliningtalk

I haven't looked at the DWARF yet, but the plan is to add inlining info to the DWARF tables before we turn on mid-stack inlining for 1.9.

CL https://golang.org/cl/37854 mentions this issue.

CL https://golang.org/cl/38090 mentions this issue.

gopherbot pushed a commit to golang/proposal that referenced this issue Mar 11, 2017

design: add mid-stack inlining design doc
For golang/go#19348.

Change-Id: Ibf3e3817b35226a33d961e76fedb924e15e37069
Reviewed-on: https://go-review.googlesource.com/38090
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

@gopherbot gopherbot added this to the Proposal milestone Mar 20, 2017

Contributor

rsc commented Mar 27, 2017

It seems clear we're going to do this, assuming the right tuning (not yet done!). The tuning itself doesn't have to go through the proposal process. Accepting proposal.

@rsc rsc changed the title from proposal: mid-stack inlining in the Go compiler to cmd/compile: enable mid-stack inlining in the Go compiler Mar 27, 2017

@rsc rsc changed the title from cmd/compile: enable mid-stack inlining in the Go compiler to cmd/compile: enable mid-stack inlining Mar 27, 2017

@rsc rsc modified the milestones: Go1.9Maybe, Proposal Mar 27, 2017

gopherbot pushed a commit that referenced this issue Mar 29, 2017

runtime: handle inlined calls in runtime.Callers
The `skip` argument passed to runtime.Caller and runtime.Callers should
be interpreted as the number of logical calls to skip (rather than the
number of physical stack frames to skip). This changes runtime.Callers
to skip inlined calls in addition to physical stack frames.

The result value of runtime.Callers is a slice of program counters
([]uintptr) representing physical stack frames. If the `skip` parameter
to runtime.Callers skips part-way into a physical frame, there is no
convenient way to encode that in the resulting slice. To avoid changing
the API in an incompatible way, our solution is to store the number of
skipped logical calls of the first frame in the _second_ uintptr
returned by runtime.Callers. Since this number is a small integer, we
encode it as a valid PC value into a small symbol called:

    runtime.skipPleaseUseCallersFrames

For example, if f() calls g(), g() calls `runtime.Callers(2, pcs)`, and
g() is inlined into f, then the frame for f will be partially skipped,
resulting in the following slice:

    pcs = []uintptr{pc_in_f, runtime.skipPleaseUseCallersFrames+1, ...}

We store the skip PC in pcs[1] instead of pcs[0] so that `pcs[i:]` will
truncate the captured stack trace rather than grow it for all i.

Updates #19348.

Change-Id: I1c56f89ac48c29e6f52a5d085567c6d77d499cf1
Reviewed-on: https://go-review.googlesource.com/37854
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

It seems that func Caller(skip int) in runtime/extern.go also needs to be updated for this change, as it currently calls findfunc(pc), similarly to FuncForPC.

Member

davidlazar commented Apr 7, 2017

Indeed. I have a CL that updates runtime.Caller but haven't mailed it out yet.

CL https://golang.org/cl/40270 mentions this issue.

lparth added a commit to lparth/go that referenced this issue Apr 13, 2017

runtime: handle inlined calls in runtime.Callers
The `skip` argument passed to runtime.Caller and runtime.Callers should
be interpreted as the number of logical calls to skip (rather than the
number of physical stack frames to skip). This changes runtime.Callers
to skip inlined calls in addition to physical stack frames.

The result value of runtime.Callers is a slice of program counters
([]uintptr) representing physical stack frames. If the `skip` parameter
to runtime.Callers skips part-way into a physical frame, there is no
convenient way to encode that in the resulting slice. To avoid changing
the API in an incompatible way, our solution is to store the number of
skipped logical calls of the first frame in the _second_ uintptr
returned by runtime.Callers. Since this number is a small integer, we
encode it as a valid PC value into a small symbol called:

    runtime.skipPleaseUseCallersFrames

For example, if f() calls g(), g() calls `runtime.Callers(2, pcs)`, and
g() is inlined into f, then the frame for f will be partially skipped,
resulting in the following slice:

    pcs = []uintptr{pc_in_f, runtime.skipPleaseUseCallersFrames+1, ...}

We store the skip PC in pcs[1] instead of pcs[0] so that `pcs[i:]` will
truncate the captured stack trace rather than grow it for all i.

Updates #19348.

Change-Id: I1c56f89ac48c29e6f52a5d085567c6d77d499cf1
Reviewed-on: https://go-review.googlesource.com/37854
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Apr 18, 2017

runtime: skip logical frames in runtime.Caller
This rewrites runtime.Caller in terms of stackExpander, which already
handles inlined frames and partially skipped frames. This also has the
effect of making runtime.Caller understand cgo frames if there is a cgo
symbolizer.

Updates #19348.

Change-Id: Icdf4df921aab5aa394d4d92e3becc4dd169c9a6e
Reviewed-on: https://go-review.googlesource.com/40270
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Contributor

cespare commented Jun 8, 2017

Is -l=4 going to be the default for Go 1.9?

Contributor

dr2chase commented Jun 8, 2017

Not yet, it has high compilation costs, largely because we need to be much pickier about how we read export data.

Owner

bradfitz commented Jun 8, 2017

Is -l=4 going to be the default for Go 1.9?

No.

Maybe in Go 1.10.

jadbox commented Jun 17, 2017

Is the recommendation then to use "-l=4" in Go 1.9 for production builds where runtime performance is ideal?

Contributor

rsc commented Jun 19, 2017

No, -l=4 is explicitly untested and unsupported for production use. If you do that and you get programs that break, you get to keep both pieces.

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017

Change https://golang.org/cl/74110 mentions this issue: cmd/compile: don't export unreachable inline method bodies

gopherbot pushed a commit that referenced this issue Oct 31, 2017

cmd/compile: don't export unreachable inline method bodies
Previously, anytime we exported a function or method declaration
(which includes methods for every type transitively exported), we
included the inline function bodies, if any. However, in many cases,
it's impossible (or at least very unlikely) for the importing package
to call the method.

For example:

    package p
    type T int
    func (t T) M() { t.u() }
    func (t T) u() {}
    func (t T) v() {}

T.M and T.u are inlineable, and they're both reachable through calls
to T.M, which is exported. However, t.v is also inlineable, but cannot
be reached.

Exception: if p.T is embedded in another type q.U, p.T.v will be
promoted to q.U.v, and the generated wrapper function could have
inlined the call to p.T.v. However, in practice, this doesn't happen,
and a missed inlining opportunity doesn't affect correctness.

To implement this, this CL introduces an extra flood fill pass before
exporting to mark inline bodies that are actually reachable, so the
exporter can skip over methods like t.v.

This reduces Kubernetes build time (as measured by "time go build -a
k8s.io/kubernetes/cmd/...") on an HP Z620 measurably:

    == before ==
    real    0m44.658s
    user    11m19.136s
    sys     0m53.844s

    == after ==
    real    0m41.702s
    user    10m29.732s
    sys     0m50.908s

It also significantly cuts down the cost of enabling mid-stack
inlining (-l=4):

    == before (-l=4) ==
    real    1m19.236s
    user    20m6.528s
    sys     1m17.328s

    == after (-l=4) ==
    real    0m59.100s
    user    13m12.808s
    sys     0m58.776s

Updates #19348.

Change-Id: Iade58233ca42af823a1630517a53848b5d3c7a7e
Reviewed-on: https://go-review.googlesource.com/74110
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Contributor

cherrymui commented Nov 29, 2017

I guess we're not going to enable this by default for Go 1.10? @aclements

@cherrymui cherrymui modified the milestones: Go1.10, Go1.11 Nov 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment