New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: enable mid-stack inlining #19348

Open
davidlazar opened this Issue Mar 1, 2017 · 57 comments

Comments

@davidlazar
Member

davidlazar commented Mar 1, 2017

Design doc: https://golang.org/design/19348-midstack-inlining

@davidlazar davidlazar added the Proposal label Mar 1, 2017

@davidlazar davidlazar self-assigned this Mar 1, 2017

@gopherbot

This comment has been minimized.

gopherbot commented Mar 1, 2017

CL https://golang.org/cl/37231 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Mar 3, 2017

CL https://golang.org/cl/37233 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 3, 2017

cmd/compile,link: generate PC-value tables with inlining information
In order to generate accurate tracebacks, the runtime needs to know the
inlined call stack for a given PC. This creates two tables per function
for this purpose. The first table is the inlining tree (stored in the
function's funcdata), which has a node containing the file, line, and
function name for every inlined call. The second table is a PC-value
table that maps each PC to a node in the inlining tree (or -1 if the PC
is not the result of inlining).

To give the appearance that inlining hasn't happened, the runtime also
needs the original source position information of inlined AST nodes.
Previously the compiler plastered over the line numbers of inlined AST
nodes with the line number of the call. This meant that the PC-line
table mapped each PC to line number of the outermost call in its inlined
call stack, with no way to access the innermost line number.

Now the compiler retains line numbers of inlined AST nodes and writes
the innermost source position information to the PC-line and PC-file
tables. Some tools and tests expect to see outermost line numbers, so we
provide the OutermostLine function for displaying line info.

To keep track of the inlined call stack for an AST node, we extend the
src.PosBase type with an index into a global inlining tree. Every time
the compiler inlines a call, it creates a node in the global inlining
tree for the call, and writes its index to the PosBase of every inlined
AST node. The parent of this node is the inlining tree index of the
call. -1 signifies no parent.

For each function, the compiler creates a local inlining tree and a
PC-value table mapping each PC to an index in the local tree.  These are
written to an object file, which is read by the linker.  The linker
re-encodes these tables compactly by deduplicating function names and
file names.

This change increases the size of binaries by 4-5%. For example, this is
how the go1 benchmark binary is impacted by this change:

section             old bytes   new bytes   delta
.text               3.49M ± 0%  3.49M ± 0%   +0.06%
.rodata             1.12M ± 0%  1.21M ± 0%   +8.21%
.gopclntab          1.50M ± 0%  1.68M ± 0%  +11.89%
.debug_line          338k ± 0%   435k ± 0%  +28.78%
Total               9.21M ± 0%  9.58M ± 0%   +4.01%

Updates #19348.

Change-Id: Ic4f180c3b516018138236b0c35e0218270d957d3
Reviewed-on: https://go-review.googlesource.com/37231
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Mar 3, 2017

runtime: use inlining tables to generate accurate tracebacks
The code in https://play.golang.org/p/aYQPrTtzoK now produces the
following stack trace:

goroutine 1 [running]:
main.(*point).negate(...)
	/tmp/go/main.go:8
main.main()
	/tmp/go/main.go:14 +0x23

Previously the stack trace missed the inlined call:

goroutine 1 [running]:
main.main()
	/tmp/go/main.go:14 +0x23

Fixes #10152.
Updates #19348.

Change-Id: Ib43c67012f53da0ef1a1e69bcafb65b57d9cecb2
Reviewed-on: https://go-review.googlesource.com/37233
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@dhananjay92

This comment has been minimized.

Member

dhananjay92 commented Mar 4, 2017

This is awesome.

I probably missed some discussion, but is there a design doc or proposal doc I can look at?

Out of curiosity, is there a plan to emit this information as part of DWARF? It would be a nice feature if debuggers can access the InlTree info (right now they can't print correct backtraces for inlined calls; I confirmed with 781fd39).

@davidlazar

This comment has been minimized.

Member

davidlazar commented Mar 6, 2017

There is an outdated proposal doc. I'll update and publish it this week. In the meantime, these slides give an overview of the approach: https://golang.org/s/go19inliningtalk

I haven't looked at the DWARF yet, but the plan is to add inlining info to the DWARF tables before we turn on mid-stack inlining for 1.9.

@gopherbot

This comment has been minimized.

gopherbot commented Mar 6, 2017

CL https://golang.org/cl/37854 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Mar 10, 2017

CL https://golang.org/cl/38090 mentions this issue.

gopherbot pushed a commit to golang/proposal that referenced this issue Mar 11, 2017

design: add mid-stack inlining design doc
For golang/go#19348.

Change-Id: Ibf3e3817b35226a33d961e76fedb924e15e37069
Reviewed-on: https://go-review.googlesource.com/38090
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

@gopherbot gopherbot added this to the Proposal milestone Mar 20, 2017

@rsc

This comment has been minimized.

Contributor

rsc commented Mar 27, 2017

It seems clear we're going to do this, assuming the right tuning (not yet done!). The tuning itself doesn't have to go through the proposal process. Accepting proposal.

@rsc rsc changed the title from proposal: mid-stack inlining in the Go compiler to cmd/compile: enable mid-stack inlining in the Go compiler Mar 27, 2017

@rsc rsc changed the title from cmd/compile: enable mid-stack inlining in the Go compiler to cmd/compile: enable mid-stack inlining Mar 27, 2017

@rsc rsc modified the milestones: Go1.9Maybe, Proposal Mar 27, 2017

gopherbot pushed a commit that referenced this issue Mar 29, 2017

runtime: handle inlined calls in runtime.Callers
The `skip` argument passed to runtime.Caller and runtime.Callers should
be interpreted as the number of logical calls to skip (rather than the
number of physical stack frames to skip). This changes runtime.Callers
to skip inlined calls in addition to physical stack frames.

The result value of runtime.Callers is a slice of program counters
([]uintptr) representing physical stack frames. If the `skip` parameter
to runtime.Callers skips part-way into a physical frame, there is no
convenient way to encode that in the resulting slice. To avoid changing
the API in an incompatible way, our solution is to store the number of
skipped logical calls of the first frame in the _second_ uintptr
returned by runtime.Callers. Since this number is a small integer, we
encode it as a valid PC value into a small symbol called:

    runtime.skipPleaseUseCallersFrames

For example, if f() calls g(), g() calls `runtime.Callers(2, pcs)`, and
g() is inlined into f, then the frame for f will be partially skipped,
resulting in the following slice:

    pcs = []uintptr{pc_in_f, runtime.skipPleaseUseCallersFrames+1, ...}

We store the skip PC in pcs[1] instead of pcs[0] so that `pcs[i:]` will
truncate the captured stack trace rather than grow it for all i.

Updates #19348.

Change-Id: I1c56f89ac48c29e6f52a5d085567c6d77d499cf1
Reviewed-on: https://go-review.googlesource.com/37854
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@bilokurov

This comment has been minimized.

bilokurov commented Apr 7, 2017

It seems that func Caller(skip int) in runtime/extern.go also needs to be updated for this change, as it currently calls findfunc(pc), similarly to FuncForPC.

@davidlazar

This comment has been minimized.

Member

davidlazar commented Apr 7, 2017

Indeed. I have a CL that updates runtime.Caller but haven't mailed it out yet.

@gopherbot

This comment has been minimized.

gopherbot commented Apr 10, 2017

CL https://golang.org/cl/40270 mentions this issue.

lparth added a commit to lparth/go that referenced this issue Apr 13, 2017

runtime: handle inlined calls in runtime.Callers
The `skip` argument passed to runtime.Caller and runtime.Callers should
be interpreted as the number of logical calls to skip (rather than the
number of physical stack frames to skip). This changes runtime.Callers
to skip inlined calls in addition to physical stack frames.

The result value of runtime.Callers is a slice of program counters
([]uintptr) representing physical stack frames. If the `skip` parameter
to runtime.Callers skips part-way into a physical frame, there is no
convenient way to encode that in the resulting slice. To avoid changing
the API in an incompatible way, our solution is to store the number of
skipped logical calls of the first frame in the _second_ uintptr
returned by runtime.Callers. Since this number is a small integer, we
encode it as a valid PC value into a small symbol called:

    runtime.skipPleaseUseCallersFrames

For example, if f() calls g(), g() calls `runtime.Callers(2, pcs)`, and
g() is inlined into f, then the frame for f will be partially skipped,
resulting in the following slice:

    pcs = []uintptr{pc_in_f, runtime.skipPleaseUseCallersFrames+1, ...}

We store the skip PC in pcs[1] instead of pcs[0] so that `pcs[i:]` will
truncate the captured stack trace rather than grow it for all i.

Updates golang#19348.

Change-Id: I1c56f89ac48c29e6f52a5d085567c6d77d499cf1
Reviewed-on: https://go-review.googlesource.com/37854
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>

gopherbot pushed a commit that referenced this issue Apr 18, 2017

runtime: skip logical frames in runtime.Caller
This rewrites runtime.Caller in terms of stackExpander, which already
handles inlined frames and partially skipped frames. This also has the
effect of making runtime.Caller understand cgo frames if there is a cgo
symbolizer.

Updates #19348.

Change-Id: Icdf4df921aab5aa394d4d92e3becc4dd169c9a6e
Reviewed-on: https://go-review.googlesource.com/40270
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@cespare

This comment has been minimized.

Contributor

cespare commented Jun 8, 2017

Is -l=4 going to be the default for Go 1.9?

@dr2chase

This comment has been minimized.

Contributor

dr2chase commented Jun 8, 2017

Not yet, it has high compilation costs, largely because we need to be much pickier about how we read export data.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jun 8, 2017

Is -l=4 going to be the default for Go 1.9?

No.

Maybe in Go 1.10.

@dr2chase

This comment has been minimized.

Contributor

dr2chase commented May 31, 2018

See also: https://go-review.googlesource.com/c/go/+/109918

Quick summary of where we are:

  • calling panic no longer forbids inlining.
  • -l=4 gets midstack inlining, if you want to play with it. Binaries get bigger, compiles take longer. We're interested in feedback on how this works for people, especially when it doesn't work.
  • the compiler itself is not helped by midstack inlining; a -l=4-built (but not -l=4-compiling) compiler runs slower than normal.
  • some benchmarks speed up nicely

The bigger+slower compiler is worrisome, which is the main reason this is not enabled yet; if this happened to your binary, you'd not be happy. The minimum plan is to understand how to manage inlining so bigger+slower at least doesn't happen to the compiler, and hope that it generalizes. A more ambitious plan is to build some sort of a feedback framework so that it's clear where inlining would actually help, instead of just guessing. Or we could use machine learning....

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Jun 1, 2018

Is -l=4 still unsupported for production use? Or is it now supported for production but with potential performance regressions (like, say, -O3)?

@dr2chase

This comment has been minimized.

Contributor

dr2chase commented Jun 1, 2018

I am not sure of the official position, and it's not tested as much as it should be (i.e., I need to see about whether we can have a -l=4 test box) but it's supposed to at least execute correctly and we'd like to know when it doesn't, which I think is different from "you own the pieces". Debugging is also not well-tested for -l=4 binaries.

I've been rebenchmarking the compiler to check how inlining changes its performance, and the short answer is it isn't worse, but it isn't better, and without a noinline annotation on one method it doubles the size of the binary (with the annotation, it's only 50% larger). I don't think we want the noinline annotations to become part of common practice for using go (we use them in tests, very helpful there) but on the other hand it can also be a good way of figuring out the sort of inlining mistakes the compiler needs to not make in order to turn this on in general.

@btracey

This comment has been minimized.

Contributor

btracey commented Jun 1, 2018

I ran some of our BLAS benchmarks with no significant effect. Do you know if functions with asm stubs can be inlined (assuming the build tags are such that the asm is not actually used)?

chromium-infra-bot pushed a commit to luci/luci-go that referenced this issue Jun 2, 2018

stringset: switch Set to a concrete type
There had been a plan to make a concurrent safe Set implementation but it's not
happening. Replace the interface Set with the concrete type set, which turns the
method through an interface to direct calls, which can be inlined by the
compiler.

Because of golang/go#19348, inline all the functions
(it doesn't make the code longer!) to ensure that most methods can be inlined.

Change-Id: Ib8dd36a16ab57c1383edd236ad62b78b4a10091c
Reviewed-on: https://chromium-review.googlesource.com/1083531
Reviewed-by: Andrii Shyshkalov <tandrii@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
@davecheney

This comment has been minimized.

Contributor

davecheney commented Jun 2, 2018

@laboger

This comment has been minimized.

Contributor

laboger commented Jun 4, 2018

I've been rebenchmarking the compiler to check how inlining changes its performance, and the short answer is it isn't worse, but it isn't better, and without a noinline annotation on one method it doubles the size of the binary (with the annotation, it's only 50% larger).

Is this a statement for amd64 or for other GOARCHes too? I would expect to see more improvement on ppc64x because of the high cost of loading and storing the arguments and return values.

@ugorji

This comment has been minimized.

Contributor

ugorji commented Sep 22, 2018

Ping. Any chance this gets in for go 1.12?

@dr2chase

This comment has been minimized.

Contributor

dr2chase commented Sep 23, 2018

Someone needs to look into better heuristics, because the current rules tend to over-bloat the generated binary. Someone is not supposed to be me, though I really want it to happen.

@ugorji

This comment has been minimized.

Contributor

ugorji commented Oct 29, 2018

Phew - this feature may never get done ;(

@gopherbot

This comment has been minimized.

gopherbot commented Nov 5, 2018

Change https://golang.org/cl/147361 mentions this issue: cmd/compile: encourage inlining of functions with single-call bodies

gopherbot pushed a commit that referenced this issue Nov 8, 2018

cmd/compile: encourage inlining of functions with single-call bodies
This is a simple tweak to allow a bit more mid-stack inlining.
In cases like this:

func f() {
    g()
}

We'd really like to inline f into its callers. It can't hurt.

We implement this optimization by making calls a bit cheaper, enough
to afford a single call in the function body, but not 2.
The remaining budget allows for some argument modification, or perhaps
a wrapping conditional:

func f(x int) {
    g(x, 0)
}
func f(x int) {
    if x > 0 {
        g()
    }
}

Update #19348

Change-Id: Ifb1ea0dd1db216c3fd5c453c31c3355561fe406f
Reviewed-on: https://go-review.googlesource.com/c/147361
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
@dr2chase

This comment has been minimized.

Contributor

dr2chase commented Nov 12, 2018

Do we call this fixed (1.12) or work-in-progress (1.13)?
Either way, we're not done with inlining, but we're also unlikely to do more in 1.12.

@randall77

This comment has been minimized.

Contributor

randall77 commented Nov 12, 2018

I'm happy to punt any future work to 1.13.

@randall77 randall77 modified the milestones: Go1.12, Go1.13 Nov 12, 2018

@ugorji

This comment has been minimized.

Contributor

ugorji commented Nov 16, 2018

@dr2chase @randall77

https://golang.org/cl/147361 sets inlineExtraCallCost = 60.

I want to make an argument for setting inlineExtraCallCost = 56. This is a similarly conservative value (like 60), preserves the original premise of allowing at most 1 call for inlining, maintains similar <5% increase in cmd/compile and cmd/go binaries, and allows slightly more code to be inlined.

I captured most of my arguments in https://golang.org/cl/147361 , but want to capture it here in the issue so it doesn't get lost.

To illustrate, I will first show the cost increases for cmd/go and cmd/compile, for various settings of inlineExtraCallCost. Then I will show some typical sample code which cost just 1-4 more than the budget - switching budget from 60 to 56 will allow these get inlined.

Cost increases for cmd/go and cmd/compile for various settings of inlineExtraCallCost

I updated $GOROOT/src/cmd/compile/internal/gc/inl.go to set inlineExtraCallCost to 80 60 56 55 54 53 50 41 40 30 1, then I ran make.bash, and collected the sizes of $GOROOT/bin/go $GOROOT/pkg/tool/darwin_amd64/compile. I then checked how each increased using the baseline of 80 (value as at go 1.11).

Results:

cc = 60: go: +2.945%, compile: +4.049%
cc = 56: go: +3.243%, compile: +4.992%
cc = 55: go: +3.362%, compile: +5.224%
cc = 54: go: +3.297%, compile: +12.178%
cc = 53: go: +3.354%, compile: +12.213%
cc = 50: go: +3.502%, compile: +12.352%
cc = 41: go: +4.133%, compile: +12.585%
cc = 40: go: +4.167%, compile: +12.621%
cc = 30: go: +4.802%, compile: +15.026%
cc = 1: go: +13.013%, compile: +32.246%

This shows that, up to about cc=55, the size increases are modest (in line with what cc=60 gives).

Typical sample code which cost just 1-4 more than the budget

The sample codebase which illustrates my usage is below. My library (github.com/ugorji/go/codec) is a encoder/decoder which can work off []byte or io.Reader/Writer.

//+build ignore

// To test this (assuming file is called inlining.go), use:
//
// go build -gcflags -m=2 inlining.go 2>&1 | grep "cannot inline" | grep -v "go:noinline"
// go run inlining.go
//
// Ideally, this is a buffered reader/writer, where you are reading/writing bytes a few at a time.
// If buffer holds 4096, and you read a token at a time (as in a decoder), then you
// may read 4096 times before having to fill again. Each read is just getting an element
// of an array, and incrementing a cursor.
//
// Paying the cost of a method call is too much.
// Yet that cost is paid, for the rare times that a fill() is needed.
//
// Note: inlineExtraCallCost=56 is best compromise, allowing some internal helper calls to be inlined.

package main

import (
	"fmt"
	"io"
)

type Rh struct {
	R
}

type R struct {
	cursor int
	avail  int
	bytes  bool
	buffer []byte
}

func main() {
	var r Rh
	r.buffer = make([]byte, 64)
	for i := range r.buffer {
		r.buffer[i] = 'A'
	}
	fmt.Printf("Rh.readn  5: %s\n", r.readn(5))
	fmt.Printf("Rh.readn 17: %s\n", r.readn(17))
	fmt.Printf("Rh.readn 96: %s\n", r.readn(96))

	fmt.Printf("Rh.writen:  %d\n", r.writen([]byte("hello")))
	fmt.Printf("Rh.writen2: %d\n", r.writen2('h', 'e'))
	fmt.Printf("Rh.writen22: %d\n", r.writen22('h', 'e'))
	fmt.Printf("R.writen2: %d\n", r.R.writen2('h', 'e'))
}

//go:noinline
func (r *R) fill() { // not inlineable
	// in reality, this reaches out to the network to fill the buffer
	r.avail = len(r.buffer)
	r.cursor = 0
}

//go:noinline
func (r *R) doWriten2(b1, b2 byte) {
}

// inlineable method - to see how it affects inlining cost
func (r *R) doReadn(n int) []byte { // inlineable // cost=38
	if r.avail == 0 { // cost=5
		panic(io.EOF) // cost=3
	}
	if n > r.avail { // cost=5
		panic(io.ErrUnexpectedEOF) //cost=3
	}
	r.avail -= n                           // cost=4
	r.cursor += n                          // cost=4
	return r.buffer[r.cursor-n : r.cursor] // slicing cost=9, return = ?
}

func (r *R) readn(n int) []byte { // cost=107
	if n > r.avail { // cost=5
		r.fill() // cost=63
	}
	return r.doReadn(n) // cost=39
}

// simulate accessing methods/fields of struct
func (r *R) writen2(b1, b2 byte) int { // cost=80
	if r.bytes { // cost=2
		r.buffer = append(r.buffer, b1, b2) // cost=8
	} else {
		r.doWriten2(b1, b2) // cost = 65 (call=60 + 2 args + ???)
	}
	return len(r.buffer) // cost=5 (return cost=1, len cost=4)
}

// simulate accessing methods/fields of embedded member
func (r *Rh) writen(b []byte) int { // cost=83
	if r.bytes { // cost=3
		r.buffer = append(r.buffer, b...) // cost=9
	} else {
		r.fill() // cost = 65
	}
	return len(r.buffer) // cost=6 (return cost=1, len cost=5)
}

// simulate accessing methods/fields of struct members
func (r *Rh) writen2(b1, b2 byte) int { // cost=86
	if r.R.bytes { // cost=3
		r.R.buffer = append(r.R.buffer, b1, b2) // cost=10
	} else {
		r.R.doWriten2(b1, b2) // cost = 67 (call=60 + 2 args + ???)
	}
	return len(r.R.buffer) // cost=6 (return cost=1, len cost=5)
}

// simulate accessing methods/fields of struct members
func (r *Rh) writen22(b1, b2 byte) int { // cost=88
	return r.R.writen2(b1, b2)
}

Running

go build -gcflags -m=2 inlining.go 2>&1 | grep "cannot inline" | grep -v "go:noinline"

We get

./inlining.go:76:6: cannot inline (*R).readn: function too complex: cost 107 exceeds budget 80
./inlining.go:94:6: cannot inline (*Rh).writen: function too complex: cost 83 exceeds budget 80
./inlining.go:104:6: cannot inline (*Rh).writen2: function too complex: cost 86 exceeds budget 80
./inlining.go:114:6: cannot inline (*Rh).writen22: function too complex: cost 88 exceeds budget 80
./inlining.go:36:6: cannot inline main: unhandled op RANGE

With inlineExtraCallCost=56, writen will be inlined. This allows us do something similar in the code i.e. inline the fast-path while ensuring slow-path is not inlined, and keeping the cost under 80 so the whole thing is inlined. This allows the append(...) and b[n] calls to be inlined, eliding a function call in this fast path.

Currently, in github.com/ugorji/go/codec, in my critical path, I get:

./encode.go:999:6: cannot inline (*encWriterSwitch).writen1: function too complex: cost 81 exceeds budget 80
./encode.go:985:6: cannot inline (*encWriterSwitch).writeb: function too complex: cost 81 exceeds budget 80
./encode.go:1006:6: cannot inline (*encWriterSwitch).writen2: function too complex: cost 84 exceeds budget 80
./encode.go:992:6: cannot inline (*encWriterSwitch).writestr: function too complex: cost 81 exceeds budget 80

This is so so close, and cc=56 will allow all these functions be inlined.

It will be nice if we can validate that cc=55 or cc=56 is fair and possibly get it in for go 1.12.

Thanks much!

@cristaloleg

This comment has been minimized.

cristaloleg commented Nov 16, 2018

I have a not idiomatic idea but can we just export inlineExtraCallCost as a GOINLINEEXPERIMENT (like it was for vendoring) and gave it for those who need it? 😃

@dr2chase

This comment has been minimized.

Contributor

dr2chase commented Nov 16, 2018

Thanks for doing this, I will run a bunch of benchmarks over the weekend to see how it generalizes.
It would be really interesting to know what happened between 55 and 54.

@dr2chase

This comment has been minimized.

Contributor

dr2chase commented Nov 19, 2018

How do you feel about 57?

I ran my pile of selected benchmarks from github over the weekend, "stuff happens" for a couple of them at 56 but not at 57. There seems to be minor improvement at 57 over 60, though most changes are indistinguishable from noise. Making sense of why things sometimes gets notably worse would be interesting.

@ugorji, how much faster does your code run with the lower call cost for inlining?

Summary of binary sizes, compile times, and benchmark runs

@ugorji

This comment has been minimized.

Contributor

ugorji commented Nov 19, 2018

Thanks @dr2chase I will run my code tomorrow with 57 and report on my findings.

Also, is it possible to share your summaries outside google, or at least share with me directly so I can view - email is ugorji @ gmail dot com .

Thanks.

@ugorji

This comment has been minimized.

Contributor

ugorji commented Nov 19, 2018

@dr2chase

Ran my code with inlineExtraCallCost=80, 60 and 57, and captured run my benchmark runtimes and did some analysis:

# running with cc=80, and checking for k=60, k=57
(compared to cc=80)    cc = 60: bytes: -3.107%, io-static-buf: -2.470%, io-dynamic-buf: -1.545%
(compared to cc=80)    cc = 57: bytes: -8.395%, io-static-buf: -7.246%, io-dynamic-buf: -6.676%

# running with cc=60, and checking for k=57
(compared to cc=60)    cc = 57: bytes: -5.457%, io-static-buf: -4.896%, io-dynamic-buf: -5.211%

In plain english, with cc=57, my usecase runs about 8.4% faster compared to cc=80, and about 5.5% faster compared to cc=60, for the common case where folks just want to encode into a []byte (not io.Reader). This is a significant performance improvement in my use-case, and encourages folks to not do the codecgen path (which kubernetes did previously before they moved to another library and etcd still does).

Thanks so much for taking the time to investigate.

The simple script I used is below:

declare -a zb zi zf
# runtimes for cc=80, 60 and 57 below
zb[80]=3786935
zi[80]=4333331
zf[80]=4195797
zb[60]=3669257
zi[60]=4226259
zf[60]=4130957
zb[57]=3469019
zi[57]=4019335
zf[57]=3915682

cc=80
for k in 60 57
do
  b=$(bc -l <<< "scale=3;(${zb[${k}]}-${zb[${cc}]})*100/${zb[${cc}]}")
  i=$(bc -l <<< "scale=3;(${zi[${k}]}-${zi[${cc}]})*100/${zi[${cc}]}")
  f=$(bc -l <<< "scale=3;(${zf[${k}]}-${zf[${cc}]})*100/${zf[${cc}]}")
  echo "(compared to cc=${cc})    cc = $k: bytes: ${b}%, io-static-buf: ${i}%, io-dynamic-buf: ${f}%"
done
@dr2chase

This comment has been minimized.

Contributor

dr2chase commented Nov 19, 2018

It is supposed to be possible to share that, but I managed not to.
I'm not sure how I did it in the past.
Here is a PDF:
Fine tuning inline call cost parameter.pdf

@ugorji

This comment has been minimized.

Contributor

ugorji commented Nov 26, 2018

Thanks @dr2chase for the extremely detailed analysis you did.

Looking forward to the CL.

@ugorji

This comment has been minimized.

Contributor

ugorji commented Nov 30, 2018

@dr2chase @khr

Any chance we get cc=57 in by beta?

@gopherbot

This comment has been minimized.

gopherbot commented Nov 30, 2018

Change https://golang.org/cl/151977 mentions this issue: cmd/compile: decrease inlining call cost from 60 to 57

gopherbot pushed a commit that referenced this issue Dec 1, 2018

cmd/compile: decrease inlining call cost from 60 to 57
A Go user made a well-documented request for a slightly
lower threshold.  I tested against a selection of other
people's benchmarks, and saw a tiny benefit (possibly noise)
at equally tiny cost, and no unpleasant surprises observed
in benchmarking.

I.e., might help, doesn't hurt, low risk, request was
delivered on a silver platter.

It did, however, change the behavior of one test because
now bytes.Buffer.Grow is eligible for inlining.

Updates #19348.

Change-Id: I85e3088a4911290872b8c6bda9601b5354c48695
Reviewed-on: https://go-review.googlesource.com/c/151977
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment