Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
cmd/compile: improve inlining cost model #17566
Comments
josharian
added
Performance
ToolSpeed
labels
Oct 24, 2016
josharian
added this to the Go1.9 milestone
Oct 24, 2016
josharian
referenced this issue
Oct 24, 2016
Closed
cmd/go: add test to ensure upx can compress our binaries #16706
|
I'm going to toss in a few more ideas to consider. An unexported function that is only called once is often cheaper to inline. Functions that include tests of parameter values can be cheaper to inline for specific calls that pass constant arguments for those parameters. That is, the cost of inlining is not solely determined by the function itself, it is also determined by the nature of the call. Functions that only make function calls in error cases, which is fairly common, can be cheaper to handle as a mix of inlining and outlining: you inline the main control flow but leave the error handling in a separate function. This may be particularly worth considering when inlining across packages, as the export data only needs to include the main control flow. (Error cases are detectable as the control flow blocks that return a non-nil value for a single result parameter of type One of the most important optimizations for large programs is feedback directed optimization aka profiled guided optimization. One of the most important lessons to learn from feedback/profiling is which functions are worth inlining, both on a per-call basis and on a "most calls pass X as argument N" basis. Therefore, while we have no FDO/PGO framework at present, any work on inlining should consider how to incorporate information gleaned from such a framework when it exists. Pareto optimal is a nice goal but I suspect it is somewhat unrealistic. It's almost always possible to find a horrible decision made by any specific algorithm, but the algorithm can still be better on realistic benchmarks. |
A common case where this would apply is when calling marshallers/unmarshallers that use |
|
Along the lines of what iant@ said, it's common for C++ compilers take into account whether a callsite appears in a loop (and thus might be "hotter"). This can help for toolchains that don't support FDO/PGO or for applications in which FDO/PGO are not being used. |
|
No pragmas that mandates inline, please.
I already expressed dislike for //go:noinline, and I will firmly object any
proposal for //go:mustinline or something like that, even if it's limited
to runtime.
If we can't find a good heuristics for the runtime package, I don't think
it will handle real-world cases well.
Also, we need to somehow fix the traceback for inlined non-leaf functions
first.
Another idea for the inlining decision is how simpler could the function
body be if inlined. Esp. for reflect using functions that has fast paths,
if the input type matches the fast path, even though the function might be
very complicated, the inlined version might be really simple.
|
|
Couldn't we obtain a minor improvement in the cost model by measuring the size of generated assembly language? It would require preserving a copy of the tree till after compilation, and doing compilation bottom-up (same way as inlining is scheduled) but that would give you a more accurate measure. There's a moderate chance of being able to determine goodness of constant parameters at the SSA-level, too. Note that this would require rearranging all of these transformations (inlining, escape analysis, closure conversion, compilation) to run them function/recursive-function-nest at-a-time, so that the results from compiling bottom-most functions all the way to assembly language would be available to inform inlining at the next level up. |
I have also considered this. There'd be a lot of high risk work rearranging the rest of the compiler to work this way. It could also hurt our chances to get a big boost out of concurrent compilation; you want to start on the biggest, slowest functions ASAP, but those are the most likely to depend on many other functions. |
|
It doesn't look that high risk to me; it's just another iteration order. SSA also gives us a slightly more tractable place to compute things like "constant parameter values that shrink code size", even if it is only so crude as looking for blocks directly conditional on comparisons with parameter values. |
|
I think we could test the inlining benefits of the bottom-up compilation pretty easily. One way is to do it just for inter-package compilation (as suggested above); another is to hack cmd/compile to dump the function asm size somewhere and then hack cmd/go to compile all packages twice, using the dumped sizes for the second round. |
This was referenced Oct 31, 2016
Out of curiosity, why "often"? I can't think off the top of my head a case in which the contrary is true. Also, just to understand, in |
It is not true when the code looks like
Because in the normal case where you don't need to call
In package main, yes. |
Oh I see, makes sense. Would be nice (also in other cases) if setting up the stack frame could be sunk in the if, but likely it wouldn't be worth the extra effort.
The tyranny of unit-at-a-time :D |
RalphCorderoy
commented
Jan 18, 2017
|
Functions that start with a run of |
|
@RalphCorderoy I've been thinking about the same kind of function body "chunking" for early returns. Especially interesting for quick paths, where the slow path is too big to inline. Unless the compiler chunks, it's up to the developer to split the function in two I presume. |
RalphCorderoy
commented
Jan 19, 2017
|
Hi @mvdan, Split the function in two with the intention the compiler then inlines the non-leaf first one? |
|
Yes, for example, here
|
|
Too late for 1.9. |
josharian
modified the milestones:
Go1.10,
Go1.9
May 18, 2017
OneOfOne
referenced this issue
Aug 19, 2017
Closed
Proposal: cmd/compile: add a go:inline directive #21536
gopherbot
commented
Aug 20, 2017
|
Change https://golang.org/cl/57410 mentions this issue: |
pushed a commit
that referenced
this issue
Aug 22, 2017
mvdan
referenced this issue
Sep 12, 2017
Closed
cmd/compile: expand TestIntendedInlining to more packages and funcs #21851
|
Another example of current inlining heuristic punishing more readable code
is more "expensive" than
This is based on real code from regexp package (see https://go-review.googlesource.com/c/go/+/65491 |
josharian commentedOct 24, 2016
•
Edited 1 time
-
josharian
Oct 24, 2016
The current inlining cost model is simplistic. Every gc.Node in a function has a cost of one. However, the actual impact of each node varies. Some nodes (OKEY) are placeholders never generate any code. Some nodes (OAPPEND) generate lots of code.
In addition to leading to bad inlining decisions, this design means that any refactoring that changes the AST structure can have unexpected and significant impact on compiled code. See CL 31674 for an example.
Inlining occurs near the beginning of compilation, which makes good predictions hard. For example,
newormakeor&might allocate (large runtime call, much code generated) or not (near zero code generated). As another example, code guarded byif falsestill gets counted. As another example, we don't know whether bounds checks (which generate lots of code) will be eliminated or not.One approach is to hand-write a better cost model: append is very expensive, things that might end up in a runtime call are moderately expensive, pure structure and variable/const declarations are cheap or free.
Another approach is to compile lots of code and generate a simple machine-built model (e.g. linear regression) from it.
I have tried both of these approaches, and believe both of them to be improvements, but did not mail either of them, for two reasons:
Three other related ideas:
cc @dr2chase @randall77 @ianlancetaylor @mdempsky