Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upcmd/compile: enable mid-stack inlining #19348
Comments
davidlazar
added
the
Proposal
label
Mar 1, 2017
davidlazar
self-assigned this
Mar 1, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gopherbot
commented
Mar 1, 2017
|
CL https://golang.org/cl/37231 mentions this issue. |
josharian
referenced this issue
Mar 2, 2017
Open
cmd/compile: inline forwarding thunk functions #8421
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gopherbot
commented
Mar 3, 2017
|
CL https://golang.org/cl/37233 mentions this issue. |
pushed a commit
that referenced
this issue
Mar 3, 2017
pushed a commit
that referenced
this issue
Mar 3, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dhananjay92
Mar 4, 2017
Member
This is awesome.
I probably missed some discussion, but is there a design doc or proposal doc I can look at?
Out of curiosity, is there a plan to emit this information as part of DWARF? It would be a nice feature if debuggers can access the InlTree info (right now they can't print correct backtraces for inlined calls; I confirmed with 781fd39).
|
This is awesome. I probably missed some discussion, but is there a design doc or proposal doc I can look at? Out of curiosity, is there a plan to emit this information as part of DWARF? It would be a nice feature if debuggers can access the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davidlazar
Mar 6, 2017
Member
There is an outdated proposal doc. I'll update and publish it this week. In the meantime, these slides give an overview of the approach: https://golang.org/s/go19inliningtalk
I haven't looked at the DWARF yet, but the plan is to add inlining info to the DWARF tables before we turn on mid-stack inlining for 1.9.
|
There is an outdated proposal doc. I'll update and publish it this week. In the meantime, these slides give an overview of the approach: https://golang.org/s/go19inliningtalk I haven't looked at the DWARF yet, but the plan is to add inlining info to the DWARF tables before we turn on mid-stack inlining for 1.9. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gopherbot
commented
Mar 6, 2017
|
CL https://golang.org/cl/37854 mentions this issue. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gopherbot
commented
Mar 10, 2017
|
CL https://golang.org/cl/38090 mentions this issue. |
pushed a commit
to golang/proposal
that referenced
this issue
Mar 11, 2017
gopherbot
added this to the Proposal milestone
Mar 20, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rsc
Mar 27, 2017
Contributor
It seems clear we're going to do this, assuming the right tuning (not yet done!). The tuning itself doesn't have to go through the proposal process. Accepting proposal.
|
It seems clear we're going to do this, assuming the right tuning (not yet done!). The tuning itself doesn't have to go through the proposal process. Accepting proposal. |
rsc
added
the
Proposal-Accepted
label
Mar 27, 2017
rsc
changed the title from
proposal: mid-stack inlining in the Go compiler
to
cmd/compile: enable mid-stack inlining in the Go compiler
Mar 27, 2017
rsc
changed the title from
cmd/compile: enable mid-stack inlining in the Go compiler
to
cmd/compile: enable mid-stack inlining
Mar 27, 2017
rsc
modified the milestones:
Go1.9Maybe,
Proposal
Mar 27, 2017
pushed a commit
that referenced
this issue
Mar 29, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bilokurov
Apr 7, 2017
It seems that func Caller(skip int) in runtime/extern.go also needs to be updated for this change, as it currently calls findfunc(pc), similarly to FuncForPC.
bilokurov
commented
Apr 7, 2017
|
It seems that func Caller(skip int) in runtime/extern.go also needs to be updated for this change, as it currently calls findfunc(pc), similarly to FuncForPC. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davidlazar
Apr 7, 2017
Member
Indeed. I have a CL that updates runtime.Caller but haven't mailed it out yet.
|
Indeed. I have a CL that updates |
mdwhatcott
referenced this issue
Apr 7, 2017
Closed
failure_report.go: Go 1.9 "runtime" package API modifications #7
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gopherbot
commented
Apr 10, 2017
|
CL https://golang.org/cl/40270 mentions this issue. |
added a commit
to lparth/go
that referenced
this issue
Apr 13, 2017
pushed a commit
that referenced
this issue
Apr 18, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Is |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dr2chase
Jun 8, 2017
Contributor
Not yet, it has high compilation costs, largely because we need to be much pickier about how we read export data.
|
Not yet, it has high compilation costs, largely because we need to be much pickier about how we read export data. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
No. Maybe in Go 1.10. |
This was referenced Jun 14, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jadbox
Jun 17, 2017
Is the recommendation then to use "-l=4" in Go 1.9 for production builds where runtime performance is ideal?
jadbox
commented
Jun 17, 2017
|
Is the recommendation then to use "-l=4" in Go 1.9 for production builds where runtime performance is ideal? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rsc
Jun 19, 2017
Contributor
No, -l=4 is explicitly untested and unsupported for production use. If you do that and you get programs that break, you get to keep both pieces.
|
No, -l=4 is explicitly untested and unsupported for production use. If you do that and you get programs that break, you get to keep both pieces. |
bradfitz
modified the milestones:
Go1.9Maybe,
Go1.10
Jul 20, 2017
bcmills
referenced this issue
Jul 31, 2017
Closed
proposal: hash: export a built-in hash function for comparable values #21195
petermattis
referenced this issue
Aug 22, 2017
Open
util/log: use runtime.CallersFrames instead of runtime.Callers #17770
mdempsky
referenced this issue
Oct 17, 2017
Open
cmd/compile: odd inlining heuristic under mid-stack inlining #22310
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gopherbot
Oct 27, 2017
Change https://golang.org/cl/74110 mentions this issue: cmd/compile: don't export unreachable inline method bodies
gopherbot
commented
Oct 27, 2017
|
Change https://golang.org/cl/74110 mentions this issue: |
pushed a commit
that referenced
this issue
Oct 31, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cherrymui
Nov 29, 2017
Contributor
I guess we're not going to enable this by default for Go 1.10? @aclements
|
I guess we're not going to enable this by default for Go 1.10? @aclements |
cherrymui
modified the milestones:
Go1.10,
Go1.11
Nov 29, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ugorji
Feb 26, 2018
Contributor
Hi, is this still in plan for Go 1.11? Its now been about 3 months since it was punted to go 1.11, and about a year since this was published and prototyped. It would be nice to get this in for go 1.11. At the minimum, it makes reflection faster (all those reflect methods that have panic making them non-inlineable will now be inlined) which makes many heavily-used things faster (printing via fmt, json encoding, etc) and will cause a significant jump in performance for most libraries by eliding the function overhead for delegate functions, etc.
I know there was talk of an export format changes blocking this. Is this done yet?
Thanks. I am writing this because it's in my list of things I am excited about for go 1.11, along with faster defer and support for co-operative coroutines (i.e. scheduler optimizing case where 2 goroutines serve as producer and consumer on a chan and can be scheduled "together" instead of each send/receive doing round-robin over all goroutines). Rust is also getting extremely compelling by September 2018, and it would be nice that performance is "comparable".
As they say, optimizations drive people to code the right way, as they don't see a loss. Without mid-stack inlining, I have written code where I have "manually inlined" functions to get better performance in my library, or I never use defer in my libraries because of the performance hit. That's the kind of mental overload that I would like to avoid.
|
Hi, is this still in plan for Go 1.11? Its now been about 3 months since it was punted to go 1.11, and about a year since this was published and prototyped. It would be nice to get this in for go 1.11. At the minimum, it makes reflection faster (all those reflect methods that have panic making them non-inlineable will now be inlined) which makes many heavily-used things faster (printing via fmt, json encoding, etc) and will cause a significant jump in performance for most libraries by eliding the function overhead for delegate functions, etc. I know there was talk of an export format changes blocking this. Is this done yet? Thanks. I am writing this because it's in my list of things I am excited about for go 1.11, along with faster defer and support for co-operative coroutines (i.e. scheduler optimizing case where 2 goroutines serve as producer and consumer on a chan and can be scheduled "together" instead of each send/receive doing round-robin over all goroutines). Rust is also getting extremely compelling by September 2018, and it would be nice that performance is "comparable". As they say, optimizations drive people to code the right way, as they don't see a loss. Without mid-stack inlining, I have written code where I have "manually inlined" functions to get better performance in my library, or I never use defer in my libraries because of the performance hit. That's the kind of mental overload that I would like to avoid. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
randall77
Feb 26, 2018
Contributor
We'd like to get this done for 1.11.
I think the needed export format changes are done. I think Matthew has some more changes lined up to make things better (compile-time faster), but at this point they aren't blockers.
The major TODO at this point is to tune the inlining heuristic. Mid-stack inlining helps runtime speed, but it can make binaries bigger. A lot bigger in some cases; cmd/compile's text segment gets ~100% bigger. I don't think that's launchable as-is, so we need to figure out the right way to tweak the heuristics to preserve as much speed as we can while keeping binary size manageable. Ideas welcome; there's no obvious plan of attack here.
Yes, we'd definitely like to get rid of all the situations where people have had to manually inline things.
|
We'd like to get this done for 1.11. I think the needed export format changes are done. I think Matthew has some more changes lined up to make things better (compile-time faster), but at this point they aren't blockers. The major TODO at this point is to tune the inlining heuristic. Mid-stack inlining helps runtime speed, but it can make binaries bigger. A lot bigger in some cases; cmd/compile's text segment gets ~100% bigger. I don't think that's launchable as-is, so we need to figure out the right way to tweak the heuristics to preserve as much speed as we can while keeping binary size manageable. Ideas welcome; there's no obvious plan of attack here. Yes, we'd definitely like to get rid of all the situations where people have had to manually inline things. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mvdan
Feb 26, 2018
Member
Has any thought been given to enabling a conservative version of mid-stack inlining in 1.11? That is, only doing the extra inlining where it means little or none increment in binary size.
|
Has any thought been given to enabling a conservative version of mid-stack inlining in 1.11? That is, only doing the extra inlining where it means little or none increment in binary size. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dlsniper
Feb 26, 2018
Contributor
@randall77 would you consider having a conservative version, as @mvdan suggested, but allowing users to also experiment with this by providing a compiler directive, like //go:inline, which could perform the inline but only up to a max defined complexity?
|
@randall77 would you consider having a conservative version, as @mvdan suggested, but allowing users to also experiment with this by providing a compiler directive, like |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
randall77
Feb 26, 2018
Contributor
@mvdan: That's an option. It's not trivial to do, though, as at inlining decision time we don't know what the final binary size difference is going to end up being. We have to more or less guess based on the info we do have (# and kind of AST nodes).
@dlsniper: I'd like to avoid a //go:inline comment if we can. I don't think it solves the problem well, as the inlining decision should probably depend on characteristics of the call site (e.g. in a loop, constant arguments, etc.), not just the function being called.
|
@mvdan: That's an option. It's not trivial to do, though, as at inlining decision time we don't know what the final binary size difference is going to end up being. We have to more or less guess based on the info we do have (# and kind of AST nodes). @dlsniper: I'd like to avoid a |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mvdan
Feb 26, 2018
Member
I was thinking conservative in terms of the heuristic. For example, every extra level of inlining could increase the cost of the function by a constant, or by a percent.
I assume that this will come down to lots of testing and gathering of data, though. I'm not sure how useful it is to throw ideas at this issue before then :)
|
I was thinking conservative in terms of the heuristic. For example, every extra level of inlining could increase the cost of the function by a constant, or by a percent. I assume that this will come down to lots of testing and gathering of data, though. I'm not sure how useful it is to throw ideas at this issue before then :) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
CAFxX
Feb 26, 2018
Contributor
Maybe a silly idea but... What if, at least for this first version, mid-stack in lining was enabled only for functions that are transitively statically reachable from a benchmark function in the same package? Would be nice to extend this to use actual profiling information in the future.
|
Maybe a silly idea but... What if, at least for this first version, mid-stack in lining was enabled only for functions that are transitively statically reachable from a benchmark function in the same package? Would be nice to extend this to use actual profiling information in the future. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
CAFxX
Feb 26, 2018
Contributor
@randall77 what if the //go:inline was used to mark the callsite instead?
|
@randall77 what if the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
randall77
Feb 26, 2018
Contributor
Transitively reachable from a benchmark function sounds problematic. When doing a non-test build, the compiler never sees _test.go files, which is where all the Benchmark functions tend to be. And having the existence of a Benchmark function affect the performance of the function being benchmarked sounds like a recipe for HeisenBugs. On the plus side, though, it would encourage the writing of Benchmark functions.
The compiler has no support for //go: directives at statement or expression scope, only at global scope. Not that it couldn't be added, but it's significant work.
|
Transitively reachable from a benchmark function sounds problematic. When doing a non-test build, the compiler never sees _test.go files, which is where all the Benchmark functions tend to be. And having the existence of a Benchmark function affect the performance of the function being benchmarked sounds like a recipe for HeisenBugs. On the plus side, though, it would encourage the writing of Benchmark functions. The compiler has no support for |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
chewxy
Feb 26, 2018
Is there a way to track the inlined code such that #14840 would then be useful and eliminate more deadcode? The inlining process is going to touch the linker anyway, might as well make it useful?
chewxy
commented
Feb 26, 2018
|
Is there a way to track the inlined code such that #14840 would then be useful and eliminate more deadcode? The inlining process is going to touch the linker anyway, might as well make it useful? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
randall77
Feb 26, 2018
Contributor
@chewxy The inlining process does not involve the linker. The compiler does ~all the work.
The inlining process will detect and remove dead code. If that ends up removing the last reference to a global, that global will be removed by the linker. But I don't think that will help with #14840, which is about globals with init functions.
|
@chewxy The inlining process does not involve the linker. The compiler does ~all the work. The inlining process will detect and remove dead code. If that ends up removing the last reference to a global, that global will be removed by the linker. But I don't think that will help with #14840, which is about globals with init functions. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
CAFxX
Feb 27, 2018
Contributor
The major TODO at this point is to tune the inlining heuristic. Mid-stack inlining helps runtime speed, but it can make binaries bigger. A lot bigger in some cases; cmd/compile's text segment gets ~100% bigger. I don't think that's launchable as-is, so we need to figure out the right way to tweak the heuristics to preserve as much speed as we can while keeping binary size manageable. Ideas welcome; there's no obvious plan of attack here.
Silly idea # 2: how about brute forcing this?
- Grab an intern
🥇 - Gather a corpus of Go code with (macro?)benchmarks
- For each benchmark measure speed (+allocations?) and text size with inlining disabled (baseline)
- For each benchmark measure the same as above, with "random" inlining decisions in the functions that are transitively called by it; have the compiler log those decisions (repeat this step many times to generate many measures)
- Run some fancy ML method on the corpus of inlining decisions and benchmark results (relative to the baseline) to identify a set of inlining heuristics that yield good improvements at the expense of a reasonable increase in text size.
〰️ 👋 Profit!Implement in the inliner the heuristics identified above👌
As a bonus point, the intern gets to write a paper about this.
Silly idea # 2: how about brute forcing this?
As a bonus point, the intern gets to write a paper about this. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dlsniper
Feb 27, 2018
Contributor
@dlsniper: I'd like to avoid a //go:inline comment if we can. I don't think it solves the problem well, as the inlining decision should probably depend on characteristics of the call site (e.g. in a loop, constant arguments, etc.), not just the function being called.
The compiler has no support for //go: directives at statement or expression scope, only at global scope. Not that it couldn't be added, but it's significant work.
@randall77 thank you for replying so quick on this. I understand that there is a fair amount of work, and concern at the same time with regards to how users will use this functionality. I think that the approach of having this enabled by default but with conservative defaults would be a good start.
However, what I have in mind when suggesting the introduction of //go:inline that could be added at call site is that the experienced users will have the understanding for how to use it and will be able to assert, via benchmarks, which approach works better for them when the compiler defaults are not enough.
From there, that could be collected as a feedback or observed how this is used in the wild, in order to allow further experimentation / changes to the heuristics / defaults. Much like what @CAFxX suggested but without dedicating an intern and a lot of hardware to running benchmarks. For example, in all of my use-cases so far, I would gladly trade a few more MB of binary size for better runtime speed. I understand that others may not wish to do the same, which is I why I think that satisfying all these requirements would be better left to the users.
One of the other interesting options of having this as a compiler directive is allowing of fine-tuning the standard library code by performing analysis on the existing benchmarks.
- Do I think it could be potentially abused / misused by users that do not understand what this option will do? Yes, I do. But then the burden would be entirely on the users rather than on the Go team to figure out "the best" way to move forward with this.
- Who are the people that I target with this option? This allows people that understand what they are doing to further fine-tune their code at a level that they would not have access today, which I believe it's a good step in the direction of giving some control to the users while providing solid defaults.
- Do I like the idea of introducing more magic directives to the compiler? I do not, but I also do not see another way to give these hints to the compiler.
Hope this helps. I'll continue to watch this issue and look forward to how this will work out. Thank you.
@randall77 thank you for replying so quick on this. I understand that there is a fair amount of work, and concern at the same time with regards to how users will use this functionality. I think that the approach of having this enabled by default but with conservative defaults would be a good start. However, what I have in mind when suggesting the introduction of From there, that could be collected as a feedback or observed how this is used in the wild, in order to allow further experimentation / changes to the heuristics / defaults. Much like what @CAFxX suggested but without dedicating an intern and a lot of hardware to running benchmarks. For example, in all of my use-cases so far, I would gladly trade a few more MB of binary size for better runtime speed. I understand that others may not wish to do the same, which is I why I think that satisfying all these requirements would be better left to the users. One of the other interesting options of having this as a compiler directive is allowing of fine-tuning the standard library code by performing analysis on the existing benchmarks.
Hope this helps. I'll continue to watch this issue and look forward to how this will work out. Thank you. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ugorji
Feb 27, 2018
Contributor
@dlsniper @randall77 My only concern with enforcing the //go:inline is that it only scales for final executables, not for libraries. Imagine i put //go:inline all over my lib, and a user depends on my lib. It wasn't the user's decision - it was the author of the lib forcing his decision on the users.
If we do //go:inline, let it be a hint to the compiler, that if this function doesn't make the cut but it is within reason, pls inline it. E.g. let's say only functions up to a cost of 10 are inlined, but my function has a cost of 12, meaning it will not be inlined by default. But as the cost is within 30% over threshold (ie cost less than 10+30% = 13), and the author says "pls inline", then it will be inlined, but if cost is over 30%, the hint will be disregarded.
This is similar to how c++ inline keyword works, as a hint.
Now, I personally don't want a //go:inline. I prefer that the "general" (conservative) rules/heuristics for inlining are reasonable and fair and known and published. The compiler can still tweak outside of the general/published/conservative rules, but authors will work within those published ones and be happy.
My 2 cents.
|
@dlsniper @randall77 My only concern with enforcing the //go:inline is that it only scales for final executables, not for libraries. Imagine i put //go:inline all over my lib, and a user depends on my lib. It wasn't the user's decision - it was the author of the lib forcing his decision on the users. If we do //go:inline, let it be a hint to the compiler, that if this function doesn't make the cut but it is within reason, pls inline it. E.g. let's say only functions up to a cost of 10 are inlined, but my function has a cost of 12, meaning it will not be inlined by default. But as the cost is within 30% over threshold (ie cost less than 10+30% = 13), and the author says "pls inline", then it will be inlined, but if cost is over 30%, the hint will be disregarded. This is similar to how c++ inline keyword works, as a hint. Now, I personally don't want a //go:inline. I prefer that the "general" (conservative) rules/heuristics for inlining are reasonable and fair and known and published. The compiler can still tweak outside of the general/published/conservative rules, but authors will work within those published ones and be happy. My 2 cents. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mvdan
Feb 27, 2018
Member
As far as I know, it has been core to Go's design (including its compiler) to have as few knobs and flags as possible. This includes flags like -O4 and compiler directives in the code.
There have been no knobs to control inlining until now; why should enabling mid-stack inlining change that?
|
As far as I know, it has been core to Go's design (including its compiler) to have as few knobs and flags as possible. This includes flags like There have been no knobs to control inlining until now; why should enabling mid-stack inlining change that? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
aclements
Mar 7, 2018
Member
The problem with inlining directives is that people are notoriously bad at maintaining them as needs shift and code changes. We've resisted exposing inlining directives even to the runtime (which has several directives not available to user code) because we know they'll get stale and lead to code bloat and less performance. Instead, we have a test that checks that key functions are being inlined by the compiler's heuristics; and even that list gets out of date quickly.
Our long-term (albeit vague) plan is to use profile-guided optimization to make inlining decisions, rather than hand-crafted heuristics or developer annotations. It'll take a while to get there, but it fits very nicely with the Go model of doing the right thing automatically.
|
The problem with inlining directives is that people are notoriously bad at maintaining them as needs shift and code changes. We've resisted exposing inlining directives even to the runtime (which has several directives not available to user code) because we know they'll get stale and lead to code bloat and less performance. Instead, we have a test that checks that key functions are being inlined by the compiler's heuristics; and even that list gets out of date quickly. Our long-term (albeit vague) plan is to use profile-guided optimization to make inlining decisions, rather than hand-crafted heuristics or developer annotations. It'll take a while to get there, but it fits very nicely with the Go model of doing the right thing automatically. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ugorji
Apr 23, 2018
Contributor
@dlsniper @randall77 @aclements Given that code-freeze is in a week, will this be making it into go 1.11 in some form? There seems to have been zero movement here.
There some clear wins here, even with simple heuristics e.g. inlining leaf functions that panic, short delegate functions that just call other functions, leaf functions with switch statements, etc. These will make reflect faster, which will impact just about every go program (using json, fmt, etc).
Thanks.
|
@dlsniper @randall77 @aclements Given that code-freeze is in a week, will this be making it into go 1.11 in some form? There seems to have been zero movement here. There some clear wins here, even with simple heuristics e.g. inlining leaf functions that panic, short delegate functions that just call other functions, leaf functions with switch statements, etc. These will make reflect faster, which will impact just about every go program (using json, fmt, etc). Thanks. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
randall77
Apr 23, 2018
Contributor
We're not sure yet. We'd like to get something in, but the current heuristics are too aggressive. We've seen code size blowups of 100%. We've also seen net slowdowns.
We're thinking of trying to enable this for 1.11, but with a much stricter heuristic. But we don't know what that heuristic might be yet. Unfortunately, this is no one's top priority at the moment.
If you have particular programs that do get significant speedups from mid-stack inlining, please post them. It will help us guide the choice of heuristic.
|
We're not sure yet. We'd like to get something in, but the current heuristics are too aggressive. We've seen code size blowups of 100%. We've also seen net slowdowns. We're thinking of trying to enable this for 1.11, but with a much stricter heuristic. But we don't know what that heuristic might be yet. Unfortunately, this is no one's top priority at the moment. If you have particular programs that do get significant speedups from mid-stack inlining, please post them. It will help us guide the choice of heuristic. |
dr2chase
modified the milestones:
Go1.11,
Go1.12
May 31, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dr2chase
May 31, 2018
Contributor
See also: https://go-review.googlesource.com/c/go/+/109918
Quick summary of where we are:
- calling panic no longer forbids inlining.
- -l=4 gets midstack inlining, if you want to play with it. Binaries get bigger, compiles take longer. We're interested in feedback on how this works for people, especially when it doesn't work.
- the compiler itself is not helped by midstack inlining; a -l=4-built (but not -l=4-compiling) compiler runs slower than normal.
- some benchmarks speed up nicely
The bigger+slower compiler is worrisome, which is the main reason this is not enabled yet; if this happened to your binary, you'd not be happy. The minimum plan is to understand how to manage inlining so bigger+slower at least doesn't happen to the compiler, and hope that it generalizes. A more ambitious plan is to build some sort of a feedback framework so that it's clear where inlining would actually help, instead of just guessing. Or we could use machine learning....
|
See also: https://go-review.googlesource.com/c/go/+/109918 Quick summary of where we are:
The bigger+slower compiler is worrisome, which is the main reason this is not enabled yet; if this happened to your binary, you'd not be happy. The minimum plan is to understand how to manage inlining so bigger+slower at least doesn't happen to the compiler, and hope that it generalizes. A more ambitious plan is to build some sort of a feedback framework so that it's clear where inlining would actually help, instead of just guessing. Or we could use machine learning.... |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
CAFxX
Jun 1, 2018
Contributor
Is -l=4 still unsupported for production use? Or is it now supported for production but with potential performance regressions (like, say, -O3)?
|
Is |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dr2chase
Jun 1, 2018
Contributor
I am not sure of the official position, and it's not tested as much as it should be (i.e., I need to see about whether we can have a -l=4 test box) but it's supposed to at least execute correctly and we'd like to know when it doesn't, which I think is different from "you own the pieces". Debugging is also not well-tested for -l=4 binaries.
I've been rebenchmarking the compiler to check how inlining changes its performance, and the short answer is it isn't worse, but it isn't better, and without a noinline annotation on one method it doubles the size of the binary (with the annotation, it's only 50% larger). I don't think we want the noinline annotations to become part of common practice for using go (we use them in tests, very helpful there) but on the other hand it can also be a good way of figuring out the sort of inlining mistakes the compiler needs to not make in order to turn this on in general.
|
I am not sure of the official position, and it's not tested as much as it should be (i.e., I need to see about whether we can have a -l=4 test box) but it's supposed to at least execute correctly and we'd like to know when it doesn't, which I think is different from "you own the pieces". Debugging is also not well-tested for -l=4 binaries. I've been rebenchmarking the compiler to check how inlining changes its performance, and the short answer is it isn't worse, but it isn't better, and without a noinline annotation on one method it doubles the size of the binary (with the annotation, it's only 50% larger). I don't think we want the noinline annotations to become part of common practice for using go (we use them in tests, very helpful there) but on the other hand it can also be a good way of figuring out the sort of inlining mistakes the compiler needs to not make in order to turn this on in general. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
btracey
Jun 1, 2018
Contributor
I ran some of our BLAS benchmarks with no significant effect. Do you know if functions with asm stubs can be inlined (assuming the build tags are such that the asm is not actually used)?
|
I ran some of our BLAS benchmarks with no significant effect. Do you know if functions with asm stubs can be inlined (assuming the build tags are such that the asm is not actually used)? |
pushed a commit
to luci/luci-go
that referenced
this issue
Jun 2, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
davecheney
Jun 2, 2018
Contributor
|
-gcflags=“-l=4 -m”
Will give the definitive answer to that question.
… On 2 Jun 2018, at 06:53, Brendan Tracey ***@***.***> wrote:
I ran some of our BLAS benchmarks with no significant effect. Do you know if functions with asm stubs can be inlined (assuming the build tags are such that the asm is not actually used)?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
laboger
Jun 4, 2018
Contributor
I've been rebenchmarking the compiler to check how inlining changes its performance, and the short answer is it isn't worse, but it isn't better, and without a noinline annotation on one method it doubles the size of the binary (with the annotation, it's only 50% larger).
Is this a statement for amd64 or for other GOARCHes too? I would expect to see more improvement on ppc64x because of the high cost of loading and storing the arguments and return values.
Is this a statement for amd64 or for other GOARCHes too? I would expect to see more improvement on ppc64x because of the high cost of loading and storing the arguments and return values. |
davidlazar commentedMar 1, 2017
•
edited
Edited 1 time
-
davidlazar
edited Mar 11, 2017 (most recent)
Design doc: https://golang.org/design/19348-midstack-inlining