Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: cmd/compile: add a go:inline directive #21536

Closed
OneOfOne opened this issue Aug 19, 2017 · 32 comments

Comments

Projects
None yet
@OneOfOne
Copy link
Contributor

commented Aug 19, 2017

The compiler (gc) has come a long way, however the inliner is sometimes lacking, for example any function with a for-loop no matter how simple it is, won't get inlined so we either have to use a goto loop instead (which is used few times in the runtime) or just have a one big ugly function.

My proposal is to add a go:inline directive that will force inlining, even if it's just for non-exported functions.

This will help make a lot of code much cleaner and allows separation of big functions that can't be currently inlined.

If the proposal is accepted, I'd like to work on it.

Another semi-common example:

func (xx *XXHash64) WriteString(s string) (int, error) {
	if len(s) == 0 {
		return 0, nil
	}
	return xx.Write([]byte(s))
}
// cannot inline (*XXHash64).WriteString: non-leaf method

Related: #17566 #14768

@dr2chase @josharian @randall77 @mvdan

@josharian

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2017

This proposal has basically no chance of being accepted.

Go eschews knobs.

Compiler directives are widely disliked.

I don't think it can be implemented reasonably, if at all. Functions are not eligible for inlining for many reasons. These include conceptual implementation difficulty: I have no idea how to inline a function containing defer. And practical implementation difficulty: It took a dedicated person a summer to get non-leaf inlining working correctly, even with part of the implementation already written. And it's still not released because of the binary size and compilation time aspects. Supporting such a directive would mean solving a host of such problems.

I personally have long wanted a much smaller directive that causes compilation to fail if a function is deemed ineligible for inlining. This would allow me to refactor without fear that future inlining changes will cause my no-op inlining to become a silent performance regression. Even this much more limited proposal has been rejected forcefully.

All that said, if you'd like to work on improving inlining, that'd be fabulous. There are many opportunities to improve inlining decisions, to improve how we do inlining (e.g. partial inlining), and to expand the set of functions that can be unlined.

@neelance

This comment has been minimized.

Copy link
Member

commented Aug 19, 2017

@josharian The flag -gcflags "-m" makes the go compiler print decisions on inlining. Maybe you can build a tool that parses this output and verifies it against some "must inline" list. Most people won't need it, but if you really do, then you could add that tool to your build pipeline.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2017

@neelance yes, or alternatively parse the output of go tool nm and make sure the inlined function has been dead-code-eliminated. (Note that I want it most of all for package runtime, so "my build pipeline" is probably go test runtime.)

@neelance

This comment has been minimized.

Copy link
Member

commented Aug 19, 2017

@josharian I already suspected that you want it for the stdlib. May I ask why you didn't build it yet? I get the point of not adding such a feature to the Go compiler itself, but a standalone tool doesn't do any harm if only used by individuals voluntarily. I just cringe when I hear "fear of refactoring". ;-)

@josharian

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2017

@neelance just haven't gotten to it yet; I have little contribution time at the moment and lots I want to do. I'd be delighted if someone else wanted to jump in and do this. The recently introduce runtime function tophash would be an excellent and relevant test case. :)

@mvdan mvdan changed the title [proposal] cmd/compile: add a go:inline directive Proposal cmd/compile: add a go:inline directive Aug 19, 2017

@mvdan mvdan changed the title Proposal cmd/compile: add a go:inline directive Proposal: cmd/compile: add a go:inline directive Aug 19, 2017

@gopherbot gopherbot added this to the Proposal milestone Aug 19, 2017

@gopherbot gopherbot added the Proposal label Aug 19, 2017

@gopherbot

This comment has been minimized.

Copy link

commented Aug 20, 2017

Change https://golang.org/cl/57410 mentions this issue: runtime: add TestIntendedInlining

@rsc

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2017

For the reasons Josh outlined above, we're not going to do this. Note also that even in C/C++, the "inline" directive does not really mean "this must be inlined". It is more a guideline than a rule.

@rsc rsc closed this Aug 21, 2017

gopherbot pushed a commit that referenced this issue Aug 22, 2017

runtime: add TestIntendedInlining
The intent is to allow more aggressive refactoring
in the runtime without silent performance changes.

The test would be useful for many functions.
I've seeded it with the runtime functions tophash and add;
it will grow organically (or wither!) from here.

Updates #21536 and #17566

Change-Id: Ib26d9cfd395e7a8844150224da0856add7bedc42
Reviewed-on: https://go-review.googlesource.com/57410
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@mandarjog

This comment has been minimized.

Copy link

commented May 9, 2018

Any reason why inlining is more difficult in go than other languages like java or c++.

I was trying to find the rationale behind no branches / 40 expressions or less restrictions for inlined functions.
Any pointers are appreciated.

While trying to write performant code (dynamic proto3 marshaling), hand inlining a function brought down marshal time from 5ms to 4ms. That can be significant depending on your performance goals.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented May 9, 2018

@mandarjog A closed issue is not the place for this discussion. General discussions about inlining in Go should use a forum; see https://golang.org/wiki/Questions . Specific discussions about the gc toolchain should take place on the golang-dev mailing list.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

I'm going to re-open this for further, given the recent discussion at #17566.

In an attempt to migrate discussion to here, to keep the threads distinct, I will also paste in some other folks' comments from there.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

@randall77 wrote:

The question I have for //go:inline is, does it apply to functions or to callsites? It sounds like the proposal is the former, but I worry that people would really want the latter.

@thanm wrote:

Callsite pragma sounds interesting to me. Maybe go a bit farther and say that such pragmas are required to be intra-package (caller and callee both in package X, and callee is not exported)? Even then a "must" inline pragmas is still a scary prospect. Well-intentioned users could easily blow up package compile times by accident. Also seems like the inliner would have to build some sort of call graph in order to detect cycles in must-inline directives.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

Even then a "must" inline pragmas is still a scary prospect.

Agreed. Any request for inlining will fundamentally need to be a request, not a promise. I imagine the initial implementation would be a relaxation or removal of the budget, but not the other constraints.

For callsite inlining requests, does that propagate to only the immediate function being called (possibly using mid-stack inlining), or to as much of the call tree from that point onwards as is possible?

What (if anything) does this do in the case of recursive calls which can be partially mid-stack inlined? There is discussion of this case in a few scattered issues and CLs, because it turns out to be important in a few places in the runtime.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

My rationale for asking for this is at #17566 (comment). In particular, an argument that I believe does not get weaker with better (but not perfect) heuristics is that having the inliner decisions be opaque and subtle means that unrelated changes by other developers can significantly impact performance, and the only way to get robustness is currently writing the caller in assembly.

My proposal, instead of //go:inline, is

package runtime

// Inline requests, but does not guarantee, that the caller be inlined.
function Inline() {}

Like runtime.KeepAlive and testing.Helper. And it's not a magic comment.

It's per-function instead of per-callsite, but I never needed the latter myself.

@rasky

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

A totally different option is to have a “flatten” directive that forces a function to be a leaf, that is all calls within the function body are recursively inlined. This is usually simpler to reason about because people usually know which functions should go fast, while it’s sometimes harder to foresee whether a function should always be inlined or not.

Moreover the flatten directive doesn’t propagate transitively, so a module author that misuses flatten shouldn’t cause exponential compile time behavior like a force-inline directive might cause.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

A small collection of examples where this would have been useful.

https://speakerdeck.com/gtank/micro-optimizing-go-code?slide=21
https://speakerdeck.com/gtank/i-wanna-go-fast?slide=12
https://lemire.me/blog/2017/09/05/go-does-not-inline-functions-when-it-should/
#17566 (comment)

I also had half a dozen people complain about it to me or to my immediate circle in the last year or so. I'll ask them to chime in with examples.

@cespare

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

I've had my share of fights with the inliner. Some representative examples are in some private code: a column-oriented database which naturally has some very important tight loops.

  • In two places, we have very small, simple functions containing loops that we had to rewrite with gotos in order to make them be inlined. (This is #14768, and I'm not sure whether an initial implementation of an inlining directive would let these be inlined or not anyway.)
  • We have a type with a few hot methods accompanied by a comment stating that it's important that all the methods are inlined and that any changes should be accompanied by manual verification of inlining. In the past I've had to adjust these to account for the inliner; for example, I originally deleted some if <sanity check> { panic } safety checks because panic prevented inlining, but more recently I was able to put them back in. (I still have to be careful about the size of the methods, though.)

In all these cases, it would be very helpful to have an assertion that can simply fail compilation if something cannot be inlined. Even without any fundamentally new inlining capabilities, it would greatly increase my confidence in making code adjustments and changing Go versions if I could tell the compiler my inlining wishes rather than just leaving scary comments for programmers.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

My proposal, instead of //go:inline, is [...] runtime.Inline

A function call isn't a great fit for this use case. What should happen here?

func f() {
  if false {
    runtime.Inline()
  }
}

What if it was x := false; if x instead of a constant? What if x was a false global that no one ever wrote to? In all these cases, testing.Helper and runtime.KeepAlive have clear semantics, by thinking of them as actual function invocations (which it is in the former case, and was for a release in the latter case).

I'll ask them to chime in with examples.

Please do. They are helpful not only in making the case, but in seeing opportunities for immediate improvement. George's problematic inlining case turned out to be a compiler bug around inlining intrinsics.

a comment stating that it's important that all the methods are inlined

Someone or other I proposed //go:mustinline, which would cause a compilation failure if inlining didn't happen (according to the normal rules). We ended up with TestIntendedInlining tests. Feel free to steal them. :)

A totally different option is to have a “flatten” directive

Very interesting. There are some tricky cases that I don't quite see how to express with this directive, though. Consider:

func f() {
  for {
    if rare {
      impossibleToInline()
    }
    definitelyNeedsToBeInlined()
  }
}

You can't flatten f, but flattening definitelyNeedsToBeInlined is impossible. I'd be good to try it out on a bunch of examples to see whether it works well in practice.

I suppose you could also fix this by tweaks the semantics of flatten. It could mean "fail to compile if you can't flatten the function", which is safer, but breaks the code above. It could also mean "flatten as much as you can, throughout the tree", which helps with the code above.

There's another problem with flattening across package lines: you can't request recursive inlining of a function that has already been compiled without recursive inlining--it breaks package-at-a-time compilation. I suppose you could specify that flatten only works within a package. That might suffice, and doing better inter-package inlining might take up the slack. But it is another complication.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

It would be neat to find a way to enable something like

runtime.Inline(func() { a = mul(b, c) })

but I can't think of anything that doesn't involve complex closure edge-cases.

I'll ask them to chime in with examples.

Please do. They are helpful not only in making the case, but in seeing opportunities for immediate improvement. George's problematic inlining case turned out to be a compiler bug around inlining intrinsics.

I think his benchmarks are still useful to show the performance impact, even if they are not meaningful to discuss the heuristic. My point is that any imperfect heuristic is a balancing game and a ticking bomb. As soon as someone comes along and changes something innocent, performance might drop.

(We should still fix bugs in the heuristic, of course.)

@mundaym

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

One example of a function that could really use some sort of inlining guarantee I have is the quarterRound implementation in chacha20. If it stops being inlined the performance of the generic implementation of chacha20 would be awful. (The inlining cost is why it doesn't use bits.RotateLeft32.)

If we did the closure technique then I don't think we should use the runtime package since it isn't a runtime operation. Maybe a compiler/must package (or compiler/try)? If the closure were interpreted as reducing the inlining cost of the called function to 0 then it would be easy to write inlineable functions:

func quarterRound(a, b, c, d uint32) (uint32, uint32, uint32, uint32) {
    try.Inline(func() {
        a += b
        d ^= a
        ...
        b = (b << 7) | (b >> 25)
    })()
    return a, b, c, d
}

Still a bit ugly and hard to read though.

@rasky

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

I'll notice that this is one of those cases where the goal of the author is better described by simply saying:

func (s *Cipher) XORKeyStream(dst, src []byte) {
      compiler.Flatten()

      [...]
}

as the idea is that XORKeyStream must become linear code with no function calls, irrespective of how quarterRound, xor or bits.Rotate* are implemented.

@bcmills

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

I wonder to what extent we can separate the “what” from the “why”.

//go:inline specifies a particular mechanism, but I don't think the details of the concrete mechanism actually matter most of the time: it's important that calls to these functions be efficient, but whether that is done by inlining more aggressively, choosing a more efficient calling convention, or something else entirely should really be up to the compiler.

The thing that you need to communicate to the compiler is “this function is called frequently enough that it's worth spending the extra compile time and binary size to optimize heavily”. If you can't express that through profile-guided optimization, then perhaps an annotation like //go:hotfunction would more clearly express the relevant information (and more clearly capture the compiler's obligation — or lack thereof — to use any particular mechanism).

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

The inlining cost is why it doesn't use bits.RotateLeft32.

bits.RotateLeft32 should have a cheaper inlining cost than the | expression. Please file an issue if not.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

perhaps an annotation like //go:hotfunction would more clearly express the relevant information

//go:O3? :P

Anyway, this is a nice idea. It's a good way to dip our toes in the FGO water (and ultimately perhaps decide to ignore the hint in favor of actual FGO?) and it is not hard to imagine finding other places in the compiler where we could choose to spend a few extra cycles on such functions.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

For what it's worth, GCC supports both __attribute__((hot)) and __attribute__((cold)) for functions and labels (putting the attribute on a label affects whether a branch to that label is predicted as taken). When profiling feedback is available, GCC ignores these attributes.

GCC also supports function attributes always_inline, noinline, and flatten. Among many others.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2019

Re #17566, package-runtime-limited annotations are fine to experiment with, without a proposal. We have plenty of those already.

But for the broader space of all Go packages, micromanaging optimizations at the source code level like //go:inline or __attribute__((hot)) etc cuts against the grain of Go's approach to software development. This is why we have only one GC tuning knob for example (just GOGC). Make the tools smart enough to make a good decision instead of turning all the programmers into tuning AIs.

Closing again.

@rsc rsc closed this Apr 10, 2019

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

A few people pointed out that I didn't respond directly to the various examples of automatic inlining not being optimal for various performance results. To address that explicitly, the answer needs to be to improve the inlining rules. Look at what the compiler does, understand why it's wrong, and fix it.

An inline keyword is like C's old register keyword. It's telling the compiler something it should be able to do on its own. If our compiler is not doing a good enough job, the bugs to file are specific instances where it should be doing better, so we can fix those.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

@rsc I did not reopen the conversation because I believe the issue is automatic inlining not being optimal.

What I tried to communicate at #21536 (comment) and #17566 (comment) is that:

  1. short of a feedback-guided optimization framework, I don't believe any heuristic can be good enough, so sometimes assembly will have to be written instead
  2. any implicit behavior is opaque and unreliable: if someone else later changes something about an inlined function that incorrectly makes it not inline, performance will tank, so it will either require coordination across every contributor to the code, or force assembly to be written
  3. forcing users to use assembly feels strictly worse than forcing them to add annotations

I believe 2. in particular is not in line with Go's approach to team collaboration and explicit behavior, and while annotations are also not in line with Go's approach, so is assembly (3.).

I'd love to be wrong about 1. but no one from the compiler team has expressed confidence that I am.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

I tend to agree with your first point, it's going to be hard to make inlining happen at all the points which are performance critical, and not blow up the binary by inlining too much everywhere else.

So I think we're going to need to be somewhat conservative in our rules. I'm ok with that. I'm a fan of simplicity even if it costs a little bit of performance.

I don't really agree with your second point. If you're modifying the code, you should expect possible performance changes, and you should run benchmarks as part of validating those changes. Inlining is only one of many ways in which you could break the performance characteristics of a function by modifying its code.

@dr2chase

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

I suspect a heuristic could be good enough, but we have not put adequate effort into finding such a heuristic, and are proceeding only gradually.

I did an experiment last night where I exposed all the inliner knobs and turned them up well past 11%. I'm working on the spreadsheet and how to share it but TLDR is:

  • some build user times increased by a factor of 2 (for 4x-8x-.6x, budget-large budget-call cost)
  • some run times decreased by almost 20% (same parameters)
  • worst case build user time increase was 2.7x.
  • median performance increase was about 3%

Ridiculous inlining tends to tickle build bugs in the runtime, which limited how far I could take this.

@seebs

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

I suspect the compiler is much better than I am at deciding what to inline roughly 95% of the time.

It's true that you should probably be benchmarking performance-sensitive code when changing it, but it's still useful to get more direct feedback on what specifically changed -- if a change is part of a larger change, it can be very non-obvious that the actual substantive performance change is a single function no longer being inlined. (There's also the risk that the change actually comes about because of a new compiler, or better yet, a new compiler and a code change, where either alone doesn't change the behavior.)

I do note that "inline", and "register" for that matter, can have additional compiler-hinting semantics that are sometimes non-obvious, like "also this implies you can't take the thing's address". I am not sure whether I like that or not -- I dislike the additional complexity, but I see the appeal of being able to give the compiler hints about a thing's behavior or assumptions.

I do sort of like an annotation for "be sure to let me know if this can't be inlined", but that's something I think we can already do with existing tools and a bit of shell scripting.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

  1. short of a feedback-guided optimization framework, I don't believe any heuristic can be good enough, so sometimes assembly will have to be written instead
  2. any implicit behavior is opaque and unreliable: if someone else later changes something about an inlined function that incorrectly makes it not inline, performance will tank, so it will either require coordination across every contributor to the code, or force assembly to be written
  3. forcing users to use assembly feels strictly worse than forcing them to add annotations

You are claiming that, without //go:inline, people must write assembly
to work around compiler inlining decisions that they need to override.

This is untrue. At most, people must copy and paste Go code. No assembly required.

A little copying is better than a new, subtle source code annotation.
(Not quite the same ring as the original, I admit.)

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

A few other, much more minor points:

(1) short of a feedback-guided optimization framework, I don't believe any heuristic can be good enough

It depends on what "good enough" means. For one person, today's compiler might be "good enough". For someone else, no compiler would ever be "good enough". That needs to be defined better. And without filed bugs for the specific cases that aren't good enough today, how would we know? The first step has to be filing those bugs.

(2) any implicit behavior is opaque and unreliable: if someone else later changes something about an inlined function that incorrectly makes it not inline, performance will tank,

The performance cliff you are describing is true of almost every compiler optimization, but most notably escape analysis. We avoid making escape analysis "opaque" by giving developers tools to understand the compiler's decisions, specifically -gcflags=-m, which also reports inlining decisions. And we avoid making it "unreliable" by writing tests that specific cases are handled a particular way. If there are specific inlining cases that are important work today and are important to keep working (like maybe chacha20 quarterRound), it may be worth adding them as test cases (in $GOROOT/test/inline*.go).

so it will either require coordination across every contributor to the code,

As Keith pointed out, tests and benchmarks exist precisely to enable this kind of coordination. It is not some impossibility but instead a reality of all our development.

(3) forcing users to use assembly feels strictly worse than forcing them to add annotations

Regarding annotations more generally, there is a significant learning, maintenance, and semantic cost to annotations and they are historically avoided for this kind of thing in Go. Unlike modern C, Go is not meant to be a replacement for an assembler in all cases. If you really need very precise control over details of code generations, assembly can be an entirely appropriate answer. And that is OK! But again the proper phrasing of (3) would be "forcing users to copy and paste code". And that makes the argument for micromanagement of the compiler's decision here even weaker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.