cmd/compile: inline forwarding thunk functions #8421

dvyukov · 2014-07-25T10:32:43Z

For the following functions (cl120140044):

func (v *Value) Load() interface{} {
    return valueLoad(v)
}

// Implemented in assembly.
func valueLoad(v *Value) interface{}

more time is spent in the Value.Load thunk than in the valueLoad that does actual work:

 26.20%  atomic.test  atomic.test        [.] sync/atomic_test.func.053
 14.78%  atomic.test  atomic.test        [.] sync/atomic.(*Value).Load
 14.41%  atomic.test  atomic.test        [.] assertE2Tret
 10.50%  atomic.test  atomic.test        [.] sync/atomic.valueLoad   

Compiler should inline the Value.Load thunk.
And probably cases that add few additional arguments like:

func foo(x int) int {
    return bar(x, true)
}

minux · 2014-07-26T05:03:28Z

Comment 1:

inlining is disabled for all non-leaf function so that the
stack trace could be more informative.
but I think at least the compiler could inline compiler
generated wrappers.

dvyukov · 2014-07-26T07:38:33Z

Comment 2:

C world solves this with more elaborate debug info. Basically a PC can say that it is
related to several frames.

remyoudompheng · 2014-07-26T07:45:40Z

Comment 3:

I opened issue #4735 last year for the more detailed debug info, for the same purpose.

rsc · 2014-08-05T17:59:23Z

Comment 4:

I might make an exception to the stack trace rule for this specific case, where a method
is calling a func without a body.
But not until the conversion from C to Go is done, so not for a while.

cespare · 2016-12-01T06:26:47Z

I have also run across cases where it would be helpful to inline "forwarding" functions.

For instance, consider this pattern:

func F() { f() } // well-documented, exported function in x.go

func f() // stub for asm implementation in x_amd64.go

func f() { // pure-go implementation in x_other.go
  // ...
}

If this is a very fast function, having a double function-call overhead instead of single is significant. (The alternative is to declare and document F twice.) And if you want to write the pure-Go implementation of f in x.go, so that a test may compare the asm with the pure-go implementation, then that requires another level of forwarding (so triple function-call overhead):

// x.go
func F() { f() }
func fgo() { /* ... */ }

func f() // stub for asm implementation in x_amd64.go

func f() { fgo() } // x_other.go

(I also described this issue in this go-nuts thread.)

josharian · 2017-03-02T19:58:15Z

There's a forthcoming proposal for mid-stack inlining in #19348 that I believe should address this.

quasilyte · 2018-12-08T21:37:19Z

See also https://golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.

CAFxX · 2018-12-08T23:01:46Z

See also golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.

As a followup to that CL I started sending CLs to inline the fast paths of some of the sync facilities: https://go-review.googlesource.com/c/go/+/152698

I also tried to apply the same approach to the lock/unlock in runtime but it turns out it's impossible for now, as lock and unlock turn out to be mutually recursive with throw, and the compiler doesn't like that (both because they are mutually recursive, and because it blows the inlining budget). The only way to solve it would be to move the call to throw out of the fast path.

josharian · 2018-12-09T01:31:49Z

@CAFxX if you mark throw as //go:noinline does that help with the mutual recursion/budget problem?

CAFxX · 2018-12-10T08:28:08Z

@josharian I didn't think of that but alas it doesn't work:

src/runtime/lock_futex.go:46:6: cannot inline lock: recursive

I suppose this is because the check for recursion is done before caninl and is, likely, unaware of noinline annotations. FWIW maybe this check could be folded in caninl so that it is aware of actual inlining restrictions/decisions (I'm assuming the only reason why inlining of recursive functions is disabled is to avoid inlining a copy of a function within that same function, although I'm not really sure of the root reason for this).

CAFxX · 2018-12-10T08:40:55Z

Another thing I noticed, that is only partially related though, is that it seems that no DCE is done before inlining, so things like the following happen (I'm compiling on linux amd64, race mode is disabled). Notice all the "dead" ifs:

src/math/bits/bits.go:283:6: can inline Len as: func(uint) int { if UintSize == 32 {  }; return Len64(uint64(x)) }
src/runtime/cgocall.go:176:14: inlining call to dolockOSThread func() { if GOARCH == "wasm" {  }; _g_ := getg(); _g_.m.lockedg.set(_g_); _g_.lockedm.set(_g_.m) }
src/runtime/stack.go:1277:24: inlining call to stackmapdata func(*stackmap, int32) bitvector { if stackDebug > 0 {  }; return bitvector literal }
src/runtime/malloc.go:1105:6: can inline nextSample as: func() int32 { if GOOS == "plan9" {  }; return fastexprand(MemProfileRate) }
src/go/token/position.go:452:15: inlining call to sync.(*RWMutex).RLock method(*sync.RWMutex) func() { if bool(false) {  }; if atomic.AddInt32(&sync.rw.readerCount, int32(1)) < int32(0) { sync.runtime_SemacquireMutex(&sync.rw.readerSem, bool(false)) }; if bool(false) {  } }
src/runtime/type.go:269:20: inlining call to reflectOffsUnlock func() { if raceenabled {  }; unlock(&reflectOffs.lock) }

(longer output is here: https://gist.github.com/CAFxX/c901a4420216d154144b4ef4b469b2cf)

Now, all of these were clearly within the inlining budget, but it's likely that many others would be within the budget if DCE would run before inlining. I'm not sure if there is already a bug for this or if I should file one.

josharian · 2018-12-10T18:42:20Z

I suppose this is because the check for recursion is done before caninl

Yes. I'd need to double-check, but IIRC the recursion check is done by visitBottomUp, which is shared across a few use cases. But we could add a flag to it to break on //go:noinline when using for inlining analysis. This is (probably) a straightforward change. If you'd like, I can look into it. Or you are welcome to.

no DCE is done before inlining

We do basic DCE before inlining. See e.g. https://go-review.googlesource.com/c/go/+/37499/ and other links from #19699. See also a related discussion in #16871. There are others. The general problem is that inlining happens early, and we don't have much analysis completed to work off of to do a more complete DCE. Ideas for cheaply increasing early DCE coverage are always welcome (cheap both in execution time and code complexity).

gopherbot · 2018-12-13T18:20:21Z

Change https://golang.org/cl/154058 mentions this issue: cmd/compile: don't recurse into go:noinline during inlining walk

josharian · 2018-12-13T18:21:54Z

I suppose this is because the check for recursion is done before caninl

@CAFxX I had a few minutes to kill so I whipped up CL 154058, if you'd like to patch that in and then experiment some with your runtime lock/unlock changes. And it should be relatively clear how to beef up the change if there are mechanisms other than go:noinline that you want to use to avoid a recursive walk. (See inl.go for how to write said mechanisms.)

DO NOT SUBMIT [Needs tests, compilebench, better write-up] Updates golang#8421 For code: package p //go:noinline func f() { g() } func g() { f() } Before: $ go tool compile -m -m x.go x.go:4:6: cannot inline f: recursive x.go:6:6: cannot inline g: recursive After: $ go tool compile -m -m x.go x.go:6:6: can inline g as: func() { f() } x.go:4:6: cannot inline f: marked go:noinline x.go:4:13: inlining call to g func() { f() } Change-Id: I49b4095451f4f23ca2e0778fdc617d9cba8698b2

randall77 · 2019-04-02T22:57:42Z

This should be fixed with mid-stack inlining.

cespare · 2019-04-02T23:53:34Z

@randall77 awesome! Just tested out tip and indeed, the forwarding overhead I mentioned in #8421 (comment) (both the "double" and "triple" versions) are gone.

DO NOT SUBMIT [Needs tests, compilebench, better write-up] Updates golang#8421 For code: package p //go:noinline func f() { g() } func g() { f() } Before: $ go tool compile -m -m x.go x.go:4:6: cannot inline f: recursive x.go:6:6: cannot inline g: recursive After: $ go tool compile -m -m x.go x.go:6:6: can inline g as: func() { f() } x.go:4:6: cannot inline f: marked go:noinline x.go:4:13: inlining call to g func() { f() } Change-Id: I49b4095451f4f23ca2e0778fdc617d9cba8698b2

dvyukov added accepted Performance labels Aug 5, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed release-none labels Apr 10, 2015

rsc changed the title ~~cmd/gc: inline forwarding thunk functions~~ cmd/compile: inline forwarding thunk functions Jun 8, 2015

josharian mentioned this issue Jun 13, 2015

cmd/compile/ssa: duplicate block elim #11189

Open

tdewolff mentioned this issue Aug 22, 2016

cmd/link: generate debugging information to display inlined functions in backtraces #4735

Closed

mvdan mentioned this issue Jan 19, 2017

cmd/compile: improve inlining cost model #17566

Open

davidlazar self-assigned this Mar 3, 2017

This was referenced Dec 12, 2018

cmd/compile: better const-based optimizations handling in compiler frontend #29095

Open

cmd/compile: DCE before inlining does not remove dead ifs #29189

Closed

josharian mentioned this issue Jan 14, 2019

cmd/compile: relax recursive restriction while inlining #29737

Closed

randall77 closed this as completed Apr 2, 2019

cespare mentioned this issue Apr 2, 2019

Remove duplicated function definitions cespare/xxhash#22

Open

golang locked and limited conversation to collaborators Apr 1, 2020

gopherbot added the FrozenDueToAge label Apr 1, 2020

rsc unassigned davidlazar Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: inline forwarding thunk functions #8421

cmd/compile: inline forwarding thunk functions #8421

dvyukov commented Jul 25, 2014

minux commented Jul 26, 2014

dvyukov commented Jul 26, 2014

remyoudompheng commented Jul 26, 2014

rsc commented Aug 5, 2014

cespare commented Dec 1, 2016

josharian commented Mar 2, 2017

quasilyte commented Dec 8, 2018

CAFxX commented Dec 8, 2018

josharian commented Dec 9, 2018

CAFxX commented Dec 10, 2018 •

edited

Loading

CAFxX commented Dec 10, 2018 •

edited

Loading

josharian commented Dec 10, 2018

gopherbot commented Dec 13, 2018

josharian commented Dec 13, 2018

randall77 commented Apr 2, 2019

cespare commented Apr 2, 2019

cmd/compile: inline forwarding thunk functions #8421

cmd/compile: inline forwarding thunk functions #8421

Comments

dvyukov commented Jul 25, 2014

minux commented Jul 26, 2014

dvyukov commented Jul 26, 2014

remyoudompheng commented Jul 26, 2014

rsc commented Aug 5, 2014

cespare commented Dec 1, 2016

josharian commented Mar 2, 2017

quasilyte commented Dec 8, 2018

CAFxX commented Dec 8, 2018

josharian commented Dec 9, 2018

CAFxX commented Dec 10, 2018 • edited Loading

CAFxX commented Dec 10, 2018 • edited Loading

josharian commented Dec 10, 2018

gopherbot commented Dec 13, 2018

josharian commented Dec 13, 2018

randall77 commented Apr 2, 2019

cespare commented Apr 2, 2019

CAFxX commented Dec 10, 2018 •

edited

Loading

CAFxX commented Dec 10, 2018 •

edited

Loading