New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: inline forwarding thunk functions #8421

Open
dvyukov opened this Issue Jul 25, 2014 · 14 comments

Comments

Projects
None yet
10 participants
@dvyukov
Member

dvyukov commented Jul 25, 2014

For the following functions (cl120140044):

func (v *Value) Load() interface{} {
    return valueLoad(v)
}

// Implemented in assembly.
func valueLoad(v *Value) interface{}

more time is spent in the Value.Load thunk than in the valueLoad that does actual work:

 26.20%  atomic.test  atomic.test        [.] sync/atomic_test.func.053
 14.78%  atomic.test  atomic.test        [.] sync/atomic.(*Value).Load
 14.41%  atomic.test  atomic.test        [.] assertE2Tret
 10.50%  atomic.test  atomic.test        [.] sync/atomic.valueLoad   

Compiler should inline the Value.Load thunk.
And probably cases that add few additional arguments like:

func foo(x int) int {
    return bar(x, true)
}
@minux

This comment has been minimized.

Member

minux commented Jul 26, 2014

Comment 1:

inlining is disabled for all non-leaf function so that the
stack trace could be more informative.
but I think at least the compiler could inline compiler
generated wrappers.
@dvyukov

This comment has been minimized.

Member

dvyukov commented Jul 26, 2014

Comment 2:

C world solves this with more elaborate debug info. Basically a PC can say that it is
related to several frames.
@remyoudompheng

This comment has been minimized.

Contributor

remyoudompheng commented Jul 26, 2014

Comment 3:

I opened issue #4735 last year for the more detailed debug info, for the same purpose.
@rsc

This comment has been minimized.

Contributor

rsc commented Aug 5, 2014

Comment 4:

I might make an exception to the stack trace rule for this specific case, where a method
is calling a func without a body.
But not until the conversion from C to Go is done, so not for a while.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@rsc rsc removed release-none labels Apr 10, 2015

@rsc rsc changed the title from cmd/gc: inline forwarding thunk functions to cmd/compile: inline forwarding thunk functions Jun 8, 2015

@cespare

This comment has been minimized.

Contributor

cespare commented Dec 1, 2016

I have also run across cases where it would be helpful to inline "forwarding" functions.

For instance, consider this pattern:

func F() { f() } // well-documented, exported function in x.go
func f() // stub for asm implementation in x_amd64.go
func f() { // pure-go implementation in x_other.go
  // ...
}

If this is a very fast function, having a double function-call overhead instead of single is significant. (The alternative is to declare and document F twice.) And if you want to write the pure-Go implementation of f in x.go, so that a test may compare the asm with the pure-go implementation, then that requires another level of forwarding (so triple function-call overhead):

// x.go
func F() { f() }
func fgo() { /* ... */ }
func f() // stub for asm implementation in x_amd64.go
func f() { fgo() } // x_other.go

(I also described this issue in this go-nuts thread.)

@josharian

This comment has been minimized.

Contributor

josharian commented Mar 2, 2017

There's a forthcoming proposal for mid-stack inlining in #19348 that I believe should address this.

@davidlazar davidlazar self-assigned this Mar 3, 2017

@Quasilyte

This comment has been minimized.

Contributor

Quasilyte commented Dec 8, 2018

See also https://golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Dec 8, 2018

See also golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.

As a followup to that CL I started sending CLs to inline the fast paths of some of the sync facilities: https://go-review.googlesource.com/c/go/+/152698

I also tried to apply the same approach to the lock/unlock in runtime but it turns out it's impossible for now, as lock and unlock turn out to be mutually recursive with throw, and the compiler doesn't like that (both because they are mutually recursive, and because it blows the inlining budget). The only way to solve it would be to move the call to throw out of the fast path.

@josharian

This comment has been minimized.

Contributor

josharian commented Dec 9, 2018

@CAFxX if you mark throw as //go:noinline does that help with the mutual recursion/budget problem?

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Dec 10, 2018

@josharian I didn't think of that but alas it doesn't work:

src/runtime/lock_futex.go:46:6: cannot inline lock: recursive

I suppose this is because the check for recursion is done before caninl and is, likely, unaware of noinline annotations. FWIW maybe this check could be folded in caninl so that it is aware of actual inlining restrictions/decisions (I'm assuming the only reason why inlining of recursive functions is disabled is to avoid inlining a copy of a function within that same function, although I'm not really sure of the root reason for this).

@CAFxX

This comment has been minimized.

Contributor

CAFxX commented Dec 10, 2018

Another thing I noticed, that is only partially related though, is that it seems that no DCE is done before inlining, so things like the following happen (I'm compiling on linux amd64, race mode is disabled). Notice all the "dead" ifs:

src/math/bits/bits.go:283:6: can inline Len as: func(uint) int { if UintSize == 32 {  }; return Len64(uint64(x)) }
src/runtime/cgocall.go:176:14: inlining call to dolockOSThread func() { if GOARCH == "wasm" {  }; _g_ := getg(); _g_.m.lockedg.set(_g_); _g_.lockedm.set(_g_.m) }
src/runtime/stack.go:1277:24: inlining call to stackmapdata func(*stackmap, int32) bitvector { if stackDebug > 0 {  }; return bitvector literal }
src/runtime/malloc.go:1105:6: can inline nextSample as: func() int32 { if GOOS == "plan9" {  }; return fastexprand(MemProfileRate) }
src/go/token/position.go:452:15: inlining call to sync.(*RWMutex).RLock method(*sync.RWMutex) func() { if bool(false) {  }; if atomic.AddInt32(&sync.rw.readerCount, int32(1)) < int32(0) { sync.runtime_SemacquireMutex(&sync.rw.readerSem, bool(false)) }; if bool(false) {  } }
src/runtime/type.go:269:20: inlining call to reflectOffsUnlock func() { if raceenabled {  }; unlock(&reflectOffs.lock) }

(longer output is here: https://gist.github.com/CAFxX/c901a4420216d154144b4ef4b469b2cf)

Now, all of these were clearly within the inlining budget, but it's likely that many others would be within the budget if DCE would run before inlining. I'm not sure if there is already a bug for this or if I should file one.

@josharian

This comment has been minimized.

Contributor

josharian commented Dec 10, 2018

I suppose this is because the check for recursion is done before caninl

Yes. I'd need to double-check, but IIRC the recursion check is done by visitBottomUp, which is shared across a few use cases. But we could add a flag to it to break on //go:noinline when using for inlining analysis. This is (probably) a straightforward change. If you'd like, I can look into it. Or you are welcome to.

no DCE is done before inlining

We do basic DCE before inlining. See e.g. https://go-review.googlesource.com/c/go/+/37499/ and other links from #19699. See also a related discussion in #16871. There are others. The general problem is that inlining happens early, and we don't have much analysis completed to work off of to do a more complete DCE. Ideas for cheaply increasing early DCE coverage are always welcome (cheap both in execution time and code complexity).

@gopherbot

This comment has been minimized.

gopherbot commented Dec 13, 2018

Change https://golang.org/cl/154058 mentions this issue: cmd/compile: don't recurse into go:noinline during inlining walk

@josharian

This comment has been minimized.

Contributor

josharian commented Dec 13, 2018

I suppose this is because the check for recursion is done before caninl

@CAFxX I had a few minutes to kill so I whipped up CL 154058, if you'd like to patch that in and then experiment some with your runtime lock/unlock changes. And it should be relatively clear how to beef up the change if there are mechanisms other than go:noinline that you want to use to avoid a recursive walk. (See inl.go for how to write said mechanisms.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment