Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: inline forwarding thunk functions #8421

Closed
dvyukov opened this issue Jul 25, 2014 · 16 comments
Closed

cmd/compile: inline forwarding thunk functions #8421

dvyukov opened this issue Jul 25, 2014 · 16 comments

Comments

@dvyukov
Copy link
Member

dvyukov commented Jul 25, 2014

For the following functions (cl120140044):

func (v *Value) Load() interface{} {
    return valueLoad(v)
}

// Implemented in assembly.
func valueLoad(v *Value) interface{}

more time is spent in the Value.Load thunk than in the valueLoad that does actual work:

 26.20%  atomic.test  atomic.test        [.] sync/atomic_test.func.053
 14.78%  atomic.test  atomic.test        [.] sync/atomic.(*Value).Load
 14.41%  atomic.test  atomic.test        [.] assertE2Tret
 10.50%  atomic.test  atomic.test        [.] sync/atomic.valueLoad   

Compiler should inline the Value.Load thunk.
And probably cases that add few additional arguments like:

func foo(x int) int {
    return bar(x, true)
}
@minux
Copy link
Member

minux commented Jul 26, 2014

Comment 1:

inlining is disabled for all non-leaf function so that the
stack trace could be more informative.
but I think at least the compiler could inline compiler
generated wrappers.

@dvyukov
Copy link
Member Author

dvyukov commented Jul 26, 2014

Comment 2:

C world solves this with more elaborate debug info. Basically a PC can say that it is
related to several frames.

@remyoudompheng
Copy link
Contributor

Comment 3:

I opened issue #4735 last year for the more detailed debug info, for the same purpose.

@rsc
Copy link
Contributor

rsc commented Aug 5, 2014

Comment 4:

I might make an exception to the stack trace rule for this specific case, where a method
is calling a func without a body.
But not until the conversion from C to Go is done, so not for a while.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc rsc changed the title cmd/gc: inline forwarding thunk functions cmd/compile: inline forwarding thunk functions Jun 8, 2015
@cespare
Copy link
Contributor

cespare commented Dec 1, 2016

I have also run across cases where it would be helpful to inline "forwarding" functions.

For instance, consider this pattern:

func F() { f() } // well-documented, exported function in x.go
func f() // stub for asm implementation in x_amd64.go
func f() { // pure-go implementation in x_other.go
  // ...
}

If this is a very fast function, having a double function-call overhead instead of single is significant. (The alternative is to declare and document F twice.) And if you want to write the pure-Go implementation of f in x.go, so that a test may compare the asm with the pure-go implementation, then that requires another level of forwarding (so triple function-call overhead):

// x.go
func F() { f() }
func fgo() { /* ... */ }
func f() // stub for asm implementation in x_amd64.go
func f() { fgo() } // x_other.go

(I also described this issue in this go-nuts thread.)

@josharian
Copy link
Contributor

There's a forthcoming proposal for mid-stack inlining in #19348 that I believe should address this.

@davidlazar davidlazar self-assigned this Mar 3, 2017
@quasilyte
Copy link
Contributor

See also https://golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.

@CAFxX
Copy link
Contributor

CAFxX commented Dec 8, 2018

See also golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.

As a followup to that CL I started sending CLs to inline the fast paths of some of the sync facilities: https://go-review.googlesource.com/c/go/+/152698

I also tried to apply the same approach to the lock/unlock in runtime but it turns out it's impossible for now, as lock and unlock turn out to be mutually recursive with throw, and the compiler doesn't like that (both because they are mutually recursive, and because it blows the inlining budget). The only way to solve it would be to move the call to throw out of the fast path.

@josharian
Copy link
Contributor

@CAFxX if you mark throw as //go:noinline does that help with the mutual recursion/budget problem?

@CAFxX
Copy link
Contributor

CAFxX commented Dec 10, 2018

@josharian I didn't think of that but alas it doesn't work:

src/runtime/lock_futex.go:46:6: cannot inline lock: recursive

I suppose this is because the check for recursion is done before caninl and is, likely, unaware of noinline annotations. FWIW maybe this check could be folded in caninl so that it is aware of actual inlining restrictions/decisions (I'm assuming the only reason why inlining of recursive functions is disabled is to avoid inlining a copy of a function within that same function, although I'm not really sure of the root reason for this).

@CAFxX
Copy link
Contributor

CAFxX commented Dec 10, 2018

Another thing I noticed, that is only partially related though, is that it seems that no DCE is done before inlining, so things like the following happen (I'm compiling on linux amd64, race mode is disabled). Notice all the "dead" ifs:

src/math/bits/bits.go:283:6: can inline Len as: func(uint) int { if UintSize == 32 {  }; return Len64(uint64(x)) }
src/runtime/cgocall.go:176:14: inlining call to dolockOSThread func() { if GOARCH == "wasm" {  }; _g_ := getg(); _g_.m.lockedg.set(_g_); _g_.lockedm.set(_g_.m) }
src/runtime/stack.go:1277:24: inlining call to stackmapdata func(*stackmap, int32) bitvector { if stackDebug > 0 {  }; return bitvector literal }
src/runtime/malloc.go:1105:6: can inline nextSample as: func() int32 { if GOOS == "plan9" {  }; return fastexprand(MemProfileRate) }
src/go/token/position.go:452:15: inlining call to sync.(*RWMutex).RLock method(*sync.RWMutex) func() { if bool(false) {  }; if atomic.AddInt32(&sync.rw.readerCount, int32(1)) < int32(0) { sync.runtime_SemacquireMutex(&sync.rw.readerSem, bool(false)) }; if bool(false) {  } }
src/runtime/type.go:269:20: inlining call to reflectOffsUnlock func() { if raceenabled {  }; unlock(&reflectOffs.lock) }

(longer output is here: https://gist.github.com/CAFxX/c901a4420216d154144b4ef4b469b2cf)

Now, all of these were clearly within the inlining budget, but it's likely that many others would be within the budget if DCE would run before inlining. I'm not sure if there is already a bug for this or if I should file one.

@josharian
Copy link
Contributor

I suppose this is because the check for recursion is done before caninl

Yes. I'd need to double-check, but IIRC the recursion check is done by visitBottomUp, which is shared across a few use cases. But we could add a flag to it to break on //go:noinline when using for inlining analysis. This is (probably) a straightforward change. If you'd like, I can look into it. Or you are welcome to.

no DCE is done before inlining

We do basic DCE before inlining. See e.g. https://go-review.googlesource.com/c/go/+/37499/ and other links from #19699. See also a related discussion in #16871. There are others. The general problem is that inlining happens early, and we don't have much analysis completed to work off of to do a more complete DCE. Ideas for cheaply increasing early DCE coverage are always welcome (cheap both in execution time and code complexity).

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/154058 mentions this issue: cmd/compile: don't recurse into go:noinline during inlining walk

@josharian
Copy link
Contributor

I suppose this is because the check for recursion is done before caninl

@CAFxX I had a few minutes to kill so I whipped up CL 154058, if you'd like to patch that in and then experiment some with your runtime lock/unlock changes. And it should be relatively clear how to beef up the change if there are mechanisms other than go:noinline that you want to use to avoid a recursive walk. (See inl.go for how to write said mechanisms.)

josharian added a commit to josharian/go that referenced this issue Dec 30, 2018
DO NOT SUBMIT

[Needs tests, compilebench, better write-up]

Updates golang#8421

For code:

package p

//go:noinline
func f() { g() }

func g() { f() }

Before:

$ go tool compile -m -m x.go
x.go:4:6: cannot inline f: recursive
x.go:6:6: cannot inline g: recursive

After:

$ go tool compile -m -m x.go
x.go:6:6: can inline g as: func() { f() }
x.go:4:6: cannot inline f: marked go:noinline
x.go:4:13: inlining call to g func() { f() }

Change-Id: I49b4095451f4f23ca2e0778fdc617d9cba8698b2
@randall77
Copy link
Contributor

This should be fixed with mid-stack inlining.

@cespare
Copy link
Contributor

cespare commented Apr 2, 2019

@randall77 awesome! Just tested out tip and indeed, the forwarding overhead I mentioned in #8421 (comment) (both the "double" and "triple" versions) are gone.

josharian added a commit to josharian/go that referenced this issue May 7, 2019
DO NOT SUBMIT

[Needs tests, compilebench, better write-up]

Updates golang#8421

For code:

package p

//go:noinline
func f() { g() }

func g() { f() }

Before:

$ go tool compile -m -m x.go
x.go:4:6: cannot inline f: recursive
x.go:6:6: cannot inline g: recursive

After:

$ go tool compile -m -m x.go
x.go:6:6: can inline g as: func() { f() }
x.go:4:6: cannot inline f: marked go:noinline
x.go:4:13: inlining call to g func() { f() }

Change-Id: I49b4095451f4f23ca2e0778fdc617d9cba8698b2
@golang golang locked and limited conversation to collaborators Apr 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests