Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: no closure inlining even in the simple case? #35196

Open
ivoras opened this issue Oct 27, 2019 · 5 comments

Comments

@ivoras
Copy link

@ivoras ivoras commented Oct 27, 2019

I hesitate to call this an issue, since I know it's a complex topic, and I've read about half a dozen issue reports touching this topic, which kind-of sort-of fix some inlining cases, but not all of them.

I'm using a simple pattern for avoiding lock leakage:

// WithRWMutex extends the RWMutex type with convenient .With(func) functions
type WithRWMutex struct {
	sync.RWMutex
}

// WithRLock executes the given function with the mutex rlocked
func (m *WithRWMutex) WithRLock(f func()) {
	m.RWMutex.RLock()
	f()
	m.RWMutex.RUnlock()
}

// WithWLock executes the given function with the mutex wlocked
func (m *WithRWMutex) WithWLock(f func()) {
	m.RWMutex.Lock()
	f()
	m.RWMutex.Unlock()
}

in practice it's used like this:

beforeCode()
o.WithWLock(func() {
 doSomething(o)
})
afterCode()

Full example here: https://go.godbolt.org/z/mkvMbZ

I'd expect that when the closure passed to WithWlock() is simple enough (i.e. no need for a new stack), inlining would collapse all the scaffolding and compile it to something like:

beforeCode()
m.RWMutex.Lock()
doSomething(o)
m.RWMutex.Unlock()
afterCode()

But looking at the assembly code (https://go.godbolt.org/z/mkvMbZ), the inner function is compiled once, its address taken, passed to the WithWLock() function, i.e. a full closure call is being done.

This is on go 1.13.

It seems that a full closure call is an expensive choice for this particular pattern?

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Oct 27, 2019

Our current inline heuristic thinks that (*WithRWMutex).WithRLock is too expensive to inline. Its body contains 3 calls, which is above the cost threshold.

Even if I hack the compiler to inline in this case, it still doesn't inline the anonymous func. But it does do so in simpler cases. Not sure what's going on there, probably a phase ordering issue.

@ivoras

This comment has been minimized.

Copy link
Author

@ivoras ivoras commented Oct 27, 2019

Hmmm... which criteria is used which results in 3 function calls being considered expensive? Wouldn't the assembly for that case be something like 3 CALL instructions with some low-weight scaffolding for the arguments and the stack?

Or is it just the middle call, the one which calls the closure function, the one which drives the cost up, since it's a call via a function pointer?

Re: hacking the compiler - could the "inlining threshold" be a compiler argument?

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Oct 27, 2019

Hmmm... which criteria is used which results in 3 function calls being considered expensive? Wouldn't the assembly for that case be something like 3 CALL instructions with some low-weight scaffolding for the arguments and the stack?

We have a cost budget of 80, and each call is cost 57. That normally allows only one call in an inlineable body (with a few exceptions, like intrinsics). See #19348 (comment)

Or is it just the middle call, the one which calls the closure function, the one which drives the cost up, since it's a call via a function pointer?

The closureness of the call isn't used as part of the cost, other than it must be able to resolve the target function.

Re: hacking the compiler - could the "inlining threshold" be a compiler argument?

We do have a -l argument to the compiler which adjusts inlining decisions. But that flag is mostly for testing/debugging, it's not intended for performance tuning. In general we try to avoid providing knobs like this, as it leads to all kinds of thorny questions, like "what if two packages want two different inlining thresholds?" "is is the threshold of the calling package, or the called package?".

@ivoras

This comment has been minimized.

Copy link
Author

@ivoras ivoras commented Oct 27, 2019

Thank you, I appreciate your answers and your expertise.

On my system, the -l flag is documented as "disable inlining" so not really usable for experimenting.

But even if inlining worked, is the compiler smart enough to inline the closure as a whole inside the one and only caller (since the closure is an anonymous function used exactly once)?

FWIW, 2 more cents: I'm guessing the issue is because "go build" also builds required packages, so there could be mixed results from multiple invocations. Doesn't that point to flags like inline thresholds being global settings on a build machine, for all packages. Maybe in a config file, an environment variable, etc.?

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Oct 27, 2019

But even if inlining worked, is the compiler smart enough to inline the closure as a whole inside the one and only caller (since the closure is an anonymous function used exactly once)?

This code optimizes to just a call to doSomething.

	func() {
		doSomething(o)
	}()

FWIW, 2 more cents: I'm guessing the issue is because "go build" also builds required packages, so there could be mixed results from multiple invocations. Doesn't that point to flags like inline thresholds being global settings on a build machine, for all packages. Maybe in a config file, an environment variable, etc.?

Sure, but that doesn't really solve the problem, it just punts it to the user. I import 2 packages, A and B. A says "I am fastest when the inlining threshold is 63". B says "I am fastest when the inlining threshold is 95". What should I set my inlining threshold to?

We've basically decided that tuning to this level is not worth the complication it introduces to the ecosystem.

@FiloSottile FiloSottile changed the title No closure inlining even in the simple case? cmd/compile: no closure inlining even in the simple case? Oct 28, 2019
@FiloSottile FiloSottile added this to the Unplanned milestone Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.