New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: odd inlining heuristic under mid-stack inlining #22310

Open
mdempsky opened this Issue Oct 17, 2017 · 6 comments

Comments

Projects
None yet
7 participants
@mdempsky
Member

mdempsky commented Oct 17, 2017

Suppose:

func F() { ...; G(); ... }
func G() { ... }

When midstack inlining (#19348) is enabled (-l=4), F is inlinable under two conditions:

  1. G is inlineable, and F's total cost including the cost of inlining G falls under the max budget.
  2. G is not inlineable, and F's total cost (excluding the cost of inlining G, since it won't be inlined) falls under the max budget.

However, this leaves out a third possibility:

  1. G is inlineable, and F's total cost falls under the max budget as long as G is not inlined.

This seems counter-intuitive to me: that simplifying G from being non-inlineable (case 2) to inlineable (case 3), might cause F to no longer be inlineable.

On the other hand, maybe this is desirable: if we can't inline F+G, maybe it's better to make non-inline calls to F+G, than to inline F and make a non-inline call to G.

What inlining heuristics do other compilers use in situations like this?

/cc @ianlancetaylor

@mvdan

This comment has been minimized.

Member

mvdan commented Oct 17, 2017

Is there a metric to try to predict if a func call would happen often enough for the call overhead to be noticeable? For example if G is run inside a loop, but F isn't, that likely means that you would want to inline G instead of F.

Not saying that this can be measured accurately on a per-func basis, but some cases like I described above could lead to a clear winner when a comparison is made.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 17, 2017

GCC uses a number of heuristics. GCC also has multiple inliners that work at different points during the compilation.

Among the heuristics are:

  • It is good if the call site is passing an argument that will change an indirect call (via a function pointer) in the inlined function into a direct call
  • It is good if the call site is passing an argument that will change the loop iteration count or loop stride from a variable to a constant
  • It is bad if the call site and the inlined function are in the same strongly connected subset of the call graph, on the theory that this will tend to increase stack frame size to no benefit (a strongly connected subset is a subset in which every function calls every other function)
  • It is (less) bad if the inlined function is itself in a strongly connected subset, even if the call site is not
  • It is good if the inlined function is actually declared inline
  • It is (very slightly) bad if the inlined function and the call site are in different compilation units, on the theory that most people put the functions they want to inline in header files anyhow
  • It is good if the call site is passing an argument that will make some array index into a constant in the inlined function
  • It is good if the inlined function is known to be hot based on profiling information
  • It is good if the inlined function can be inlined at all call sites (typically only possible for static functions)

Beyond these heuristics (I'm probably missing some) GCC estimates the time required to execute the inlined function at the call site, and compares that to the time required to call the function (these time estimates are all based on profiling data if available). If that is positive, and if the size increase is acceptable based on the command line arguments (-Os and so forth), the function is inlined. GCC sorts the inline possibilities from cheapest to most expensive, and does the cheapest first, then recomputes the other values if they have changed.

@thanm

This comment has been minimized.

Member

thanm commented Oct 18, 2017

LLVM inliner makes use of synthetic/static profile (via static branch prediction, Wu & Larus type stuff) to drive inlining decisions when there is no actual dynamic execution profile. I would guess that GCC does the same.

@rasky

This comment has been minimized.

Member

rasky commented Oct 21, 2017

A specialization of Ian's heuristics for go would be:

  • It is good if the inline function if the call site is passing an argument to a interface{} that is used as type-switch.

See binary.Read for an example where there are many fast-paths that could be inlined.

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Mar 29, 2018

@dsnet

This comment has been minimized.

Member

dsnet commented Apr 14, 2018

Here's a concrete example of interfaces with type-switch (simpler than binary.Read) that is a great candidate for inlining.

func ValueOf(v interface{}) Value {
	switch v := v.(type) {
	case bool:
		return valueOfBool(v)
	case int32:
		return valueOfInt(int64(v))
	case int64:
		return valueOfInt(v)
	case uint32:
		return valueOfUint(uint64(v))
	case uint64:
		return valueOfUint(v)
	case float32:
		return valueOfFloat(float64(v))
	case float64:
		return valueOfFloat(v)
	case string:
		return valueOfString(v)
	case []byte:
		return valueOfBytes(v)
	case EnumNumber:
		return valueOfEnum(v)
	default:
		panic(fmt.Sprintf("invalid type: %T", v))
	}
}

Each individual valueOfX is inlineable. However, the type-switch is somewhat large, which makes the entire function not suitable for inlining. In practice, almost every call of ValueOf has a concrete type passed in, for which the compiler can statically prove that only one case is executed and everything else is dead code. Thus, calling ValueOf(int32(0)) should be exactly equivalent to valueOfInt(int64(v)).

@thanm thanm modified the milestones: Go1.11, Go1.12 Jun 19, 2018

@randall77

This comment has been minimized.

Contributor

randall77 commented Dec 12, 2018

Punting to 1.13, too late for anything major in 1.12.

@randall77 randall77 modified the milestones: Go1.12, Go1.13 Dec 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment