Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: PGO fails to do multiple levels of inlining into a single function #69046

Open
prattmic opened this issue Aug 23, 2024 · 2 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@prattmic
Copy link
Member

Reproducer:

-- dep/dep.go --
package dep

var sink int

//go:noinline
func spin() {
        for i := 0; i < 1000000; i++ {
                sink = i
        }
}

func b() {
        // Two calls, too hot to inline, darn...
        spin()
        spin()
        for i := 0; i < 1000000; i++ {
                sink = i
        }
}

func a() {
        // Two calls, too hot to inline, darn...
        b()
        b()
        for i := 0; i < 1000000; i++ {
                sink = i
        }
}

//go:noinline
func Foo() {
        a()
}
-- dep/dep_test.go --
package dep

import "testing"

func BenchmarkMain(b *testing.B) {
        for i := 0; i < b.N; i++ {
                Foo()
        }
}
-- main.go --
package main

import "example.com/pgo/dep"

func main() {
        dep.Foo()
}
-- go.mod --
module example.com/pgo

go 1.23

https://go.dev/play/p/QxBt13wV3Uh

Run the benchmark to collect a profile, and then use that as a PGO profile:

$ go test -bench . -cpuprofile=cpu.pprof ./dep
$ go build "-gcflags=example.com/pgo/dep=-m=2 -d=pgodebug=1" -pgo=cpu.pprof                         
# example.com/pgo/dep
hot-callsite-thres-from-CDF=0.6493506493506493
dep/dep.go:6:6: cannot inline spin: marked go:noinline
hot-node enabled increased budget=2000 for func=example.com/pgo/dep.b
dep/dep.go:12:6: can inline b with cost 133 as: func() { spin(); spin(); for loop }
hot-node enabled increased budget=2000 for func=example.com/pgo/dep.a
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:23:3 in function example.com/pgo/dep.a
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:24:3 in function example.com/pgo/dep.a
dep/dep.go:21:6: can inline a with cost 285 as: func() { b(); b(); for loop }
dep/dep.go:31:6: cannot inline Foo: marked go:noinline
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:23:3 in function example.com/pgo/dep.a
dep/dep.go:23:3: inlining call to b
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:24:3 in function example.com/pgo/dep.a
dep/dep.go:24:3: inlining call to b
hot-budget check allows inlining for call example.com/pgo/dep.a (cost 285) at dep/dep.go:32:3 in function example.com/pgo/dep.Foo
dep/dep.go:32:3: inlining call to a

The hot-budget lines note that PGO allows inlining b into a and a into Foo. dep/dep.go:32:3: inlining call to a indicates that a was inlined into Foo, but there should then be a subsequent inline b into Foo. The output makes this a bit confusing, but objdump makes it clear that b was not inlined:

0000000000467a80 <example.com/pgo/dep.Foo>:
...
  467a8b:       |  |         e8 b0 ff ff ff             call   467a40 <example.com/pgo/dep.b>
  467a90:       |  |         e8 ab ff ff ff             call   467a40 <example.com/pgo/dep.b>
...

A bit of custom logging in the compiler makes it more clear what is happening:

dep/dep.go:31:6: function Foo: DevirtualizeAndInlineFunc
dep/dep.go:31:6: function Foo: TryInlineCall to a
hot-budget check allows inlining for call example.com/pgo/dep.a (cost 285) at dep/dep.go:32:3 in function example.com/pgo/dep.Foo
dep/dep.go:32:3: inlining call to a
dep/dep.go:31:6: function Foo: TryInlineCall to b
dep/dep.go:31:6: canInlineCallExpr from Foo to b: cost too high: maxCost 80, callSiteScore 133, hot false
dep/dep.go:31:6: mkinlcall b: !canInlineCallExpr

It looks like what is happening here is:

  1. a in inlined into Foo.
  2. We then attempt to inline b into Foo (from the inlined body of a).
  3. canInlineCallExpr checks inlineCostOK of Foo -> b. This consults the PGO edge map, which does not contain a hot edge from Foo -> b because PGO includes inline calls in edges. i.e., it has hot edges Foo -> a and a -> b.
  4. Inline does not occur because we didn't increase the budget and base cost is too high.

I believe what should happen is if inlining a call inside of an ir.InlinedCallExpr, we should use the original name of the InlinedCallExpr to consult the PGO edge map.

cc @cherrymui

@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Aug 23, 2024
@prattmic prattmic added this to the Go1.24 milestone Aug 23, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 23, 2024
@prattmic
Copy link
Member Author

I believe what should happen is if inlining a call inside of an ir.InlinedCallExpr, we should use the original name of the InlinedCallExpr to consult the PGO edge map.

Getting the original name looks somewhat painful, as it looks like the only reference we keep to it is a reference to the inline tree via an inline mark statement in the expression init: https://cs.opensource.google/go/go/+/master:src/cmd/compile/internal/noder/reader.go;drc=4e1cc09f8b9bcef2b6d0839a7d0026b50c21998b;bpv=1;bpt=1;l=3513

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Development

No branches or pull requests

3 participants