-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: missed obvious inlining #58905
Comments
It seems to me that in In |
The inliner doesn't track an ongoing cost within a function and stop inlining once some threshold is reached. With one exception, the threshold it applies is solely on the complexity of the target of a given call edge. The one exception is that if we're inlining into a "big" function, we reduce the threshold for all call edges coming from the big function, but none of the functions involved here are close to the "big" threshold. In this case, (We're looking into changing up all of this, but that's how it currently works.) |
Sounds like #54632 |
I don't see how that's related. All of the function references in my code in static calls, and this is a case where we simply don't inline a call, versus producing bad code from inlining. |
Just in case it matters:
GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/austin/.cache/go-build" GOENV="/home/austin/.config/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/austin/r/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/austin/r/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/home/austin/go.tmp" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/home/austin/go.tmp/pkg/tool/linux_amd64" GOVCS="" GOVERSION="devel go1.21-eaa00d878e Thu Mar 2 17:29:22 2023 -0500" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/austin/go.tmp/src/go.mod" GOWORK="" CGO_CFLAGS="-O2 -g" CGO_CPPFLAGS="" CGO_CXXFLAGS="-O2 -g" CGO_FFLAGS="-O2 -g" CGO_LDFLAGS="-O2 -g" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2641378236=/tmp/go-build -gno-record-gcc-switches" Bootstrap is go1.17.13. |
I pulled the patch, e.g.
and I can see the weird behavior. To get more info on the ordering, I added some code to the inliner's
From the resulting trace output, it looks like the odd ordering is due to cycles in the call graph. Specifically it looks like there is a giant cycle that includes
This would probably explain some of the weirdness, since We could try to do some work to process nodes within the SCC in a better order, although I am not sure exactly how much work that would be. |
Here is a smaller example program that illustrates the problem (easier
When the inliner processes this package, it calls ir.VisitFuncsBottomUp to collect up strongly connected components in the call graph; then handles all functions in a given SCC (either a trivial SCC or a non-trivial SCC where recursion is actually happening) as a clump. For each function "f" in an SCC clump, the inliner first calls CanInline (to see if "f" itself can be inlined), then calls InlineCalls on the body of "f" (to inline things into "f"). For non-trivial SCCs, this implies that when processing a function earlier in the clump, we may miss inlining opportunities since functions later in the clump haven't been tested for inlinability yet. In the example above, we'll have a clump { "root", "mid" }, then all other functions will be in their own trivial SCCs. Thus the "-m=2" output for this package looks like:
Note that even though "mid" is inlinable, we don't inline the call to "mid" from "root", because at the point we are doing inlining in "root", "mid" hasn't been checked yet. I am going to send a CL that changes this behavior for non-trivial SCCs (e.g. does all "CanInline" calls in an SCC before any "InlineCalls" operations). |
Change https://go.dev/cl/474955 mentions this issue: |
Thanks for figuring this out, Than. This lends more support to the idea I've been pushing that inlining shouldn't be bottom-up at all, but should be done starting with the cheapest edge. That would fix SCC ordering issues by eliminating the SCC entirely. |
For grins, I wrote a small CL that gathers statistics on how many functions are presented to the inliner as part of a non-trivial SCC. Here's what I got for
So it seems that SCC functions are relatively uncommon but not unheard of. Note that this is total functions presented to the inliner (which is not the same as total functions in the final executable). |
This patch changes the relative order of "CanInline" and "InlineCalls" operations within the inliner for clumps of functions corresponding to strongly connected components in the call graph. This helps increase the amount of inlining within SCCs, particularly in Go's runtime package, which has a couple of very large SCCs. For a given SCC of the form { fn1, fn2, ... fnk }, the inliner would (prior to this point) walk through the list of functions and for each function first compute inlinability ("CanInline") and then perform inlining ("InlineCalls"). This meant that if there was an inlinable call from fn3 to fn4 (for example), this call would never be inlined, since at the point fn3 was visited, we would not have computed inlinability for fn4. We now do inlinability analysis for all functions in an SCC first, then do actual inlining for everything. This results in 47 additional inlines in the Go runtime package (a fairly modest increase percentage-wise of 0.6%). Updates #58905. Change-Id: I48dbb1ca16f0b12f256d9eeba8cf7f3e6dd853cd Reviewed-on: https://go-review.googlesource.com/c/go/+/474955 Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
Let me know if this issue can be closed out... thanks |
Thanks. My particular issue was fixed by your change, so I'll close this. There's still the open question of how to deal better with large SCCs, but that's part of a bigger project. :) |
Change https://go.dev/cl/492015 mentions this issue: |
Delete the "InlineSCCOnePass" debugging flag and the inliner fallback code that kicks in if it is used. The change it was intended to guard has been working on tip for some time, no need for the fallback any more. Updates #58905. Change-Id: I2e1dbc7640902d9402213db5ad338be03deb96c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/492015 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
I'm debugging a performance regression in CL 466099. One cause is that the compiler does not inline the call from
newInlineUnwinder
toresolveInternal
, even thoughresolveInternal
is inlinable according to the-m
output and the call fromnext
toresolveInternal
a few lines down does get inlined. This appears to be because this call totypecheck.HaveInlineBody
returns false forresolveInternal
when we're visitingnewInlineUnwinder
and true when we're visitingnext
. Beyond that I get past my depth. The -m output seems to indicate that the inliner visitsnewInlineUnwinder
, thenresolveInternal
, thennext
, which is surprising because that's not in bottom-up order (and this is regardless of source order). If visiting a function is what creates the inline body, then this visit order is clearly a problem.Relevant -m output:
cc @mdempsky @thanm
The text was updated successfully, but these errors were encountered: