-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: doesn't inline basic funcs and doesn't optimize code after inlining #70235
Comments
I believe what is happening here is that the compiler is preferring to inline There are of course different choices we could make in the heuristics. For instance, we have thought about making the threshold higher when there is only a single callsite of a function. (Hard to tell in general with exported functions, but exported functions in package
It isn't a problem for a const string. It is just saying "the first argument at this callsite escapes to the heap". |
Randal, thx a lot for your fast reply. While this logic sounds as an explanation I must say it is naive to believe that compiler is smart enough to make best decisions possible. Remember Linus Torvalds commented on similar Intel blind believes that compilers are smart enough to optimize code greatly for VLIW (IA64) and they failed. I know this was commented millions of times, but w/o |
@randall77 what about 2nd example? inlining works fine in example 2, but code generated is obviously super sub-optimal. |
True, the compiler is not good at pushing work like that into conditionals. It has very little information about where subsequent uses of memory locations are.
That's #21536. |
We did add PGO to increase inlining thresholds in hot code. I don't think that applies in this case, but more generally, it is a workaround to not-enough inlining. |
While I love Golang, I worked tens of years with linux kernel and know for sure how small compiler hints (inline, likely/unlikely, free etc.) can make a huge impact here and there. It's unavoidable. If there was some fund to make performance optimizations we would donate it. Otherwise, Rust and such will take over from Golang in the web services field :/ |
I am just learning GO- so have things fresh, but have background with 1990 QBASIC (compiled) and Python. I don’t think GO compiler is wrong here, just pick up not optimal code. The var level =0 is global variable kept on heap and it is not even well declared. They recommend to put variables under main to be kept on stack, but even as global would work. With BASIC I had global variables and never had compile problem, even if I run simply code in 2024 on PC286 (12.5 MHz, 4mb ram) all compiles and run fast. That string “span” is passed to function and not even used at all in StartSpan. Try as below first playground https://go.dev/play/p/hZchFGhz9ey
Or as but try with global variable as it is or then put under main.
|
Go version
go version go1.22.6 darwin/arm64
Output of
go env
in your module/workspace:What did you do?
I'm trying to add a basic tracing library to our code and found out so many example when golang doesn't inline basic cases, so providing here a "simplified" to the minimum examples w/o code doing a real tracing.
What did you see happen?
example 1
StartSpan
doesn't inline in this example in main().gcflags -m says that './main.go:9:15: "StartSpan" escapes to heap', though it is unclear what exactly escapes here and why it is a problem for const string.
Seems this is a problem as in tracing one frequently need to provide variadic number of attributes.
Replacing
fmt.Println()
to some non-variadic function helps in this example.Example 2
Ok, lets make compiler to inline StartSpan by replacing Println() to a different function, but having a variadic KV list:
now
StartSpan
is inlined to main, but its arguments are prepared BEFORE check forif level&1 != 0
and it takes significant amount of code like this (actually, I saw 2 pages of asm with allocations and WriteBarrier in real life):so it seems like compiler is absolutely incapable to detect and reorder code efficiently in situations like this... :/
I believe such situations are pretty common in loggers, tracing and other typical scenarios and it deserves optimization.
What did you expect to see?
efficient inlining of function
StartSpan
in both examples and checking level variable BEFORE initializing lots of argument objects.The text was updated successfully, but these errors were encountered: