Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: doesn't inline basic funcs and doesn't optimize code after inlining #70235

Open
david-its opened this issue Nov 6, 2024 · 8 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@david-its
Copy link

Go version

go version go1.22.6 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/xxx/Library/Caches/go-build'
GOENV='/Users/xxx/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/xxx/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/xxx/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.6/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.6/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.6'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/6n/4bnk08l915n2qsvtzm_lmtv40000gn/T/go-build2946791027=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

I'm trying to add a basic tracing library to our code and found out so many example when golang doesn't inline basic cases, so providing here a "simplified" to the minimum examples w/o code doing a real tracing.

What did you see happen?

example 1

package main

import "fmt"

var level = 0

func StartSpan(name string) {
        if level&1 != 0 {
                fmt.Println("StartSpan")
        }
}

func main() {
        StartSpan("span")
}

StartSpan doesn't inline in this example in main().
gcflags -m says that './main.go:9:15: "StartSpan" escapes to heap', though it is unclear what exactly escapes here and why it is a problem for const string.

Seems this is a problem as in tracing one frequently need to provide variadic number of attributes.
Replacing fmt.Println() to some non-variadic function helps in this example.

Example 2

Ok, lets make compiler to inline StartSpan by replacing Println() to a different function, but having a variadic KV list:

func StartSpan(name string, kv ...KV) {
        if level&1 != 0 {
                // do smth useless, just need a call to some other func here
                fn(name, kv)
        }
}

var list []KV

func fn(name string, kv []KV) {
        // do smth with kv... e.g. copy elements to some global list
        for idx := range kv {
                list = append(list, kv[idx])
        }
}

func main() {
        StartSpan("span", KV{"k1", "v1"}, KV{"k2", "v2"})
}

now StartSpan is inlined to main, but its arguments are prepared BEFORE check for if level&1 != 0 and it takes significant amount of code like this (actually, I saw 2 pages of asm with allocations and WriteBarrier in real life):

main.main STEXT size=432 args=0x0 locals=0xb8 funcid=0x0 align=0x0
...
        0x001c 00028 (/tmp/test/main.go:27)     STP     (ZR, ZR), main..autotmp_6-80(SP)
        0x0020 00032 (/tmp/test/main.go:27)     STP     (ZR, ZR), main..autotmp_6-64(SP)
        0x0024 00036 (/tmp/test/main.go:27)     STP     (ZR, ZR), main..autotmp_6-48(SP)
        0x0028 00040 (/tmp/test/main.go:27)     STP     (ZR, ZR), main..autotmp_6-32(SP)
        0x002c 00044 (/tmp/test/main.go:27)     MOVD    $2, R5
        0x0030 00048 (/tmp/test/main.go:27)     MOVD    R5, main..autotmp_6-72(SP)
        0x0034 00052 (/tmp/test/main.go:27)     MOVD    $go:string."k1"(SB), R6
        0x003c 00060 (/tmp/test/main.go:27)     MOVD    R6, main..autotmp_6-80(SP)
        0x0040 00064 (/tmp/test/main.go:27)     MOVD    R5, main..autotmp_6-56(SP)
        0x0044 00068 (/tmp/test/main.go:27)     MOVD    $go:string."v1"(SB), R6
        0x004c 00076 (/tmp/test/main.go:27)     MOVD    R6, main..autotmp_6-64(SP)
        0x0050 00080 (/tmp/test/main.go:27)     MOVD    R5, main..autotmp_6-40(SP)
        0x0054 00084 (/tmp/test/main.go:27)     MOVD    $go:string."k2"(SB), R6
        0x005c 00092 (/tmp/test/main.go:27)     MOVD    R6, main..autotmp_6-48(SP)
        0x0060 00096 (/tmp/test/main.go:27)     MOVD    R5, main..autotmp_6-24(SP)
        0x0064 00100 (/tmp/test/main.go:27)     MOVD    $go:string."v2"(SB), R5
        0x006c 00108 (/tmp/test/main.go:27)     MOVD    R5, main..autotmp_6-32(SP)
        0x0070 00112 (<unknown line number>)    NOP
        0x0070 00112 (<unknown line number>)    PCDATA  $0, $-3

while the level flag is checked just here:
        
        0x0070 00112 (/tmp/test/main.go:11)     MOVD    main.level(SB), R5
        0x0078 00120 (/tmp/test/main.go:11)     PCDATA  $0, $-1
        0x0078 00120 (/tmp/test/main.go:11)     TBZ     $0, R5, 132
...

so it seems like compiler is absolutely incapable to detect and reorder code efficiently in situations like this... :/
I believe such situations are pretty common in loggers, tracing and other typical scenarios and it deserves optimization.

What did you expect to see?

efficient inlining of function StartSpan in both examples and checking level variable BEFORE initializing lots of argument objects.

@randall77
Copy link
Contributor

StartSpan doesn't inline in this example in main().

I believe what is happening here is that the compiler is preferring to inline fmt.Println into StartSpan in preference to inlining StartSpan into main. When it does the former then StartSpan is too big to inline into the latter.

There are of course different choices we could make in the heuristics. For instance, we have thought about making the threshold higher when there is only a single callsite of a function. (Hard to tell in general with exported functions, but exported functions in package main are an exception.)

gcflags -m says that './main.go:9:15: "StartSpan" escapes to heap', though it is unclear what exactly escapes here and why it is a problem for const string.

It isn't a problem for a const string. It is just saying "the first argument at this callsite escapes to the heap".

@david-its
Copy link
Author

I believe what is happening here is that the compiler is preferring to inline fmt.Println into StartSpan in preference to inlining StartSpan into main. When it does the former then StartSpan is too big to inline into the latter.

Randal, thx a lot for your fast reply. While this logic sounds as an explanation I must say it is naive to believe that compiler is smart enough to make best decisions possible. Remember Linus Torvalds commented on similar Intel blind believes that compilers are smart enough to optimize code greatly for VLIW (IA64) and they failed.

I know this was commented millions of times, but w/o inline directive such simple yet vital scenarios will never be handled correctly :/ And this truly impacts the believe in the language which is otherwise great.

@david-its
Copy link
Author

@randall77 what about 2nd example? inlining works fine in example 2, but code generated is obviously super sub-optimal.

@david-its david-its changed the title compiler doesn't inline basic funcs compiler doesn't inline basic funcs and doesn't optimize code after inlining Nov 7, 2024
@randall77
Copy link
Contributor

now StartSpan is inlined to main, but its arguments are prepared BEFORE check for if level&1 != 0

True, the compiler is not good at pushing work like that into conditionals. It has very little information about where subsequent uses of memory locations are.
There's superfluous zeroing there also, that's #47107.

I know this was commented millions of times, but w/o inline directive such simple yet vital scenarios will never be handled correctly

That's #21536.

@dr2chase
Copy link
Contributor

dr2chase commented Nov 7, 2024

We did add PGO to increase inlining thresholds in hot code. I don't think that applies in this case, but more generally, it is a workaround to not-enough inlining.

@dr2chase dr2chase added this to the Backlog milestone Nov 7, 2024
@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 7, 2024
@seankhliao seankhliao changed the title compiler doesn't inline basic funcs and doesn't optimize code after inlining cmd/compile: doesn't inline basic funcs and doesn't optimize code after inlining Nov 7, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 7, 2024
@david-its
Copy link
Author

We did add PGO to increase inlining thresholds in hot code. I don't think that applies in this case, but more generally, it is a workaround to not-enough inlining.

While I love Golang, I worked tens of years with linux kernel and know for sure how small compiler hints (inline, likely/unlikely, free etc.) can make a huge impact here and there. It's unavoidable. If there was some fund to make performance optimizations we would donate it. Otherwise, Rust and such will take over from Golang in the web services field :/

@Liebenfiels
Copy link

Liebenfiels commented Nov 11, 2024

I am just learning GO- so have things fresh, but have background with 1990 QBASIC (compiled) and Python.
I learn BASIC since last year based on old books and computers/ Python since this year.
Keeping things simply make it work as in QBASIC, overcomplicating simply functions is asking for problems.

I don’t think GO compiler is wrong here, just pick up not optimal code.

The var level =0 is global variable kept on heap and it is not even well declared. They recommend to put variables under main to be kept on stack, but even as global would work. With BASIC I had global variables and never had compile problem, even if I run simply code in 2024 on PC286 (12.5 MHz, 4mb ram) all compiles and run fast.
I keep functions simple.

That string “span” is passed to function and not even used at all in StartSpan.
For clarity I would pass "span" and under function used span instead of name so it is clear.
That will not print with level = 0 as it is under if !=0, either change for test to 1 or add else as below for testing.

Try as below first playground https://go.dev/play/p/hZchFGhz9ey

package main

import "fmt"

var level = 0

func StartSpan() {
	if level&1 != 0 {
		fmt.Println("StartSpan")
	} else {
		fmt.Println("Test print when level=0")
	}
}

func main() {
	StartSpan()
}

Or as but try with global variable as it is or then put under main.
Play ground https://go.dev/play/p/7rDB36og1vQ

package main

import "fmt"

var level = 0

func StartSpan(name string) {
	if level&1 != 0 {
		fmt.Println("StartSpan", name)
	} else {
		fmt.Println("Test print when level=0", name)
	}
}

func main() {
	StartSpan("span")
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

7 participants