Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile/internal/amd64: huge performance degradation (register allocation issue?) #50821

Open
ikravets opened this issue Jan 26, 2022 · 1 comment
Labels
NeedsInvestigation
Milestone

Comments

@ikravets
Copy link

@ikravets ikravets commented Jan 26, 2022

What version of Go are you using (go version)?

$ go version
go version go1.17.6 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/user/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/user/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/user/opt/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/user/opt/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.6"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2989573659=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I have wrote a function which I guess stresses the register allocation in the Go compiler.
The problem I (accidentally) discovered is that swapping two independent lines in the source code causes spillage of the variables from the registers into the stack. This reduces performance of my benchmark by 30%!

Excerpt from the "fast" code version:

// ...lines identical in both versions...
diagonalValue = state[begin-1]
end := min(col+maxDist, patternLen)
// ...lines identical in both versions...

Excerpt from the "slow" code version:

// ...lines identical in both versions...
end := min(col+maxDist, patternLen)
diagonalValue = state[begin-1]
// ...lines identical in both versions...

As you can see the lines are independent and the compiler is free to reorder the code.
I have a self-contained 150 line source file ready for go test -bench which reproduces this issue. I can share this file with the compiler developers privately for the purpose of the issue reproduction only due to the code being proprietary. Please advise on how to proceed.

What did you expect to see?

I expect to see both versions compiled into the same binary code with the same performance.

What did you see instead?

Function asm code change from STEXT nosplit size=1158 args=0x68 locals=0x40 funcid=0x0 to STEXT size=1355 args=0x68 locals=0x78 funcid=0x0. Performance drop of ~30% from 2618 ns/op to 3905 ns/op.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jan 26, 2022

CC @dr2chase

This sounds like something you were discussing recently, but maybe I'm wrong.

@mknyszek mknyszek added the NeedsInvestigation label Jan 26, 2022
@mknyszek mknyszek added this to the Backlog milestone Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
None yet
Development

No branches or pull requests

2 participants