Skip to content

cmd/compile: poor register allocation with PGO #58298

@Deleplace

Description

@Deleplace

What version of Go are you using (go version)?

$ go version
go version go1.20 darwin/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/deleplace/Library/Caches/go-build"
GOENV="/Users/deleplace/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/deleplace/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/deleplace/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/deleplace/Documents/2023/02/03/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/p4/g7fnpss96g5708_mmv3cxzgh0000gn/T/go-build3787014677=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I wanted to try the amazing PGO (Profile Guided Optimization) new capability of Go 1.20.

For this I wrote a program that converts all JPEG files in a directory into PNG. It is mostly CPU/memory-bound, with some I/O to read and write the local filesystem.

I wanted to observe:

  • What % of perf gain PGO would provide,
  • If the perf gain would be consistent when running multiple times,
  • If the perf gain would be sensitive to the original sample inputs used to create the profiles (when executed with new inputs),
  • If the perf gain would be sensitive to the level of concurrency.

I tried 6 settings: {sequential, concurrent} x process {3, 10, 82} images.

Source at https://github.com/Deleplace/pgo-test

To create the 6 profiles easily, I executed main() inside a test:

for n in 3 10 82; do
	for conc in 0 1; do
		go test -cpuprofile=conc${conc}_${n}.prof -args -concurrent=$conc ./${n}images
	done
done

To create the 6 PGO-optimized executables:

for n in 3 10 82; do
	for conc in 0 1; do
		go build -pgo=conc${conc}_${n}.prof -o process_conc${conc}_${n}
	done
done

What did you expect to see?

  • A 2-4% perf gain

What did you see instead?

  • A consistent ~5% perf loss

The PGO-enabled programs always seem a bit slower than the original program, regardless of the exact inputs used to produce the profiles.

Metadata

Metadata

Assignees

Labels

NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions