New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: reordering struct field accesses can alter performance #22479

Open
mvdan opened this Issue Oct 28, 2017 · 1 comment

Comments

Projects
None yet
4 participants
@mvdan
Member

mvdan commented Oct 28, 2017

What version of Go are you using (go version)?

go version devel +47c868dc1c Sat Oct 28 11:53:49 2017 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/mvdan/go/land:/home/mvdan/go"
GORACE=""
GOROOT="/home/mvdan/tip"
GOTOOLDIR="/home/mvdan/tip/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build268906853=/tmp/go-build -gno-record-gcc-switches"

What did you do?

https://play.golang.org/p/d5g7tuaxHW

go test -bench=.

What did you expect to see?

Both of them performing equally.

What did you see instead?

With 6 runs of each on an idle machine:

name          time/op
Separate-4    2.48ns ± 1%
Contiguous-4  2.15ns ± 2%

This is a minifiedin version of a performance issue I had in a very hot function. In particular, the function is the ASCII fast path of a rune advance method.

Here's the assembly for the two funcs:

"".(*T).separate STEXT nosplit size=26 args=0x10 locals=0x0
        0x0000 00000 (f_test.go:9)      TEXT    "".(*T).separate(SB), NOSPLIT, $0-16
        0x0000 00000 (f_test.go:9)      MOVQ    "".t+8(SP), AX
        0x0005 00005 (f_test.go:10)     INCQ    8(AX)
        0x0009 00009 (f_test.go:11)     MOVQ    $0, (AX)
        0x0010 00016 (f_test.go:12)     MOVQ    8(AX), AX
        0x0014 00020 (f_test.go:12)     MOVQ    AX, "".~r0+16(SP)
        0x0019 00025 (f_test.go:12)     RET
"".(*T).contiguous STEXT nosplit size=29 args=0x10 locals=0x0
        0x0000 00000 (f_test.go:15)     TEXT    "".(*T).contiguous(SB), NOSPLIT, $0-16
        0x0000 00000 (f_test.go:15)     MOVQ    "".t+8(SP), AX
        0x0005 00005 (f_test.go:16)     MOVQ    $0, (AX)
        0x000c 00012 (f_test.go:17)     MOVQ    8(AX), CX
        0x0010 00016 (f_test.go:17)     INCQ    CX
        0x0013 00019 (f_test.go:17)     MOVQ    CX, 8(AX)
        0x0017 00023 (f_test.go:18)     MOVQ    CX, "".~r0+16(SP)
        0x001c 00028 (f_test.go:18)     RET

Funnily enough, the faster one results in an extra instruction. How that makes sense is beyond me. Perhaps it's because the first one accesses 8(AX) three times, and the second only twice?

My understanding of the compiler and assembly are limited, so any pointers welcome.

/cc @randall77 @philhofer

@randall77

This comment has been minimized.

Contributor

randall77 commented Oct 28, 2017

Pasting code here:

func (t *T) separate() int {
	t.b++
	t.a = 0
	return t.b
}

func (t *T) contiguous() int {
	t.a = 0
	t.b++
	return t.b
}

separate needs an additional load, because the compiler currently thinks that the write to t.a invalidates the t.b value that it already loaded. So it needs to reload the t.b value instead of using the registerized value.

The extra instruction doesn't really matter - it's the extra load that matters.

This is a tricky problem to solve in general, because it requires some alias analysis. Perhaps something could be done for this simpler situation (different constant offsets from the same pointer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment