cmd/compile: Get rid of redundant TEST instructions by using the ZF flag of DEC/INC #49113

jake-ciolek · 2021-10-22T11:56:04Z

What version of Go are you using (`go version`)?

go tip

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOVCS=""
GOVERSION="devel go1.18-9ff91b9098 Fri Oct 22 00:57:18 2021 +0000"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/7d/d_44vx7920vfy_tmc_l6xgprjh03ym/T/go-build2772586434=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Dumped the Go assembly itself, but this happens for anything compiled with gc.

What did you expect to see?

The compiler making use of the ZF flag value when emitting DEC(Q|L) / INC(Q|L) instructions.

What did you see instead?

Redundant TEST(Q|L) instructions for checking if the result is zero, for example:

 _math/big.nat.shl:
 
decq	%rdx
testq	%rdx, %rdx
 
 _cmd/go/internal/work.(*Builder).Do.func3:
 
decl	%esi
testl	%esi, %esi
 
 _go/token.(*FileSet).file:
 
incl	%edx
testl	%edx, %edx
 
 _cmd/go/internal/work.gcToolchain.gc:
 
incq	%r9
testq	%r9, %r9

We could get rid of testq and use the value of ZF set by INCQ/DECQ instead.

I have looked into this briefly and it seems that it might be better to move the INC/DEC implementation to AMD64.rules instead? Currently it's handled in a special way inside of cmd/cmpile/internal/amd64/ssa.go

The text was updated successfully, but these errors were encountered:

martisch · 2021-10-22T12:56:49Z

The non optimization of using flags from arithmetic instructions directly is a general gap (not just INC/DEC).

I think I got a bit around it by defining new instruction for MUL with flag output then optimizing the SETcc CMP away in https://go-review.googlesource.com/c/go/+/141820. If we going to do this more generally we should probably look for a generic way how to easily use the flag output of any instruction in rules files for branching instructions that can consume those flags.

jake-ciolek · 2021-10-22T13:21:57Z

I think it might go even beyond arithmetic. I've looked at doing this for AND(Q|L)const/XOR(Q|L)const , but ran into issues related to handling blocks...

I've changed the Op definition so it emitted two values (Uint and Flags) but was unable to make it work for some reason. It was somehow failing when handling OpAMD64SETEQ / OpAMD64SETNE and others.

I think we could get a lot of speedup on x86 by doing this.

josharian · 2021-10-23T06:01:26Z

I'd love to see a general purpose approach to this. Maybe a pass that occurs after flagalloc, to avoid the issues around spilling flags.

josharian · 2021-10-23T06:02:39Z

I've changed the Op definition so it emitted two values (Uint and Flags) but was unable to make it work for some reason.

Feel free to post a CL showing your changes and the errors if you'd like some help getting unstuck. (Although as mentioned I'd prefer something general purpose.)

ulikunitz · 2021-10-23T07:33:37Z

Are there micro-benchmarks proving this is a speed increase? Chips are doing macro-op and micro-op fusions nowadays. The test and the following jump are very clearly in that territory. You might end-up with smaller binaries, which will help caching and decoding, but no real speed increase on the µOp level.

jake-ciolek · 2021-10-23T08:36:26Z

@ulikunitz

I believe reducing code size will improve performance, for example this change where we replace longer forms of AND with a shorter one clearly boosted performance:

https://go-review.googlesource.com/c/go/+/354970

Additionally, even though TEST/JUMP fuses, DEC/JUMP fuses too. Putting a test in between makes A DEC/TEST/JUMP execute one extra instruction.

The same goes for AND, ADD and SUB.

jake-ciolek · 2021-10-23T08:38:28Z

There's another thing, for which I think I'll start a new issue.

Sometimes we generate code that prevents macro-op fusion from happening because there's an instruction between TEST and a JUMP or a CMP and a JUMP. That instruction could be put before TEST/CMP and the execution of the program wouldn't change, but the speed would increase.

martisch · 2021-10-23T12:30:24Z

@jake-ciolek RE: removing instructions in between. I think often this can increase performance but I think we should still prove that with microbenchmarks or prove its not worse but better for binary size. There AFAIK can be cases were it doesnt improve performance and not all amd64 do op fusion or deal with partial register flags updates (INC/DEC) equally well without delays. (e.g. if I remember correctly Sandy Bridge has a flag merging uOP delay after some INC/DEC anyway.)

mvdan added the Performance label Oct 22, 2021

dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 25, 2021

dmitshur added this to the Backlog milestone Oct 25, 2021

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: Get rid of redundant TEST instructions by using the ZF flag of DEC/INC #49113

cmd/compile: Get rid of redundant TEST instructions by using the ZF flag of DEC/INC #49113

jake-ciolek commented Oct 22, 2021 •

edited

martisch commented Oct 22, 2021 •

edited

jake-ciolek commented Oct 22, 2021 •

edited

josharian commented Oct 23, 2021

josharian commented Oct 23, 2021

ulikunitz commented Oct 23, 2021

jake-ciolek commented Oct 23, 2021

jake-ciolek commented Oct 23, 2021 •

edited

martisch commented Oct 23, 2021 •

edited

cmd/compile: Get rid of redundant TEST instructions by using the ZF flag of DEC/INC #49113

cmd/compile: Get rid of redundant TEST instructions by using the ZF flag of DEC/INC #49113

Comments

jake-ciolek commented Oct 22, 2021 • edited

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

martisch commented Oct 22, 2021 • edited

jake-ciolek commented Oct 22, 2021 • edited

josharian commented Oct 23, 2021

josharian commented Oct 23, 2021

ulikunitz commented Oct 23, 2021

jake-ciolek commented Oct 23, 2021

jake-ciolek commented Oct 23, 2021 • edited

martisch commented Oct 23, 2021 • edited

jake-ciolek commented Oct 22, 2021 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

martisch commented Oct 22, 2021 •

edited

jake-ciolek commented Oct 22, 2021 •

edited

jake-ciolek commented Oct 23, 2021 •

edited

martisch commented Oct 23, 2021 •

edited