Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: Get rid of redundant TEST instructions by using the ZF flag of DEC/INC #49113

Open
jake-ciolek opened this issue Oct 22, 2021 · 8 comments

Comments

@jake-ciolek
Copy link
Contributor

@jake-ciolek jake-ciolek commented Oct 22, 2021

What version of Go are you using (go version)?

go tip

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOVCS=""
GOVERSION="devel go1.18-9ff91b9098 Fri Oct 22 00:57:18 2021 +0000"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/7d/d_44vx7920vfy_tmc_l6xgprjh03ym/T/go-build2772586434=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Dumped the Go assembly itself, but this happens for anything compiled with gc.

What did you expect to see?

The compiler making use of the ZF flag value when emitting DEC(Q|L) / INC(Q|L) instructions.

What did you see instead?

Redundant TEST(Q|L) instructions for checking if the result is zero, for example:

 _math/big.nat.shl:
 
decq	%rdx
testq	%rdx, %rdx
 
 _cmd/go/internal/work.(*Builder).Do.func3:
 
decl	%esi
testl	%esi, %esi
 
 _go/token.(*FileSet).file:
 
incl	%edx
testl	%edx, %edx
 
 _cmd/go/internal/work.gcToolchain.gc:
 
incq	%r9
testq	%r9, %r9

We could get rid of testq and use the value of ZF set by INCQ/DECQ instead.

I have looked into this briefly and it seems that it might be better to move the INC/DEC implementation to AMD64.rules instead? Currently it's handled in a special way inside of cmd/cmpile/internal/amd64/ssa.go

@martisch
Copy link
Contributor

@martisch martisch commented Oct 22, 2021

The non optimization of using flags from arithmetic instructions directly is a general gap (not just INC/DEC).

I think I got a bit around it by defining new instruction for MUL with flag output then optimizing the SETcc CMP away in https://go-review.googlesource.com/c/go/+/141820. If we going to do this more generally we should probably look for a generic way how to easily use the flag output of any instruction in rules files for branching instructions that can consume those flags.

Loading

@jake-ciolek
Copy link
Contributor Author

@jake-ciolek jake-ciolek commented Oct 22, 2021

I think it might go even beyond arithmetic. I've looked at doing this for AND(Q|L)const/XOR(Q|L)const , but ran into issues related to handling blocks...

I've changed the Op definition so it emitted two values (Uint and Flags) but was unable to make it work for some reason. It was somehow failing when handling OpAMD64SETEQ / OpAMD64SETNE and others.

I think we could get a lot of speedup on x86 by doing this.

Loading

@josharian
Copy link
Contributor

@josharian josharian commented Oct 23, 2021

I'd love to see a general purpose approach to this. Maybe a pass that occurs after flagalloc, to avoid the issues around spilling flags.

Loading

@josharian
Copy link
Contributor

@josharian josharian commented Oct 23, 2021

I've changed the Op definition so it emitted two values (Uint and Flags) but was unable to make it work for some reason.

Feel free to post a CL showing your changes and the errors if you'd like some help getting unstuck. (Although as mentioned I'd prefer something general purpose.)

Loading

@ulikunitz
Copy link
Contributor

@ulikunitz ulikunitz commented Oct 23, 2021

Are there micro-benchmarks proving this is a speed increase? Chips are doing macro-op and micro-op fusions nowadays. The test and the following jump are very clearly in that territory. You might end-up with smaller binaries, which will help caching and decoding, but no real speed increase on the µOp level.

Loading

@jake-ciolek
Copy link
Contributor Author

@jake-ciolek jake-ciolek commented Oct 23, 2021

@ulikunitz

I believe reducing code size will improve performance, for example this change where we replace longer forms of AND with a shorter one clearly boosted performance:

https://go-review.googlesource.com/c/go/+/354970

Additionally, even though TEST/JUMP fuses, DEC/JUMP fuses too. Putting a test in between makes A DEC/TEST/JUMP execute one extra instruction.

The same goes for AND, ADD and SUB.

Loading

@jake-ciolek
Copy link
Contributor Author

@jake-ciolek jake-ciolek commented Oct 23, 2021

There's another thing, for which I think I'll start a new issue.

Sometimes we generate code that prevents macro-op fusion from happening because there's an instruction between TEST and a JUMP or a CMP and a JUMP. That instruction could be put before TEST/CMP and the execution of the program wouldn't change, but the speed would increase.

Loading

@martisch
Copy link
Contributor

@martisch martisch commented Oct 23, 2021

@jake-ciolek RE: removing instructions in between. I think often this can increase performance but I think we should still prove that with microbenchmarks or prove its not worse but better for binary size. There AFAIK can be cases were it doesnt improve performance and not all amd64 do op fusion or deal with partial register flags updates (INC/DEC) equally well without delays. (e.g. if I remember correctly Sandy Bridge has a flag merging uOP delay after some INC/DEC anyway.)

Loading

@dmitshur dmitshur added this to the Backlog milestone Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants