Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: suboptimal code generated for simple integer comparison if-block #33012

Open
tstanev opened this issue Jul 9, 2019 · 1 comment
Open

Comments

@tstanev
Copy link

@tstanev tstanev commented Jul 9, 2019

What version of Go are you using (go version)?

$ go version
go1.12.7 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/stanevt/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/stanevt/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/zx/b2z35dfn0272q_n6g410wv340000gn/T/go-build707218538=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

The program linked below generates an array of random uint8s and then repeatedly runs a function that finds the second highest value in the array of random uint8s. The logic inside the inner loop is simple, it keeps track of the highest value and the second highest value using an if-else condition.

The problem is that the code generated by the compiler using the naive implementation (shown in max2_slow() in the example) runs slow -- 810 ms for the test. Adding a continue statement in the final else block generates better code that runs the test in 540 milliseconds.

For comparison, the same logic as the max2_slow, written in JavaScript runs the same test in 690 ms and it doesn't make a difference if the logic is converted to if-continue.

Just for reference, the same logic in C runs much faster (490ms without loop unrolling, 350 with unrolling), but the C compiler seems to make use of conditional move instructions, which neither Golang nor V8 seem able to generate.

The program:

https://play.golang.org/p/uV2tVrLifCq

(Note that timings will not show in the web based play page, because it does not have a functioning clock)

What did you expect to see?

I would expect the code generated for the plain if-else condition to run as fast as the code generated by manually adding "continue" statement.

What did you see instead?

The code generated by the if-else condition runs 50% slower compared to the code with the artificially added continue statement.

@bcmills bcmills added the Performance label Jul 15, 2019
@dmitshur dmitshur added this to the Go1.14 milestone Jul 15, 2019
@dmitshur dmitshur changed the title Suboptimal code generated for simple integer comparison if-block cmd/compile: suboptimal code generated for simple integer comparison if-block Jul 15, 2019
@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Oct 7, 2019

This one is a bit tricky. Sort of the luck of the draw, the val > max1 path is processed first during register allocation. Because of the way regalloc works, the reg-reg moves from that branch end up at the end of the loop. The other paths through the loop then have to issue countervailing moves to pre-undo what those moves do. By using "continue", you avoid that merge point and the need to do/undo the moves on every iteration.

At least, I think that is what is going on.

I don't see any easy fix for this. If we knew that both ifs were unlikely, the layout pass would fix it for us. FDO would help with that. Without FDO, I don't see any easy way to know. The register allocator is correctly optimizing for the case that the first if is likely taken.

Maybe there's a way to detect that we would need undo register moves, and lay things out a bit differently as a result. Not sure.

In any case, it is unlikely that this will be fixed for 1.14. Moving to unplanned.

@randall77 randall77 modified the milestones: Go1.14, Unplanned Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.