Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize xor to complement #39710

Open
jaddr2line opened this issue Jun 19, 2020 · 4 comments
Open

cmd/compile: optimize xor to complement #39710

jaddr2line opened this issue Jun 19, 2020 · 4 comments

Comments

@jaddr2line
Copy link

@jaddr2line jaddr2line commented Jun 19, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.4 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jadenw/.cache/go-build"
GOENV="/home/jadenw/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jadenw/GOPATH"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build102234605=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Tried to flip bits in a large section of a bitmap using ^= ^uint64(0) in a loop.

What did you expect to see?

The compiler treats this as a bitwise complement operation.

What did you see instead?

The compiler emits an xor against a constant, and treats it differently than a bitwise complement.

https://godbolt.org/z/vxQyD3

If I understand correctly, this could be solved by adding the following rule to generic.rules:

(Xor(64|32|16|8) (Const(64|32|16|8) [-1]) x) => (Com(64|32|16|8) x)
@randall77
Copy link
Contributor

@randall77 randall77 commented Jun 19, 2020

Sure, that would work.
For some architectures it won't matter, as XOR -1, R and NOT R are, as far as the chip is concerned, the same instruction. It helps a bit on x86 for code size.

@randall77 randall77 added this to the Unplanned milestone Jun 19, 2020
@shawn-xdji
Copy link
Contributor

@shawn-xdji shawn-xdji commented Jun 29, 2020

Would like to suggest evaluating expressions that have more than two operands as well, like:
res := -1 ^ x ^ const_c
as per our recent investigation on arm64 for rewriting 'MVN XOR' to EON, (https://go-review.googlesource.com/c/go/+/239638), reducing '-1 ^ x' too early may result in non-optimized code for aforementioned scenarios (NOT + XOR vs. XOR), since current rules in generic.rules prefer to associating constants first.

@jaddr2line
Copy link
Author

@jaddr2line jaddr2line commented Jun 29, 2020

Huh. If we wanted to make them interchangeable for optimization purposes, we would probably need some rules like these:

(Xor(64|32|16|8) (Const(64|32|16|8) [c]) (Com(64|32|16|8) x)) => (Xor(64|32|16|8) (Const(64|32|16|8) [^c]) x)
(Com(64|32|16|8) (Xor(64|32|16|8) (Const(64|32|16|8) [c]) x)) => (Xor(64|32|16|8) (Const(64|32|16|8) [^c]) x)

At which point I am left wondering whether the existence of Com at this phase of compilation is actually worthwhile. Maybe this should be instead always represented as an Xor op and converted to a complement by the arch rules on platforms with a separate complement instruction?

@randall77
Copy link
Contributor

@randall77 randall77 commented Jun 29, 2020

res := -1 ^ x ^ const_c

We try to lift constants to the outermost level, and then fold them. For instance, we have these rules for xor:

// x ^ (C ^ z) -> C ^ (x ^ z)
(Xor64 (Xor64 i:(Const64 <t>) z) x) && (z.Op != OpConst64 && x.Op != OpConst64) => (Xor64 i (Xor64 <t> z x))
// C ^ (D ^ x) -> (C ^ D) ^ x
(Xor64 (Const64 <t> [c]) (Xor64 (Const64 <t> [d]) x)) => (Xor64 (Const64 <t> [c^d]) x)

Which should do the right thing for this example, at least in generic.rules.

When we lower, we do need rules to combine (for amd64) NOT and XORconst. They shouldn't be hard to do.

Yes, we could eliminate Com at the generic level. I'm not sure that would buy us much, as NOTs can be introduced in other ways at the arch-specific level, and thus the arch-specific rules need to know how to combine XOR and NOT anyway.
But I'm not against removing Com if it simplifies things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.