New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize unsigned comparisons with zero/one #21439

Open
josharian opened this Issue Aug 14, 2017 · 5 comments

Comments

Projects
None yet
5 participants
@josharian
Contributor

josharian commented Aug 14, 2017

I started on an minor optimization effort earlier this summer that I won't have time to see through for 1.10. This issue is a snapshot of the work and a description of what to do next, in case anyone else wants to pick an architecture to work on and run with it.

Comparing unsigned ints with zero/one is special and (perhaps surprisingly) not uncommon. One example: Given unsigned x, x > 0 iff x != 0. This is helpful, because on many architectures, x == 0 and x != 0 are faster and/or shorter. For example, on amd64, we can use TEST instead of CMP, and arm64 has dedicated instructions for comparison with zero.

Here are some generic rewrite rules that I believe to be sound:

(Greater64U x zero:(Const64 [0])) -> (Neq64 x zero)
(Greater32U x zero:(Const32 [0])) -> (Neq32 x zero)
(Greater16U x zero:(Const16 [0])) -> (Neq16 x zero)
(Greater8U  x zero:(Const8  [0])) -> (Neq8  x zero)

(Leq64U x zero:(Const64 [0])) -> (Eq64 x zero)
(Leq32U x zero:(Const32 [0])) -> (Eq32 x zero)
(Leq16U x zero:(Const16 [0])) -> (Eq16 x zero)
(Leq8U  x zero:(Const8  [0])) -> (Eq8  x zero)

(Geq64U _ (Const64 [0])) -> (ConstBool [1])
(Geq32U _ (Const32 [0])) -> (ConstBool [1])
(Geq16U _ (Const16 [0])) -> (ConstBool [1])
(Geq8U  _ (Const8  [0])) -> (ConstBool [1])

(Less64U _ (Const64 [0])) -> (ConstBool [0])
(Less32U _ (Const32 [0])) -> (ConstBool [0])
(Less16U _ (Const16 [0])) -> (ConstBool [0])
(Less8U  _ (Const8  [0])) -> (ConstBool [0])

(Less64U x (Const64 <t> [1])) -> (Eq64 x (Const64 <t> [0]))
(Less32U x (Const32 <t> [1])) -> (Eq32 x (Const32 <t> [0]))
(Less16U x (Const16 <t> [1])) -> (Eq16 x (Const16 <t> [0]))
(Less8U  x (Const8  <t> [1])) -> (Eq8  x (Const8  <t> [0]))

(Geq64U x (Const64 <t> [1])) -> (Neq64 x (Const64 <t> [0]))
(Geq32U x (Const32 <t> [1])) -> (Neq32 x (Const32 <t> [0]))
(Geq16U x (Const16 <t> [1])) -> (Neq16 x (Const16 <t> [0]))
(Geq8U  x (Const8  <t> [1])) -> (Neq8  x (Const8  <t> [0]))

Unfortunately, doing this at the generic level is probably not ideal, since (a) not all architectures have special handling of eq/neq 0, (b) it might interfere with the prove pass.

So the remaining work here is to port these rules to arch-specific rules as applicable, and confirm/disconfirm that they are used and that they are worthwhile.

(Related aside: Why doesn't amd64.rules have rules like (SETNE (CMPQ x zero:(MOVQconst))) -> (SETNE (TESTQ x x))?)

cc @martisch @philhofer

@josharian josharian added this to the Unplanned milestone Aug 14, 2017

@martisch

This comment has been minimized.

Member

martisch commented Aug 14, 2017

Interesting (and not uncommen) i had a similar thought of checking and ensuring go gc emits TEST instead of CMP where simpler/shorter the other day but wasnt sure how far that was already done.

Im happy to have a look at this for amd64 and 386 (also the SETcc part).

BTW (need to make an extra issue for this): we should clear the SETcc target register with a XOR to avoid false dependencies but need to be careful not to dirty flags after TEST/CMP or clearing what we actually test.

@philhofer

This comment has been minimized.

Contributor

philhofer commented Aug 14, 2017

I'm happy to take the arm/arm64 parts of this.

@stemar94

This comment has been minimized.

stemar94 commented Aug 15, 2017

These might also be related, but much less common:

(Less64 (Sub64 x y) (Const64 [0])) -> (Less64 x y)
(Less32 (Sub32 x y) (Const32 [0])) -> (Less32 x y)
(Less16 (Sub16 x y) (Const16 [0])) -> (Less16 x y)
(Less8 (Sub8 x y) (Const8 [0])) -> (Less8 x y)

(Leq64 (Sub64 x y) (Const64 [0])) -> (Leq64 x y)
(Leq32 (Sub32 x y) (Const32 [0])) -> (Leq32 x y)
(Leq16 (Sub16 x y) (Const16 [0])) -> (Leq16 x y)
(Leq8 (Sub8 x y) (Const8 [0])) -> (Leq8 x y)

(Neq64 (Sub64 x y) (Const64 [0])) -> (Neq64 x y)
(Neq32 (Sub32 x y) (Const32 [0])) -> (Neq32 x y)
(Neq16 (Sub16 x y) (Const16 [0])) -> (Neq16 x y)
(Neq8 (Sub8 x y) (Const8 [0])) -> (Neq8 x y)
(Eq64 (Sub64 x y) (Const64 [0])) -> (Eq64 x y)
(Eq32 (Sub32 x y) (Const32 [0])) -> (Eq32 x y)
(Eq16 (Sub16 x y) (Const16 [0])) -> (Eq16 x y)
(Eq8 (Sub8 x y) (Const8 [0])) -> (Eq8 x y)

(Geq64 (Sub64 x y) (Const64 [0])) -> (Geq64 x y)
(Geq32 (Sub32 x y) (Const32 [0])) -> (Geq32 x y)
(Geq16 (Sub16 x y) (Const16 [0])) -> (Geq16 x y)
(Geq8 (Sub8 x y) (Const8 [0])) -> (Geq8 x y)

(Greater64 (Sub64 x y) (Const64 [0])) -> (Greater64 x y)
(Greater32 (Sub32 x y) (Const32 [0])) -> (Greater32 x y)
(Greater16 (Sub16 x y) (Const16 [0])) -> (Greater16 x y)
(Greater8 (Sub8 x y) (Const8 [0])) -> (Greater8 x y)

I would have a CL ready for these.

@martisch

This comment has been minimized.

Member

martisch commented Aug 15, 2017

@stemar94 since overflow in go is defined and the compiler is not allowed to optimize it away i think this does not always apply:

uint8: ((0 - 1) > 0) != (0 > 1)
int8: ((-128 - 1) > 0) != (-128 > 1)

https://play.golang.org/p/2R2nEnvZej

@benshi001

This comment has been minimized.

Member

benshi001 commented Oct 12, 2017

I did optimization of comparison to zero with TST/TEQ/CMN on ARM32 in 1ec78d1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment