Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize comparisons to min/max (u)ints by checking for overflow #41664

Closed
josharian opened this issue Sep 27, 2020 · 2 comments
Closed
Labels
help wanted NeedsFix Performance
Milestone

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Sep 27, 2020

package p

import "math"

func f(x int64) bool {
	return x == math.MinInt64
}

On amd64, the core of this compiles to:

	0x0005 00005 (z.go:6)	MOVQ	$-9223372036854775808, CX
	0x000f 00015 (z.go:6)	CMPQ	AX, CX
	0x0012 00018 (z.go:6)	SETEQ	"".~r1+16(SP)

It would be cheaper and smaller instead to decrement CX and check the flags for underflow. A similar trick can be used for checking min and max ints and uints of all sizes. It might also be useful in the division fix-up code, where we must check for min int divisor.

cc @randall77 @dr2chase @martisch @mundaym

@josharian josharian added this to the Unplanned milestone Sep 27, 2020
@andybons andybons added the NeedsFix label Sep 29, 2020
@quasilyte
Copy link
Contributor

@quasilyte quasilyte commented Nov 29, 2021

I tried to implement this. The idea looks interesting, at least on the surface.

Unfortunately, I get 5-8% slowdown in comparison with the original ("unoptimized") code. It does generate smaller code, but it runs slower.

The benchmarking code:

package foo

import (
	"testing"
	"math"
)

//go:noinline
func checkInt64(i int64, u uint64) int {
	if i == math.MaxInt64 {
		if u != math.MaxUint64 {
			return 10
		}
		return 1
	}
	if i == math.MinInt64 {
		return 2
	}
	return 0
}

func BenchmarkCheckInt(b *testing.B) {
	for i := 0; i < b.N; i++ {
		checkInt64(0, 0)
		checkInt64(math.MaxInt64, 0)
		checkInt64(math.MinInt64, 0)
		checkInt64(math.MaxInt64, math.MaxUint64)
		checkInt64(math.MinInt64, math.MaxUint64)
	}
}

Optimization rules:

// Convert `x == int64max` to increment+overflow check,
// for int64min test do a decrement+overflow check.
// For unsigned values, check for the carry flag instead.
//
// TODO: we could emit INC/DEC instead of ADDQ, but that would require a separate
// flag-aware SSA op: INC/DEC do not set carry flag, so we can't produce them for ADDQconstcarry.
// This rewrite needs overflow flags that are modified by both ADD and INC instructions.
((EQ|NE) (CMPQ <flags> x intmin:(MOVQconst [c])) yes no) && c == math.MinInt64 && intmin.Uses == 1 =>
  ((OS|OC) (Select1 <flags> (ADDQconstcarry [-1] x)) yes no)
((SETEQ|SETNE) (CMPQ <flags> x intmin:(MOVQconst [c]))) && c == math.MinInt64 && intmin.Uses == 1 =>
  ((SETO|SETNO) (Select1 <flags> (ADDQconstcarry [-1] x)))
((EQ|NE) (CMPQ <flags> x intmax:(MOVQconst [c])) yes no) && c == math.MaxInt64 && intmax.Uses == 1 =>
  ((OS|OC) (Select1 <flags> (ADDQconstcarry [1] x)) yes no)
((SETEQ|SETNE) (CMPQ <flags> x intmax:(MOVQconst [c]))) && c == math.MaxInt64 && intmax.Uses == 1 =>
  ((SETO|SETNO) (Select1 <flags> (ADDQconstcarry [1] x)))
// For the reference:
// SETB is "overflow flag is set" (SETCS in asm)
// SETAE is "overflow flag is not set" (SETCC in asm); also known as SETNB
// UGE is AJCS in asm
// ULT is AJCC in asm
((EQ|NE) intmax:(CMPQconst <flags> x [c]) yes no) && uint64(c) == math.MaxUint64 && intmax.Uses == 1 =>
  ((ULT|UGE) (Select1 <flags> (ADDQconstcarry [1] x)) yes no)
((SETEQ|SETNE) (CMPQ <flags> x intmin:(MOVQconst [c]))) && uint64(c) == math.MaxUint64 && intmin.Uses == 1 =>
  ((SETB|SETAE) (Select1 <flags> (ADDQconstcarry [1] x)))

I can submit a CL with tests if anyone is interested in seeing the whole thing.

Note: I haven't tried changing ADDQ 1/-1 to INC/DEC. I'm not sure it can help there.

Note: gcc/clang for C and C++ seem not to use this optimization idea either. Maybe there is a good reason not to do this.

I see two ways right now:

  1. Figuring out what's wrong with the results (using different bench or reading the CPU counters for more info) or the implementation.
  2. Marking this idea as wontfix with this issue being closed.

This is better than someone else spending their evening (or weekend) pursuing something that will result in disappointment. :)

@josharian
Copy link
Contributor Author

@josharian josharian commented Nov 29, 2021

Thanks so much for digging into this! Sounds like we shouldn't do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix Performance
Projects
None yet
Development

No branches or pull requests

3 participants