Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize bitset tests #31904

Closed
rillig opened this issue May 8, 2019 · 7 comments

Comments

@rillig
Copy link
Contributor

commented May 8, 2019

What version of Go are you using (go version)?

go version go1.12.1 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\rillig\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\rillig\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Program Files\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\rillig\Program Files\cygwin\tmp\go-build152819318=/tmp/go-build -gno-record-gcc-switches

What did you do?

https://play.golang.org/p/G1EhaxsYsAV

What did you expect to see?

When a bitset is queried for a single bit, it doesn't matter whether set & q is compared to 0 or to the expected bit. Comparing to 0 saves an instruction on x86_64 and doesn't modify a register.

Therefore I expected the generated code to use TESTQ $0x8, AX instead of the combined ANDQ and CMPQ.

I didn't measure how often this pattern appears in the wild, but when I saw the generated code I immediately wondered why it uses two instructions when the same effect can be achieved with a single instruction and even fewer side effects.

What did you see instead?

  bitset.go:13          MOVQ os.Args+8(SB), AX
  bitset.go:14          NOPL
  bitset.go:14          XORPS X0, X0
  bitset.go:14          MOVUPS X0, 0x40(SP)
  bitset.go:14          LEAQ runtime.types+65664(SB), CX
  bitset.go:14          MOVQ CX, 0x40(SP)
  bitset.go:9           ANDQ $0x8, AX              <-- seems inefficient
  bitset.go:9           CMPQ $0x8, AX              <-- seems inefficient
  bitset.go:9           SETE AL
  bitset.go:14          MOVZX AL, AX
@randall77

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

Indeed. We currently handle x & 8 == 0 but not x & 8 == 8.

@randall77 randall77 added this to the Go1.14 milestone May 8, 2019

@gopherbot

This comment has been minimized.

Copy link

commented May 8, 2019

Change https://golang.org/cl/175957 mentions this issue: cmd/compile: optimize bitset tests

@tmthrgd

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

This optimisation is only valid for x & y == y if y has exactly one bit set (i.e. is a power of two).

See this example: https://play.golang.org/p/fAqS02z3Fgm

@cuonglm

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

@tmthrgd it's right, finding a way to fix it.

@randall77 is there anyway to check a *ssa.Value is power of two? I see there's a function isPowerOfTwo, but it gets an int64 instead.

@randall77

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

isPowerOfTwo is probably the right one to use.
You're going to need to make the rules require that one argument is a constant.

It would be nice if you could do this optimization in the generic rules, something like:

(Eq64 (And64 x (Const64 [y])) (Const64 [y])) && isPowerOfTwo(y) -> (Ne64 (And64 x (Const64 [y])) (Const64 [0]))

(And 32/16/8 versions, Ne -> Eq also)
That way all the architectures get the optimization.

@cuonglm

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

@randall77 interesting, with these rules added in generic.rules:

// Optimize bitsets
(Eq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [y])) && isPowerOfTwo(y)
  -> (Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [0]))
(Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [y])) && isPowerOfTwo(y)
  -> (Eq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [0]))

For return set&8 == 8, the assembly code generate using BTL instead of TESTQ:

"".AllBitsSet STEXT nosplit size=15 args=0x18 locals=0x0
	0x0000 00000 (main.go:8)	TEXT	"".AllBitsSet(SB), NOSPLIT|ABIInternal, $0-24
	0x0000 00000 (main.go:8)	FUNCDATA	$0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (main.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (main.go:8)	FUNCDATA	$2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (main.go:12)	PCDATA	$0, $0
	0x0000 00000 (main.go:12)	PCDATA	$1, $0
	0x0000 00000 (main.go:12)	MOVQ	"".set+8(SP), AX
	0x0005 00005 (main.go:12)	BTL	$3, AX
	0x0009 00009 (main.go:12)	SETCS	"".~r2+24(SP)
	0x000e 00014 (main.go:12)	RET

I benched with no speed improvement compare to ANDQ + COMPQ.

@randall77

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

Yes, it should use BTL. It's smaller, but probably not any faster than TESTQ.
It might be hard to find a benchmark which cares. I'm happy for this to go in as long as the benchmarks don't get worse.

@andybons andybons added the NeedsFix label May 8, 2019

@gopherbot gopherbot closed this in c5f142f Aug 27, 2019

tomocy added a commit to tomocy/go that referenced this issue Sep 1, 2019
cmd/compile: optimize bitset tests
The assembly output for x & c == c, where c is power of 2:

	MOVQ	"".set+8(SP), AX
	ANDQ	$8, AX
	CMPQ	AX, $8
	SETEQ	"".~r2+24(SP)

With optimization using bitset:

	MOVQ	"".set+8(SP), AX
	BTL	$3, AX
	SETCS	"".~r2+24(SP)

output less than 1 instruction.

However, there is no speed improvement:

name         old time/op  new time/op  delta
AllBitSet-8  0.35ns ± 0%  0.35ns ± 0%   ~     (all equal)

Fixes golang#31904

Change-Id: I5dca4e410bf45716ed2145e3473979ec997e35d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/175957
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
t4n6a1ka added a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
cmd/compile: optimize bitset tests
The assembly output for x & c == c, where c is power of 2:

	MOVQ	"".set+8(SP), AX
	ANDQ	$8, AX
	CMPQ	AX, $8
	SETEQ	"".~r2+24(SP)

With optimization using bitset:

	MOVQ	"".set+8(SP), AX
	BTL	$3, AX
	SETCS	"".~r2+24(SP)

output less than 1 instruction.

However, there is no speed improvement:

name         old time/op  new time/op  delta
AllBitSet-8  0.35ns ± 0%  0.35ns ± 0%   ~     (all equal)

Fixes golang#31904

Change-Id: I5dca4e410bf45716ed2145e3473979ec997e35d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/175957
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.