Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: handle 64-bit masks better on 32-bit systems #32781

Open
rsc opened this issue Jun 26, 2019 · 2 comments

Comments

Projects
None yet
3 participants
@rsc
Copy link
Contributor

commented Jun 26, 2019

On a 32-bit system, an expression like uint64value & constantUint64Mask could easily be decomposed two 32-bit mask operations. That does not seem to happen (see below, especially the OR instructions). It seems like probably it should.

$ cat x.go
package p

func f(x uint64) uint64 {
	return x & (1<<60 - 1)
}
$ GOARCH=arm go build -gcflags=-S x.go
# command-line-arguments
"".f STEXT size=40 args=0x10 locals=0x0 leaf
	0x0000 00000 (/Users/rsc/x.go:3)	TEXT	"".f(SB), LEAF|NOFRAME|ABIInternal, $-4-16
	0x0000 00000 (/Users/rsc/x.go:3)	FUNCDATA	$0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (/Users/rsc/x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (/Users/rsc/x.go:3)	FUNCDATA	$2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (/Users/rsc/x.go:4)	PCDATA	$0, $0
	0x0000 00000 (/Users/rsc/x.go:4)	PCDATA	$1, $0
	0x0000 00000 (/Users/rsc/x.go:4)	MOVW	"".x(FP), R0
	0x0004 00004 (/Users/rsc/x.go:4)	SRL	$28, R0, R1
	0x0008 00008 (/Users/rsc/x.go:4)	MOVW	"".x+4(FP), R2
	0x000c 00012 (/Users/rsc/x.go:4)	ORR	R2<<4, R1, R1
	0x0010 00016 (/Users/rsc/x.go:4)	BFXU	$28, R0, $0, R0
	0x0014 00020 (/Users/rsc/x.go:4)	ORR	R1<<28, R0, R0
	0x0018 00024 (/Users/rsc/x.go:4)	MOVW	R0, "".~r1+8(FP)
	0x001c 00028 (/Users/rsc/x.go:4)	SRL	$4, R1, R0
	0x0020 00032 (/Users/rsc/x.go:4)	MOVW	R0, "".~r1+12(FP)
	0x0024 00036 (/Users/rsc/x.go:4)	JMP	(R14)
...
$ GOARCH=386 go build -gcflags=-S x.go
# command-line-arguments
"".f STEXT size=62 args=0x10 locals=0x0
	0x0000 00000 (/Users/rsc/x.go:3)	TEXT	"".f(SB), ABIInternal, $0-16
	0x0000 00000 (/Users/rsc/x.go:3)	MOVL	(TLS), CX
	0x0007 00007 (/Users/rsc/x.go:3)	CMPL	SP, 8(CX)
	0x000a 00010 (/Users/rsc/x.go:3)	JLS	55
	0x000c 00012 (/Users/rsc/x.go:3)	FUNCDATA	$0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x000c 00012 (/Users/rsc/x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x000c 00012 (/Users/rsc/x.go:3)	FUNCDATA	$2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x000c 00012 (/Users/rsc/x.go:4)	PCDATA	$0, $0
	0x000c 00012 (/Users/rsc/x.go:4)	PCDATA	$1, $0
	0x000c 00012 (/Users/rsc/x.go:4)	MOVL	"".x+8(SP), AX
	0x0010 00016 (/Users/rsc/x.go:4)	SHLL	$4, AX
	0x0013 00019 (/Users/rsc/x.go:4)	MOVL	"".x+4(SP), CX
	0x0017 00023 (/Users/rsc/x.go:4)	MOVL	CX, DX
	0x0019 00025 (/Users/rsc/x.go:4)	SHRL	$28, CX
	0x001c 00028 (/Users/rsc/x.go:4)	ORL	AX, CX
	0x001e 00030 (/Users/rsc/x.go:4)	SHLL	$4, DX
	0x0021 00033 (/Users/rsc/x.go:4)	SHRL	$4, DX
	0x0024 00036 (/Users/rsc/x.go:4)	MOVL	CX, AX
	0x0026 00038 (/Users/rsc/x.go:4)	SHLL	$28, CX
	0x0029 00041 (/Users/rsc/x.go:4)	ORL	DX, CX
	0x002b 00043 (/Users/rsc/x.go:4)	MOVL	CX, "".~r1+12(SP)
	0x002f 00047 (/Users/rsc/x.go:4)	SHRL	$4, AX
	0x0032 00050 (/Users/rsc/x.go:4)	MOVL	AX, "".~r1+16(SP)
	0x0036 00054 (/Users/rsc/x.go:4)	RET
	0x0037 00055 (/Users/rsc/x.go:4)	NOP
	0x0037 00055 (/Users/rsc/x.go:3)	PCDATA	$1, $-1
	0x0037 00055 (/Users/rsc/x.go:3)	PCDATA	$0, $-1
	0x0037 00055 (/Users/rsc/x.go:3)	CALL	runtime.morestack_noctxt(SB)
	0x003c 00060 (/Users/rsc/x.go:3)	JMP	0

@rsc rsc added this to the Go1.14 milestone Jun 26, 2019

@rsc rsc added the NeedsFix label Jun 26, 2019

@cherrymui

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

Emmm. I'm pretty sure it decomposes 64-bit mask to two 32-bit masks when I wrote the code a few years ago. https://go.googlesource.com/go/+/6d1aaf143c879ae7acbd20a1e60bf74681bf6055/src/cmd/compile/internal/ssa/gen/dec64.rules#88

This particular case is interesting, as the generic opt pass (runs before decompose pass) changes the AND to a pair of shifts (x<<4>>4), and the decomposing shifts is more complex.

If it is a different mask, which cannot be rewritten to shifts, it works as expected.

func f(x uint64) uint64 {
	return x & 0x1234567890abcdef
}
	0x0000 00000 (x64.go:4)	MOVW	"".x(FP), R0
	0x0004 00004 (x64.go:4)	AND	$-1867788817, R0, R0
	0x000c 00012 (x64.go:4)	MOVW	R0, "".~r1+8(FP)
	0x0010 00016 (x64.go:4)	MOVW	"".x+4(FP), R0
	0x0014 00020 (x64.go:4)	AND	$305419896, R0, R0
	0x001c 00028 (x64.go:4)	MOVW	R0, "".~r1+12(FP)
	0x0020 00032 (x64.go:4)	JMP	(R14)

Maybe we should either not do the mask-to-shift rewriting for 64-bit ops on 32-bit architectures, or somehow recognizes this pattern and do better in decomposing.

@gopherbot

This comment has been minimized.

Copy link

commented Jun 26, 2019

Change https://golang.org/cl/183957 mentions this issue: cmd/compile: not rewrite 64-bit AND to shifts on 32-bit machines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.