New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize away some MOVQconverts #21572

Open
josharian opened this Issue Aug 23, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@josharian
Contributor

josharian commented Aug 23, 2017

We have to keep uintptrs and unsafe.Pointers separate, to get accurate stackmaps for the compiler. However, in some cases, this generates unnecessary register moves.

Here's the example from the runtime I'm looking at. mapaccess1_fast32 currently ends:

	for {
		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
			if *(*uint32)(k) == key && b.tophash[i] != empty {
				return add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.valuesize))
			}
		}
		b = b.overflow(t)
		if b == nil {
			return unsafe.Pointer(&zeroVal[0])
		}
	}

This has an unnecessary nil check of b in the inner loop when evaluating b.tophash, so I'd like to change the outer loop structure to remove it:

	for ; b != nil; b = b.overflow(t) {
		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
			if *(*uint32)(k) == key && b.tophash[i] != empty {
				return add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.valuesize))
			}
		}
	}
	return unsafe.Pointer(&zeroVal[0])

With this new structure, the nil check is gone, but we now have an extra register-register move, instruction 0x009f:

	0x0096 00150 (hashmap_fast.go:42)	MOVQ	"".t+40(SP), CX
	0x009b 00155 (hashmap_fast.go:42)	MOVWLZX	84(CX), DX
	0x009f 00159 (hashmap_fast.go:42)	MOVQ	AX, BX
	0x00a2 00162 (hashmap_fast.go:42)	LEAQ	-8(BX)(DX*1), DX
	0x00a7 00167 (hashmap_fast.go:42)	TESTB	AL, (CX)
	0x00a9 00169 (hashmap_fast.go:42)	MOVQ	(DX), AX
	0x00ac 00172 (hashmap_fast.go:42)	TESTQ	AX, AX
	0x00af 00175 (hashmap_fast.go:42)	JEQ	185

The register-register move is there because calculating b.overflow involves a uintptr/unsafe.Pointer conversion, which gets translated into a MOVQconvert; regalloc allocates a register for the converted value. However, the register move is pointless; the destination register (BX) is used in an LEAQ instruction and is dead thereafter.

In general, it seems that we should be able to rewrite away some MOVQconverts when they are used once, immediately, as part of some pointer math, which is the typical usage. The hard part is making sure that the rewrite rules are safe.

This should help codegen for the runtime, which does lots of pointer arithmetic.

cc @randall77

@josharian josharian added this to the Unplanned milestone Aug 23, 2017

@randall77

This comment has been minimized.

Contributor

randall77 commented Aug 23, 2017

Maybe we can just mark MOVQconvert as resultInArg0. The register allocator will insert a copy if really needed, but hopefully will get rid of it most of the time.
The SSA->Prog conversion for MOVQconvert would just be a no-op.

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 24, 2017

That works beautifully on 386+amd64, thanks!

On other architectures, it fails because the g reg is allowed as an input but not an output for MOVWconvert (and friends). I'm not sure how best to approach that. In the meantime, will mail a CL for x86 only.

@gopherbot

This comment has been minimized.

gopherbot commented Aug 24, 2017

Change https://golang.org/cl/58371 mentions this issue: cmd/compile: mark MOVQconvert as resultInArg0 on x86 architectures

@gopherbot

This comment has been minimized.

gopherbot commented Aug 24, 2017

Change https://golang.org/cl/58372 mentions this issue: runtime: refactor walking of bucket overflows

gopherbot pushed a commit that referenced this issue Aug 24, 2017

cmd/compile: mark MOVQconvert as resultInArg0 on x86 architectures
This prevents unnecessary reg-reg moves during pointer arithmetic.

This change reduces the size of the full hello world binary by 0.4%.

Updates #21572

Change-Id: Ia0427021e5c94545a0dbd83a6801815806e5b12d
Reviewed-on: https://go-review.googlesource.com/58371
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Keith Randall <khr@golang.org>

gopherbot pushed a commit that referenced this issue Aug 24, 2017

runtime: refactor walking of bucket overflows
This eliminates a nil check of b while evaluating b.tophash,
which is in the inner loop of many hot map functions.
It also makes the code a bit clearer.

Also remove some gotos in favor of labeled breaks.

On non-x86 architectures, this change introduces a pointless reg-reg move,
although the cause is well-understood (#21572).

Change-Id: Ib7ee58b59ea5463b92e1590c8b8f5c0ef87d410a
Reviewed-on: https://go-review.googlesource.com/58372
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment