Skip to content

cmd/compile: prefer to cheaply rematerialize than copy registers #24132

@josharian

Description

@josharian
package p

func f(x int) (uint32, uint32) {
	var a, b uint32
	for {
		a++
		if x == 0 {
			break
		}
		x--
		b += 2
	}
	return a, b
}

This compiles as:

"".f STEXT nosplit size=33 args=0x10 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f(SB), NOSPLIT, $0-16
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:3)	MOVQ	"".x+8(SP), AX
	0x0005 00005 (x.go:3)	XORL	CX, CX
	0x0007 00007 (x.go:3)	MOVL	CX, DX
	0x0009 00009 (x.go:5)	JMP	17
	0x000b 00011 (x.go:10)	DECQ	AX
	0x000e 00014 (x.go:11)	ADDL	$2, DX
	0x0011 00017 (x.go:6)	INCL	CX
	0x0013 00019 (x.go:7)	TESTQ	AX, AX
	0x0016 00022 (x.go:7)	JNE	11
	0x0018 00024 (x.go:13)	MOVL	CX, "".~r1+16(SP)
	0x001c 00028 (x.go:13)	MOVL	DX, "".~r2+20(SP)
	0x0020 00032 (x.go:13)	RET

This issue is about instruction 0x0007, MOVL CX, DX. I think we should prefer XORL DX, DX. The reasoning is that it is shorter (2 bytes instead of 4) and avoids false dependencies between registers. This is only preferable when rematerialization is cheaper than a register copy, which are special but common cases, like zeroing.

I believe that the modification to regalloc should occur in processDest, but that's as far as I got.

cc @cherrymui @randall77

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions