Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: regression: unnecessary spilling to the stack #61356

Closed
dominikh opened this issue Jul 14, 2023 · 10 comments
Closed

cmd/compile: regression: unnecessary spilling to the stack #61356

dominikh opened this issue Jul 14, 2023 · 10 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dominikh
Copy link
Member

What version of Go are you using (go version)?

$ go version
go version devel go1.21-c30faf9c54 Fri Jul 14 12:49:10 2023 +0000 linux/amd64

Does this issue reproduce with the latest release?

No.

What operating system and processor architecture are you using (go env)?

Linux, amd64

What did you do?

Compile

func pack20(in *[20]uint64) uint64 {
	var out uint64
	out |= 4
	out |= in[0] << 4
	out |= in[1] << 7
	out |= in[2] << 10
	out |= in[3] << 13
	out |= in[4] << 16
	out |= in[5] << 19
	out |= in[6] << 22
	out |= in[7] << 25
	out |= in[8] << 28
	out |= in[9] << 31
	out |= in[10] << 34
	out |= in[11] << 37
	out |= in[12] << 40
	out |= in[13] << 43
	out |= in[14] << 46
	out |= in[15] << 49
	out |= in[16] << 52
	out |= in[17] << 55
	out |= in[18] << 58
	out |= in[19] << 61
	return out
}

What did you expect to see?

Output similar to that of Go 1.20.6:

command-line-arguments.pack20 STEXT nosplit size=233 args=0x8 locals=0x0 funcid=0x0 align=0x0
	0x0000 00000 (/home/dominikh/prj/src/example.com/foo.go:3)	TEXT	command-line-arguments.pack20(SB), NOSPLIT|ABIInternal, $0-8
	0x0000 00000 (/home/dominikh/prj/src/example.com/foo.go:3)	FUNCDATA	$0, gclocals·wgcWObbY2HYnK2SU/U22lA==(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/foo.go:3)	FUNCDATA	$1, gclocals·J5F+7Qw7O7ve2QcWC7DpeQ==(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/foo.go:3)	FUNCDATA	$5, command-line-arguments.pack20.arginfo1(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/foo.go:3)	FUNCDATA	$6, command-line-arguments.pack20.argliveinfo(SB)
	0x0000 00000 (/home/dominikh/prj/src/example.com/foo.go:3)	PCDATA	$3, $1
	0x0000 00000 (/home/dominikh/prj/src/example.com/foo.go:6)	MOVQ	(AX), CX
	0x0003 00003 (/home/dominikh/prj/src/example.com/foo.go:6)	SHLQ	$4, CX
	0x0007 00007 (/home/dominikh/prj/src/example.com/foo.go:7)	MOVQ	8(AX), DX
	0x000b 00011 (/home/dominikh/prj/src/example.com/foo.go:7)	SHLQ	$7, DX
	0x000f 00015 (/home/dominikh/prj/src/example.com/foo.go:7)	ORQ	DX, CX
	0x0012 00018 (/home/dominikh/prj/src/example.com/foo.go:8)	MOVQ	16(AX), DX
	0x0016 00022 (/home/dominikh/prj/src/example.com/foo.go:8)	SHLQ	$10, DX
	0x001a 00026 (/home/dominikh/prj/src/example.com/foo.go:8)	ORQ	CX, DX
	0x001d 00029 (/home/dominikh/prj/src/example.com/foo.go:9)	MOVQ	24(AX), CX
	0x0021 00033 (/home/dominikh/prj/src/example.com/foo.go:9)	SHLQ	$13, CX
	0x0025 00037 (/home/dominikh/prj/src/example.com/foo.go:9)	ORQ	DX, CX
	0x0028 00040 (/home/dominikh/prj/src/example.com/foo.go:10)	MOVQ	32(AX), DX
	0x002c 00044 (/home/dominikh/prj/src/example.com/foo.go:10)	SHLQ	$16, DX
	0x0030 00048 (/home/dominikh/prj/src/example.com/foo.go:10)	ORQ	CX, DX
	0x0033 00051 (/home/dominikh/prj/src/example.com/foo.go:11)	MOVQ	40(AX), CX
	0x0037 00055 (/home/dominikh/prj/src/example.com/foo.go:11)	SHLQ	$19, CX
	0x003b 00059 (/home/dominikh/prj/src/example.com/foo.go:11)	ORQ	DX, CX
	0x003e 00062 (/home/dominikh/prj/src/example.com/foo.go:12)	MOVQ	48(AX), DX
	0x0042 00066 (/home/dominikh/prj/src/example.com/foo.go:12)	SHLQ	$22, DX
	0x0046 00070 (/home/dominikh/prj/src/example.com/foo.go:12)	ORQ	CX, DX
	0x0049 00073 (/home/dominikh/prj/src/example.com/foo.go:13)	MOVQ	56(AX), CX
	0x004d 00077 (/home/dominikh/prj/src/example.com/foo.go:13)	SHLQ	$25, CX
	0x0051 00081 (/home/dominikh/prj/src/example.com/foo.go:13)	ORQ	DX, CX
	0x0054 00084 (/home/dominikh/prj/src/example.com/foo.go:14)	MOVQ	64(AX), DX
	0x0058 00088 (/home/dominikh/prj/src/example.com/foo.go:14)	SHLQ	$28, DX
	0x005c 00092 (/home/dominikh/prj/src/example.com/foo.go:14)	ORQ	CX, DX
	0x005f 00095 (/home/dominikh/prj/src/example.com/foo.go:15)	MOVQ	72(AX), CX
	0x0063 00099 (/home/dominikh/prj/src/example.com/foo.go:15)	SHLQ	$31, CX
	0x0067 00103 (/home/dominikh/prj/src/example.com/foo.go:15)	ORQ	DX, CX
	0x006a 00106 (/home/dominikh/prj/src/example.com/foo.go:16)	MOVQ	80(AX), DX
	0x006e 00110 (/home/dominikh/prj/src/example.com/foo.go:16)	SHLQ	$34, DX
	0x0072 00114 (/home/dominikh/prj/src/example.com/foo.go:16)	ORQ	CX, DX
	0x0075 00117 (/home/dominikh/prj/src/example.com/foo.go:17)	MOVQ	88(AX), CX
	0x0079 00121 (/home/dominikh/prj/src/example.com/foo.go:17)	SHLQ	$37, CX
	0x007d 00125 (/home/dominikh/prj/src/example.com/foo.go:17)	ORQ	DX, CX
	0x0080 00128 (/home/dominikh/prj/src/example.com/foo.go:18)	MOVQ	96(AX), DX
	0x0084 00132 (/home/dominikh/prj/src/example.com/foo.go:18)	SHLQ	$40, DX
	0x0088 00136 (/home/dominikh/prj/src/example.com/foo.go:18)	ORQ	CX, DX
	0x008b 00139 (/home/dominikh/prj/src/example.com/foo.go:19)	MOVQ	104(AX), CX
	0x008f 00143 (/home/dominikh/prj/src/example.com/foo.go:19)	SHLQ	$43, CX
	0x0093 00147 (/home/dominikh/prj/src/example.com/foo.go:19)	ORQ	DX, CX
	0x0096 00150 (/home/dominikh/prj/src/example.com/foo.go:20)	MOVQ	112(AX), DX
	0x009a 00154 (/home/dominikh/prj/src/example.com/foo.go:20)	SHLQ	$46, DX
	0x009e 00158 (/home/dominikh/prj/src/example.com/foo.go:20)	ORQ	CX, DX
	0x00a1 00161 (/home/dominikh/prj/src/example.com/foo.go:21)	MOVQ	120(AX), CX
	0x00a5 00165 (/home/dominikh/prj/src/example.com/foo.go:21)	SHLQ	$49, CX
	0x00a9 00169 (/home/dominikh/prj/src/example.com/foo.go:21)	ORQ	DX, CX
	0x00ac 00172 (/home/dominikh/prj/src/example.com/foo.go:22)	MOVQ	128(AX), DX
	0x00b3 00179 (/home/dominikh/prj/src/example.com/foo.go:22)	SHLQ	$52, DX
	0x00b7 00183 (/home/dominikh/prj/src/example.com/foo.go:22)	ORQ	CX, DX
	0x00ba 00186 (/home/dominikh/prj/src/example.com/foo.go:23)	MOVQ	136(AX), CX
	0x00c1 00193 (/home/dominikh/prj/src/example.com/foo.go:23)	SHLQ	$55, CX
	0x00c5 00197 (/home/dominikh/prj/src/example.com/foo.go:23)	ORQ	DX, CX
	0x00c8 00200 (/home/dominikh/prj/src/example.com/foo.go:24)	MOVQ	144(AX), DX
	0x00cf 00207 (/home/dominikh/prj/src/example.com/foo.go:24)	SHLQ	$58, DX
	0x00d3 00211 (/home/dominikh/prj/src/example.com/foo.go:24)	ORQ	CX, DX
	0x00d6 00214 (/home/dominikh/prj/src/example.com/foo.go:25)	MOVQ	152(AX), AX
	0x00dd 00221 (/home/dominikh/prj/src/example.com/foo.go:25)	SHLQ	$61, AX
	0x00e1 00225 (/home/dominikh/prj/src/example.com/foo.go:25)	ORQ	DX, AX
	0x00e4 00228 (/home/dominikh/prj/src/example.com/foo.go:25)	ORQ	$4, AX
	0x00e8 00232 (/home/dominikh/prj/src/example.com/foo.go:26)	RET

What did you see instead?

command-line-arguments.pack20 STEXT nosplit size=314 args=0x8 locals=0x40 funcid=0x0 align=0x0
	0x0000 00000 (/home/dominikh/prj/src/example.com/foo.go:3)	TEXT	command-line-arguments.pack20(SB), NOSPLIT|ABIInternal, $64-8
	0x0000 00000 (/home/dominikh/prj/src/example.com/foo.go:3)	PUSHQ	BP
	0x0001 00001 (/home/dominikh/prj/src/example.com/foo.go:3)	MOVQ	SP, BP
	0x0004 00004 (/home/dominikh/prj/src/example.com/foo.go:3)	SUBQ	$56, SP
	0x0008 00008 (/home/dominikh/prj/src/example.com/foo.go:3)	FUNCDATA	$0, gclocals·wgcWObbY2HYnK2SU/U22lA==(SB)
	0x0008 00008 (/home/dominikh/prj/src/example.com/foo.go:3)	FUNCDATA	$1, gclocals·J5F+7Qw7O7ve2QcWC7DpeQ==(SB)
	0x0008 00008 (/home/dominikh/prj/src/example.com/foo.go:3)	FUNCDATA	$5, command-line-arguments.pack20.arginfo1(SB)
	0x0008 00008 (/home/dominikh/prj/src/example.com/foo.go:3)	FUNCDATA	$6, command-line-arguments.pack20.argliveinfo(SB)
	0x0008 00008 (/home/dominikh/prj/src/example.com/foo.go:3)	PCDATA	$3, $1
	0x0008 00008 (/home/dominikh/prj/src/example.com/foo.go:6)	MOVQ	8(AX), CX
	0x000c 00012 (/home/dominikh/prj/src/example.com/foo.go:8)	MOVQ	16(AX), DX
	0x0010 00016 (/home/dominikh/prj/src/example.com/foo.go:9)	MOVQ	24(AX), BX
	0x0014 00020 (/home/dominikh/prj/src/example.com/foo.go:10)	MOVQ	32(AX), SI
	0x0018 00024 (/home/dominikh/prj/src/example.com/foo.go:11)	MOVQ	40(AX), DI
	0x001c 00028 (/home/dominikh/prj/src/example.com/foo.go:12)	MOVQ	48(AX), R8
	0x0020 00032 (/home/dominikh/prj/src/example.com/foo.go:13)	MOVQ	56(AX), R9
	0x0024 00036 (/home/dominikh/prj/src/example.com/foo.go:14)	MOVQ	64(AX), R10
	0x0028 00040 (/home/dominikh/prj/src/example.com/foo.go:15)	MOVQ	72(AX), R11
	0x002c 00044 (/home/dominikh/prj/src/example.com/foo.go:16)	MOVQ	80(AX), R12
	0x0030 00048 (/home/dominikh/prj/src/example.com/foo.go:17)	MOVQ	88(AX), R13
	0x0034 00052 (/home/dominikh/prj/src/example.com/foo.go:18)	MOVQ	96(AX), R15
	0x0038 00056 (/home/dominikh/prj/src/example.com/foo.go:18)	MOVQ	R15, command-line-arguments..autotmp_3+48(SP)
	0x003d 00061 (/home/dominikh/prj/src/example.com/foo.go:19)	MOVQ	104(AX), R15
	0x0041 00065 (/home/dominikh/prj/src/example.com/foo.go:19)	MOVQ	R15, command-line-arguments..autotmp_4+40(SP)
	0x0046 00070 (/home/dominikh/prj/src/example.com/foo.go:20)	MOVQ	112(AX), R15
	0x004a 00074 (/home/dominikh/prj/src/example.com/foo.go:20)	MOVQ	R15, command-line-arguments..autotmp_5+32(SP)
	0x004f 00079 (/home/dominikh/prj/src/example.com/foo.go:21)	MOVQ	120(AX), R15
	0x0053 00083 (/home/dominikh/prj/src/example.com/foo.go:21)	MOVQ	R15, command-line-arguments..autotmp_6+24(SP)
	0x0058 00088 (/home/dominikh/prj/src/example.com/foo.go:22)	MOVQ	128(AX), R15
	0x005f 00095 (/home/dominikh/prj/src/example.com/foo.go:22)	MOVQ	R15, command-line-arguments..autotmp_7+16(SP)
	0x0064 00100 (/home/dominikh/prj/src/example.com/foo.go:23)	MOVQ	136(AX), R15
	0x006b 00107 (/home/dominikh/prj/src/example.com/foo.go:23)	MOVQ	R15, command-line-arguments..autotmp_8+8(SP)
	0x0070 00112 (/home/dominikh/prj/src/example.com/foo.go:24)	MOVQ	144(AX), R15
	0x0077 00119 (/home/dominikh/prj/src/example.com/foo.go:24)	MOVQ	R15, command-line-arguments..autotmp_9(SP)
	0x007b 00123 (/home/dominikh/prj/src/example.com/foo.go:25)	MOVQ	152(AX), R15
	0x0082 00130 (/home/dominikh/prj/src/example.com/foo.go:6)	MOVQ	(AX), AX
	0x0085 00133 (/home/dominikh/prj/src/example.com/foo.go:6)	SHLQ	$4, AX
	0x0089 00137 (/home/dominikh/prj/src/example.com/foo.go:7)	SHLQ	$7, CX
	0x008d 00141 (/home/dominikh/prj/src/example.com/foo.go:7)	ORQ	CX, AX
	0x0090 00144 (/home/dominikh/prj/src/example.com/foo.go:8)	SHLQ	$10, DX
	0x0094 00148 (/home/dominikh/prj/src/example.com/foo.go:8)	ORQ	DX, AX
	0x0097 00151 (/home/dominikh/prj/src/example.com/foo.go:9)	SHLQ	$13, BX
	0x009b 00155 (/home/dominikh/prj/src/example.com/foo.go:9)	ORQ	BX, AX
	0x009e 00158 (/home/dominikh/prj/src/example.com/foo.go:10)	SHLQ	$16, SI
	0x00a2 00162 (/home/dominikh/prj/src/example.com/foo.go:10)	ORQ	SI, AX
	0x00a5 00165 (/home/dominikh/prj/src/example.com/foo.go:11)	SHLQ	$19, DI
	0x00a9 00169 (/home/dominikh/prj/src/example.com/foo.go:11)	ORQ	DI, AX
	0x00ac 00172 (/home/dominikh/prj/src/example.com/foo.go:12)	SHLQ	$22, R8
	0x00b0 00176 (/home/dominikh/prj/src/example.com/foo.go:12)	ORQ	R8, AX
	0x00b3 00179 (/home/dominikh/prj/src/example.com/foo.go:13)	SHLQ	$25, R9
	0x00b7 00183 (/home/dominikh/prj/src/example.com/foo.go:13)	ORQ	R9, AX
	0x00ba 00186 (/home/dominikh/prj/src/example.com/foo.go:14)	SHLQ	$28, R10
	0x00be 00190 (/home/dominikh/prj/src/example.com/foo.go:14)	ORQ	R10, AX
	0x00c1 00193 (/home/dominikh/prj/src/example.com/foo.go:15)	SHLQ	$31, R11
	0x00c5 00197 (/home/dominikh/prj/src/example.com/foo.go:15)	ORQ	R11, AX
	0x00c8 00200 (/home/dominikh/prj/src/example.com/foo.go:16)	SHLQ	$34, R12
	0x00cc 00204 (/home/dominikh/prj/src/example.com/foo.go:16)	ORQ	R12, AX
	0x00cf 00207 (/home/dominikh/prj/src/example.com/foo.go:17)	SHLQ	$37, R13
	0x00d3 00211 (/home/dominikh/prj/src/example.com/foo.go:17)	ORQ	R13, AX
	0x00d6 00214 (/home/dominikh/prj/src/example.com/foo.go:18)	MOVQ	command-line-arguments..autotmp_3+48(SP), CX
	0x00db 00219 (/home/dominikh/prj/src/example.com/foo.go:18)	SHLQ	$40, CX
	0x00df 00223 (/home/dominikh/prj/src/example.com/foo.go:18)	ORQ	CX, AX
	0x00e2 00226 (/home/dominikh/prj/src/example.com/foo.go:19)	MOVQ	command-line-arguments..autotmp_4+40(SP), CX
	0x00e7 00231 (/home/dominikh/prj/src/example.com/foo.go:19)	SHLQ	$43, CX
	0x00eb 00235 (/home/dominikh/prj/src/example.com/foo.go:19)	ORQ	CX, AX
	0x00ee 00238 (/home/dominikh/prj/src/example.com/foo.go:20)	MOVQ	command-line-arguments..autotmp_5+32(SP), CX
	0x00f3 00243 (/home/dominikh/prj/src/example.com/foo.go:20)	SHLQ	$46, CX
	0x00f7 00247 (/home/dominikh/prj/src/example.com/foo.go:20)	ORQ	CX, AX
	0x00fa 00250 (/home/dominikh/prj/src/example.com/foo.go:21)	MOVQ	command-line-arguments..autotmp_6+24(SP), CX
	0x00ff 00255 (/home/dominikh/prj/src/example.com/foo.go:21)	SHLQ	$49, CX
	0x0103 00259 (/home/dominikh/prj/src/example.com/foo.go:21)	ORQ	CX, AX
	0x0106 00262 (/home/dominikh/prj/src/example.com/foo.go:22)	MOVQ	command-line-arguments..autotmp_7+16(SP), CX
	0x010b 00267 (/home/dominikh/prj/src/example.com/foo.go:22)	SHLQ	$52, CX
	0x010f 00271 (/home/dominikh/prj/src/example.com/foo.go:22)	ORQ	CX, AX
	0x0112 00274 (/home/dominikh/prj/src/example.com/foo.go:23)	MOVQ	command-line-arguments..autotmp_8+8(SP), CX
	0x0117 00279 (/home/dominikh/prj/src/example.com/foo.go:23)	SHLQ	$55, CX
	0x011b 00283 (/home/dominikh/prj/src/example.com/foo.go:23)	ORQ	CX, AX
	0x011e 00286 (/home/dominikh/prj/src/example.com/foo.go:24)	MOVQ	command-line-arguments..autotmp_9(SP), CX
	0x0122 00290 (/home/dominikh/prj/src/example.com/foo.go:24)	SHLQ	$58, CX
	0x0126 00294 (/home/dominikh/prj/src/example.com/foo.go:24)	ORQ	CX, AX
	0x0129 00297 (/home/dominikh/prj/src/example.com/foo.go:25)	SHLQ	$61, R15
	0x012d 00301 (/home/dominikh/prj/src/example.com/foo.go:25)	ORQ	R15, AX
	0x0130 00304 (/home/dominikh/prj/src/example.com/foo.go:25)	ORQ	$4, AX
	0x0134 00308 (/home/dominikh/prj/src/example.com/foo.go:26)	ADDQ	$56, SP
	0x0138 00312 (/home/dominikh/prj/src/example.com/foo.go:26)	POPQ	BP
	0x0139 00313 (/home/dominikh/prj/src/example.com/foo.go:26)	RET

Go 1.21 seems to be loading all of the values up front, only to spill some of them to the stack. This seems strictly worse than the old output.

/cc @prattmic

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 14, 2023
@prattmic
Copy link
Member

cc @golang/compiler @randall77

@prattmic
Copy link
Member

@dominikh noted in chat that the slice equivalent of this function still compiles at tip to pretty much the same thing as the 1.20 output above: https://go.dev/play/p/Ty9G09raoMM.go

@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 14, 2023
@prattmic prattmic added this to the Go1.21 milestone Jul 14, 2023
@randall77
Copy link
Contributor

This is almost certainly the new scheduler: https://go-review.googlesource.com/c/go/+/270940

Both the new and the old scheduler basically have no heuristics here. There are lots of scheduleable values at one time and the scheduler choses kind of arbitrarily. In particular, the arbitrary choice between issuing another load and issuing a shift/or.
So although the old scheduler got this one right, it was not because of any explicit reason - it just got lucky.

I'm not sure why the slice version is different. Maybe the value numbering is different which causes a different arbitrary choice to be made.

This problem is a bit tricky because I think we need to encode the size of the register set somehow. We do want to issue the loads early when there are only a few of them. But when there are lots of them we want to prioritize issuing uses of those loads.

@randall77
Copy link
Contributor

Ah, the difference between array and slice is from here: https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/schedule.go#L68
There's a special case for the entry block that upsets the otherwise "schedule in program order" tie breaker.
@dr2chase

@randall77
Copy link
Contributor

Hm, looks like that special case isn't needed (anymore?). Also looks like I'm the one that added it.
I think we can just delete it. There's still the underlying issue that the scheduler is pretty simplistic in its priority decisions, but at least it fixes the immediate issue.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/509856 mentions this issue: cmd/compile: get rid of special case in scheduler for entry block

@dominikh
Copy link
Member Author

Another example that got worse with Go 1.21, even with the CL applied: https://go.dev/play/p/21VpDblOelZ.go (only the extra spilling is a regression, the other issues with the generated code already exist in Go 1.20.)

@randall77
Copy link
Contributor

That special case was needed to pass TestInlineLines in cmd/compile/internal/ssa/debug_lines_test.go:225 when I checked in that code in CL 270940. Since then CL 468455 modified that test subtly. With that test modification, the special case isn't needed anymore.

@randall77
Copy link
Contributor

@dominikh That looks like a more general problem of the SETCS opcode (and friends) not having an indexed store version.
Could you open a separate issue?

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/510435 mentions this issue: cmd/compile: add indexed SET* opcodes for amd64

@randall77 randall77 modified the milestones: Go1.21, Go1.22 Jul 17, 2023
gopherbot pushed a commit that referenced this issue Jul 26, 2023
Update #61356

Change-Id: I391af98563b1c068208784c80ea736c78c29639d
Reviewed-on: https://go-review.googlesource.com/c/go/+/510435
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants