Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pass: register allocation failure involving X0, Y0 #65

Open
mmcloughlin opened this issue Mar 18, 2019 · 7 comments
Open

pass: register allocation failure involving X0, Y0 #65

mmcloughlin opened this issue Mar 18, 2019 · 7 comments
Labels
bug Something isn't working

Comments

@mmcloughlin
Copy link
Owner

mmcloughlin commented Mar 18, 2019

@zeebo reports that:

    VMOVDQU(data, X0)
    VMOVDQU(key, X1)
    VINSERTI128(Imm(1), data.Offset(16), Y0, Y0)
    VINSERTI128(Imm(1), key.Offset(16), Y1, Y1)

fails with an impossible register allocation. This is a direct translation from

  4005e0:    c5 fa 6f 06              vmovdqu (%rsi),%xmm0
  4005e4:    c5 fa 6f 0a              vmovdqu (%rdx),%xmm1
  4005e8:    c4 e3 7d 38 46 10 01     vinserti128 $0x1,0x10(%rsi),%ymm0,%ymm0
  4005ef:    c4 e3 75 38 4a 10 01     vinserti128 $0x1,0x10(%rdx),%ymm1,%ymm1

Also

using virtual registers avoids the impossible allocation, but generates the wrong code. the xmm registers aren't used after that line, and so it just loads both into X0, and avoids Y0 in the VINSERTI128

Full example: https://play.golang.org/p/NwsVZI9CVef (fails with 16 loop iterations). Note it was possible to work around the problem by lifting the allocations out of the loop.

Slack discussion https://gophers.slack.com/archives/C0VP8EF3R/p1552876283318100

@zeebo
Copy link

zeebo commented Jan 21, 2020

I hit this again, recently. I was trying to write my own abstraction around register allocation with some basic automatic stack spills, and so it was handing out physical YMM registers. Unfortunately, I can't use YMM and XMM registers at the same time, even if I do something like Y0.AsX().

It turns out that quite a bit of patterns use the XMM registers to read the lower halves of YMM registers.

@mmcloughlin
Copy link
Owner Author

@zeebo turns out the register allocator has a fundamental flaw in the way it handles aliased registers, see #100. I assume your issue may have the same root cause, and if so the PR #121 might help you. Would you be able to try that and report back if it resolves your problem?

Either way, it would be great if you could share some code, both to help debug and to serve as a regression test.

@zeebo
Copy link

zeebo commented Jan 21, 2020

Thanks for the pointer to that PR. It seems to do significantly better, but it's not quite fixed yet. Here's a simple reproducer:

func TestAsX(c ctx) {
	TEXT("test", NOSPLIT, "func()")
	VINSERTI128(Imm(1), Y0.AsX(), Y1, Y2)
	RET()
}

This fails with non physical register found. Please let me know if I can help in any other way.

@mmcloughlin
Copy link
Owner Author

Thanks! That's with the PR applied? If so I'm pretty sure I know what the problem is.

@zeebo
Copy link

zeebo commented Jan 21, 2020

Yeah, I'm running on a27b743.

mmcloughlin added a commit that referenced this issue Jan 23, 2020
Test for casting a physical register.

Updates #65
@mmcloughlin
Copy link
Owner Author

Using #6 on this example we see that liveness analysis is reporting many registers live in their top 128-bits:

function {
	name       = "accum"
	attributes = 0x0004 NOSPLIT
	doc        = []string(nil)
	signature  = (acc *byte, data *byte, key *byte)
	local size = 0
	nodes {
		instruction {
			addr        = 0x0xc0000f8000
			opcode      = "MOVQ"
			terminal    = false
			branch      = false
			conditional = false
			operands {
				0: acc+0(FP)
				1: <virtual:0:1:8>
			}
			inputs {
				0: acc+0(FP)
			}
			outputs {
				0: <virtual:0:1:8>
			}
			pred {
			}
			succ {
				0x0xc0000f80c0
			}
			livein {
				00030201: 20
				00010201: 20
				000d0201: 20
				001e0201: 20
				00090201: 20
				001f0201: 20
				00150201: 20
				001b0201: 20
				00060201: 20
				00180201: 20
				00270201: 20
				001c0201: 20
				00130201: 20
				000f0201: 20
				00280201: 20
				00040201: 20
				002b0201: 20
				002a0201: 20
				00070201: 20
				00000201: 20
				00240201: 20
				000a0201: 20
				002e0201: 20
				002d0201: 20
				00100201: 20
				00120201: 20
				00160201: 20
				000c0201: 20
				00250201: 20
				00210201: 20
				00190201: 20
				00220201: 20
			}
			...

@mmcloughlin
Copy link
Owner Author

Okay the problem is caused by the following:

  1. According to avo the VINSERTI128 instruction needs to read the entire 256-bits of y0. In fact it only depends on one half, but which half is determined by the immediate value.
  2. avo thinks that the VMOVDQU writing to x0 only writes the low 128 bits of y0. In fact in VEX.128 encoded mode this will actually write the whole 256 bit register (zeroing the high 128 bits), but avo does not know this pass: handle upper half zeroing in vector registers #128.

The result of these two problems is that avo sees a false dependency on the high 128-bits of y0, and the same with y1. Therefore it incorrectly thinks there are 32 live registers, preventing register allocation.

Fixing either of the two problems would be enough. VINSERTI128 is somewhat of a special case (given the "dynamic" dependency on the immediate value), but #128 is probably easier to fix and would cause a wider class of problems.

mmcloughlin added a commit that referenced this issue Jan 28, 2020
Adds a regression test based on klauspost/compress#186. This necessitated some related changes:

* Mark "RET" as a terminal instruction
* printer refactor to maintain compatibility with asmfmt
* Tweaks to other regression tests to ensure they are run correctly in CI

Updates #100 #65 #8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants