reg,pass: refactor allocation of aliased registers #121

mmcloughlin · 2020-01-19T08:31:30Z

Issue #100 demonstrated that register allocation for aliased registers is
fundamentally broken. The root of the issue is that currently accesses to the
same virtual register with different masks are treated as different registers.
This PR takes a different approach:

Liveness analysis is masked: we now properly consider which parts of a register are live
Register allocation produces a mapping from virtual to physical ID, and aliasing is applied later

klauspost · 2020-01-19T11:14:00Z

This is super awesome. All my code "compiles" now and the power feels... overwhelming.

Still have no idea if my code works, but at least now I can generate it and start debugging 👍

Untested.

codecov-io · 2020-01-20T00:05:48Z

Codecov Report

Merging #121 into master will increase coverage by 0.01%.
The diff coverage is 91.03%.

@@            Coverage Diff             @@
##           master     #121      +/-   ##
==========================================
+ Coverage    77.6%   77.62%   +0.01%     
==========================================
  Files          55       55              
  Lines       33326    33376      +50     
==========================================
+ Hits        25864    25909      +45     
  Misses       7354     7354              
- Partials      108      113       +5

Flag	Coverage Δ
#integration	`6.4% <81.6%> (+0.18%)`	⬆️
#unittests	`75.58% <50.94%> (-0.09%)`	⬇️

Impacted Files	Coverage Δ
pass/pass.go	`61.53% <ø> (ø)`	⬆️
ir/ir.go	`90.47% <ø> (ø)`	⬆️
operand/types.go	`95.77% <100%> (ø)`	⬆️
build/global.go	`80.43% <100%> (+0.88%)`	⬆️
reg/collection.go	`100% <100%> (+6.25%)`	⬆️
reg/x86.go	`100% <100%> (ø)`	⬆️
reg/types.go	`84.25% <81.25%> (-6.33%)`	⬇️
pass/reg.go	`84.37% <87.5%> (-3.13%)`	⬇️
reg/set.go	`87.27% <93.18%> (+5.22%)`	⬆️
pass/alloc.go	`89.79% <95.34%> (-0.21%)`	⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 126469f...8c19550. Read the comment docs.

klauspost · 2020-01-20T17:25:15Z

It seems like something is still a bit weird. I have two pieces of code that after some changes that shouldn't cause allocation changes are now causing issues.

Case 1

This is probably the easiest to understand. In this case it simply skips over the piece of code that are inlined below. Adding the branch should not make any difference on allocation, but it seems to do so.

Case 2

This skips a whole bunch of code if taken, but none of the registers allocated in that piece of code (rep, left, right) is referenced in where it is jumping to. The registers at the destination should already be carried over because the rest of the code can end up there anyway.

It can easily be a problem in my code, which is why it would be great to have more info than "failed to allocate registers".

klauspost · 2020-01-20T18:08:42Z

FWIW, this commit is where it 'broke'. No other changes than size-related AFAICT.

I am looking at changing parts of it so I can see where it breaks.

Edit: ok, made a "mimimal" reproducer:

These changes breaks the allocator. Before this change things are fine, but when the changes are applied it runs out of registers. There is nothing in the change that by my understanding should change register allocation since we are just using 'As32()' to get the 32 bit equivalent.

mmcloughlin · 2020-01-20T18:55:35Z

@klauspost Sorry there's another problem, and thanks for your effort debugging this!

There's something I've been aware of but didn't think was going to be an immediate problem, specifically the behavior of 32-bit operands in 64-bit mode. From the Intel Manual:

When in 64-bit mode, operand size determines the number of valid bits in the destination general-purpose register:

64-bit operands generate a 64-bit result in the destination general-purpose register.

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.

8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.

This means that 32-bit instructions will actually cause a write to the full 64-bit register. However the current implementation in avo does not handle this correctly: it sees a write to a 32-bit register and doesn't account for the zero-extension. So if a 32-bit write is followed by a 64-bit read of the same register, liveness analysis would consider there to be a dependency on the high 32-bits of the register, when in fact there is no such dependency.

The fact that your problem manifests when you change sizes from 64-bit to 32-bit makes me think the above issue might be to blame. However this is just a guess at this point, I'll have to dig into the details (I'll have time later today).

Updates #6 #100 #121

klauspost · 2020-01-20T21:15:23Z

Not completely sure I understand what it does. Kinda just assumed a GP64() lived as long as it was referenced (or an AsXX(), referring to the same register).

Anyway, if you don't crack it, a workaround would be great. I can at least run parts of the code now, and I got a reasonable setup with delve, so I can actually debug it somewhat, so making progress.

mmcloughlin · 2020-01-20T21:15:59Z

Okay, using my experimental debug printer (https://github.com/mmcloughlin/avo/compare/regalloc-debug) we see

function {
	name       = "encodeBlockAsm"
	attributes = 0x0000 0
	doc        = []string{"encodeBlockAsm encodes a non-empty src to a guaranteed-large-enough dst.", "It assumes that the varint-encoded length of the decompressed bytes has already been written.", ""}
	signature  = (dst []byte, src []byte) int
	local size = 65568
	nodes {
		instruction {
			addr        = 0x0xc00010a000
			opcode      = "MOVQ"
			terminal    = false
			branch      = false
			conditional = false
			operands {
				0: $0x00000200
				1: AX
			}
			inputs {
			}
			outputs {
				0: <virtual:0:1:8>
			}
			pred {
			}
			succ {
				0x0xc00010a0c0
			}
			livein {
				000b0101: 08
				00220101: 08
				001a0101: 08
				00250101: 08
				005e0101: 08
			}
			liveout {
				00000101: 0f
				000b0101: 08
				00220101: 08
				001a0101: 08
				00250101: 08
				005e0101: 08
			}
		}
		...

Note the livein and liveout sections show a map from register ID to live mask. The 0x08 masks here indicate that avo thinks the high 32 bits of those registers are live, lending some credence to the theory in my last comment.

klauspost · 2020-01-20T21:37:13Z

ok, I managed to work around it by not converting registers as much.

mmcloughlin · 2020-01-20T22:43:58Z

ok, I managed to work around it by not converting registers as much.

Good to hear. I'm going to try to work on a proper fix for this.

Not completely sure I understand what it does. Kinda just assumed a GP64() lived as long as it was referenced (or an AsXX(), referring to the same register).

Just to clarify what the problem is, consider the following:

...
x := GP64()
MOVW(U16(42), x.As16())
ADDQ(x, ...)
...

The move instruction only writes the low 16 bits, and the add consume the full 64 bits. Therefore avo liveness analysis would consider the high 48 bits to be live prior to the move instruction (all the way to the beginning of the function if there's no other write to x). This is actually correct, since a 16-bit operation will only write to the low 16 bits of the register. But what about the same thing in 32 bit?

...
x := GP64()
MOVL(U32(42), x.As32())
ADDQ(x, ...)
...

This should not exhibit the same problem as the 16 bit version, because in x86-64 the 32-bit write caused by MOVL will be zero extended to a 64-bit write. Hence it's just a 64-bit write followed by a 64-bit read, perfectly fine. The problem is that avo at the moment does not account for the zero-extension, and it sees a 32-bit write followed by a 64-bit read, and thinks that the upper 32 bits of x is live prior to the move instruction.

Ultimately this means avo thinks there are more live registers than there actually are, and in your case this is bad enough it prevented register allocation. If I have avo account for the zero-extension, the problem should be resolved. I hope 😅

Test to ensure that `avo` correctly accounts for zero-extension of 32-bit operands to 64 bits. Updates #121

Updates #121

Test to ensure that `avo` correctly accounts for zero-extension of 32-bit operands to 64 bits. Updates #121

Updates #121

mmcloughlin · 2020-01-21T06:28:37Z

@klauspost I have added an additional pass that handles the 32-bit zero extension issue I described. With this change the code you referenced in #121 (comment) doesn't fail allocation.

Please let me know how you get on with this new version. I actually think your change in klauspost/compress@214d2e2 is probably preferable anyway, not just as a workaround for this problem.

Once again, thanks for working through this with me 😄

Requires all virtual registers to specify a mask rather than just a size. Currently avo allows specifying an 8-bit register, which could resolve to a low or high byte register. This complicates liveness analysis and register allocation for what ends up being a very niche use case. Removes the `reg.Width` type since this is no longer used anywhere. Updates #100

Provides register IDs. These will be used to help unify handling of physical/virtual registers in liveness and allocation phases. Updates #100

For liveness and allocation we actually want to know which parts of each register are live at each time. The `reg.Set` structure was not correct. This diff replaces it with `reg.MaskSet`, mapping a register ID to the active mask. Updates #100

Refactors the register allocator to be based on IDs, with masks applied later. This is just the result of hacking and slashing; it builds but has not been tested at all. Updates #100

Refactored allocator works on the ./examples/... generators. Updates #100

Small example that triggers the aliasing problem. The refactored allocator does not have the same problem. Updates #100

Create a synthentic test case to verify correct handling of masking in liveness analysis and the allocator. Updates #100

Test to ensure that `avo` correctly accounts for zero-extension of 32-bit operands to 64 bits. Updates #121

By writing to all the registers early on, we ensure the allocation will actually fail.

Updates #121

Test for casting a physical register. Updates #65

klauspost added a commit to klauspost/compress that referenced this pull request Jan 19, 2020

Enable all assembly, requires mmcloughlin/avo#121

ab87e9f

Untested.

mmcloughlin marked this pull request as ready for review January 20, 2020 00:12

mmcloughlin force-pushed the regalloc-refactor branch from 15142b7 to de139fc Compare January 20, 2020 06:19

mmcloughlin added a commit that referenced this pull request Jan 20, 2020

printer: include liveness in debug output

06c05ad

Updates #6 #100 #121

mmcloughlin added a commit that referenced this pull request Jan 20, 2020

tests/alloc/upper32: test for 32-bit operands

69a4930

Test to ensure that `avo` correctly accounts for zero-extension of 32-bit operands to 64 bits. Updates #121

mmcloughlin added a commit that referenced this pull request Jan 21, 2020

pass: ZeroExtend32BitOutputs

6a6a813

Updates #121

mmcloughlin added a commit that referenced this pull request Jan 21, 2020

tests/alloc/upper32: test for 32-bit operands

e4203cf

Test to ensure that `avo` correctly accounts for zero-extension of 32-bit operands to 64 bits. Updates #121

mmcloughlin added a commit that referenced this pull request Jan 21, 2020

pass: ZeroExtend32BitOutputs

bd3ac12

Updates #121

mmcloughlin mentioned this pull request Jan 21, 2020

pass: register allocation failure involving X0, Y0 #65

Open

mmcloughlin added 11 commits January 22, 2020 20:23

reg: register IDs

6d86634

Provides register IDs. These will be used to help unify handling of physical/virtual registers in liveness and allocation phases. Updates #100

reg: MaskSet instead of Set

45f3337

For liveness and allocation we actually want to know which parts of each register are live at each time. The `reg.Set` structure was not correct. This diff replaces it with `reg.MaskSet`, mapping a register ID to the active mask. Updates #100

pass,reg: refactor allocator

43acf85

Refactors the register allocator to be based on IDs, with masks applied later. This is just the result of hacking and slashing; it builds but has not been tested at all. Updates #100

reg,pass: make it work on examples

83def08

Refactored allocator works on the ./examples/... generators. Updates #100

tests/fixedbugs/issue100: issue #100 reproducer

0b9b6ca

Small example that triggers the aliasing problem. The refactored allocator does not have the same problem. Updates #100

tests/alloc/masks: test allocator mask handling

7fdf98d

Create a synthentic test case to verify correct handling of masking in liveness analysis and the allocator. Updates #100

reg: fix/remove tests

60b98cb

pass: fix unit tests

8556567

lint

8913c44

reg: rename id to idx

f401d22

mmcloughlin added 12 commits January 22, 2020 20:23

tests/fixedbugs/issue100: package doc

3aee513

tests/alloc/upper32: test for 32-bit operands

b3461a8

Test to ensure that `avo` correctly accounts for zero-extension of 32-bit operands to 64 bits. Updates #121

tests/alloc/upper32: make it actually fail

4c0a4aa

By writing to all the registers early on, we ensure the allocation will actually fail.

pass: ZeroExtend32BitOutputs

7142b5f

Updates #121

reg: fix test of As*() methods

68daa41

pass: remove debug print

b01718a

wip

d4c787e

fix bad merge

cf6b89c

wip

3d11825

tests/fixedbugs/issue65: regression test

f0d5f19

Test for casting a physical register. Updates #65

reg: test As*() preserves interfaces

95f98a9

pass: fix TestZeroExtend32BitOutputs

d8b7ea5

mmcloughlin force-pushed the regalloc-refactor branch from a27b743 to d8b7ea5 Compare January 23, 2020 06:10

mmcloughlin added 2 commits January 22, 2020 22:26

reg: ID tests

c8a7ab7

reg: basic Lookup*() tests

8c19550

mmcloughlin merged commit f40d602 into master Jan 23, 2020

This was referenced Jan 23, 2020

pass: unexpected register allocation failure #100

Closed

pass: handle upper half zeroing in vector registers #128

Open

mmcloughlin deleted the regalloc-refactor branch January 27, 2020 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reg,pass: refactor allocation of aliased registers #121

reg,pass: refactor allocation of aliased registers #121

mmcloughlin commented Jan 19, 2020

klauspost commented Jan 19, 2020

codecov-io commented Jan 20, 2020 •

edited

klauspost commented Jan 20, 2020

klauspost commented Jan 20, 2020 •

edited

mmcloughlin commented Jan 20, 2020

klauspost commented Jan 20, 2020

mmcloughlin commented Jan 20, 2020

klauspost commented Jan 20, 2020

mmcloughlin commented Jan 20, 2020

mmcloughlin commented Jan 21, 2020

reg,pass: refactor allocation of aliased registers #121

reg,pass: refactor allocation of aliased registers #121

Conversation

mmcloughlin commented Jan 19, 2020

klauspost commented Jan 19, 2020

codecov-io commented Jan 20, 2020 • edited

Codecov Report

klauspost commented Jan 20, 2020

klauspost commented Jan 20, 2020 • edited

mmcloughlin commented Jan 20, 2020

klauspost commented Jan 20, 2020

mmcloughlin commented Jan 20, 2020

klauspost commented Jan 20, 2020

mmcloughlin commented Jan 20, 2020

mmcloughlin commented Jan 21, 2020

codecov-io commented Jan 20, 2020 •

edited

klauspost commented Jan 20, 2020 •

edited