-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RegAlloc] Remove default restriction on non-trivial rematerialization #159211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've attached the exact results for each test below. AArch64 llvm-test-suite
RISC-V llvm-test-suite
x86_64 llvm-test-suite
RISC-V SPEC CPU 2017
x86_64 SPEC CPU 2017
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think this is generally the right direction, and am very supportive of the work.
However, I want to suggest a code structure change. I think we need to split the current isTriviallyReMaterializable into two. Version 1 keeps the current behavior. Version 2 explicitly allows the virtual regs, and the caller takes the responsibility for checking liveness.
As was discussed in the old phabricator review, I think there's an important difference between "we know this instruction is going to be materializeable in the future", and "we know this instruction is rematerializeable right now". The later question gets to use a a lot more information.
The tricky bit is that I think we already have this distinction in the current code, and just don't realize it. Several of the backends (AMDGPU, RISC-V for vector ops) already allow rematerialization of instructions with live virt regs!
My suggestion would be something along the line of removing Trivially from the name of isTriviallyReMaterializable, and instead pass a boolean argument named "DisallowVRegUses". Most callers pass true, with the one in LRE passing false.
This also allows targets to "opt in" to the new behavior. Benchmarks that want to keep the old behavior could unconditionally pass true to the generic implementation in their target hook.
#160153 is a starting point on the alternative approach I was suggesting in my last comment. |
Just to note, a bunch of changes have gone in with the goal of making this change more straight forward. An API reorg (#160377) made it easier to audit the call sites, and their expectations. We've been auditing call sites one by one to try and figure out the expected behavior between trivial and non-trivial remat (since we actually have both already, just in a much less aggressive form) We had two heuristic changes which were prerequisites for this change, both have now landed: At this point, I think we're ready to rebase this, double check the perf impact again, and then move forward with landing this in the next few days. We'll need to audit the remaining callsites for non-trivial remat one more time, and maybe we'll find another blocker, but at the moment, I don't know of any. Luke, when you rebase, please make sure to adjust the framing on the review description. As we've discussed, this isn't actually introducing the concept of non-trivial remat - we had two backends abusing the prior APIs to achieve this - it's "simply" greatly increasing how aggressive we are about non-trivial remat by default. Framing it that was should make it easier to understand for later readers. |
b51bc96
to
608eabb
Compare
Stacked on llvm#159180. Unless overridden by the target, we currently only allow rematerlization of instructions with immediate or constant physical register operands, i.e. no virtual registers. The comment states that this is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. This patch relaxes this constraint which reduces a significant amount of reloads across various targets. This is another attempt at https://reviews.llvm.org/D106408, but llvm#159180 aims to have addressed the issue with the weights that may have caused the previous regressions.
608eabb
to
6c47182
Compare
This should be ready for review now, I've rebased and rerun the results on rva23u64 -O3 and arm64-apple-darwin -O3 and there was virtually no change ( < 0.1%) to the previous results in number of registers spilled/reloaded. I've also updated the PR description to clarify that we actually previously had non-trivial remat, and to mention the other work that went into untangling the API. |
@llvm/pr-subscribers-backend-systemz @llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesIn the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However #160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and #160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets.
I wasn't able to build SPEC CPU 2017 on arm64-apple-darwin due to incompatibilities with the macOS SDK headers. This also allows us to rematerialize loads and stores on RISC-V in a future patch. Patch is 218.03 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159211.diff 28 Files Affected:
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 2f3b7a2c8fcdf..3c41bbeb4b327 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -1657,12 +1657,6 @@ bool TargetInstrInfo::isReMaterializableImpl(
// same virtual register, though.
if (MO.isDef() && Reg != DefReg)
return false;
-
- // Don't allow any virtual-register uses. Rematting an instruction with
- // virtual register uses would length the live ranges of the uses, which
- // is not necessarily a good idea, certainly not "trivial".
- if (MO.isUse())
- return false;
}
// Everything checked out.
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll b/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll
index ed68723e470a2..41f7ab89094ad 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll
@@ -1219,14 +1219,14 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
;
; GISEL-LABEL: test_shl_i1024:
; GISEL: ; %bb.0: ; %entry
-; GISEL-NEXT: sub sp, sp, #416
-; GISEL-NEXT: stp x28, x27, [sp, #320] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x26, x25, [sp, #336] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x24, x23, [sp, #352] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x22, x21, [sp, #368] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x20, x19, [sp, #384] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x29, x30, [sp, #400] ; 16-byte Folded Spill
-; GISEL-NEXT: .cfi_def_cfa_offset 416
+; GISEL-NEXT: sub sp, sp, #432
+; GISEL-NEXT: stp x28, x27, [sp, #336] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x26, x25, [sp, #352] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x24, x23, [sp, #368] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x22, x21, [sp, #384] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x20, x19, [sp, #400] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x29, x30, [sp, #416] ; 16-byte Folded Spill
+; GISEL-NEXT: .cfi_def_cfa_offset 432
; GISEL-NEXT: .cfi_offset w30, -8
; GISEL-NEXT: .cfi_offset w29, -16
; GISEL-NEXT: .cfi_offset w19, -24
@@ -1242,38 +1242,44 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: ldp x10, x11, [x1]
; GISEL-NEXT: mov w8, w2
; GISEL-NEXT: lsr x9, x8, #6
-; GISEL-NEXT: and x16, x8, #0x3f
+; GISEL-NEXT: and x12, x8, #0x3f
+; GISEL-NEXT: str x0, [sp, #144] ; 8-byte Folded Spill
+; GISEL-NEXT: and x14, x8, #0x3f
; GISEL-NEXT: mov w13, #64 ; =0x40
-; GISEL-NEXT: sub x21, x13, x16
-; GISEL-NEXT: str x0, [sp, #112] ; 8-byte Folded Spill
-; GISEL-NEXT: mov x24, x16
-; GISEL-NEXT: lsl x25, x10, x16
+; GISEL-NEXT: and x16, x8, #0x3f
+; GISEL-NEXT: lsl x0, x10, x12
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: lsr x26, x10, x21
-; GISEL-NEXT: lsl x2, x11, x16
-; GISEL-NEXT: lsr x23, x11, x21
-; GISEL-NEXT: mov x22, x21
-; GISEL-NEXT: csel x12, x25, xzr, eq
+; GISEL-NEXT: sub x2, x13, x14
+; GISEL-NEXT: lsr x3, x10, x2
+; GISEL-NEXT: lsl x6, x11, x14
+; GISEL-NEXT: and x14, x8, #0x3f
+; GISEL-NEXT: csel x12, x0, xzr, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: str x1, [sp, #312] ; 8-byte Folded Spill
+; GISEL-NEXT: lsr x20, x11, x2
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: str x23, [sp, #208] ; 8-byte Folded Spill
+; GISEL-NEXT: mov x24, x0
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: stp x24, x22, [sp, #40] ; 16-byte Folded Spill
+; GISEL-NEXT: mov x7, x3
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #4
+; GISEL-NEXT: mov x28, x1
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #5
+; GISEL-NEXT: and x21, x8, #0x3f
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #6
+; GISEL-NEXT: str x6, [sp, #24] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #7
+; GISEL-NEXT: str x28, [sp, #304] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #8
+; GISEL-NEXT: str x7, [sp, #272] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #9
+; GISEL-NEXT: str x20, [sp, #112] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #10
; GISEL-NEXT: csel x12, xzr, x12, eq
@@ -1290,13 +1296,13 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x10, x10, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x10, [sp, #192] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x10, xzr, x26, eq
+; GISEL-NEXT: str x10, [sp, #232] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x10, xzr, x3, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x10, x2, x10
+; GISEL-NEXT: orr x10, x6, x10
; GISEL-NEXT: csel x10, x10, xzr, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: csel x10, x25, x10, eq
+; GISEL-NEXT: csel x10, x0, x10, eq
; GISEL-NEXT: cmp x9, #2
; GISEL-NEXT: csel x10, xzr, x10, eq
; GISEL-NEXT: cmp x9, #3
@@ -1327,25 +1333,24 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x9, #15
; GISEL-NEXT: csel x13, xzr, x13, eq
; GISEL-NEXT: cmp x8, #0
-; GISEL-NEXT: lsl x20, x12, x16
+; GISEL-NEXT: lsl x26, x12, x14
; GISEL-NEXT: csel x11, x11, x13, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #184] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x11, xzr, x23, eq
+; GISEL-NEXT: str x11, [sp, #224] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x11, xzr, x20, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x20, x11
-; GISEL-NEXT: lsr x15, x12, x21
-; GISEL-NEXT: lsl x14, x10, x16
+; GISEL-NEXT: orr x11, x26, x11
+; GISEL-NEXT: lsr x15, x12, x2
+; GISEL-NEXT: lsl x30, x10, x16
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: lsr x17, x10, x21
-; GISEL-NEXT: csel x13, xzr, x26, eq
+; GISEL-NEXT: lsr x17, x10, x2
+; GISEL-NEXT: csel x13, xzr, x3, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: str x20, [sp, #8] ; 8-byte Folded Spill
-; GISEL-NEXT: orr x13, x2, x13
+; GISEL-NEXT: orr x13, x6, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: csel x11, x25, x11, eq
+; GISEL-NEXT: csel x11, x0, x11, eq
; GISEL-NEXT: cmp x9, #3
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #4
@@ -1375,23 +1380,23 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #176] ; 8-byte Folded Spill
+; GISEL-NEXT: str x11, [sp, #216] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, xzr, x15, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x14, x11
+; GISEL-NEXT: orr x11, x30, x11
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x23, eq
+; GISEL-NEXT: csel x12, xzr, x20, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: orr x12, x20, x12
+; GISEL-NEXT: orr x12, x26, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x26, eq
+; GISEL-NEXT: csel x12, xzr, x3, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: orr x12, x2, x12
+; GISEL-NEXT: orr x12, x6, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: csel x11, x25, x11, eq
+; GISEL-NEXT: csel x11, x0, x11, eq
; GISEL-NEXT: cmp x9, #4
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #5
@@ -1421,33 +1426,33 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: lsl x0, x12, x16
; GISEL-NEXT: csel x10, x10, x13, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x10, [sp, #168] ; 8-byte Folded Spill
+; GISEL-NEXT: str x10, [sp, #208] ; 8-byte Folded Spill
; GISEL-NEXT: csel x10, xzr, x17, eq
; GISEL-NEXT: cmp x9, #0
; GISEL-NEXT: orr x10, x0, x10
-; GISEL-NEXT: lsr x27, x12, x21
+; GISEL-NEXT: lsr x4, x12, x2
; GISEL-NEXT: lsl x19, x11, x16
; GISEL-NEXT: csel x10, x10, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: lsr x3, x11, x21
+; GISEL-NEXT: mov x16, x15
; GISEL-NEXT: csel x13, xzr, x15, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: stp x27, x0, [sp, #240] ; 16-byte Folded Spill
-; GISEL-NEXT: orr x13, x14, x13
-; GISEL-NEXT: mov x7, x3
+; GISEL-NEXT: str x4, [sp, #248] ; 8-byte Folded Spill
+; GISEL-NEXT: orr x13, x30, x13
+; GISEL-NEXT: str x0, [sp, #48] ; 8-byte Folded Spill
; GISEL-NEXT: csel x10, x13, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x23, eq
+; GISEL-NEXT: csel x13, xzr, x20, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: orr x13, x20, x13
+; GISEL-NEXT: orr x13, x26, x13
; GISEL-NEXT: csel x10, x13, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x26, eq
+; GISEL-NEXT: csel x13, xzr, x3, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: orr x13, x2, x13
+; GISEL-NEXT: orr x13, x6, x13
; GISEL-NEXT: csel x10, x13, x10, eq
; GISEL-NEXT: cmp x9, #4
-; GISEL-NEXT: csel x10, x25, x10, eq
+; GISEL-NEXT: csel x10, x24, x10, eq
; GISEL-NEXT: cmp x9, #5
; GISEL-NEXT: csel x10, xzr, x10, eq
; GISEL-NEXT: cmp x9, #6
@@ -1473,8 +1478,8 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x10, x12, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x10, [sp, #160] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x10, xzr, x27, eq
+; GISEL-NEXT: str x10, [sp, #200] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x10, xzr, x4, eq
; GISEL-NEXT: cmp x9, #0
; GISEL-NEXT: orr x10, x19, x10
; GISEL-NEXT: csel x10, x10, xzr, eq
@@ -1486,20 +1491,22 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x12, xzr, x15, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: orr x12, x14, x12
+; GISEL-NEXT: and x15, x8, #0x3f
+; GISEL-NEXT: orr x12, x30, x12
; GISEL-NEXT: csel x10, x12, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x23, eq
+; GISEL-NEXT: csel x12, xzr, x20, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: orr x12, x20, x12
+; GISEL-NEXT: orr x12, x26, x12
; GISEL-NEXT: csel x10, x12, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x26, eq
+; GISEL-NEXT: csel x12, xzr, x3, eq
; GISEL-NEXT: cmp x9, #4
-; GISEL-NEXT: orr x12, x2, x12
+; GISEL-NEXT: lsr x3, x11, x2
+; GISEL-NEXT: orr x12, x6, x12
; GISEL-NEXT: csel x10, x12, x10, eq
; GISEL-NEXT: cmp x9, #5
-; GISEL-NEXT: csel x10, x25, x10, eq
+; GISEL-NEXT: csel x10, x24, x10, eq
; GISEL-NEXT: cmp x9, #6
; GISEL-NEXT: csel x10, xzr, x10, eq
; GISEL-NEXT: cmp x9, #7
@@ -1522,21 +1529,23 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x9, #15
; GISEL-NEXT: csel x13, xzr, x13, eq
; GISEL-NEXT: cmp x8, #0
-; GISEL-NEXT: lsl x4, x12, x16
+; GISEL-NEXT: lsl x22, x12, x15
; GISEL-NEXT: csel x11, x11, x13, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #152] ; 8-byte Folded Spill
+; GISEL-NEXT: str x11, [sp, #192] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, xzr, x3, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x4, x11
-; GISEL-NEXT: lsl x30, x10, x16
-; GISEL-NEXT: lsr x28, x10, x21
+; GISEL-NEXT: orr x11, x22, x11
+; GISEL-NEXT: lsl x5, x10, x15
+; GISEL-NEXT: lsr x27, x10, x2
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x27, eq
+; GISEL-NEXT: csel x13, xzr, x4, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: str x30, [sp, #200] ; 8-byte Folded Spill
+; GISEL-NEXT: mov x25, x27
; GISEL-NEXT: orr x13, x19, x13
+; GISEL-NEXT: mov x14, x5
+; GISEL-NEXT: str x27, [sp, #328] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x13, xzr, x17, eq
@@ -1544,30 +1553,29 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: orr x13, x0, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x15, eq
+; GISEL-NEXT: csel x13, xzr, x16, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: orr x13, x14, x13
+; GISEL-NEXT: orr x13, x30, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x23, eq
+; GISEL-NEXT: csel x13, xzr, x20, eq
; GISEL-NEXT: cmp x9, #4
-; GISEL-NEXT: orr x13, x20, x13
+; GISEL-NEXT: orr x13, x26, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x26, eq
+; GISEL-NEXT: csel x13, xzr, x7, eq
; GISEL-NEXT: cmp x9, #5
-; GISEL-NEXT: orr x13, x2, x13
+; GISEL-NEXT: orr x13, x6, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: cmp x9, #6
-; GISEL-NEXT: lsr x13, x12, x21
-; GISEL-NEXT: csel x11, x25, x11, eq
+; GISEL-NEXT: lsr x13, x12, x2
+; GISEL-NEXT: csel x11, x24, x11, eq
; GISEL-NEXT: cmp x9, #7
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #8
-; GISEL-NEXT: mov x6, x13
+; GISEL-NEXT: mov x15, x13
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #9
-; GISEL-NEXT: str x6, [sp, #256] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #10
; GISEL-NEXT: csel x11, xzr, x11, eq
@@ -1584,18 +1592,18 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #144] ; 8-byte Folded Spill
+; GISEL-NEXT: str x11, [sp, #184] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, xzr, x13, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x30, x11
+; GISEL-NEXT: orr x11, x5, x11
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x12, xzr, x3, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: orr x12, x4, x12
+; GISEL-NEXT: orr x12, x22, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x27, eq
+; GISEL-NEXT: csel x12, xzr, x4, eq
; GISEL-NEXT: cmp x9, #2
; GISEL-NEXT: orr x12, x19, x12
; GISEL-NEXT: csel x11, x12, x11, eq
@@ -1605,22 +1613,22 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: orr x12, x0, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x15, eq
+; GISEL-NEXT: csel x12, xzr, x16, eq
; GISEL-NEXT: cmp x9, #4
-; GISEL-NEXT: orr x12, x14, x12
+; GISEL-NEXT: orr x12, x30, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x23, eq
+; GISEL-NEXT: csel x12, xzr, x20, eq
; GISEL-NEXT: cmp x9, #5
-; GISEL-NEXT: orr x12, x20, x12
+; GISEL-NEXT: orr x12, x26, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x26, eq
+; GISEL-NEXT: csel x12, xzr, x7, eq
; GISEL-NEXT: cmp x9, #6
-; GISEL-NEXT: orr x12, x2, x12
+; GISEL-NEXT: orr x12, x6, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: cmp x9, #7
-; GISEL-NEXT: csel x11, x25, x11, eq
+; GISEL-NEXT: csel x11, x24, x11, eq
; GISEL-NEXT: cmp x9, #8
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #9
@@ -1635,39 +1643,34 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #14
; GISEL-NEXT: csel x12, xzr, x11, eq
-; GISEL-NEXT: ldp x11, x5, [x1, #64]
+; GISEL-NEXT: ldp x11, x1, [x1, #64]
; GISEL-NEXT: cmp x9, #15
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x12, x10, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: lsl x21, x11, x16
-; GISEL-NEXT: str x12, [sp, #136] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x12, xzr, x28, eq
+; GISEL-NEXT: lsl x23, x11, x21
+; GISEL-NEXT: str x12, [sp, #176] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x12, xzr, x27, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x12, x21, x12
-; GISEL-NEXT: lsr x10, x11, x22
-; GISEL-NEXT: mov x16, x19
+; GISEL-NEXT: orr x12, x23, x12
+; GISEL-NEXT: lsr x21, x11, x2
+; GISEL-NEXT: str x23, [sp, #288] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, x12, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: mov x1, x16
; GISEL-NEXT: csel x13, xzr, x13, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: str x16, [sp, #304] ; 8-byte Folded Spill
-; GISEL-NEXT: orr x13, x30, x13
+; GISEL-NEXT: orr x13, x5, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x13, xzr, x3, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: lsl x3, x5, x24
-; GISEL-NEXT: orr x13, x4, x13
+; GISEL-NEXT: orr x13, x22, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: stp x21, x3, [sp, #216] ; 16-byte Folded Spill
-; GISEL-NEXT: csel x13, xzr, x27, eq
+; GISEL-NEXT: csel x13, xzr, x4, eq
; GISEL-NEXT: cmp x9, #3
; GISEL-NEXT: orr x13, x19, x13
-; GISEL-NEXT: mov x19, x28
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x13, xzr, x17, eq
@@ -1675,27 +1678,30 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: orr x13, x0, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x15, eq
+; GISEL-NEXT: csel x13, xzr, x16, eq
; GISEL-NEXT: cmp x9, #5
-; GISEL-NEXT: orr x13, x14, x13
+; GISEL-NEXT: orr x13, x30, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x23, eq
+; GISEL-NEXT: csel x13, xzr, x20, eq
; GISEL-NEXT: cmp x9, #6
-; GISEL-NEXT: orr x13, x20, x13
+; GISEL-NEXT: orr x13, x26, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x26, eq
+; GISEL-NEXT: csel x13, xzr, x7, eq
; GISEL-NEXT: cmp x9, #7
-; GISEL-NEXT: orr x13, x2, x13
+; GISEL-NEXT: orr x13, x6, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: cmp x9, #8
-; GISEL-NEXT: csel x12, x25, x12, eq
+; GISEL-NEXT: and x13, x8, #0x3f
+; GISEL-NEXT: csel x12, x24, x12, eq
; GISEL-NEXT: cmp x9, #9
+; GISEL-NEXT: lsl x10, x1, x13
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #10
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #11
+; GISEL-NEXT: stp x10, x15, [sp, #312] ; 16-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #12
; GISEL-NEXT: csel x12, xzr, x12, eq
@@ -1708,69 +1714,69 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x11, x11, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #128] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x11, xzr, x10, eq
+; GISEL-NEXT: str x11, [sp, #168] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x11, xzr, x21, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x3, x11
+; GISEL-NEXT: orr x11, x10, x11
+; GISEL-NEXT: mov x10, x23
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x28, eq
+; GISEL-NEXT: csel x12, xzr, x27, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: mov x28, x4
-; GISEL-NEXT: orr x12, x21, x12
-; GISEL-NEXT: str x28, [sp, #32] ; 8-byte Folded Spill
+; GISEL-NEXT: mov x27, x24
+; GISEL-NEXT: orr x12, x23, x12
+; GISEL...
[truncated]
|
@llvm/pr-subscribers-backend-x86 Author: Luke Lau (lukel97) ChangesIn the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However #160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and #160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets.
I wasn't able to build SPEC CPU 2017 on arm64-apple-darwin due to incompatibilities with the macOS SDK headers. This also allows us to rematerialize loads and stores on RISC-V in a future patch. Patch is 218.03 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159211.diff 28 Files Affected:
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 2f3b7a2c8fcdf..3c41bbeb4b327 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -1657,12 +1657,6 @@ bool TargetInstrInfo::isReMaterializableImpl(
// same virtual register, though.
if (MO.isDef() && Reg != DefReg)
return false;
-
- // Don't allow any virtual-register uses. Rematting an instruction with
- // virtual register uses would length the live ranges of the uses, which
- // is not necessarily a good idea, certainly not "trivial".
- if (MO.isUse())
- return false;
}
// Everything checked out.
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll b/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll
index ed68723e470a2..41f7ab89094ad 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll
@@ -1219,14 +1219,14 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
;
; GISEL-LABEL: test_shl_i1024:
; GISEL: ; %bb.0: ; %entry
-; GISEL-NEXT: sub sp, sp, #416
-; GISEL-NEXT: stp x28, x27, [sp, #320] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x26, x25, [sp, #336] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x24, x23, [sp, #352] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x22, x21, [sp, #368] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x20, x19, [sp, #384] ; 16-byte Folded Spill
-; GISEL-NEXT: stp x29, x30, [sp, #400] ; 16-byte Folded Spill
-; GISEL-NEXT: .cfi_def_cfa_offset 416
+; GISEL-NEXT: sub sp, sp, #432
+; GISEL-NEXT: stp x28, x27, [sp, #336] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x26, x25, [sp, #352] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x24, x23, [sp, #368] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x22, x21, [sp, #384] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x20, x19, [sp, #400] ; 16-byte Folded Spill
+; GISEL-NEXT: stp x29, x30, [sp, #416] ; 16-byte Folded Spill
+; GISEL-NEXT: .cfi_def_cfa_offset 432
; GISEL-NEXT: .cfi_offset w30, -8
; GISEL-NEXT: .cfi_offset w29, -16
; GISEL-NEXT: .cfi_offset w19, -24
@@ -1242,38 +1242,44 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: ldp x10, x11, [x1]
; GISEL-NEXT: mov w8, w2
; GISEL-NEXT: lsr x9, x8, #6
-; GISEL-NEXT: and x16, x8, #0x3f
+; GISEL-NEXT: and x12, x8, #0x3f
+; GISEL-NEXT: str x0, [sp, #144] ; 8-byte Folded Spill
+; GISEL-NEXT: and x14, x8, #0x3f
; GISEL-NEXT: mov w13, #64 ; =0x40
-; GISEL-NEXT: sub x21, x13, x16
-; GISEL-NEXT: str x0, [sp, #112] ; 8-byte Folded Spill
-; GISEL-NEXT: mov x24, x16
-; GISEL-NEXT: lsl x25, x10, x16
+; GISEL-NEXT: and x16, x8, #0x3f
+; GISEL-NEXT: lsl x0, x10, x12
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: lsr x26, x10, x21
-; GISEL-NEXT: lsl x2, x11, x16
-; GISEL-NEXT: lsr x23, x11, x21
-; GISEL-NEXT: mov x22, x21
-; GISEL-NEXT: csel x12, x25, xzr, eq
+; GISEL-NEXT: sub x2, x13, x14
+; GISEL-NEXT: lsr x3, x10, x2
+; GISEL-NEXT: lsl x6, x11, x14
+; GISEL-NEXT: and x14, x8, #0x3f
+; GISEL-NEXT: csel x12, x0, xzr, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: str x1, [sp, #312] ; 8-byte Folded Spill
+; GISEL-NEXT: lsr x20, x11, x2
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: str x23, [sp, #208] ; 8-byte Folded Spill
+; GISEL-NEXT: mov x24, x0
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: stp x24, x22, [sp, #40] ; 16-byte Folded Spill
+; GISEL-NEXT: mov x7, x3
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #4
+; GISEL-NEXT: mov x28, x1
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #5
+; GISEL-NEXT: and x21, x8, #0x3f
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #6
+; GISEL-NEXT: str x6, [sp, #24] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #7
+; GISEL-NEXT: str x28, [sp, #304] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #8
+; GISEL-NEXT: str x7, [sp, #272] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #9
+; GISEL-NEXT: str x20, [sp, #112] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #10
; GISEL-NEXT: csel x12, xzr, x12, eq
@@ -1290,13 +1296,13 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x10, x10, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x10, [sp, #192] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x10, xzr, x26, eq
+; GISEL-NEXT: str x10, [sp, #232] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x10, xzr, x3, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x10, x2, x10
+; GISEL-NEXT: orr x10, x6, x10
; GISEL-NEXT: csel x10, x10, xzr, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: csel x10, x25, x10, eq
+; GISEL-NEXT: csel x10, x0, x10, eq
; GISEL-NEXT: cmp x9, #2
; GISEL-NEXT: csel x10, xzr, x10, eq
; GISEL-NEXT: cmp x9, #3
@@ -1327,25 +1333,24 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x9, #15
; GISEL-NEXT: csel x13, xzr, x13, eq
; GISEL-NEXT: cmp x8, #0
-; GISEL-NEXT: lsl x20, x12, x16
+; GISEL-NEXT: lsl x26, x12, x14
; GISEL-NEXT: csel x11, x11, x13, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #184] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x11, xzr, x23, eq
+; GISEL-NEXT: str x11, [sp, #224] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x11, xzr, x20, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x20, x11
-; GISEL-NEXT: lsr x15, x12, x21
-; GISEL-NEXT: lsl x14, x10, x16
+; GISEL-NEXT: orr x11, x26, x11
+; GISEL-NEXT: lsr x15, x12, x2
+; GISEL-NEXT: lsl x30, x10, x16
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: lsr x17, x10, x21
-; GISEL-NEXT: csel x13, xzr, x26, eq
+; GISEL-NEXT: lsr x17, x10, x2
+; GISEL-NEXT: csel x13, xzr, x3, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: str x20, [sp, #8] ; 8-byte Folded Spill
-; GISEL-NEXT: orr x13, x2, x13
+; GISEL-NEXT: orr x13, x6, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: csel x11, x25, x11, eq
+; GISEL-NEXT: csel x11, x0, x11, eq
; GISEL-NEXT: cmp x9, #3
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #4
@@ -1375,23 +1380,23 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #176] ; 8-byte Folded Spill
+; GISEL-NEXT: str x11, [sp, #216] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, xzr, x15, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x14, x11
+; GISEL-NEXT: orr x11, x30, x11
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x23, eq
+; GISEL-NEXT: csel x12, xzr, x20, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: orr x12, x20, x12
+; GISEL-NEXT: orr x12, x26, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x26, eq
+; GISEL-NEXT: csel x12, xzr, x3, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: orr x12, x2, x12
+; GISEL-NEXT: orr x12, x6, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: csel x11, x25, x11, eq
+; GISEL-NEXT: csel x11, x0, x11, eq
; GISEL-NEXT: cmp x9, #4
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #5
@@ -1421,33 +1426,33 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: lsl x0, x12, x16
; GISEL-NEXT: csel x10, x10, x13, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x10, [sp, #168] ; 8-byte Folded Spill
+; GISEL-NEXT: str x10, [sp, #208] ; 8-byte Folded Spill
; GISEL-NEXT: csel x10, xzr, x17, eq
; GISEL-NEXT: cmp x9, #0
; GISEL-NEXT: orr x10, x0, x10
-; GISEL-NEXT: lsr x27, x12, x21
+; GISEL-NEXT: lsr x4, x12, x2
; GISEL-NEXT: lsl x19, x11, x16
; GISEL-NEXT: csel x10, x10, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: lsr x3, x11, x21
+; GISEL-NEXT: mov x16, x15
; GISEL-NEXT: csel x13, xzr, x15, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: stp x27, x0, [sp, #240] ; 16-byte Folded Spill
-; GISEL-NEXT: orr x13, x14, x13
-; GISEL-NEXT: mov x7, x3
+; GISEL-NEXT: str x4, [sp, #248] ; 8-byte Folded Spill
+; GISEL-NEXT: orr x13, x30, x13
+; GISEL-NEXT: str x0, [sp, #48] ; 8-byte Folded Spill
; GISEL-NEXT: csel x10, x13, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x23, eq
+; GISEL-NEXT: csel x13, xzr, x20, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: orr x13, x20, x13
+; GISEL-NEXT: orr x13, x26, x13
; GISEL-NEXT: csel x10, x13, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x26, eq
+; GISEL-NEXT: csel x13, xzr, x3, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: orr x13, x2, x13
+; GISEL-NEXT: orr x13, x6, x13
; GISEL-NEXT: csel x10, x13, x10, eq
; GISEL-NEXT: cmp x9, #4
-; GISEL-NEXT: csel x10, x25, x10, eq
+; GISEL-NEXT: csel x10, x24, x10, eq
; GISEL-NEXT: cmp x9, #5
; GISEL-NEXT: csel x10, xzr, x10, eq
; GISEL-NEXT: cmp x9, #6
@@ -1473,8 +1478,8 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x10, x12, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x10, [sp, #160] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x10, xzr, x27, eq
+; GISEL-NEXT: str x10, [sp, #200] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x10, xzr, x4, eq
; GISEL-NEXT: cmp x9, #0
; GISEL-NEXT: orr x10, x19, x10
; GISEL-NEXT: csel x10, x10, xzr, eq
@@ -1486,20 +1491,22 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x12, xzr, x15, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: orr x12, x14, x12
+; GISEL-NEXT: and x15, x8, #0x3f
+; GISEL-NEXT: orr x12, x30, x12
; GISEL-NEXT: csel x10, x12, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x23, eq
+; GISEL-NEXT: csel x12, xzr, x20, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: orr x12, x20, x12
+; GISEL-NEXT: orr x12, x26, x12
; GISEL-NEXT: csel x10, x12, x10, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x26, eq
+; GISEL-NEXT: csel x12, xzr, x3, eq
; GISEL-NEXT: cmp x9, #4
-; GISEL-NEXT: orr x12, x2, x12
+; GISEL-NEXT: lsr x3, x11, x2
+; GISEL-NEXT: orr x12, x6, x12
; GISEL-NEXT: csel x10, x12, x10, eq
; GISEL-NEXT: cmp x9, #5
-; GISEL-NEXT: csel x10, x25, x10, eq
+; GISEL-NEXT: csel x10, x24, x10, eq
; GISEL-NEXT: cmp x9, #6
; GISEL-NEXT: csel x10, xzr, x10, eq
; GISEL-NEXT: cmp x9, #7
@@ -1522,21 +1529,23 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x9, #15
; GISEL-NEXT: csel x13, xzr, x13, eq
; GISEL-NEXT: cmp x8, #0
-; GISEL-NEXT: lsl x4, x12, x16
+; GISEL-NEXT: lsl x22, x12, x15
; GISEL-NEXT: csel x11, x11, x13, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #152] ; 8-byte Folded Spill
+; GISEL-NEXT: str x11, [sp, #192] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, xzr, x3, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x4, x11
-; GISEL-NEXT: lsl x30, x10, x16
-; GISEL-NEXT: lsr x28, x10, x21
+; GISEL-NEXT: orr x11, x22, x11
+; GISEL-NEXT: lsl x5, x10, x15
+; GISEL-NEXT: lsr x27, x10, x2
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x27, eq
+; GISEL-NEXT: csel x13, xzr, x4, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: str x30, [sp, #200] ; 8-byte Folded Spill
+; GISEL-NEXT: mov x25, x27
; GISEL-NEXT: orr x13, x19, x13
+; GISEL-NEXT: mov x14, x5
+; GISEL-NEXT: str x27, [sp, #328] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x13, xzr, x17, eq
@@ -1544,30 +1553,29 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: orr x13, x0, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x15, eq
+; GISEL-NEXT: csel x13, xzr, x16, eq
; GISEL-NEXT: cmp x9, #3
-; GISEL-NEXT: orr x13, x14, x13
+; GISEL-NEXT: orr x13, x30, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x23, eq
+; GISEL-NEXT: csel x13, xzr, x20, eq
; GISEL-NEXT: cmp x9, #4
-; GISEL-NEXT: orr x13, x20, x13
+; GISEL-NEXT: orr x13, x26, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x26, eq
+; GISEL-NEXT: csel x13, xzr, x7, eq
; GISEL-NEXT: cmp x9, #5
-; GISEL-NEXT: orr x13, x2, x13
+; GISEL-NEXT: orr x13, x6, x13
; GISEL-NEXT: csel x11, x13, x11, eq
; GISEL-NEXT: cmp x9, #6
-; GISEL-NEXT: lsr x13, x12, x21
-; GISEL-NEXT: csel x11, x25, x11, eq
+; GISEL-NEXT: lsr x13, x12, x2
+; GISEL-NEXT: csel x11, x24, x11, eq
; GISEL-NEXT: cmp x9, #7
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #8
-; GISEL-NEXT: mov x6, x13
+; GISEL-NEXT: mov x15, x13
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #9
-; GISEL-NEXT: str x6, [sp, #256] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #10
; GISEL-NEXT: csel x11, xzr, x11, eq
@@ -1584,18 +1592,18 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #144] ; 8-byte Folded Spill
+; GISEL-NEXT: str x11, [sp, #184] ; 8-byte Folded Spill
; GISEL-NEXT: csel x11, xzr, x13, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x30, x11
+; GISEL-NEXT: orr x11, x5, x11
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x12, xzr, x3, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: orr x12, x4, x12
+; GISEL-NEXT: orr x12, x22, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x27, eq
+; GISEL-NEXT: csel x12, xzr, x4, eq
; GISEL-NEXT: cmp x9, #2
; GISEL-NEXT: orr x12, x19, x12
; GISEL-NEXT: csel x11, x12, x11, eq
@@ -1605,22 +1613,22 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: orr x12, x0, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x15, eq
+; GISEL-NEXT: csel x12, xzr, x16, eq
; GISEL-NEXT: cmp x9, #4
-; GISEL-NEXT: orr x12, x14, x12
+; GISEL-NEXT: orr x12, x30, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x23, eq
+; GISEL-NEXT: csel x12, xzr, x20, eq
; GISEL-NEXT: cmp x9, #5
-; GISEL-NEXT: orr x12, x20, x12
+; GISEL-NEXT: orr x12, x26, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x26, eq
+; GISEL-NEXT: csel x12, xzr, x7, eq
; GISEL-NEXT: cmp x9, #6
-; GISEL-NEXT: orr x12, x2, x12
+; GISEL-NEXT: orr x12, x6, x12
; GISEL-NEXT: csel x11, x12, x11, eq
; GISEL-NEXT: cmp x9, #7
-; GISEL-NEXT: csel x11, x25, x11, eq
+; GISEL-NEXT: csel x11, x24, x11, eq
; GISEL-NEXT: cmp x9, #8
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #9
@@ -1635,39 +1643,34 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: csel x11, xzr, x11, eq
; GISEL-NEXT: cmp x9, #14
; GISEL-NEXT: csel x12, xzr, x11, eq
-; GISEL-NEXT: ldp x11, x5, [x1, #64]
+; GISEL-NEXT: ldp x11, x1, [x1, #64]
; GISEL-NEXT: cmp x9, #15
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x12, x10, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: lsl x21, x11, x16
-; GISEL-NEXT: str x12, [sp, #136] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x12, xzr, x28, eq
+; GISEL-NEXT: lsl x23, x11, x21
+; GISEL-NEXT: str x12, [sp, #176] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x12, xzr, x27, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x12, x21, x12
-; GISEL-NEXT: lsr x10, x11, x22
-; GISEL-NEXT: mov x16, x19
+; GISEL-NEXT: orr x12, x23, x12
+; GISEL-NEXT: lsr x21, x11, x2
+; GISEL-NEXT: str x23, [sp, #288] ; 8-byte Folded Spill
; GISEL-NEXT: csel x12, x12, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: mov x1, x16
; GISEL-NEXT: csel x13, xzr, x13, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: str x16, [sp, #304] ; 8-byte Folded Spill
-; GISEL-NEXT: orr x13, x30, x13
+; GISEL-NEXT: orr x13, x5, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x13, xzr, x3, eq
; GISEL-NEXT: cmp x9, #2
-; GISEL-NEXT: lsl x3, x5, x24
-; GISEL-NEXT: orr x13, x4, x13
+; GISEL-NEXT: orr x13, x22, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: stp x21, x3, [sp, #216] ; 16-byte Folded Spill
-; GISEL-NEXT: csel x13, xzr, x27, eq
+; GISEL-NEXT: csel x13, xzr, x4, eq
; GISEL-NEXT: cmp x9, #3
; GISEL-NEXT: orr x13, x19, x13
-; GISEL-NEXT: mov x19, x28
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
; GISEL-NEXT: csel x13, xzr, x17, eq
@@ -1675,27 +1678,30 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: orr x13, x0, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x15, eq
+; GISEL-NEXT: csel x13, xzr, x16, eq
; GISEL-NEXT: cmp x9, #5
-; GISEL-NEXT: orr x13, x14, x13
+; GISEL-NEXT: orr x13, x30, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x23, eq
+; GISEL-NEXT: csel x13, xzr, x20, eq
; GISEL-NEXT: cmp x9, #6
-; GISEL-NEXT: orr x13, x20, x13
+; GISEL-NEXT: orr x13, x26, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x13, xzr, x26, eq
+; GISEL-NEXT: csel x13, xzr, x7, eq
; GISEL-NEXT: cmp x9, #7
-; GISEL-NEXT: orr x13, x2, x13
+; GISEL-NEXT: orr x13, x6, x13
; GISEL-NEXT: csel x12, x13, x12, eq
; GISEL-NEXT: cmp x9, #8
-; GISEL-NEXT: csel x12, x25, x12, eq
+; GISEL-NEXT: and x13, x8, #0x3f
+; GISEL-NEXT: csel x12, x24, x12, eq
; GISEL-NEXT: cmp x9, #9
+; GISEL-NEXT: lsl x10, x1, x13
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #10
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #11
+; GISEL-NEXT: stp x10, x15, [sp, #312] ; 16-byte Folded Spill
; GISEL-NEXT: csel x12, xzr, x12, eq
; GISEL-NEXT: cmp x9, #12
; GISEL-NEXT: csel x12, xzr, x12, eq
@@ -1708,69 +1714,69 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) {
; GISEL-NEXT: cmp x8, #0
; GISEL-NEXT: csel x11, x11, x12, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: str x11, [sp, #128] ; 8-byte Folded Spill
-; GISEL-NEXT: csel x11, xzr, x10, eq
+; GISEL-NEXT: str x11, [sp, #168] ; 8-byte Folded Spill
+; GISEL-NEXT: csel x11, xzr, x21, eq
; GISEL-NEXT: cmp x9, #0
-; GISEL-NEXT: orr x11, x3, x11
+; GISEL-NEXT: orr x11, x10, x11
+; GISEL-NEXT: mov x10, x23
; GISEL-NEXT: csel x11, x11, xzr, eq
; GISEL-NEXT: tst x8, #0x3f
-; GISEL-NEXT: csel x12, xzr, x28, eq
+; GISEL-NEXT: csel x12, xzr, x27, eq
; GISEL-NEXT: cmp x9, #1
-; GISEL-NEXT: mov x28, x4
-; GISEL-NEXT: orr x12, x21, x12
-; GISEL-NEXT: str x28, [sp, #32] ; 8-byte Folded Spill
+; GISEL-NEXT: mov x27, x24
+; GISEL-NEXT: orr x12, x23, x12
+; GISEL...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this now gets rematerialized we need another way of showing that the register pressure is too high, so I copied what was originally done in https://reviews.llvm.org/D106408 to converted it to an MIR test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
A couple of points for later reference:
- As the compile time concern reported against https://reviews.llvm.org/D106408 was never identified (i.e. no reproducer shared), we may see a regression on some workloads after this lands. Please do not revert unless a reproducer is available! I have a rough idea of the possible cause, but need a reproducer to confirm a fix.\
- If for some reason this doesn't stick, we will probably move to enabling this selectively by target. AMDGPU already does this. RISC-V allows a couple specific cases, and we do want this more broadly.
- I have audited the remaining callsites of TII->isReMaterializeable. I think the ones that are left all want the non-trivial behavior; hopefully we didn't miss anything.
- See #161972 for a possible opt-quality improvement. (I wonder if the scheme in eliminateDeadDefs interacts with the compile time point above.)
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/15646 Here is the relevant piece of the build log for the reference
|
In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses.
We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in
TargetTransformInfo::isReMaterializableImpl
. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites.https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations.
However #160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and #160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial.
With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets.
-target riscv64-linux-gnu -march=rva23u64 -O3
-target arm64-apple-darwin -O3
-target x86_64-linux-gnu -O3
-target riscv64-linux-gnu -march=rva23u64 -O3
-target x86_64-linux-gnu -O3
I wasn't able to build SPEC CPU 2017 on arm64-apple-darwin due to incompatibilities with the macOS SDK headers.
This also allows us to rematerialize loads and stores on RISC-V in a future patch.