-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[RegisterCoalescer] Improve register allocation for return values by limiting rematerialization #163047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-aarch64 Author: guan jian (rez5427) ChangesHi, I recently find this case:
clang produce like this:
If this swap a2 and a0, it can remove one line of code for this. I put this case in riscv test. This patch detect when all of the following conditions are met:
In such cases, skip rematerialization in RegisterCoalescer. Patch is 128.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163047.diff 87 Files Affected:
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index ebfea8e5581bf..c29491488f06e 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1326,6 +1326,41 @@ bool RegisterCoalescer::reMaterializeDef(const CoalescerPair &CP,
if (!TII->isAsCheapAsAMove(*DefMI))
return false;
+ // Skip rematerialization for physical registers used as return values within
+ // the same basic block to enable better coalescing.
+ if (DstReg.isPhysical()) {
+ MachineBasicBlock *MBB = CopyMI->getParent();
+ if (DefMI->getParent() == MBB) {
+ // Check if there's already an identical instruction before CopyMI
+ // If so, allow rematerialization to avoid redundant instructions
+ bool FoundCopy = false;
+ for (MachineInstr &MI : *MBB) {
+ if (&MI == CopyMI) {
+ FoundCopy = true;
+ continue;
+ }
+
+ // Before CopyMI: check for duplicate instructions
+ if (!FoundCopy && &MI != DefMI &&
+ MI.isIdenticalTo(*DefMI, MachineInstr::IgnoreDefs)) {
+ break; // Found duplicate, allow rematerialization
+ } else if (FoundCopy) {
+ // After CopyMI: check if used as return register
+ // If the register is redefined, it's not a return register
+ if (MI.modifiesRegister(DstReg, TRI))
+ break;
+
+ // If there's a return instruction that uses this register, skip remat
+ if (MI.isReturn() && MI.readsRegister(DstReg, TRI)) {
+ LLVM_DEBUG(dbgs() << "\tSkip remat for return register: "
+ << printReg(DstReg, TRI) << '\n');
+ return false;
+ }
+ }
+ }
+ }
+ }
+
if (!TII->isReMaterializable(*DefMI))
return false;
diff --git a/llvm/test/CodeGen/AArch64/aarch64_win64cc_vararg.ll b/llvm/test/CodeGen/AArch64/aarch64_win64cc_vararg.ll
index 7d488c9ca2002..ea268ed83f3de 100644
--- a/llvm/test/CodeGen/AArch64/aarch64_win64cc_vararg.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64_win64cc_vararg.ll
@@ -52,9 +52,8 @@ define win64cc ptr @f9(i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5, i64
; CHECK-LABEL: f9:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: str x18, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: add x8, sp, #24
; CHECK-NEXT: add x0, sp, #24
-; CHECK-NEXT: str x8, [sp, #8]
+; CHECK-NEXT: str x0, [sp, #8]
; CHECK-NEXT: ldr x18, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
;
@@ -78,9 +77,8 @@ define win64cc ptr @f8(i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5, i64
; CHECK-LABEL: f8:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: str x18, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: add x8, sp, #16
; CHECK-NEXT: add x0, sp, #16
-; CHECK-NEXT: str x8, [sp, #8]
+; CHECK-NEXT: str x0, [sp, #8]
; CHECK-NEXT: ldr x18, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
;
@@ -104,10 +102,9 @@ define win64cc ptr @f7(i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5, i64
; CHECK-LABEL: f7:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: str x18, [sp, #-32]! // 8-byte Folded Spill
-; CHECK-NEXT: add x8, sp, #24
; CHECK-NEXT: add x0, sp, #24
; CHECK-NEXT: str x7, [sp, #24]
-; CHECK-NEXT: str x8, [sp, #8]
+; CHECK-NEXT: str x0, [sp, #8]
; CHECK-NEXT: ldr x18, [sp], #32 // 8-byte Folded Reload
; CHECK-NEXT: ret
;
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll b/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll
index e18a5f695ba29..98c3071de3ae8 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-copy.ll
@@ -2156,6 +2156,7 @@ define <4 x i16> @concat_vector_v4i16_const() {
; CHECK-LABEL: concat_vector_v4i16_const:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
%r = shufflevector <1 x i16> zeroinitializer, <1 x i16> undef, <4 x i32> zeroinitializer
ret <4 x i16> %r
@@ -2183,6 +2184,7 @@ define <8 x i8> @concat_vector_v8i8_const() {
; CHECK-LABEL: concat_vector_v8i8_const:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
%r = shufflevector <1 x i8> zeroinitializer, <1 x i8> undef, <8 x i32> zeroinitializer
ret <8 x i8> %r
diff --git a/llvm/test/CodeGen/AArch64/arm64-vector-ext.ll b/llvm/test/CodeGen/AArch64/arm64-vector-ext.ll
index 197a385b0e7cb..91f0fbcd5c46b 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vector-ext.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vector-ext.ll
@@ -1,15 +1,16 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
; RUN: llc < %s -mtriple=arm64-eabi -aarch64-neon-syntax=apple | FileCheck %s
-;CHECK: @func30
-;CHECK: movi.4h v1, #1
-;CHECK: and.8b v0, v0, v1
-;CHECK: ushll.4s v0, v0, #0
-;CHECK: str q0, [x0]
-;CHECK: ret
-
%T0_30 = type <4 x i1>
%T1_30 = type <4 x i32>
define void @func30(%T0_30 %v0, ptr %p1) {
+; CHECK-LABEL: func30:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi.4h v1, #1
+; CHECK-NEXT: and.8b v0, v0, v1
+; CHECK-NEXT: ushll.4s v0, v0, #0
+; CHECK-NEXT: str q0, [x0]
+; CHECK-NEXT: ret
%r = zext %T0_30 %v0 to %T1_30
store %T1_30 %r, ptr %p1
ret void
@@ -18,9 +19,11 @@ define void @func30(%T0_30 %v0, ptr %p1) {
; Extend from v1i1 was crashing things (PR20791). Make sure we do something
; sensible instead.
define <1 x i32> @autogen_SD7918() {
-; CHECK-LABEL: autogen_SD7918
-; CHECK: movi.2d v0, #0000000000000000
-; CHECK-NEXT: ret
+; CHECK-LABEL: autogen_SD7918:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi.2d v0, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
+; CHECK-NEXT: ret
%I29 = insertelement <1 x i1> zeroinitializer, i1 false, i32 0
%ZE = zext <1 x i1> %I29 to <1 x i32>
ret <1 x i32> %ZE
diff --git a/llvm/test/CodeGen/AArch64/arm64-vshuffle.ll b/llvm/test/CodeGen/AArch64/arm64-vshuffle.ll
index b225d9a1acaf5..fd0f2433f2c2b 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vshuffle.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vshuffle.ll
@@ -1,9 +1,11 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -mcpu=cyclone | FileCheck %s
define <8 x i1> @test1() {
; CHECK-LABEL: test1:
; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: movi.16b v0, #0
+; CHECK-NEXT: ; kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
entry:
%Shuff = shufflevector <8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,
@@ -58,9 +60,14 @@ bb:
; CHECK: .byte 0 ; 0x0
; CHECK: .byte 0 ; 0x0
define <16 x i1> @test4(ptr %ptr, i32 %v) {
-; CHECK-LABEL: _test4:
-; CHECK: adrp x[[REG3:[0-9]+]], lCPI3_0@PAGE
-; CHECK: ldr q[[REG2:[0-9]+]], [x[[REG3]], lCPI3_0@PAGEOFF]
+; CHECK-LABEL: test4:
+; CHECK: ; %bb.0: ; %bb
+; CHECK-NEXT: Lloh0:
+; CHECK-NEXT: adrp x8, lCPI3_0@PAGE
+; CHECK-NEXT: Lloh1:
+; CHECK-NEXT: ldr q0, [x8, lCPI3_0@PAGEOFF]
+; CHECK-NEXT: ret
+; CHECK-NEXT: .loh AdrpLdr Lloh0, Lloh1
bb:
%Shuff = shufflevector <16 x i1> zeroinitializer,
<16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1,
diff --git a/llvm/test/CodeGen/AArch64/bitcast.ll b/llvm/test/CodeGen/AArch64/bitcast.ll
index 20f19fddf790a..d462d2269f6bc 100644
--- a/llvm/test/CodeGen/AArch64/bitcast.ll
+++ b/llvm/test/CodeGen/AArch64/bitcast.ll
@@ -8,6 +8,7 @@ define <4 x i16> @foo1(<2 x i32> %a) {
; CHECK-SD-LABEL: foo1:
; CHECK-SD: // %bb.0:
; CHECK-SD-NEXT: movi v0.2d, #0000000000000000
+; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: foo1:
@@ -28,6 +29,7 @@ define <4 x i16> @foo2(<2 x i32> %a) {
; CHECK-SD-LABEL: foo2:
; CHECK-SD: // %bb.0:
; CHECK-SD-NEXT: movi v0.2d, #0000000000000000
+; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: foo2:
diff --git a/llvm/test/CodeGen/AArch64/combine-mul.ll b/llvm/test/CodeGen/AArch64/combine-mul.ll
index ff6d1a571a084..5d65b21f902b7 100644
--- a/llvm/test/CodeGen/AArch64/combine-mul.ll
+++ b/llvm/test/CodeGen/AArch64/combine-mul.ll
@@ -18,6 +18,7 @@ define <4 x i1> @PR48683_vec(<4 x i32> %x) {
; CHECK-LABEL: PR48683_vec:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
%a = mul <4 x i32> %x, %x
%b = and <4 x i32> %a, <i32 2, i32 2, i32 2, i32 2>
@@ -29,6 +30,7 @@ define <4 x i1> @PR48683_vec_undef(<4 x i32> %x) {
; CHECK-LABEL: PR48683_vec_undef:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
%a = mul <4 x i32> %x, %x
%b = and <4 x i32> %a, <i32 2, i32 2, i32 2, i32 undef>
diff --git a/llvm/test/CodeGen/AArch64/ext-narrow-index.ll b/llvm/test/CodeGen/AArch64/ext-narrow-index.ll
index f62cfef9baf28..017971df99d6e 100644
--- a/llvm/test/CodeGen/AArch64/ext-narrow-index.ll
+++ b/llvm/test/CodeGen/AArch64/ext-narrow-index.ll
@@ -251,6 +251,7 @@ define <8 x i8> @i8_zero_off22(<16 x i8> %arg1) {
; CHECK-SD-LABEL: i8_zero_off22:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: movi v0.2d, #0000000000000000
+; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-SD-NEXT: ret
;
; CHECK-GISEL-LABEL: i8_zero_off22:
@@ -302,6 +303,7 @@ define <4 x i16> @i16_zero_off8(<8 x i16> %arg1) {
; CHECK-LABEL: i16_zero_off8:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
entry:
%shuffle = shufflevector <8 x i16> %arg1, <8 x i16> zeroinitializer, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
@@ -346,6 +348,7 @@ define <2 x i32> @i32_zero_off4(<4 x i32> %arg1) {
; CHECK-LABEL: i32_zero_off4:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
entry:
%shuffle = shufflevector <4 x i32> %arg1, <4 x i32> zeroinitializer, <2 x i32> <i32 4, i32 5>
diff --git a/llvm/test/CodeGen/AArch64/fast-isel-const-float.ll b/llvm/test/CodeGen/AArch64/fast-isel-const-float.ll
index 4de2c934a672e..fbb71ba1c295f 100644
--- a/llvm/test/CodeGen/AArch64/fast-isel-const-float.ll
+++ b/llvm/test/CodeGen/AArch64/fast-isel-const-float.ll
@@ -9,6 +9,7 @@ define float @select_fp_const() {
; GISEL-LABEL: select_fp_const:
; GISEL: // %bb.0: // %entry
; GISEL-NEXT: movi v0.2s, #79, lsl #24
+; GISEL-NEXT: // kill: def $s0 killed $s0 killed $d0
; GISEL-NEXT: ret
;
; FISEL-LABEL: select_fp_const:
diff --git a/llvm/test/CodeGen/AArch64/movi64_sve.ll b/llvm/test/CodeGen/AArch64/movi64_sve.ll
index 1d4e00d0c3d10..3253b35d77470 100644
--- a/llvm/test/CodeGen/AArch64/movi64_sve.ll
+++ b/llvm/test/CodeGen/AArch64/movi64_sve.ll
@@ -12,6 +12,7 @@ define <2 x i64> @movi_1_v2i64() {
; SVE-LABEL: movi_1_v2i64:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #1 // =0x1
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <2 x i64> splat (i64 1)
}
@@ -26,6 +27,7 @@ define <2 x i64> @movi_127_v2i64() {
; SVE-LABEL: movi_127_v2i64:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #127 // =0x7f
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <2 x i64> splat (i64 127)
}
@@ -40,6 +42,7 @@ define <2 x i64> @movi_m128_v2i64() {
; SVE-LABEL: movi_m128_v2i64:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #-128 // =0xffffffffffffff80
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <2 x i64> splat (i64 -128)
}
@@ -54,6 +57,7 @@ define <2 x i64> @movi_256_v2i64() {
; SVE-LABEL: movi_256_v2i64:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #256 // =0x100
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <2 x i64> splat (i64 256)
}
@@ -68,6 +72,7 @@ define <2 x i64> @movi_32512_v2i64() {
; SVE-LABEL: movi_32512_v2i64:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #32512 // =0x7f00
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <2 x i64> splat (i64 32512)
}
@@ -82,6 +87,7 @@ define <2 x i64> @movi_m32768_v2i64() {
; SVE-LABEL: movi_m32768_v2i64:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #-32768 // =0xffffffffffff8000
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <2 x i64> splat (i64 -32768)
}
@@ -98,6 +104,7 @@ define <4 x i32> @movi_v4i32_1() {
; SVE-LABEL: movi_v4i32_1:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #127 // =0x7f
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <4 x i32> <i32 127, i32 0, i32 127, i32 0>
}
@@ -112,6 +119,7 @@ define <4 x i32> @movi_v4i32_2() {
; SVE-LABEL: movi_v4i32_2:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #32512 // =0x7f00
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <4 x i32> <i32 32512, i32 0, i32 32512, i32 0>
}
@@ -126,6 +134,7 @@ define <8 x i16> @movi_v8i16_1() {
; SVE-LABEL: movi_v8i16_1:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #127 // =0x7f
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <8 x i16> <i16 127, i16 0, i16 0, i16 0, i16 127, i16 0, i16 0, i16 0>
}
@@ -140,6 +149,7 @@ define <8 x i16> @movi_v8i16_2() {
; SVE-LABEL: movi_v8i16_2:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #32512 // =0x7f00
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <8 x i16> <i16 32512, i16 0, i16 0, i16 0, i16 32512, i16 0, i16 0, i16 0>
}
@@ -154,6 +164,7 @@ define <16 x i8> @movi_v16i8_1() {
; SVE-LABEL: movi_v16i8_1:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #127 // =0x7f
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <16 x i8> <i8 127, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 127, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>
}
@@ -168,6 +179,7 @@ define <16 x i8> @movi_v16i8_2() {
; SVE-LABEL: movi_v16i8_2:
; SVE: // %bb.0:
; SVE-NEXT: mov z0.d, #32512 // =0x7f00
+; SVE-NEXT: // kill: def $q0 killed $q0 killed $z0
; SVE-NEXT: ret
ret <16 x i8> <i8 0, i8 127, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 127, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>
}
diff --git a/llvm/test/CodeGen/AArch64/neon-abd.ll b/llvm/test/CodeGen/AArch64/neon-abd.ll
index 314edd2fc81a7..c81438aa2250e 100644
--- a/llvm/test/CodeGen/AArch64/neon-abd.ll
+++ b/llvm/test/CodeGen/AArch64/neon-abd.ll
@@ -525,6 +525,7 @@ define <4 x i16> @combine_sabd_4h_zerosign(<4 x i16> %a, <4 x i16> %b) #0 {
; CHECK-LABEL: combine_sabd_4h_zerosign:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
%a.ext = ashr <4 x i16> %a, <i16 7, i16 8, i16 9, i16 10>
%b.ext = ashr <4 x i16> %b, <i16 11, i16 12, i16 13, i16 14>
diff --git a/llvm/test/CodeGen/AArch64/neon-compare-instructions.ll b/llvm/test/CodeGen/AArch64/neon-compare-instructions.ll
index 11b3b62ec1c8d..47ceeece0a6e5 100644
--- a/llvm/test/CodeGen/AArch64/neon-compare-instructions.ll
+++ b/llvm/test/CodeGen/AArch64/neon-compare-instructions.ll
@@ -2482,6 +2482,7 @@ define <2 x i32> @fcmal2xfloat(<2 x float> %A, <2 x float> %B) {
; CHECK-SD-LABEL: fcmal2xfloat:
; CHECK-SD: // %bb.0:
; CHECK-SD-NEXT: movi v0.2d, #0xffffffffffffffff
+; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: fcmal2xfloat:
@@ -2535,6 +2536,7 @@ define <2 x i32> @fcmnv2xfloat(<2 x float> %A, <2 x float> %B) {
; CHECK-LABEL: fcmnv2xfloat:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
%tmp3 = fcmp false <2 x float> %A, %B
%tmp4 = sext <2 x i1> %tmp3 to <2 x i32>
diff --git a/llvm/test/CodeGen/AArch64/neon-mov.ll b/llvm/test/CodeGen/AArch64/neon-mov.ll
index 5be9394f61b30..4e5b099d62e7f 100644
--- a/llvm/test/CodeGen/AArch64/neon-mov.ll
+++ b/llvm/test/CodeGen/AArch64/neon-mov.ll
@@ -16,6 +16,7 @@ define <8 x i8> @movi8b_0() {
; CHECK-LABEL: movi8b_0:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
ret <8 x i8> zeroinitializer
}
@@ -48,6 +49,7 @@ define <2 x i32> @movi2s_0() {
; CHECK-LABEL: movi2s_0:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
ret <2 x i32> zeroinitializer
}
@@ -417,6 +419,7 @@ define <2 x float> @fmov2s_0() {
; CHECK-LABEL: fmov2s_0:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: ret
ret <2 x float> zeroinitializer
}
diff --git a/llvm/test/CodeGen/AArch64/remat-const-float-simd.ll b/llvm/test/CodeGen/AArch64/remat-const-float-simd.ll
index 2a19d258f1adf..6f1b68dbcd667 100644
--- a/llvm/test/CodeGen/AArch64/remat-const-float-simd.ll
+++ b/llvm/test/CodeGen/AArch64/remat-const-float-simd.ll
@@ -10,6 +10,7 @@ define float @foo() {
; CHECK-NEON-LABEL: foo:
; CHECK-NEON: // %bb.0: // %entry
; CHECK-NEON-NEXT: movi v0.2s, #79, lsl #24
+; CHECK-NEON-NEXT: // kill: def $s0 killed $s0 killed $d0
; CHECK-NEON-NEXT: ret
;
; CHECK-SCALAR-LABEL: foo:
diff --git a/llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll b/llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll
index ebec275c92c52..1bdfac8d6c979 100644
--- a/llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll
+++ b/llvm/test/CodeGen/AArch64/sve-implicit-zero-filling.ll
@@ -195,8 +195,8 @@ define <vscale x 2 x i64> @zero_fill_non_zero_index(<vscale x 2 x i1> %pg, <vsca
define <vscale x 4 x i64> @zero_fill_type_mismatch(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) #0 {
; CHECK-LABEL: zero_fill_type_mismatch:
; CHECK: // %bb.0:
-; CHECK-NEXT: uminv d0, p0, z0.d
; CHECK-NEXT: movi v1.2d, #0000000000000000
+; CHECK-NEXT: uminv d0, p0, z0.d
; CHECK-NEXT: ret
%t1 = call i64 @llvm.aarch64.sve.uminv.nxv2i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a)
%t2 = insertelement <vscale x 4 x i64> zeroinitializer, i64 %t1, i64 0
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-insert-vector-elt.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-insert-vector-elt.ll
index ad00e99b704dd..275d13ebfd949 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-insert-vector-elt.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-insert-vector-elt.ll
@@ -419,6 +419,7 @@ define <1 x i64> @insertelement_v1i64(<1 x i64> %op1) {
; CHECK-LABEL: insertelement_v1i64:
; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.d, #5 // =0x5
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
;
; NONEON-NOSVE-LABEL: insertelement_v1i64:
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
index 37435e35ceabf..9c7a3d5046d0e 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
@@ -39,6 +39,7 @@ define <2 x i64> @fixed_vec_zero_constant() {
; CHECK-LABEL: fixed_vec_zero_constant:
; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.d, #0 // =0x0
+; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
; CHECK-NEXT: ret
;
; NONEON-NOSVE-LABEL: fixed_vec_zero_constant:
@@ -53,6 +54,7 @@ define <2 x double> @fixed_vec_fp_zero_constant() {
; CHECK-LABEL: fixed_vec_fp_zero_constant:
; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.d, #0 // =0x0
+; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
; CHECK-NEXT: ret
;
; NONEON-NOSVE-LABEL: fixed_vec_fp_zero_constant:
diff --git a/llvm/test/CodeGen/AArch64/win64_vararg.ll b/llvm/test/CodeGen/AA...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
edf604d
to
a503ad2
Compare
…limiting rematerialization
@arsenm Can you please review this? |
From the motivation case, do you know why MachineCSE fails to optimize? |
Machine CSE is before this register coalescer, Machine CSE will see something like:
So Machine CSE will not eliminate this pattern. |
IIRC, MachineCSE will try copy propagation. Looks failing in this case. Maybe we should find out why it fails. |
Let me check this. |
I checked MachineCSE.cpp there is a comment on the top:
This requirement is essential because SSA form guarantees that each virtual register is defined exactly once and never overwritten. Consider the following pseudo-MachineIR:
If the machine function were not in SSA form, the pass could not know that $x10 was redefined between %0 and %1. And in the function PerformTrivialCopyPropagation it skip the phisical register by
|
408fa93
to
757fc0a
Compare
✅ With the latest revision this PR passed the undef deprecator. |
af6dcfe
to
9e253cc
Compare
ping |
Hi, I recently find this case:
clang produce like this:
If this swap a2 and a0, it can remove one line of code for this. I put this case in riscv test.
This patch detect when all of the following conditions are met:
In such cases, skip rematerialization in RegisterCoalescer.