-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RISCV] Enable LUi/AUIPC+ADDI/ADDIW reg alloc hint by default #155693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This block of code is currently conditional on the fusions being enabled but as far as I can tell, does no harm to generally enable. The net effect is the generically compiled code runs slightly better on machines with this fusion. The actual motivation is merely to stop confusing myself when I see the sequence in code; the register allocators choice to sometimes blow two registers instead of one is just generally weird, and my eyes spot it when scanning disassembly. (Note that this is just the regalloc hint; the scheduling changes remain conditional, and probably should remain so.)
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesThis block of code is currently conditional on the fusions being enabled but as far as I can tell, does no harm to generally enable. The net effect is the generically compiled code runs slightly better on machines with this fusion. The actual motivation is merely to stop confusing myself when I see the sequence in code; the register allocators choice to sometimes blow two registers instead of one is just generally weird, and my eyes spot it when scanning disassembly. (Note that this is just the regalloc hint; the scheduling changes remain conditional, and probably should remain so.) Patch is 29.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155693.diff 17 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index f3966a55ce7d1..40b641680b2ce 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -966,7 +966,9 @@ bool RISCVRegisterInfo::getRegAllocationHints(
}
}
- // Add a hint if it would allow auipc/lui+addi(w) fusion.
+ // Add a hint if it would allow auipc/lui+addi(w) fusion. We do this even
+ // without the fusions explicitly enabled as the impact is rarely negative
+ // and some cores do implement this fusion.
if ((MI.getOpcode() == RISCV::ADDIW || MI.getOpcode() == RISCV::ADDI) &&
MI.getOperand(1).isReg()) {
const MachineBasicBlock &MBB = *MI.getParent();
@@ -974,9 +976,7 @@ bool RISCVRegisterInfo::getRegAllocationHints(
// Is the previous instruction a LUI or AUIPC that can be fused?
if (I != MBB.begin()) {
I = skipDebugInstructionsBackward(std::prev(I), MBB.begin());
- if (((I->getOpcode() == RISCV::LUI && Subtarget.hasLUIADDIFusion()) ||
- (I->getOpcode() == RISCV::AUIPC &&
- Subtarget.hasAUIPCADDIFusion())) &&
+ if ((I->getOpcode() == RISCV::LUI || I->getOpcode() == RISCV::AUIPC) &&
I->getOperand(0).getReg() == MI.getOperand(1).getReg()) {
if (OpIdx == 0)
tryAddHint(MO, MI.getOperand(1), /*NeedGPRC=*/false);
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll b/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
index afef96db5e290..1ec80a4978699 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
@@ -1155,8 +1155,8 @@ define void @va3_caller() nounwind {
; RV32: # %bb.0:
; RV32-NEXT: addi sp, sp, -16
; RV32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; RV32-NEXT: lui a0, 5
-; RV32-NEXT: addi a3, a0, -480
+; RV32-NEXT: lui a3, 5
+; RV32-NEXT: addi a3, a3, -480
; RV32-NEXT: li a0, 2
; RV32-NEXT: li a1, 1111
; RV32-NEXT: li a2, 0
@@ -1184,8 +1184,8 @@ define void @va3_caller() nounwind {
; RV32-WITHFP-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
; RV32-WITHFP-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
; RV32-WITHFP-NEXT: addi s0, sp, 16
-; RV32-WITHFP-NEXT: lui a0, 5
-; RV32-WITHFP-NEXT: addi a3, a0, -480
+; RV32-WITHFP-NEXT: lui a3, 5
+; RV32-WITHFP-NEXT: addi a3, a3, -480
; RV32-WITHFP-NEXT: li a0, 2
; RV32-WITHFP-NEXT: li a1, 1111
; RV32-WITHFP-NEXT: li a2, 0
diff --git a/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll b/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll
index 530980c13116c..908a12331d1bb 100644
--- a/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll
+++ b/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll
@@ -393,8 +393,8 @@ define i64 @test_cttz_i64(i64 %a) nounwind {
; RV32I-NEXT: # %bb.1: # %cond.false
; RV32I-NEXT: neg a1, a0
; RV32I-NEXT: and a1, a0, a1
-; RV32I-NEXT: lui a2, 30667
-; RV32I-NEXT: addi s2, a2, 1329
+; RV32I-NEXT: lui s2, 30667
+; RV32I-NEXT: addi s2, s2, 1329
; RV32I-NEXT: mv s4, a0
; RV32I-NEXT: mv a0, a1
; RV32I-NEXT: mv a1, s2
@@ -460,8 +460,8 @@ define i64 @test_cttz_i64(i64 %a) nounwind {
; RV32M-NEXT: or a2, a0, a1
; RV32M-NEXT: beqz a2, .LBB3_3
; RV32M-NEXT: # %bb.1: # %cond.false
-; RV32M-NEXT: lui a2, 30667
-; RV32M-NEXT: addi a3, a2, 1329
+; RV32M-NEXT: lui a3, 30667
+; RV32M-NEXT: addi a3, a3, 1329
; RV32M-NEXT: lui a2, %hi(.LCPI3_0)
; RV32M-NEXT: addi a2, a2, %lo(.LCPI3_0)
; RV32M-NEXT: bnez a0, .LBB3_4
@@ -847,8 +847,8 @@ define i64 @test_cttz_i64_zero_undef(i64 %a) nounwind {
; RV32I-NEXT: mv s2, a0
; RV32I-NEXT: neg a0, a0
; RV32I-NEXT: and a0, s2, a0
-; RV32I-NEXT: lui a1, 30667
-; RV32I-NEXT: addi s3, a1, 1329
+; RV32I-NEXT: lui s3, 30667
+; RV32I-NEXT: addi s3, s3, 1329
; RV32I-NEXT: mv a1, s3
; RV32I-NEXT: call __mulsi3
; RV32I-NEXT: mv s0, a0
@@ -900,8 +900,8 @@ define i64 @test_cttz_i64_zero_undef(i64 %a) nounwind {
;
; RV32M-LABEL: test_cttz_i64_zero_undef:
; RV32M: # %bb.0:
-; RV32M-NEXT: lui a2, 30667
-; RV32M-NEXT: addi a3, a2, 1329
+; RV32M-NEXT: lui a3, 30667
+; RV32M-NEXT: addi a3, a3, 1329
; RV32M-NEXT: lui a2, %hi(.LCPI7_0)
; RV32M-NEXT: addi a2, a2, %lo(.LCPI7_0)
; RV32M-NEXT: bnez a0, .LBB7_2
diff --git a/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll b/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll
index a1061fbbbbf02..29de02af09c8f 100644
--- a/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll
+++ b/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll
@@ -43,8 +43,8 @@ define signext i32 @ctz_dereferencing_pointer(ptr %b) nounwind {
; RV32I-NEXT: lw s4, 4(a0)
; RV32I-NEXT: neg a0, s2
; RV32I-NEXT: and a0, s2, a0
-; RV32I-NEXT: lui a1, 30667
-; RV32I-NEXT: addi s1, a1, 1329
+; RV32I-NEXT: lui s1, 30667
+; RV32I-NEXT: addi s1, s1, 1329
; RV32I-NEXT: mv a1, s1
; RV32I-NEXT: call __mulsi3
; RV32I-NEXT: mv s0, a0
@@ -563,8 +563,8 @@ define signext i32 @ctz4(i64 %b) nounwind {
; RV32I-NEXT: mv s0, a0
; RV32I-NEXT: neg a0, a0
; RV32I-NEXT: and a0, s0, a0
-; RV32I-NEXT: lui a1, 30667
-; RV32I-NEXT: addi s3, a1, 1329
+; RV32I-NEXT: lui s3, 30667
+; RV32I-NEXT: addi s3, s3, 1329
; RV32I-NEXT: mv a1, s3
; RV32I-NEXT: call __mulsi3
; RV32I-NEXT: mv s1, a0
diff --git a/llvm/test/CodeGen/RISCV/double-convert.ll b/llvm/test/CodeGen/RISCV/double-convert.ll
index 9c81bc2851347..8124d00e63fa7 100644
--- a/llvm/test/CodeGen/RISCV/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/double-convert.ll
@@ -1691,9 +1691,8 @@ define signext i16 @fcvt_w_s_sat_i16(double %a) nounwind {
; RV32I-NEXT: sw s4, 8(sp) # 4-byte Folded Spill
; RV32I-NEXT: mv s0, a1
; RV32I-NEXT: mv s1, a0
-; RV32I-NEXT: lui a0, 265728
-; RV32I-NEXT: addi a3, a0, -64
-; RV32I-NEXT: mv a0, s1
+; RV32I-NEXT: lui a3, 265728
+; RV32I-NEXT: addi a3, a3, -64
; RV32I-NEXT: li a2, 0
; RV32I-NEXT: call __gtdf2
; RV32I-NEXT: mv s2, a0
diff --git a/llvm/test/CodeGen/RISCV/float-convert.ll b/llvm/test/CodeGen/RISCV/float-convert.ll
index 6e49d479cf0b9..72578193ee4bf 100644
--- a/llvm/test/CodeGen/RISCV/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/float-convert.ll
@@ -1474,8 +1474,8 @@ define signext i16 @fcvt_w_s_sat_i16(float %a) nounwind {
; RV32I-NEXT: # %bb.1: # %start
; RV32I-NEXT: lui s1, 1048568
; RV32I-NEXT: .LBB24_2: # %start
-; RV32I-NEXT: lui a0, 290816
-; RV32I-NEXT: addi a1, a0, -512
+; RV32I-NEXT: lui a1, 290816
+; RV32I-NEXT: addi a1, a1, -512
; RV32I-NEXT: mv a0, s0
; RV32I-NEXT: call __gtsf2
; RV32I-NEXT: blez a0, .LBB24_4
@@ -1516,8 +1516,8 @@ define signext i16 @fcvt_w_s_sat_i16(float %a) nounwind {
; RV64I-NEXT: # %bb.1: # %start
; RV64I-NEXT: lui s1, 1048568
; RV64I-NEXT: .LBB24_2: # %start
-; RV64I-NEXT: lui a0, 290816
-; RV64I-NEXT: addi a1, a0, -512
+; RV64I-NEXT: lui a1, 290816
+; RV64I-NEXT: addi a1, a1, -512
; RV64I-NEXT: mv a0, s0
; RV64I-NEXT: call __gtsf2
; RV64I-NEXT: blez a0, .LBB24_4
@@ -1640,8 +1640,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(float %a) nounwind {
; RV32I-NEXT: mv a0, s2
; RV32I-NEXT: call __fixunssfsi
; RV32I-NEXT: mv s1, a0
-; RV32I-NEXT: lui a0, 292864
-; RV32I-NEXT: addi a1, a0, -256
+; RV32I-NEXT: lui a1, 292864
+; RV32I-NEXT: addi a1, a1, -256
; RV32I-NEXT: mv a0, s2
; RV32I-NEXT: call __gtsf2
; RV32I-NEXT: lui a1, 16
@@ -1677,8 +1677,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(float %a) nounwind {
; RV64I-NEXT: mv a0, s2
; RV64I-NEXT: call __fixunssfdi
; RV64I-NEXT: mv s1, a0
-; RV64I-NEXT: lui a0, 292864
-; RV64I-NEXT: addi a1, a0, -256
+; RV64I-NEXT: lui a1, 292864
+; RV64I-NEXT: addi a1, a1, -256
; RV64I-NEXT: mv a0, s2
; RV64I-NEXT: call __gtsf2
; RV64I-NEXT: lui a1, 16
diff --git a/llvm/test/CodeGen/RISCV/half-convert.ll b/llvm/test/CodeGen/RISCV/half-convert.ll
index 961c6cd78212b..6cebf8b2828bf 100644
--- a/llvm/test/CodeGen/RISCV/half-convert.ll
+++ b/llvm/test/CodeGen/RISCV/half-convert.ll
@@ -328,8 +328,8 @@ define i16 @fcvt_si_h_sat(half %a) nounwind {
; RV32I-NEXT: # %bb.1: # %start
; RV32I-NEXT: lui s1, 1048568
; RV32I-NEXT: .LBB1_2: # %start
-; RV32I-NEXT: lui a0, 290816
-; RV32I-NEXT: addi a1, a0, -512
+; RV32I-NEXT: lui a1, 290816
+; RV32I-NEXT: addi a1, a1, -512
; RV32I-NEXT: mv a0, s0
; RV32I-NEXT: call __gtsf2
; RV32I-NEXT: blez a0, .LBB1_4
@@ -371,8 +371,8 @@ define i16 @fcvt_si_h_sat(half %a) nounwind {
; RV64I-NEXT: # %bb.1: # %start
; RV64I-NEXT: lui s1, 1048568
; RV64I-NEXT: .LBB1_2: # %start
-; RV64I-NEXT: lui a0, 290816
-; RV64I-NEXT: addi a1, a0, -512
+; RV64I-NEXT: lui a1, 290816
+; RV64I-NEXT: addi a1, a1, -512
; RV64I-NEXT: mv a0, s0
; RV64I-NEXT: call __gtsf2
; RV64I-NEXT: blez a0, .LBB1_4
@@ -812,8 +812,8 @@ define i16 @fcvt_ui_h_sat(half %a) nounwind {
; RV32I-NEXT: li a1, 0
; RV32I-NEXT: call __gesf2
; RV32I-NEXT: mv s2, a0
-; RV32I-NEXT: lui a0, 292864
-; RV32I-NEXT: addi a1, a0, -256
+; RV32I-NEXT: lui a1, 292864
+; RV32I-NEXT: addi a1, a1, -256
; RV32I-NEXT: mv a0, s3
; RV32I-NEXT: call __gtsf2
; RV32I-NEXT: bgtz a0, .LBB3_2
@@ -850,8 +850,8 @@ define i16 @fcvt_ui_h_sat(half %a) nounwind {
; RV64I-NEXT: li a1, 0
; RV64I-NEXT: call __gesf2
; RV64I-NEXT: mv s2, a0
-; RV64I-NEXT: lui a0, 292864
-; RV64I-NEXT: addi a1, a0, -256
+; RV64I-NEXT: lui a1, 292864
+; RV64I-NEXT: addi a1, a1, -256
; RV64I-NEXT: mv a0, s3
; RV64I-NEXT: call __gtsf2
; RV64I-NEXT: bgtz a0, .LBB3_2
@@ -6416,8 +6416,8 @@ define signext i16 @fcvt_w_s_sat_i16(half %a) nounwind {
; RV32I-NEXT: # %bb.1: # %start
; RV32I-NEXT: lui s1, 1048568
; RV32I-NEXT: .LBB32_2: # %start
-; RV32I-NEXT: lui a0, 290816
-; RV32I-NEXT: addi a1, a0, -512
+; RV32I-NEXT: lui a1, 290816
+; RV32I-NEXT: addi a1, a1, -512
; RV32I-NEXT: mv a0, s0
; RV32I-NEXT: call __gtsf2
; RV32I-NEXT: blez a0, .LBB32_4
@@ -6461,8 +6461,8 @@ define signext i16 @fcvt_w_s_sat_i16(half %a) nounwind {
; RV64I-NEXT: # %bb.1: # %start
; RV64I-NEXT: lui s1, 1048568
; RV64I-NEXT: .LBB32_2: # %start
-; RV64I-NEXT: lui a0, 290816
-; RV64I-NEXT: addi a1, a0, -512
+; RV64I-NEXT: lui a1, 290816
+; RV64I-NEXT: addi a1, a1, -512
; RV64I-NEXT: mv a0, s0
; RV64I-NEXT: call __gtsf2
; RV64I-NEXT: blez a0, .LBB32_4
@@ -6903,8 +6903,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(half %a) nounwind {
; RV32I-NEXT: li a1, 0
; RV32I-NEXT: call __gesf2
; RV32I-NEXT: mv s1, a0
-; RV32I-NEXT: lui a0, 292864
-; RV32I-NEXT: addi a1, a0, -256
+; RV32I-NEXT: lui a1, 292864
+; RV32I-NEXT: addi a1, a1, -256
; RV32I-NEXT: mv a0, s2
; RV32I-NEXT: call __gtsf2
; RV32I-NEXT: blez a0, .LBB34_2
@@ -6944,8 +6944,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(half %a) nounwind {
; RV64I-NEXT: li a1, 0
; RV64I-NEXT: call __gesf2
; RV64I-NEXT: mv s1, a0
-; RV64I-NEXT: lui a0, 292864
-; RV64I-NEXT: addi a1, a0, -256
+; RV64I-NEXT: lui a1, 292864
+; RV64I-NEXT: addi a1, a1, -256
; RV64I-NEXT: mv a0, s2
; RV64I-NEXT: call __gtsf2
; RV64I-NEXT: blez a0, .LBB34_2
diff --git a/llvm/test/CodeGen/RISCV/imm.ll b/llvm/test/CodeGen/RISCV/imm.ll
index 8c9a9b43952ba..fad51697264f4 100644
--- a/llvm/test/CodeGen/RISCV/imm.ll
+++ b/llvm/test/CodeGen/RISCV/imm.ll
@@ -837,8 +837,8 @@ define i64 @imm64_5() nounwind {
define i64 @imm64_6() nounwind {
; RV32I-LABEL: imm64_6:
; RV32I: # %bb.0:
-; RV32I-NEXT: lui a0, 74565
-; RV32I-NEXT: addi a1, a0, 1656
+; RV32I-NEXT: lui a1, 74565
+; RV32I-NEXT: addi a1, a1, 1656
; RV32I-NEXT: li a0, 0
; RV32I-NEXT: ret
;
@@ -3895,8 +3895,8 @@ define i64 @imm_neg_10307948543() {
define i64 @li_rori_1() {
; RV32I-LABEL: li_rori_1:
; RV32I: # %bb.0:
-; RV32I-NEXT: lui a0, 1048567
-; RV32I-NEXT: addi a1, a0, 2047
+; RV32I-NEXT: lui a1, 1048567
+; RV32I-NEXT: addi a1, a1, 2047
; RV32I-NEXT: li a0, -1
; RV32I-NEXT: ret
;
diff --git a/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll b/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll
index 8deb17582cb11..2a1ba2f77edee 100644
--- a/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll
+++ b/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll
@@ -145,113 +145,59 @@ ret:
@gd = external thread_local global i32
define void @test_la_tls_gd(i32 signext %n) nounwind {
-; RV32NOFUSION-LABEL: test_la_tls_gd:
-; RV32NOFUSION: # %bb.0: # %entry
-; RV32NOFUSION-NEXT: addi sp, sp, -16
-; RV32NOFUSION-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT: sw s1, 4(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT: sw s2, 0(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT: mv s0, a0
-; RV32NOFUSION-NEXT: li s2, 0
-; RV32NOFUSION-NEXT: .Lpcrel_hi3:
-; RV32NOFUSION-NEXT: auipc a0, %tls_gd_pcrel_hi(gd)
-; RV32NOFUSION-NEXT: addi s1, a0, %pcrel_lo(.Lpcrel_hi3)
-; RV32NOFUSION-NEXT: .LBB3_1: # %loop
-; RV32NOFUSION-NEXT: # =>This Inner Loop Header: Depth=1
-; RV32NOFUSION-NEXT: mv a0, s1
-; RV32NOFUSION-NEXT: call __tls_get_addr
-; RV32NOFUSION-NEXT: lw zero, 0(a0)
-; RV32NOFUSION-NEXT: addi s2, s2, 1
-; RV32NOFUSION-NEXT: blt s2, s0, .LBB3_1
-; RV32NOFUSION-NEXT: # %bb.2: # %ret
-; RV32NOFUSION-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT: lw s1, 4(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT: lw s2, 0(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT: addi sp, sp, 16
-; RV32NOFUSION-NEXT: ret
-;
-; RV64NOFUSION-LABEL: test_la_tls_gd:
-; RV64NOFUSION: # %bb.0: # %entry
-; RV64NOFUSION-NEXT: addi sp, sp, -32
-; RV64NOFUSION-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT: sd s1, 8(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT: sd s2, 0(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT: mv s0, a0
-; RV64NOFUSION-NEXT: li s2, 0
-; RV64NOFUSION-NEXT: .Lpcrel_hi3:
-; RV64NOFUSION-NEXT: auipc a0, %tls_gd_pcrel_hi(gd)
-; RV64NOFUSION-NEXT: addi s1, a0, %pcrel_lo(.Lpcrel_hi3)
-; RV64NOFUSION-NEXT: .LBB3_1: # %loop
-; RV64NOFUSION-NEXT: # =>This Inner Loop Header: Depth=1
-; RV64NOFUSION-NEXT: mv a0, s1
-; RV64NOFUSION-NEXT: call __tls_get_addr
-; RV64NOFUSION-NEXT: lw zero, 0(a0)
-; RV64NOFUSION-NEXT: addiw s2, s2, 1
-; RV64NOFUSION-NEXT: blt s2, s0, .LBB3_1
-; RV64NOFUSION-NEXT: # %bb.2: # %ret
-; RV64NOFUSION-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT: ld s1, 8(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT: ld s2, 0(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT: addi sp, sp, 32
-; RV64NOFUSION-NEXT: ret
-;
-; RV32FUSION-LABEL: test_la_tls_gd:
-; RV32FUSION: # %bb.0: # %entry
-; RV32FUSION-NEXT: addi sp, sp, -16
-; RV32FUSION-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT: sw s1, 4(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT: sw s2, 0(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT: mv s0, a0
-; RV32FUSION-NEXT: li s2, 0
-; RV32FUSION-NEXT: .Lpcrel_hi3:
-; RV32FUSION-NEXT: auipc s1, %tls_gd_pcrel_hi(gd)
-; RV32FUSION-NEXT: addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
-; RV32FUSION-NEXT: .LBB3_1: # %loop
-; RV32FUSION-NEXT: # =>This Inner Loop Header: Depth=1
-; RV32FUSION-NEXT: mv a0, s1
-; RV32FUSION-NEXT: call __tls_get_addr
-; RV32FUSION-NEXT: lw zero, 0(a0)
-; RV32FUSION-NEXT: addi s2, s2, 1
-; RV32FUSION-NEXT: blt s2, s0, .LBB3_1
-; RV32FUSION-NEXT: # %bb.2: # %ret
-; RV32FUSION-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT: lw s1, 4(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT: lw s2, 0(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT: addi sp, sp, 16
-; RV32FUSION-NEXT: ret
+; RV32I-LABEL: test_la_tls_gd:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT: addi sp, sp, -16
+; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
+; RV32I-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
+; RV32I-NEXT: sw s1, 4(sp) # 4-byte Folded Spill
+; RV32I-NEXT: sw s2, 0(sp) # 4-byte Folded Spill
+; RV32I-NEXT: mv s0, a0
+; RV32I-NEXT: li s2, 0
+; RV32I-NEXT: .Lpcrel_hi3:
+; RV32I-NEXT: auipc s1, %tls_gd_pcrel_hi(gd)
+; RV32I-NEXT: addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
+; RV32I-NEXT: .LBB3_1: # %loop
+; RV32I-NEXT: # =>This Inner Loop Header: Depth=1
+; RV32I-NEXT: mv a0, s1
+; RV32I-NEXT: call __tls_get_addr
+; RV32I-NEXT: lw zero, 0(a0)
+; RV32I-NEXT: addi s2, s2, 1
+; RV32I-NEXT: blt s2, s0, .LBB3_1
+; RV32I-NEXT: # %bb.2: # %ret
+; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
+; RV32I-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
+; RV32I-NEXT: lw s1, 4(sp) # 4-byte Folded Reload
+; RV32I-NEXT: lw s2, 0(sp) # 4-byte Folded Reload
+; RV32I-NEXT: addi sp, sp, 16
+; RV32I-NEXT: ret
;
-; RV64FUSION-LABEL: test_la_tls_gd:
-; RV64FUSION: # %bb.0: # %entry
-; RV64FUSION-NEXT: addi sp, sp, -32
-; RV64FUSION-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT: sd s1, 8(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT: sd s2, 0(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT: mv s0, a0
-; RV64FUSION-NEXT: li s2, 0
-; RV64FUSION-NEXT: .Lpcrel_hi3:
-; RV64FUSION-NEXT: auipc s1, %tls_gd_pcrel_hi(gd)
-; RV64FUSION-NEXT: addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
-; RV64FUSION-NEXT: .LBB3_1: # %loop
-; RV64FUSION-NEXT: # =>This Inner Loop Header: Depth=1
-; RV64FUSION-NEXT: mv a0, s1
-; RV64FUSION-NEXT: call __tls_get_addr
-; RV64FUSION-NEXT: lw zero, 0(a0)
-; RV64FUSION-NEXT: addiw s2, s2, 1
-; RV64FUSION-NEXT: blt s2, s0, .LBB3_1
-; RV64FUSION-NEXT: # %bb.2: # %ret
-; RV64FUSION-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT: ld s1, 8(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT: ld s2, 0(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT: addi sp, sp, 32
-; RV64FUSION-NEXT: ret
+; RV64I-LABEL: test_la_tls_gd:
+; RV64I: # %bb.0: # %entry
+; RV64I-NEXT: addi sp, sp, -32
+; RV64I-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
+; RV64I-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
+; RV64I-NEXT: sd s1, 8(sp) # 8-byte Folded Spill
+; RV64I-NEXT: sd s2, 0(sp) # 8-byte Folded Spill
+; RV64I-NEXT: mv s0, a0
+; RV64I-NEXT: li s2, 0
+; RV64I-NEXT: .Lpcrel_hi3:
+; RV64I-NEXT: auipc s1, %tls_gd_pcrel_hi(gd)
+; RV64I-NEXT: addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
+; RV64I-NEXT: .LBB3_1: # %loop
+; RV64I-NEXT: # =>This Inner Loop Header: Depth=1
+; RV64I-NEXT: mv a0, s1
+; RV64I-NEXT: call __tls_get_addr
+; RV64I-NEXT: lw zero, 0(a0)
+; RV64I-NEXT: addiw s2, s2, 1
+; RV64I-NEXT: blt s2, s0, .LBB3_1
+; RV64I-NEXT: # %bb.2: # %ret
+; RV64I-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
+; RV64I-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
+; RV64I-NEXT: ld s1, 8(sp) # 8-byte Folded Reload
+; RV64I-NEXT: ld s2, 0(sp) # 8-byte Folded Reload
+; RV64I-NEXT: addi sp, sp, 32
+; RV64I-NEXT: ret
entry:
br label %loop
@@ -265,3 +211,8 @@ loop:
ret:
ret void
}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; RV32FUSION: {{.*}}
+; RV32NOFUSION: {{.*}}
+; RV64FUSION: {{.*}}
+; RV64NOFUSION: {{.*}}
diff --git a/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll b/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll
index 1360a29a3e10f..aa02d46c34550 100644
--- a/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll
+++ b/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll
@@ -238,8 +238,8 @@ define i64 @cttz_i64(i64 %a) nounwind {
; RV32I-NEXT: # %bb.1: # %cond.false
; RV32I-NEXT: neg a1, a0
; RV32I-NEXT: and a1, a0, a1
-; RV32I-NEXT: lui a2, 30667
-; RV32I-NEXT: addi s2, a2, 1329
+; RV32I-NEXT: lui s2, 30667
+; RV32I-NEXT: addi s2, s2, 1329
; RV32I-NEXT: ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Obviously I know this has landed already etc, but out of curiosity I looked at the impact on an rva22u64 llvm-test-suite build. Interestingly there's a minor reduction in static instruction count (
or similarly:
i.e. before the register allocator dirtied |
This block of code is currently conditional on the fusions being enabled but as far as I can tell, does no harm to generally enable. The net effect is the generically compiled code runs slightly better on machines with this fusion.
The actual motivation is merely to stop confusing myself when I see the sequence in code; the register allocators choice to sometimes blow two registers instead of one is just generally weird, and my eyes spot it when scanning disassembly.
(Note that this is just the regalloc hint; the scheduling changes remain conditional, and probably should remain so.)