[RISCV] Enable LUi/AUIPC+ADDI/ADDIW reg alloc hint by default #155693

preames · 2025-08-27T20:53:13Z

This block of code is currently conditional on the fusions being enabled but as far as I can tell, does no harm to generally enable. The net effect is the generically compiled code runs slightly better on machines with this fusion.

The actual motivation is merely to stop confusing myself when I see the sequence in code; the register allocators choice to sometimes blow two registers instead of one is just generally weird, and my eyes spot it when scanning disassembly.

(Note that this is just the regalloc hint; the scheduling changes remain conditional, and probably should remain so.)

This block of code is currently conditional on the fusions being enabled but as far as I can tell, does no harm to generally enable. The net effect is the generically compiled code runs slightly better on machines with this fusion. The actual motivation is merely to stop confusing myself when I see the sequence in code; the register allocators choice to sometimes blow two registers instead of one is just generally weird, and my eyes spot it when scanning disassembly. (Note that this is just the regalloc hint; the scheduling changes remain conditional, and probably should remain so.)

llvmbot · 2025-08-27T20:53:45Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This block of code is currently conditional on the fusions being enabled but as far as I can tell, does no harm to generally enable. The net effect is the generically compiled code runs slightly better on machines with this fusion.

The actual motivation is merely to stop confusing myself when I see the sequence in code; the register allocators choice to sometimes blow two registers instead of one is just generally weird, and my eyes spot it when scanning disassembly.

(Note that this is just the regalloc hint; the scheduling changes remain conditional, and probably should remain so.)

Patch is 29.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155693.diff

17 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (+4-4)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/double-convert.ll (+2-3)
(modified) llvm/test/CodeGen/RISCV/float-convert.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/half-convert.ll (+16-16)
(modified) llvm/test/CodeGen/RISCV/imm.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll (+57-106)
(modified) llvm/test/CodeGen/RISCV/rv32xtheadbb.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rv32zbb.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/shl-cttz.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/srem-vector-lkk.ll (+14-14)
(modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+10-10)
(modified) llvm/test/CodeGen/RISCV/xqccmp-additional-stack.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/xqcibm-cto-clo-brev.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/zcmp-additional-stack.ll (+2-2)

diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index f3966a55ce7d1..40b641680b2ce 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -966,7 +966,9 @@ bool RISCVRegisterInfo::getRegAllocationHints(
       }
     }
 
-    // Add a hint if it would allow auipc/lui+addi(w) fusion.
+    // Add a hint if it would allow auipc/lui+addi(w) fusion.  We do this even
+    // without the fusions explicitly enabled as the impact is rarely negative
+    // and some cores do implement this fusion.
     if ((MI.getOpcode() == RISCV::ADDIW || MI.getOpcode() == RISCV::ADDI) &&
         MI.getOperand(1).isReg()) {
       const MachineBasicBlock &MBB = *MI.getParent();
@@ -974,9 +976,7 @@ bool RISCVRegisterInfo::getRegAllocationHints(
       // Is the previous instruction a LUI or AUIPC that can be fused?
       if (I != MBB.begin()) {
         I = skipDebugInstructionsBackward(std::prev(I), MBB.begin());
-        if (((I->getOpcode() == RISCV::LUI && Subtarget.hasLUIADDIFusion()) ||
-             (I->getOpcode() == RISCV::AUIPC &&
-              Subtarget.hasAUIPCADDIFusion())) &&
+        if ((I->getOpcode() == RISCV::LUI || I->getOpcode() == RISCV::AUIPC) &&
             I->getOperand(0).getReg() == MI.getOperand(1).getReg()) {
           if (OpIdx == 0)
             tryAddHint(MO, MI.getOperand(1), /*NeedGPRC=*/false);
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll b/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
index afef96db5e290..1ec80a4978699 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
@@ -1155,8 +1155,8 @@ define void @va3_caller() nounwind {
 ; RV32:       # %bb.0:
 ; RV32-NEXT:    addi sp, sp, -16
 ; RV32-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32-NEXT:    lui a0, 5
-; RV32-NEXT:    addi a3, a0, -480
+; RV32-NEXT:    lui a3, 5
+; RV32-NEXT:    addi a3, a3, -480
 ; RV32-NEXT:    li a0, 2
 ; RV32-NEXT:    li a1, 1111
 ; RV32-NEXT:    li a2, 0
@@ -1184,8 +1184,8 @@ define void @va3_caller() nounwind {
 ; RV32-WITHFP-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32-WITHFP-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
 ; RV32-WITHFP-NEXT:    addi s0, sp, 16
-; RV32-WITHFP-NEXT:    lui a0, 5
-; RV32-WITHFP-NEXT:    addi a3, a0, -480
+; RV32-WITHFP-NEXT:    lui a3, 5
+; RV32-WITHFP-NEXT:    addi a3, a3, -480
 ; RV32-WITHFP-NEXT:    li a0, 2
 ; RV32-WITHFP-NEXT:    li a1, 1111
 ; RV32-WITHFP-NEXT:    li a2, 0
diff --git a/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll b/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll
index 530980c13116c..908a12331d1bb 100644
--- a/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll
+++ b/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll
@@ -393,8 +393,8 @@ define i64 @test_cttz_i64(i64 %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %cond.false
 ; RV32I-NEXT:    neg a1, a0
 ; RV32I-NEXT:    and a1, a0, a1
-; RV32I-NEXT:    lui a2, 30667
-; RV32I-NEXT:    addi s2, a2, 1329
+; RV32I-NEXT:    lui s2, 30667
+; RV32I-NEXT:    addi s2, s2, 1329
 ; RV32I-NEXT:    mv s4, a0
 ; RV32I-NEXT:    mv a0, a1
 ; RV32I-NEXT:    mv a1, s2
@@ -460,8 +460,8 @@ define i64 @test_cttz_i64(i64 %a) nounwind {
 ; RV32M-NEXT:    or a2, a0, a1
 ; RV32M-NEXT:    beqz a2, .LBB3_3
 ; RV32M-NEXT:  # %bb.1: # %cond.false
-; RV32M-NEXT:    lui a2, 30667
-; RV32M-NEXT:    addi a3, a2, 1329
+; RV32M-NEXT:    lui a3, 30667
+; RV32M-NEXT:    addi a3, a3, 1329
 ; RV32M-NEXT:    lui a2, %hi(.LCPI3_0)
 ; RV32M-NEXT:    addi a2, a2, %lo(.LCPI3_0)
 ; RV32M-NEXT:    bnez a0, .LBB3_4
@@ -847,8 +847,8 @@ define i64 @test_cttz_i64_zero_undef(i64 %a) nounwind {
 ; RV32I-NEXT:    mv s2, a0
 ; RV32I-NEXT:    neg a0, a0
 ; RV32I-NEXT:    and a0, s2, a0
-; RV32I-NEXT:    lui a1, 30667
-; RV32I-NEXT:    addi s3, a1, 1329
+; RV32I-NEXT:    lui s3, 30667
+; RV32I-NEXT:    addi s3, s3, 1329
 ; RV32I-NEXT:    mv a1, s3
 ; RV32I-NEXT:    call __mulsi3
 ; RV32I-NEXT:    mv s0, a0
@@ -900,8 +900,8 @@ define i64 @test_cttz_i64_zero_undef(i64 %a) nounwind {
 ;
 ; RV32M-LABEL: test_cttz_i64_zero_undef:
 ; RV32M:       # %bb.0:
-; RV32M-NEXT:    lui a2, 30667
-; RV32M-NEXT:    addi a3, a2, 1329
+; RV32M-NEXT:    lui a3, 30667
+; RV32M-NEXT:    addi a3, a3, 1329
 ; RV32M-NEXT:    lui a2, %hi(.LCPI7_0)
 ; RV32M-NEXT:    addi a2, a2, %lo(.LCPI7_0)
 ; RV32M-NEXT:    bnez a0, .LBB7_2
diff --git a/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll b/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll
index a1061fbbbbf02..29de02af09c8f 100644
--- a/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll
+++ b/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll
@@ -43,8 +43,8 @@ define signext i32 @ctz_dereferencing_pointer(ptr %b) nounwind {
 ; RV32I-NEXT:    lw s4, 4(a0)
 ; RV32I-NEXT:    neg a0, s2
 ; RV32I-NEXT:    and a0, s2, a0
-; RV32I-NEXT:    lui a1, 30667
-; RV32I-NEXT:    addi s1, a1, 1329
+; RV32I-NEXT:    lui s1, 30667
+; RV32I-NEXT:    addi s1, s1, 1329
 ; RV32I-NEXT:    mv a1, s1
 ; RV32I-NEXT:    call __mulsi3
 ; RV32I-NEXT:    mv s0, a0
@@ -563,8 +563,8 @@ define signext i32 @ctz4(i64 %b) nounwind {
 ; RV32I-NEXT:    mv s0, a0
 ; RV32I-NEXT:    neg a0, a0
 ; RV32I-NEXT:    and a0, s0, a0
-; RV32I-NEXT:    lui a1, 30667
-; RV32I-NEXT:    addi s3, a1, 1329
+; RV32I-NEXT:    lui s3, 30667
+; RV32I-NEXT:    addi s3, s3, 1329
 ; RV32I-NEXT:    mv a1, s3
 ; RV32I-NEXT:    call __mulsi3
 ; RV32I-NEXT:    mv s1, a0
diff --git a/llvm/test/CodeGen/RISCV/double-convert.ll b/llvm/test/CodeGen/RISCV/double-convert.ll
index 9c81bc2851347..8124d00e63fa7 100644
--- a/llvm/test/CodeGen/RISCV/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/double-convert.ll
@@ -1691,9 +1691,8 @@ define signext i16 @fcvt_w_s_sat_i16(double %a) nounwind {
 ; RV32I-NEXT:    sw s4, 8(sp) # 4-byte Folded Spill
 ; RV32I-NEXT:    mv s0, a1
 ; RV32I-NEXT:    mv s1, a0
-; RV32I-NEXT:    lui a0, 265728
-; RV32I-NEXT:    addi a3, a0, -64
-; RV32I-NEXT:    mv a0, s1
+; RV32I-NEXT:    lui a3, 265728
+; RV32I-NEXT:    addi a3, a3, -64
 ; RV32I-NEXT:    li a2, 0
 ; RV32I-NEXT:    call __gtdf2
 ; RV32I-NEXT:    mv s2, a0
diff --git a/llvm/test/CodeGen/RISCV/float-convert.ll b/llvm/test/CodeGen/RISCV/float-convert.ll
index 6e49d479cf0b9..72578193ee4bf 100644
--- a/llvm/test/CodeGen/RISCV/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/float-convert.ll
@@ -1474,8 +1474,8 @@ define signext i16 @fcvt_w_s_sat_i16(float %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %start
 ; RV32I-NEXT:    lui s1, 1048568
 ; RV32I-NEXT:  .LBB24_2: # %start
-; RV32I-NEXT:    lui a0, 290816
-; RV32I-NEXT:    addi a1, a0, -512
+; RV32I-NEXT:    lui a1, 290816
+; RV32I-NEXT:    addi a1, a1, -512
 ; RV32I-NEXT:    mv a0, s0
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    blez a0, .LBB24_4
@@ -1516,8 +1516,8 @@ define signext i16 @fcvt_w_s_sat_i16(float %a) nounwind {
 ; RV64I-NEXT:  # %bb.1: # %start
 ; RV64I-NEXT:    lui s1, 1048568
 ; RV64I-NEXT:  .LBB24_2: # %start
-; RV64I-NEXT:    lui a0, 290816
-; RV64I-NEXT:    addi a1, a0, -512
+; RV64I-NEXT:    lui a1, 290816
+; RV64I-NEXT:    addi a1, a1, -512
 ; RV64I-NEXT:    mv a0, s0
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    blez a0, .LBB24_4
@@ -1640,8 +1640,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(float %a) nounwind {
 ; RV32I-NEXT:    mv a0, s2
 ; RV32I-NEXT:    call __fixunssfsi
 ; RV32I-NEXT:    mv s1, a0
-; RV32I-NEXT:    lui a0, 292864
-; RV32I-NEXT:    addi a1, a0, -256
+; RV32I-NEXT:    lui a1, 292864
+; RV32I-NEXT:    addi a1, a1, -256
 ; RV32I-NEXT:    mv a0, s2
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    lui a1, 16
@@ -1677,8 +1677,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(float %a) nounwind {
 ; RV64I-NEXT:    mv a0, s2
 ; RV64I-NEXT:    call __fixunssfdi
 ; RV64I-NEXT:    mv s1, a0
-; RV64I-NEXT:    lui a0, 292864
-; RV64I-NEXT:    addi a1, a0, -256
+; RV64I-NEXT:    lui a1, 292864
+; RV64I-NEXT:    addi a1, a1, -256
 ; RV64I-NEXT:    mv a0, s2
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    lui a1, 16
diff --git a/llvm/test/CodeGen/RISCV/half-convert.ll b/llvm/test/CodeGen/RISCV/half-convert.ll
index 961c6cd78212b..6cebf8b2828bf 100644
--- a/llvm/test/CodeGen/RISCV/half-convert.ll
+++ b/llvm/test/CodeGen/RISCV/half-convert.ll
@@ -328,8 +328,8 @@ define i16 @fcvt_si_h_sat(half %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %start
 ; RV32I-NEXT:    lui s1, 1048568
 ; RV32I-NEXT:  .LBB1_2: # %start
-; RV32I-NEXT:    lui a0, 290816
-; RV32I-NEXT:    addi a1, a0, -512
+; RV32I-NEXT:    lui a1, 290816
+; RV32I-NEXT:    addi a1, a1, -512
 ; RV32I-NEXT:    mv a0, s0
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    blez a0, .LBB1_4
@@ -371,8 +371,8 @@ define i16 @fcvt_si_h_sat(half %a) nounwind {
 ; RV64I-NEXT:  # %bb.1: # %start
 ; RV64I-NEXT:    lui s1, 1048568
 ; RV64I-NEXT:  .LBB1_2: # %start
-; RV64I-NEXT:    lui a0, 290816
-; RV64I-NEXT:    addi a1, a0, -512
+; RV64I-NEXT:    lui a1, 290816
+; RV64I-NEXT:    addi a1, a1, -512
 ; RV64I-NEXT:    mv a0, s0
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    blez a0, .LBB1_4
@@ -812,8 +812,8 @@ define i16 @fcvt_ui_h_sat(half %a) nounwind {
 ; RV32I-NEXT:    li a1, 0
 ; RV32I-NEXT:    call __gesf2
 ; RV32I-NEXT:    mv s2, a0
-; RV32I-NEXT:    lui a0, 292864
-; RV32I-NEXT:    addi a1, a0, -256
+; RV32I-NEXT:    lui a1, 292864
+; RV32I-NEXT:    addi a1, a1, -256
 ; RV32I-NEXT:    mv a0, s3
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    bgtz a0, .LBB3_2
@@ -850,8 +850,8 @@ define i16 @fcvt_ui_h_sat(half %a) nounwind {
 ; RV64I-NEXT:    li a1, 0
 ; RV64I-NEXT:    call __gesf2
 ; RV64I-NEXT:    mv s2, a0
-; RV64I-NEXT:    lui a0, 292864
-; RV64I-NEXT:    addi a1, a0, -256
+; RV64I-NEXT:    lui a1, 292864
+; RV64I-NEXT:    addi a1, a1, -256
 ; RV64I-NEXT:    mv a0, s3
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    bgtz a0, .LBB3_2
@@ -6416,8 +6416,8 @@ define signext i16 @fcvt_w_s_sat_i16(half %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %start
 ; RV32I-NEXT:    lui s1, 1048568
 ; RV32I-NEXT:  .LBB32_2: # %start
-; RV32I-NEXT:    lui a0, 290816
-; RV32I-NEXT:    addi a1, a0, -512
+; RV32I-NEXT:    lui a1, 290816
+; RV32I-NEXT:    addi a1, a1, -512
 ; RV32I-NEXT:    mv a0, s0
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    blez a0, .LBB32_4
@@ -6461,8 +6461,8 @@ define signext i16 @fcvt_w_s_sat_i16(half %a) nounwind {
 ; RV64I-NEXT:  # %bb.1: # %start
 ; RV64I-NEXT:    lui s1, 1048568
 ; RV64I-NEXT:  .LBB32_2: # %start
-; RV64I-NEXT:    lui a0, 290816
-; RV64I-NEXT:    addi a1, a0, -512
+; RV64I-NEXT:    lui a1, 290816
+; RV64I-NEXT:    addi a1, a1, -512
 ; RV64I-NEXT:    mv a0, s0
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    blez a0, .LBB32_4
@@ -6903,8 +6903,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(half %a) nounwind {
 ; RV32I-NEXT:    li a1, 0
 ; RV32I-NEXT:    call __gesf2
 ; RV32I-NEXT:    mv s1, a0
-; RV32I-NEXT:    lui a0, 292864
-; RV32I-NEXT:    addi a1, a0, -256
+; RV32I-NEXT:    lui a1, 292864
+; RV32I-NEXT:    addi a1, a1, -256
 ; RV32I-NEXT:    mv a0, s2
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    blez a0, .LBB34_2
@@ -6944,8 +6944,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(half %a) nounwind {
 ; RV64I-NEXT:    li a1, 0
 ; RV64I-NEXT:    call __gesf2
 ; RV64I-NEXT:    mv s1, a0
-; RV64I-NEXT:    lui a0, 292864
-; RV64I-NEXT:    addi a1, a0, -256
+; RV64I-NEXT:    lui a1, 292864
+; RV64I-NEXT:    addi a1, a1, -256
 ; RV64I-NEXT:    mv a0, s2
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    blez a0, .LBB34_2
diff --git a/llvm/test/CodeGen/RISCV/imm.ll b/llvm/test/CodeGen/RISCV/imm.ll
index 8c9a9b43952ba..fad51697264f4 100644
--- a/llvm/test/CodeGen/RISCV/imm.ll
+++ b/llvm/test/CodeGen/RISCV/imm.ll
@@ -837,8 +837,8 @@ define i64 @imm64_5() nounwind {
 define i64 @imm64_6() nounwind {
 ; RV32I-LABEL: imm64_6:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    lui a0, 74565
-; RV32I-NEXT:    addi a1, a0, 1656
+; RV32I-NEXT:    lui a1, 74565
+; RV32I-NEXT:    addi a1, a1, 1656
 ; RV32I-NEXT:    li a0, 0
 ; RV32I-NEXT:    ret
 ;
@@ -3895,8 +3895,8 @@ define i64 @imm_neg_10307948543() {
 define i64 @li_rori_1() {
 ; RV32I-LABEL: li_rori_1:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    lui a0, 1048567
-; RV32I-NEXT:    addi a1, a0, 2047
+; RV32I-NEXT:    lui a1, 1048567
+; RV32I-NEXT:    addi a1, a1, 2047
 ; RV32I-NEXT:    li a0, -1
 ; RV32I-NEXT:    ret
 ;
diff --git a/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll b/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll
index 8deb17582cb11..2a1ba2f77edee 100644
--- a/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll
+++ b/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll
@@ -145,113 +145,59 @@ ret:
 @gd = external thread_local global i32
 
 define void @test_la_tls_gd(i32 signext %n) nounwind {
-; RV32NOFUSION-LABEL: test_la_tls_gd:
-; RV32NOFUSION:       # %bb.0: # %entry
-; RV32NOFUSION-NEXT:    addi sp, sp, -16
-; RV32NOFUSION-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT:    sw s1, 4(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT:    sw s2, 0(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT:    mv s0, a0
-; RV32NOFUSION-NEXT:    li s2, 0
-; RV32NOFUSION-NEXT:  .Lpcrel_hi3:
-; RV32NOFUSION-NEXT:    auipc a0, %tls_gd_pcrel_hi(gd)
-; RV32NOFUSION-NEXT:    addi s1, a0, %pcrel_lo(.Lpcrel_hi3)
-; RV32NOFUSION-NEXT:  .LBB3_1: # %loop
-; RV32NOFUSION-NEXT:    # =>This Inner Loop Header: Depth=1
-; RV32NOFUSION-NEXT:    mv a0, s1
-; RV32NOFUSION-NEXT:    call __tls_get_addr
-; RV32NOFUSION-NEXT:    lw zero, 0(a0)
-; RV32NOFUSION-NEXT:    addi s2, s2, 1
-; RV32NOFUSION-NEXT:    blt s2, s0, .LBB3_1
-; RV32NOFUSION-NEXT:  # %bb.2: # %ret
-; RV32NOFUSION-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT:    lw s1, 4(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT:    lw s2, 0(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT:    addi sp, sp, 16
-; RV32NOFUSION-NEXT:    ret
-;
-; RV64NOFUSION-LABEL: test_la_tls_gd:
-; RV64NOFUSION:       # %bb.0: # %entry
-; RV64NOFUSION-NEXT:    addi sp, sp, -32
-; RV64NOFUSION-NEXT:    sd ra, 24(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT:    sd s0, 16(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT:    sd s1, 8(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT:    sd s2, 0(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT:    mv s0, a0
-; RV64NOFUSION-NEXT:    li s2, 0
-; RV64NOFUSION-NEXT:  .Lpcrel_hi3:
-; RV64NOFUSION-NEXT:    auipc a0, %tls_gd_pcrel_hi(gd)
-; RV64NOFUSION-NEXT:    addi s1, a0, %pcrel_lo(.Lpcrel_hi3)
-; RV64NOFUSION-NEXT:  .LBB3_1: # %loop
-; RV64NOFUSION-NEXT:    # =>This Inner Loop Header: Depth=1
-; RV64NOFUSION-NEXT:    mv a0, s1
-; RV64NOFUSION-NEXT:    call __tls_get_addr
-; RV64NOFUSION-NEXT:    lw zero, 0(a0)
-; RV64NOFUSION-NEXT:    addiw s2, s2, 1
-; RV64NOFUSION-NEXT:    blt s2, s0, .LBB3_1
-; RV64NOFUSION-NEXT:  # %bb.2: # %ret
-; RV64NOFUSION-NEXT:    ld ra, 24(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT:    ld s0, 16(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT:    ld s1, 8(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT:    ld s2, 0(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT:    addi sp, sp, 32
-; RV64NOFUSION-NEXT:    ret
-;
-; RV32FUSION-LABEL: test_la_tls_gd:
-; RV32FUSION:       # %bb.0: # %entry
-; RV32FUSION-NEXT:    addi sp, sp, -16
-; RV32FUSION-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT:    sw s1, 4(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT:    sw s2, 0(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT:    mv s0, a0
-; RV32FUSION-NEXT:    li s2, 0
-; RV32FUSION-NEXT:  .Lpcrel_hi3:
-; RV32FUSION-NEXT:    auipc s1, %tls_gd_pcrel_hi(gd)
-; RV32FUSION-NEXT:    addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
-; RV32FUSION-NEXT:  .LBB3_1: # %loop
-; RV32FUSION-NEXT:    # =>This Inner Loop Header: Depth=1
-; RV32FUSION-NEXT:    mv a0, s1
-; RV32FUSION-NEXT:    call __tls_get_addr
-; RV32FUSION-NEXT:    lw zero, 0(a0)
-; RV32FUSION-NEXT:    addi s2, s2, 1
-; RV32FUSION-NEXT:    blt s2, s0, .LBB3_1
-; RV32FUSION-NEXT:  # %bb.2: # %ret
-; RV32FUSION-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT:    lw s1, 4(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT:    lw s2, 0(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT:    addi sp, sp, 16
-; RV32FUSION-NEXT:    ret
+; RV32I-LABEL: test_la_tls_gd:
+; RV32I:       # %bb.0: # %entry
+; RV32I-NEXT:    addi sp, sp, -16
+; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s1, 4(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s2, 0(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    mv s0, a0
+; RV32I-NEXT:    li s2, 0
+; RV32I-NEXT:  .Lpcrel_hi3:
+; RV32I-NEXT:    auipc s1, %tls_gd_pcrel_hi(gd)
+; RV32I-NEXT:    addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
+; RV32I-NEXT:  .LBB3_1: # %loop
+; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32I-NEXT:    mv a0, s1
+; RV32I-NEXT:    call __tls_get_addr
+; RV32I-NEXT:    lw zero, 0(a0)
+; RV32I-NEXT:    addi s2, s2, 1
+; RV32I-NEXT:    blt s2, s0, .LBB3_1
+; RV32I-NEXT:  # %bb.2: # %ret
+; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s1, 4(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s2, 0(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    addi sp, sp, 16
+; RV32I-NEXT:    ret
 ;
-; RV64FUSION-LABEL: test_la_tls_gd:
-; RV64FUSION:       # %bb.0: # %entry
-; RV64FUSION-NEXT:    addi sp, sp, -32
-; RV64FUSION-NEXT:    sd ra, 24(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT:    sd s0, 16(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT:    sd s1, 8(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT:    sd s2, 0(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT:    mv s0, a0
-; RV64FUSION-NEXT:    li s2, 0
-; RV64FUSION-NEXT:  .Lpcrel_hi3:
-; RV64FUSION-NEXT:    auipc s1, %tls_gd_pcrel_hi(gd)
-; RV64FUSION-NEXT:    addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
-; RV64FUSION-NEXT:  .LBB3_1: # %loop
-; RV64FUSION-NEXT:    # =>This Inner Loop Header: Depth=1
-; RV64FUSION-NEXT:    mv a0, s1
-; RV64FUSION-NEXT:    call __tls_get_addr
-; RV64FUSION-NEXT:    lw zero, 0(a0)
-; RV64FUSION-NEXT:    addiw s2, s2, 1
-; RV64FUSION-NEXT:    blt s2, s0, .LBB3_1
-; RV64FUSION-NEXT:  # %bb.2: # %ret
-; RV64FUSION-NEXT:    ld ra, 24(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT:    ld s0, 16(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT:    ld s1, 8(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT:    ld s2, 0(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT:    addi sp, sp, 32
-; RV64FUSION-NEXT:    ret
+; RV64I-LABEL: test_la_tls_gd:
+; RV64I:       # %bb.0: # %entry
+; RV64I-NEXT:    addi sp, sp, -32
+; RV64I-NEXT:    sd ra, 24(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s0, 16(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s1, 8(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s2, 0(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    mv s0, a0
+; RV64I-NEXT:    li s2, 0
+; RV64I-NEXT:  .Lpcrel_hi3:
+; RV64I-NEXT:    auipc s1, %tls_gd_pcrel_hi(gd)
+; RV64I-NEXT:    addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
+; RV64I-NEXT:  .LBB3_1: # %loop
+; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64I-NEXT:    mv a0, s1
+; RV64I-NEXT:    call __tls_get_addr
+; RV64I-NEXT:    lw zero, 0(a0)
+; RV64I-NEXT:    addiw s2, s2, 1
+; RV64I-NEXT:    blt s2, s0, .LBB3_1
+; RV64I-NEXT:  # %bb.2: # %ret
+; RV64I-NEXT:    ld ra, 24(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s0, 16(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s1, 8(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s2, 0(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    addi sp, sp, 32
+; RV64I-NEXT:    ret
 entry:
   br label %loop
 
@@ -265,3 +211,8 @@ loop:
 ret:
   ret void
 }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; RV32FUSION: {{.*}}
+; RV32NOFUSION: {{.*}}
+; RV64FUSION: {{.*}}
+; RV64NOFUSION: {{.*}}
diff --git a/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll b/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll
index 1360a29a3e10f..aa02d46c34550 100644
--- a/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll
+++ b/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll
@@ -238,8 +238,8 @@ define i64 @cttz_i64(i64 %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %cond.false
 ; RV32I-NEXT:    neg a1, a0
 ; RV32I-NEXT:    and a1, a0, a1
-; RV32I-NEXT:    lui a2, 30667
-; RV32I-NEXT:    addi s2, a2, 1329
+; RV32I-NEXT:    lui s2, 30667
+; RV32I-NEXT:    addi s2, s2, 1329
 ; RV32I-NEXT:    ...
[truncated]

topperc

LGTM

asb · 2025-09-06T11:10:06Z

Obviously I know this has landed already etc, but out of curiosity I looked at the impact on an rva22u64 llvm-test-suite build. Interestingly there's a minor reduction in static instruction count (190840 insertions(+), 196374 deletions(-)), and looking at the causes these are all cases involving function calls looking something like:

-       auipc   a0, %pcrel_hi(.L.str.5)
-       addi    a1, a0, %pcrel_lo(.Lpcrel_hi17)
+       auipc   a1, %pcrel_hi(.L.str.5)
+       addi    a1, a1, %pcrel_lo(.Lpcrel_hi17)
        mv      a2, sp
-       mv      a0, s1
        call    __isoc99_fscanf

or similarly:

-       auipc   a0, %pcrel_hi(.L.str.33)
-       addi    a1, a0, %pcrel_lo(.Lpcrel_hi42)
+       auipc   a1, %pcrel_hi(.L.str.33)
+       addi    a1, a1, %pcrel_lo(.Lpcrel_hi42)
        li      a2, 60
-       mv      a0, s1
        call    memcpy

i.e. before the register allocator dirtied a0 and then had to do additional register shuffling for a the following function call, when it could have used a different register with no cost. It does make me wonder if there are other cases where the register allocator is choosing to use more registers than appears necessary...

preames requested review from asb and topperc August 27, 2025 20:53

llvmbot added backend:RISC-V llvm:globalisel labels Aug 27, 2025

topperc approved these changes Aug 27, 2025

View reviewed changes

preames merged commit 58df9b1 into llvm:main Aug 27, 2025
12 checks passed

preames deleted the pr-riscv-fusion-hint-by-default branch August 27, 2025 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Enable LUi/AUIPC+ADDI/ADDIW reg alloc hint by default #155693

[RISCV] Enable LUi/AUIPC+ADDI/ADDIW reg alloc hint by default #155693

Uh oh!

preames commented Aug 27, 2025

Uh oh!

llvmbot commented Aug 27, 2025 •

edited

Loading

Uh oh!

topperc left a comment

Uh oh!

Uh oh!

asb commented Sep 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[RISCV] Enable LUi/AUIPC+ADDI/ADDIW reg alloc hint by default #155693

[RISCV] Enable LUi/AUIPC+ADDI/ADDIW reg alloc hint by default #155693

Uh oh!

Conversation

preames commented Aug 27, 2025

Uh oh!

llvmbot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asb commented Sep 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Aug 27, 2025 •

edited

Loading