Skip to content

Conversation

preames
Copy link
Collaborator

@preames preames commented Aug 27, 2025

This block of code is currently conditional on the fusions being enabled but as far as I can tell, does no harm to generally enable. The net effect is the generically compiled code runs slightly better on machines with this fusion.

The actual motivation is merely to stop confusing myself when I see the sequence in code; the register allocators choice to sometimes blow two registers instead of one is just generally weird, and my eyes spot it when scanning disassembly.

(Note that this is just the regalloc hint; the scheduling changes remain conditional, and probably should remain so.)

This block of code is currently conditional on the fusions being enabled
but as far as I can tell, does no harm to generally enable.  The net
effect is the generically compiled code runs slightly better on machines
with this fusion.

The actual motivation is merely to stop confusing myself when I see the
sequence in code; the register allocators choice to sometimes blow two
registers instead of one is just generally weird, and my eyes spot it
when scanning disassembly.

(Note that this is just the regalloc hint; the scheduling changes remain
 conditional, and probably should remain so.)
@llvmbot
Copy link
Member

llvmbot commented Aug 27, 2025

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This block of code is currently conditional on the fusions being enabled but as far as I can tell, does no harm to generally enable. The net effect is the generically compiled code runs slightly better on machines with this fusion.

The actual motivation is merely to stop confusing myself when I see the sequence in code; the register allocators choice to sometimes blow two registers instead of one is just generally weird, and my eyes spot it when scanning disassembly.

(Note that this is just the regalloc hint; the scheduling changes remain conditional, and probably should remain so.)


Patch is 29.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155693.diff

17 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/double-convert.ll (+2-3)
  • (modified) llvm/test/CodeGen/RISCV/float-convert.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/half-convert.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/imm.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll (+57-106)
  • (modified) llvm/test/CodeGen/RISCV/rv32xtheadbb.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbb.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/shl-cttz.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/srem-vector-lkk.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/xqccmp-additional-stack.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/xqcibm-cto-clo-brev.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/zcmp-additional-stack.ll (+2-2)
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index f3966a55ce7d1..40b641680b2ce 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -966,7 +966,9 @@ bool RISCVRegisterInfo::getRegAllocationHints(
       }
     }
 
-    // Add a hint if it would allow auipc/lui+addi(w) fusion.
+    // Add a hint if it would allow auipc/lui+addi(w) fusion.  We do this even
+    // without the fusions explicitly enabled as the impact is rarely negative
+    // and some cores do implement this fusion.
     if ((MI.getOpcode() == RISCV::ADDIW || MI.getOpcode() == RISCV::ADDI) &&
         MI.getOperand(1).isReg()) {
       const MachineBasicBlock &MBB = *MI.getParent();
@@ -974,9 +976,7 @@ bool RISCVRegisterInfo::getRegAllocationHints(
       // Is the previous instruction a LUI or AUIPC that can be fused?
       if (I != MBB.begin()) {
         I = skipDebugInstructionsBackward(std::prev(I), MBB.begin());
-        if (((I->getOpcode() == RISCV::LUI && Subtarget.hasLUIADDIFusion()) ||
-             (I->getOpcode() == RISCV::AUIPC &&
-              Subtarget.hasAUIPCADDIFusion())) &&
+        if ((I->getOpcode() == RISCV::LUI || I->getOpcode() == RISCV::AUIPC) &&
             I->getOperand(0).getReg() == MI.getOperand(1).getReg()) {
           if (OpIdx == 0)
             tryAddHint(MO, MI.getOperand(1), /*NeedGPRC=*/false);
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll b/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
index afef96db5e290..1ec80a4978699 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
@@ -1155,8 +1155,8 @@ define void @va3_caller() nounwind {
 ; RV32:       # %bb.0:
 ; RV32-NEXT:    addi sp, sp, -16
 ; RV32-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32-NEXT:    lui a0, 5
-; RV32-NEXT:    addi a3, a0, -480
+; RV32-NEXT:    lui a3, 5
+; RV32-NEXT:    addi a3, a3, -480
 ; RV32-NEXT:    li a0, 2
 ; RV32-NEXT:    li a1, 1111
 ; RV32-NEXT:    li a2, 0
@@ -1184,8 +1184,8 @@ define void @va3_caller() nounwind {
 ; RV32-WITHFP-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32-WITHFP-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
 ; RV32-WITHFP-NEXT:    addi s0, sp, 16
-; RV32-WITHFP-NEXT:    lui a0, 5
-; RV32-WITHFP-NEXT:    addi a3, a0, -480
+; RV32-WITHFP-NEXT:    lui a3, 5
+; RV32-WITHFP-NEXT:    addi a3, a3, -480
 ; RV32-WITHFP-NEXT:    li a0, 2
 ; RV32-WITHFP-NEXT:    li a1, 1111
 ; RV32-WITHFP-NEXT:    li a2, 0
diff --git a/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll b/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll
index 530980c13116c..908a12331d1bb 100644
--- a/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll
+++ b/llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll
@@ -393,8 +393,8 @@ define i64 @test_cttz_i64(i64 %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %cond.false
 ; RV32I-NEXT:    neg a1, a0
 ; RV32I-NEXT:    and a1, a0, a1
-; RV32I-NEXT:    lui a2, 30667
-; RV32I-NEXT:    addi s2, a2, 1329
+; RV32I-NEXT:    lui s2, 30667
+; RV32I-NEXT:    addi s2, s2, 1329
 ; RV32I-NEXT:    mv s4, a0
 ; RV32I-NEXT:    mv a0, a1
 ; RV32I-NEXT:    mv a1, s2
@@ -460,8 +460,8 @@ define i64 @test_cttz_i64(i64 %a) nounwind {
 ; RV32M-NEXT:    or a2, a0, a1
 ; RV32M-NEXT:    beqz a2, .LBB3_3
 ; RV32M-NEXT:  # %bb.1: # %cond.false
-; RV32M-NEXT:    lui a2, 30667
-; RV32M-NEXT:    addi a3, a2, 1329
+; RV32M-NEXT:    lui a3, 30667
+; RV32M-NEXT:    addi a3, a3, 1329
 ; RV32M-NEXT:    lui a2, %hi(.LCPI3_0)
 ; RV32M-NEXT:    addi a2, a2, %lo(.LCPI3_0)
 ; RV32M-NEXT:    bnez a0, .LBB3_4
@@ -847,8 +847,8 @@ define i64 @test_cttz_i64_zero_undef(i64 %a) nounwind {
 ; RV32I-NEXT:    mv s2, a0
 ; RV32I-NEXT:    neg a0, a0
 ; RV32I-NEXT:    and a0, s2, a0
-; RV32I-NEXT:    lui a1, 30667
-; RV32I-NEXT:    addi s3, a1, 1329
+; RV32I-NEXT:    lui s3, 30667
+; RV32I-NEXT:    addi s3, s3, 1329
 ; RV32I-NEXT:    mv a1, s3
 ; RV32I-NEXT:    call __mulsi3
 ; RV32I-NEXT:    mv s0, a0
@@ -900,8 +900,8 @@ define i64 @test_cttz_i64_zero_undef(i64 %a) nounwind {
 ;
 ; RV32M-LABEL: test_cttz_i64_zero_undef:
 ; RV32M:       # %bb.0:
-; RV32M-NEXT:    lui a2, 30667
-; RV32M-NEXT:    addi a3, a2, 1329
+; RV32M-NEXT:    lui a3, 30667
+; RV32M-NEXT:    addi a3, a3, 1329
 ; RV32M-NEXT:    lui a2, %hi(.LCPI7_0)
 ; RV32M-NEXT:    addi a2, a2, %lo(.LCPI7_0)
 ; RV32M-NEXT:    bnez a0, .LBB7_2
diff --git a/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll b/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll
index a1061fbbbbf02..29de02af09c8f 100644
--- a/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll
+++ b/llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll
@@ -43,8 +43,8 @@ define signext i32 @ctz_dereferencing_pointer(ptr %b) nounwind {
 ; RV32I-NEXT:    lw s4, 4(a0)
 ; RV32I-NEXT:    neg a0, s2
 ; RV32I-NEXT:    and a0, s2, a0
-; RV32I-NEXT:    lui a1, 30667
-; RV32I-NEXT:    addi s1, a1, 1329
+; RV32I-NEXT:    lui s1, 30667
+; RV32I-NEXT:    addi s1, s1, 1329
 ; RV32I-NEXT:    mv a1, s1
 ; RV32I-NEXT:    call __mulsi3
 ; RV32I-NEXT:    mv s0, a0
@@ -563,8 +563,8 @@ define signext i32 @ctz4(i64 %b) nounwind {
 ; RV32I-NEXT:    mv s0, a0
 ; RV32I-NEXT:    neg a0, a0
 ; RV32I-NEXT:    and a0, s0, a0
-; RV32I-NEXT:    lui a1, 30667
-; RV32I-NEXT:    addi s3, a1, 1329
+; RV32I-NEXT:    lui s3, 30667
+; RV32I-NEXT:    addi s3, s3, 1329
 ; RV32I-NEXT:    mv a1, s3
 ; RV32I-NEXT:    call __mulsi3
 ; RV32I-NEXT:    mv s1, a0
diff --git a/llvm/test/CodeGen/RISCV/double-convert.ll b/llvm/test/CodeGen/RISCV/double-convert.ll
index 9c81bc2851347..8124d00e63fa7 100644
--- a/llvm/test/CodeGen/RISCV/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/double-convert.ll
@@ -1691,9 +1691,8 @@ define signext i16 @fcvt_w_s_sat_i16(double %a) nounwind {
 ; RV32I-NEXT:    sw s4, 8(sp) # 4-byte Folded Spill
 ; RV32I-NEXT:    mv s0, a1
 ; RV32I-NEXT:    mv s1, a0
-; RV32I-NEXT:    lui a0, 265728
-; RV32I-NEXT:    addi a3, a0, -64
-; RV32I-NEXT:    mv a0, s1
+; RV32I-NEXT:    lui a3, 265728
+; RV32I-NEXT:    addi a3, a3, -64
 ; RV32I-NEXT:    li a2, 0
 ; RV32I-NEXT:    call __gtdf2
 ; RV32I-NEXT:    mv s2, a0
diff --git a/llvm/test/CodeGen/RISCV/float-convert.ll b/llvm/test/CodeGen/RISCV/float-convert.ll
index 6e49d479cf0b9..72578193ee4bf 100644
--- a/llvm/test/CodeGen/RISCV/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/float-convert.ll
@@ -1474,8 +1474,8 @@ define signext i16 @fcvt_w_s_sat_i16(float %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %start
 ; RV32I-NEXT:    lui s1, 1048568
 ; RV32I-NEXT:  .LBB24_2: # %start
-; RV32I-NEXT:    lui a0, 290816
-; RV32I-NEXT:    addi a1, a0, -512
+; RV32I-NEXT:    lui a1, 290816
+; RV32I-NEXT:    addi a1, a1, -512
 ; RV32I-NEXT:    mv a0, s0
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    blez a0, .LBB24_4
@@ -1516,8 +1516,8 @@ define signext i16 @fcvt_w_s_sat_i16(float %a) nounwind {
 ; RV64I-NEXT:  # %bb.1: # %start
 ; RV64I-NEXT:    lui s1, 1048568
 ; RV64I-NEXT:  .LBB24_2: # %start
-; RV64I-NEXT:    lui a0, 290816
-; RV64I-NEXT:    addi a1, a0, -512
+; RV64I-NEXT:    lui a1, 290816
+; RV64I-NEXT:    addi a1, a1, -512
 ; RV64I-NEXT:    mv a0, s0
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    blez a0, .LBB24_4
@@ -1640,8 +1640,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(float %a) nounwind {
 ; RV32I-NEXT:    mv a0, s2
 ; RV32I-NEXT:    call __fixunssfsi
 ; RV32I-NEXT:    mv s1, a0
-; RV32I-NEXT:    lui a0, 292864
-; RV32I-NEXT:    addi a1, a0, -256
+; RV32I-NEXT:    lui a1, 292864
+; RV32I-NEXT:    addi a1, a1, -256
 ; RV32I-NEXT:    mv a0, s2
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    lui a1, 16
@@ -1677,8 +1677,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(float %a) nounwind {
 ; RV64I-NEXT:    mv a0, s2
 ; RV64I-NEXT:    call __fixunssfdi
 ; RV64I-NEXT:    mv s1, a0
-; RV64I-NEXT:    lui a0, 292864
-; RV64I-NEXT:    addi a1, a0, -256
+; RV64I-NEXT:    lui a1, 292864
+; RV64I-NEXT:    addi a1, a1, -256
 ; RV64I-NEXT:    mv a0, s2
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    lui a1, 16
diff --git a/llvm/test/CodeGen/RISCV/half-convert.ll b/llvm/test/CodeGen/RISCV/half-convert.ll
index 961c6cd78212b..6cebf8b2828bf 100644
--- a/llvm/test/CodeGen/RISCV/half-convert.ll
+++ b/llvm/test/CodeGen/RISCV/half-convert.ll
@@ -328,8 +328,8 @@ define i16 @fcvt_si_h_sat(half %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %start
 ; RV32I-NEXT:    lui s1, 1048568
 ; RV32I-NEXT:  .LBB1_2: # %start
-; RV32I-NEXT:    lui a0, 290816
-; RV32I-NEXT:    addi a1, a0, -512
+; RV32I-NEXT:    lui a1, 290816
+; RV32I-NEXT:    addi a1, a1, -512
 ; RV32I-NEXT:    mv a0, s0
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    blez a0, .LBB1_4
@@ -371,8 +371,8 @@ define i16 @fcvt_si_h_sat(half %a) nounwind {
 ; RV64I-NEXT:  # %bb.1: # %start
 ; RV64I-NEXT:    lui s1, 1048568
 ; RV64I-NEXT:  .LBB1_2: # %start
-; RV64I-NEXT:    lui a0, 290816
-; RV64I-NEXT:    addi a1, a0, -512
+; RV64I-NEXT:    lui a1, 290816
+; RV64I-NEXT:    addi a1, a1, -512
 ; RV64I-NEXT:    mv a0, s0
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    blez a0, .LBB1_4
@@ -812,8 +812,8 @@ define i16 @fcvt_ui_h_sat(half %a) nounwind {
 ; RV32I-NEXT:    li a1, 0
 ; RV32I-NEXT:    call __gesf2
 ; RV32I-NEXT:    mv s2, a0
-; RV32I-NEXT:    lui a0, 292864
-; RV32I-NEXT:    addi a1, a0, -256
+; RV32I-NEXT:    lui a1, 292864
+; RV32I-NEXT:    addi a1, a1, -256
 ; RV32I-NEXT:    mv a0, s3
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    bgtz a0, .LBB3_2
@@ -850,8 +850,8 @@ define i16 @fcvt_ui_h_sat(half %a) nounwind {
 ; RV64I-NEXT:    li a1, 0
 ; RV64I-NEXT:    call __gesf2
 ; RV64I-NEXT:    mv s2, a0
-; RV64I-NEXT:    lui a0, 292864
-; RV64I-NEXT:    addi a1, a0, -256
+; RV64I-NEXT:    lui a1, 292864
+; RV64I-NEXT:    addi a1, a1, -256
 ; RV64I-NEXT:    mv a0, s3
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    bgtz a0, .LBB3_2
@@ -6416,8 +6416,8 @@ define signext i16 @fcvt_w_s_sat_i16(half %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %start
 ; RV32I-NEXT:    lui s1, 1048568
 ; RV32I-NEXT:  .LBB32_2: # %start
-; RV32I-NEXT:    lui a0, 290816
-; RV32I-NEXT:    addi a1, a0, -512
+; RV32I-NEXT:    lui a1, 290816
+; RV32I-NEXT:    addi a1, a1, -512
 ; RV32I-NEXT:    mv a0, s0
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    blez a0, .LBB32_4
@@ -6461,8 +6461,8 @@ define signext i16 @fcvt_w_s_sat_i16(half %a) nounwind {
 ; RV64I-NEXT:  # %bb.1: # %start
 ; RV64I-NEXT:    lui s1, 1048568
 ; RV64I-NEXT:  .LBB32_2: # %start
-; RV64I-NEXT:    lui a0, 290816
-; RV64I-NEXT:    addi a1, a0, -512
+; RV64I-NEXT:    lui a1, 290816
+; RV64I-NEXT:    addi a1, a1, -512
 ; RV64I-NEXT:    mv a0, s0
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    blez a0, .LBB32_4
@@ -6903,8 +6903,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(half %a) nounwind {
 ; RV32I-NEXT:    li a1, 0
 ; RV32I-NEXT:    call __gesf2
 ; RV32I-NEXT:    mv s1, a0
-; RV32I-NEXT:    lui a0, 292864
-; RV32I-NEXT:    addi a1, a0, -256
+; RV32I-NEXT:    lui a1, 292864
+; RV32I-NEXT:    addi a1, a1, -256
 ; RV32I-NEXT:    mv a0, s2
 ; RV32I-NEXT:    call __gtsf2
 ; RV32I-NEXT:    blez a0, .LBB34_2
@@ -6944,8 +6944,8 @@ define zeroext i16 @fcvt_wu_s_sat_i16(half %a) nounwind {
 ; RV64I-NEXT:    li a1, 0
 ; RV64I-NEXT:    call __gesf2
 ; RV64I-NEXT:    mv s1, a0
-; RV64I-NEXT:    lui a0, 292864
-; RV64I-NEXT:    addi a1, a0, -256
+; RV64I-NEXT:    lui a1, 292864
+; RV64I-NEXT:    addi a1, a1, -256
 ; RV64I-NEXT:    mv a0, s2
 ; RV64I-NEXT:    call __gtsf2
 ; RV64I-NEXT:    blez a0, .LBB34_2
diff --git a/llvm/test/CodeGen/RISCV/imm.ll b/llvm/test/CodeGen/RISCV/imm.ll
index 8c9a9b43952ba..fad51697264f4 100644
--- a/llvm/test/CodeGen/RISCV/imm.ll
+++ b/llvm/test/CodeGen/RISCV/imm.ll
@@ -837,8 +837,8 @@ define i64 @imm64_5() nounwind {
 define i64 @imm64_6() nounwind {
 ; RV32I-LABEL: imm64_6:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    lui a0, 74565
-; RV32I-NEXT:    addi a1, a0, 1656
+; RV32I-NEXT:    lui a1, 74565
+; RV32I-NEXT:    addi a1, a1, 1656
 ; RV32I-NEXT:    li a0, 0
 ; RV32I-NEXT:    ret
 ;
@@ -3895,8 +3895,8 @@ define i64 @imm_neg_10307948543() {
 define i64 @li_rori_1() {
 ; RV32I-LABEL: li_rori_1:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    lui a0, 1048567
-; RV32I-NEXT:    addi a1, a0, 2047
+; RV32I-NEXT:    lui a1, 1048567
+; RV32I-NEXT:    addi a1, a1, 2047
 ; RV32I-NEXT:    li a0, -1
 ; RV32I-NEXT:    ret
 ;
diff --git a/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll b/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll
index 8deb17582cb11..2a1ba2f77edee 100644
--- a/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll
+++ b/llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll
@@ -145,113 +145,59 @@ ret:
 @gd = external thread_local global i32
 
 define void @test_la_tls_gd(i32 signext %n) nounwind {
-; RV32NOFUSION-LABEL: test_la_tls_gd:
-; RV32NOFUSION:       # %bb.0: # %entry
-; RV32NOFUSION-NEXT:    addi sp, sp, -16
-; RV32NOFUSION-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT:    sw s1, 4(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT:    sw s2, 0(sp) # 4-byte Folded Spill
-; RV32NOFUSION-NEXT:    mv s0, a0
-; RV32NOFUSION-NEXT:    li s2, 0
-; RV32NOFUSION-NEXT:  .Lpcrel_hi3:
-; RV32NOFUSION-NEXT:    auipc a0, %tls_gd_pcrel_hi(gd)
-; RV32NOFUSION-NEXT:    addi s1, a0, %pcrel_lo(.Lpcrel_hi3)
-; RV32NOFUSION-NEXT:  .LBB3_1: # %loop
-; RV32NOFUSION-NEXT:    # =>This Inner Loop Header: Depth=1
-; RV32NOFUSION-NEXT:    mv a0, s1
-; RV32NOFUSION-NEXT:    call __tls_get_addr
-; RV32NOFUSION-NEXT:    lw zero, 0(a0)
-; RV32NOFUSION-NEXT:    addi s2, s2, 1
-; RV32NOFUSION-NEXT:    blt s2, s0, .LBB3_1
-; RV32NOFUSION-NEXT:  # %bb.2: # %ret
-; RV32NOFUSION-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT:    lw s1, 4(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT:    lw s2, 0(sp) # 4-byte Folded Reload
-; RV32NOFUSION-NEXT:    addi sp, sp, 16
-; RV32NOFUSION-NEXT:    ret
-;
-; RV64NOFUSION-LABEL: test_la_tls_gd:
-; RV64NOFUSION:       # %bb.0: # %entry
-; RV64NOFUSION-NEXT:    addi sp, sp, -32
-; RV64NOFUSION-NEXT:    sd ra, 24(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT:    sd s0, 16(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT:    sd s1, 8(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT:    sd s2, 0(sp) # 8-byte Folded Spill
-; RV64NOFUSION-NEXT:    mv s0, a0
-; RV64NOFUSION-NEXT:    li s2, 0
-; RV64NOFUSION-NEXT:  .Lpcrel_hi3:
-; RV64NOFUSION-NEXT:    auipc a0, %tls_gd_pcrel_hi(gd)
-; RV64NOFUSION-NEXT:    addi s1, a0, %pcrel_lo(.Lpcrel_hi3)
-; RV64NOFUSION-NEXT:  .LBB3_1: # %loop
-; RV64NOFUSION-NEXT:    # =>This Inner Loop Header: Depth=1
-; RV64NOFUSION-NEXT:    mv a0, s1
-; RV64NOFUSION-NEXT:    call __tls_get_addr
-; RV64NOFUSION-NEXT:    lw zero, 0(a0)
-; RV64NOFUSION-NEXT:    addiw s2, s2, 1
-; RV64NOFUSION-NEXT:    blt s2, s0, .LBB3_1
-; RV64NOFUSION-NEXT:  # %bb.2: # %ret
-; RV64NOFUSION-NEXT:    ld ra, 24(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT:    ld s0, 16(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT:    ld s1, 8(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT:    ld s2, 0(sp) # 8-byte Folded Reload
-; RV64NOFUSION-NEXT:    addi sp, sp, 32
-; RV64NOFUSION-NEXT:    ret
-;
-; RV32FUSION-LABEL: test_la_tls_gd:
-; RV32FUSION:       # %bb.0: # %entry
-; RV32FUSION-NEXT:    addi sp, sp, -16
-; RV32FUSION-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT:    sw s1, 4(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT:    sw s2, 0(sp) # 4-byte Folded Spill
-; RV32FUSION-NEXT:    mv s0, a0
-; RV32FUSION-NEXT:    li s2, 0
-; RV32FUSION-NEXT:  .Lpcrel_hi3:
-; RV32FUSION-NEXT:    auipc s1, %tls_gd_pcrel_hi(gd)
-; RV32FUSION-NEXT:    addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
-; RV32FUSION-NEXT:  .LBB3_1: # %loop
-; RV32FUSION-NEXT:    # =>This Inner Loop Header: Depth=1
-; RV32FUSION-NEXT:    mv a0, s1
-; RV32FUSION-NEXT:    call __tls_get_addr
-; RV32FUSION-NEXT:    lw zero, 0(a0)
-; RV32FUSION-NEXT:    addi s2, s2, 1
-; RV32FUSION-NEXT:    blt s2, s0, .LBB3_1
-; RV32FUSION-NEXT:  # %bb.2: # %ret
-; RV32FUSION-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT:    lw s1, 4(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT:    lw s2, 0(sp) # 4-byte Folded Reload
-; RV32FUSION-NEXT:    addi sp, sp, 16
-; RV32FUSION-NEXT:    ret
+; RV32I-LABEL: test_la_tls_gd:
+; RV32I:       # %bb.0: # %entry
+; RV32I-NEXT:    addi sp, sp, -16
+; RV32I-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s1, 4(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s2, 0(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    mv s0, a0
+; RV32I-NEXT:    li s2, 0
+; RV32I-NEXT:  .Lpcrel_hi3:
+; RV32I-NEXT:    auipc s1, %tls_gd_pcrel_hi(gd)
+; RV32I-NEXT:    addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
+; RV32I-NEXT:  .LBB3_1: # %loop
+; RV32I-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32I-NEXT:    mv a0, s1
+; RV32I-NEXT:    call __tls_get_addr
+; RV32I-NEXT:    lw zero, 0(a0)
+; RV32I-NEXT:    addi s2, s2, 1
+; RV32I-NEXT:    blt s2, s0, .LBB3_1
+; RV32I-NEXT:  # %bb.2: # %ret
+; RV32I-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s1, 4(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s2, 0(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    addi sp, sp, 16
+; RV32I-NEXT:    ret
 ;
-; RV64FUSION-LABEL: test_la_tls_gd:
-; RV64FUSION:       # %bb.0: # %entry
-; RV64FUSION-NEXT:    addi sp, sp, -32
-; RV64FUSION-NEXT:    sd ra, 24(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT:    sd s0, 16(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT:    sd s1, 8(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT:    sd s2, 0(sp) # 8-byte Folded Spill
-; RV64FUSION-NEXT:    mv s0, a0
-; RV64FUSION-NEXT:    li s2, 0
-; RV64FUSION-NEXT:  .Lpcrel_hi3:
-; RV64FUSION-NEXT:    auipc s1, %tls_gd_pcrel_hi(gd)
-; RV64FUSION-NEXT:    addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
-; RV64FUSION-NEXT:  .LBB3_1: # %loop
-; RV64FUSION-NEXT:    # =>This Inner Loop Header: Depth=1
-; RV64FUSION-NEXT:    mv a0, s1
-; RV64FUSION-NEXT:    call __tls_get_addr
-; RV64FUSION-NEXT:    lw zero, 0(a0)
-; RV64FUSION-NEXT:    addiw s2, s2, 1
-; RV64FUSION-NEXT:    blt s2, s0, .LBB3_1
-; RV64FUSION-NEXT:  # %bb.2: # %ret
-; RV64FUSION-NEXT:    ld ra, 24(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT:    ld s0, 16(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT:    ld s1, 8(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT:    ld s2, 0(sp) # 8-byte Folded Reload
-; RV64FUSION-NEXT:    addi sp, sp, 32
-; RV64FUSION-NEXT:    ret
+; RV64I-LABEL: test_la_tls_gd:
+; RV64I:       # %bb.0: # %entry
+; RV64I-NEXT:    addi sp, sp, -32
+; RV64I-NEXT:    sd ra, 24(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s0, 16(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s1, 8(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s2, 0(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    mv s0, a0
+; RV64I-NEXT:    li s2, 0
+; RV64I-NEXT:  .Lpcrel_hi3:
+; RV64I-NEXT:    auipc s1, %tls_gd_pcrel_hi(gd)
+; RV64I-NEXT:    addi s1, s1, %pcrel_lo(.Lpcrel_hi3)
+; RV64I-NEXT:  .LBB3_1: # %loop
+; RV64I-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV64I-NEXT:    mv a0, s1
+; RV64I-NEXT:    call __tls_get_addr
+; RV64I-NEXT:    lw zero, 0(a0)
+; RV64I-NEXT:    addiw s2, s2, 1
+; RV64I-NEXT:    blt s2, s0, .LBB3_1
+; RV64I-NEXT:  # %bb.2: # %ret
+; RV64I-NEXT:    ld ra, 24(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s0, 16(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s1, 8(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s2, 0(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    addi sp, sp, 32
+; RV64I-NEXT:    ret
 entry:
   br label %loop
 
@@ -265,3 +211,8 @@ loop:
 ret:
   ret void
 }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; RV32FUSION: {{.*}}
+; RV32NOFUSION: {{.*}}
+; RV64FUSION: {{.*}}
+; RV64NOFUSION: {{.*}}
diff --git a/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll b/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll
index 1360a29a3e10f..aa02d46c34550 100644
--- a/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll
+++ b/llvm/test/CodeGen/RISCV/rv32xtheadbb.ll
@@ -238,8 +238,8 @@ define i64 @cttz_i64(i64 %a) nounwind {
 ; RV32I-NEXT:  # %bb.1: # %cond.false
 ; RV32I-NEXT:    neg a1, a0
 ; RV32I-NEXT:    and a1, a0, a1
-; RV32I-NEXT:    lui a2, 30667
-; RV32I-NEXT:    addi s2, a2, 1329
+; RV32I-NEXT:    lui s2, 30667
+; RV32I-NEXT:    addi s2, s2, 1329
 ; RV32I-NEXT:    ...
[truncated]

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@preames preames merged commit 58df9b1 into llvm:main Aug 27, 2025
12 checks passed
@preames preames deleted the pr-riscv-fusion-hint-by-default branch August 27, 2025 22:59
@asb
Copy link
Contributor

asb commented Sep 6, 2025

Obviously I know this has landed already etc, but out of curiosity I looked at the impact on an rva22u64 llvm-test-suite build. Interestingly there's a minor reduction in static instruction count (190840 insertions(+), 196374 deletions(-)), and looking at the causes these are all cases involving function calls looking something like:

-       auipc   a0, %pcrel_hi(.L.str.5)
-       addi    a1, a0, %pcrel_lo(.Lpcrel_hi17)
+       auipc   a1, %pcrel_hi(.L.str.5)
+       addi    a1, a1, %pcrel_lo(.Lpcrel_hi17)
        mv      a2, sp
-       mv      a0, s1
        call    __isoc99_fscanf

or similarly:

-       auipc   a0, %pcrel_hi(.L.str.33)
-       addi    a1, a0, %pcrel_lo(.Lpcrel_hi42)
+       auipc   a1, %pcrel_hi(.L.str.33)
+       addi    a1, a1, %pcrel_lo(.Lpcrel_hi42)
        li      a2, 60
-       mv      a0, s1
        call    memcpy

i.e. before the register allocator dirtied a0 and then had to do additional register shuffling for a the following function call, when it could have used a different register with no cost. It does make me wonder if there are other cases where the register allocator is choosing to use more registers than appears necessary...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants