[RISCV] Rematerialize load #73924

niwinanto · 2023-11-30T11:28:56Z

Re materializing load is disabled for RISCV back end, and this can bring performance issues like this. https://discourse.llvm.org/t/how-to-copy-propagate-physical-register-introduced-before-ra/74828

Here I tried to enable the re materialization to fulfill the use case discussed in the above thread.

llvmbot · 2023-11-30T11:29:30Z

@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-backend-risc-v

Author: Niwin Anto (niwinanto)

Changes

Re materializing load is disabled for RISCV back end, and this can bring performance issues like this. https://discourse.llvm.org/t/how-to-copy-propagate-physical-register-introduced-before-ra/74828

Here I tried to enable the re materialization to fulfill the use case discussed in the above thread.

Patch is 211.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73924.diff

20 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+48)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+1-1)
(modified) llvm/test/CodeGen/RISCV/callee-saved-gprs.ll (+216-216)
(modified) llvm/test/CodeGen/RISCV/calling-conv-half.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll (+44-44)
(modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+38-40)
(modified) llvm/test/CodeGen/RISCV/fastcc-without-f-reg.ll (+544-608)
(modified) llvm/test/CodeGen/RISCV/forced-atomics.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+160-160)
(modified) llvm/test/CodeGen/RISCV/pr64645.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/push-pop-popret.ll (+384-384)
(added) llvm/test/CodeGen/RISCV/remat-stack-load-aggressive.ll (+62)
(modified) llvm/test/CodeGen/RISCV/rv32xtheadbb.ll (+20-21)
(modified) llvm/test/CodeGen/RISCV/rv32zbb.ll (+20-21)
(modified) llvm/test/CodeGen/RISCV/rvv/no-reserved-frame.ll (+10-11)
(modified) llvm/test/CodeGen/RISCV/rvv/rvv-out-arguments.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-by-byte-multiple-legalization.ll (+132-132)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-legalization.ll (+88-88)

diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 6c5712dc795bc75..6e427ec32816b74 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -21,6 +21,7 @@
 #include "llvm/Analysis/MemoryLocation.h"
 #include "llvm/CodeGen/LiveIntervals.h"
 #include "llvm/CodeGen/LiveVariables.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineCombinerPattern.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
@@ -46,6 +47,11 @@ static cl::opt<bool> PreferWholeRegisterMove(
     "riscv-prefer-whole-register-move", cl::init(false), cl::Hidden,
     cl::desc("Prefer whole register move for vector registers."));
 
+static cl::opt<bool>
+    AggressiveLoadRemat("riscv-enable-load-remat-aggressive", cl::init(true),
+                        cl::Hidden,
+                        cl::desc("Rematerialize load aggressively"));
+
 static cl::opt<MachineTraceStrategy> ForceMachineCombinerStrategy(
     "riscv-force-machine-combiner-strategy", cl::Hidden,
     cl::desc("Force machine combiner to use a specific strategy for machine "
@@ -1567,6 +1573,48 @@ bool RISCVInstrInfo::isAsCheapAsAMove(const MachineInstr &MI) const {
   return MI.isAsCheapAsAMove();
 }
 
+bool RISCVInstrInfo::isReallyTriviallyReMaterializable(
+    const MachineInstr &MI) const {
+  if (TargetInstrInfo::isReallyTriviallyReMaterializable(MI))
+    return true;
+
+  const MachineFunction &MF = *MI.getMF();
+  const MachineRegisterInfo &MRI = MF.getRegInfo();
+
+  const MachineOperand &Dest = MI.getOperand(0);
+  if (!MRI.hasOneUse(Dest.getReg()))
+    return false;
+
+  MachineInstr *UseMI = &*MRI.use_instr_begin(Dest.getReg());
+  MachineBasicBlock::const_iterator DefItr(MI);
+  MachineBasicBlock::const_iterator UseItr(UseMI);
+
+  const MachineBasicBlock *MBB = nullptr;
+  if ((MBB = MI.getParent()) != UseMI->getParent())
+    return false;
+
+  // When loading from stack and the stack slot is not modified before its use,
+  // then materialize this load.
+  int FrameIdx = 0;
+  if (isLoadFromStackSlot(MI, FrameIdx) && AggressiveLoadRemat) {
+    for (; DefItr != UseItr && DefItr != MBB->end(); DefItr++) {
+      int StoreFrameIdx = 0;
+      if ((*DefItr).isCall() || (isStoreToStackSlot(*DefItr, StoreFrameIdx) &&
+                                 StoreFrameIdx == FrameIdx))
+        return false;
+    }
+    return true;
+  } else if (MI.mayLoad() && AggressiveLoadRemat) {
+    for (; DefItr != UseItr && DefItr != MBB->end(); DefItr++) {
+      if ((*DefItr).isCall() || (*DefItr).mayStore())
+        return false;
+    }
+    return true;
+  }
+
+  return false;
+}
+
 std::optional<DestSourcePair>
 RISCVInstrInfo::isCopyInstrImpl(const MachineInstr &MI) const {
   if (MI.isMoveReg())
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 8f860077c303170..c572281edde49da 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -138,6 +138,7 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
                                bool) const override;
 
   bool isAsCheapAsAMove(const MachineInstr &MI) const override;
+  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
 
   std::optional<DestSourcePair>
   isCopyInstrImpl(const MachineInstr &MI) const override;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index edc08187d8f775a..78128a4a89892df 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -626,7 +626,7 @@ def BGE  : BranchCC_rri<0b101, "bge">;
 def BLTU : BranchCC_rri<0b110, "bltu">;
 def BGEU : BranchCC_rri<0b111, "bgeu">;
 
-let IsSignExtendingOpW = 1 in {
+let IsSignExtendingOpW = 1, isReMaterializable = 1, isAsCheapAsAMove = 1 in {
 def LB  : Load_ri<0b000, "lb">, Sched<[WriteLDB, ReadMemBase]>;
 def LH  : Load_ri<0b001, "lh">, Sched<[WriteLDH, ReadMemBase]>;
 def LW  : Load_ri<0b010, "lw">, Sched<[WriteLDW, ReadMemBase]>;
diff --git a/llvm/test/CodeGen/RISCV/callee-saved-gprs.ll b/llvm/test/CodeGen/RISCV/callee-saved-gprs.ll
index 09ecbbc7e8feb81..201850060fe04a4 100644
--- a/llvm/test/CodeGen/RISCV/callee-saved-gprs.ll
+++ b/llvm/test/CodeGen/RISCV/callee-saved-gprs.ll
@@ -50,16 +50,16 @@ define void @callee() nounwind {
 ; RV32I-NEXT:    sw s9, 36(sp) # 4-byte Folded Spill
 ; RV32I-NEXT:    sw s10, 32(sp) # 4-byte Folded Spill
 ; RV32I-NEXT:    sw s11, 28(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    lui a6, %hi(var)
-; RV32I-NEXT:    lw a0, %lo(var)(a6)
+; RV32I-NEXT:    lui a4, %hi(var)
+; RV32I-NEXT:    lw a0, %lo(var)(a4)
 ; RV32I-NEXT:    sw a0, 24(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    lw a0, %lo(var+4)(a6)
+; RV32I-NEXT:    lw a0, %lo(var+4)(a4)
 ; RV32I-NEXT:    sw a0, 20(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    lw a0, %lo(var+8)(a6)
+; RV32I-NEXT:    lw a0, %lo(var+8)(a4)
 ; RV32I-NEXT:    sw a0, 16(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    lw a0, %lo(var+12)(a6)
+; RV32I-NEXT:    lw a0, %lo(var+12)(a4)
 ; RV32I-NEXT:    sw a0, 12(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    addi a5, a6, %lo(var)
+; RV32I-NEXT:    addi a5, a4, %lo(var)
 ; RV32I-NEXT:    lw a0, 16(a5)
 ; RV32I-NEXT:    sw a0, 8(sp) # 4-byte Folded Spill
 ; RV32I-NEXT:    lw a0, 20(a5)
@@ -84,18 +84,18 @@ define void @callee() nounwind {
 ; RV32I-NEXT:    lw s10, 92(a5)
 ; RV32I-NEXT:    lw s11, 96(a5)
 ; RV32I-NEXT:    lw ra, 100(a5)
-; RV32I-NEXT:    lw a7, 104(a5)
-; RV32I-NEXT:    lw a4, 108(a5)
-; RV32I-NEXT:    lw a0, 124(a5)
-; RV32I-NEXT:    lw a1, 120(a5)
-; RV32I-NEXT:    lw a2, 116(a5)
-; RV32I-NEXT:    lw a3, 112(a5)
-; RV32I-NEXT:    sw a0, 124(a5)
-; RV32I-NEXT:    sw a1, 120(a5)
-; RV32I-NEXT:    sw a2, 116(a5)
-; RV32I-NEXT:    sw a3, 112(a5)
-; RV32I-NEXT:    sw a4, 108(a5)
-; RV32I-NEXT:    sw a7, 104(a5)
+; RV32I-NEXT:    lw a6, 104(a5)
+; RV32I-NEXT:    lw a3, 108(a5)
+; RV32I-NEXT:    lw a7, 124(a5)
+; RV32I-NEXT:    lw a0, 120(a5)
+; RV32I-NEXT:    lw a1, 116(a5)
+; RV32I-NEXT:    lw a2, 112(a5)
+; RV32I-NEXT:    sw a7, 124(a5)
+; RV32I-NEXT:    sw a0, 120(a5)
+; RV32I-NEXT:    sw a1, 116(a5)
+; RV32I-NEXT:    sw a2, 112(a5)
+; RV32I-NEXT:    sw a3, 108(a5)
+; RV32I-NEXT:    sw a6, 104(a5)
 ; RV32I-NEXT:    sw ra, 100(a5)
 ; RV32I-NEXT:    sw s11, 96(a5)
 ; RV32I-NEXT:    sw s10, 92(a5)
@@ -121,13 +121,13 @@ define void @callee() nounwind {
 ; RV32I-NEXT:    lw a0, 8(sp) # 4-byte Folded Reload
 ; RV32I-NEXT:    sw a0, 16(a5)
 ; RV32I-NEXT:    lw a0, 12(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    sw a0, %lo(var+12)(a6)
+; RV32I-NEXT:    sw a0, %lo(var+12)(a4)
 ; RV32I-NEXT:    lw a0, 16(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    sw a0, %lo(var+8)(a6)
+; RV32I-NEXT:    sw a0, %lo(var+8)(a4)
 ; RV32I-NEXT:    lw a0, 20(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    sw a0, %lo(var+4)(a6)
+; RV32I-NEXT:    sw a0, %lo(var+4)(a4)
 ; RV32I-NEXT:    lw a0, 24(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    sw a0, %lo(var)(a6)
+; RV32I-NEXT:    sw a0, %lo(var)(a4)
 ; RV32I-NEXT:    lw ra, 76(sp) # 4-byte Folded Reload
 ; RV32I-NEXT:    lw s0, 72(sp) # 4-byte Folded Reload
 ; RV32I-NEXT:    lw s1, 68(sp) # 4-byte Folded Reload
@@ -161,16 +161,16 @@ define void @callee() nounwind {
 ; RV32I-WITH-FP-NEXT:    sw s10, 32(sp) # 4-byte Folded Spill
 ; RV32I-WITH-FP-NEXT:    sw s11, 28(sp) # 4-byte Folded Spill
 ; RV32I-WITH-FP-NEXT:    addi s0, sp, 80
-; RV32I-WITH-FP-NEXT:    lui a6, %hi(var)
-; RV32I-WITH-FP-NEXT:    lw a0, %lo(var)(a6)
+; RV32I-WITH-FP-NEXT:    lui a4, %hi(var)
+; RV32I-WITH-FP-NEXT:    lw a0, %lo(var)(a4)
 ; RV32I-WITH-FP-NEXT:    sw a0, -56(s0) # 4-byte Folded Spill
-; RV32I-WITH-FP-NEXT:    lw a0, %lo(var+4)(a6)
+; RV32I-WITH-FP-NEXT:    lw a0, %lo(var+4)(a4)
 ; RV32I-WITH-FP-NEXT:    sw a0, -60(s0) # 4-byte Folded Spill
-; RV32I-WITH-FP-NEXT:    lw a0, %lo(var+8)(a6)
+; RV32I-WITH-FP-NEXT:    lw a0, %lo(var+8)(a4)
 ; RV32I-WITH-FP-NEXT:    sw a0, -64(s0) # 4-byte Folded Spill
-; RV32I-WITH-FP-NEXT:    lw a0, %lo(var+12)(a6)
+; RV32I-WITH-FP-NEXT:    lw a0, %lo(var+12)(a4)
 ; RV32I-WITH-FP-NEXT:    sw a0, -68(s0) # 4-byte Folded Spill
-; RV32I-WITH-FP-NEXT:    addi a5, a6, %lo(var)
+; RV32I-WITH-FP-NEXT:    addi a5, a4, %lo(var)
 ; RV32I-WITH-FP-NEXT:    lw a0, 16(a5)
 ; RV32I-WITH-FP-NEXT:    sw a0, -72(s0) # 4-byte Folded Spill
 ; RV32I-WITH-FP-NEXT:    lw a0, 20(a5)
@@ -195,20 +195,20 @@ define void @callee() nounwind {
 ; RV32I-WITH-FP-NEXT:    lw s10, 88(a5)
 ; RV32I-WITH-FP-NEXT:    lw s11, 92(a5)
 ; RV32I-WITH-FP-NEXT:    lw ra, 96(a5)
-; RV32I-WITH-FP-NEXT:    lw t0, 100(a5)
-; RV32I-WITH-FP-NEXT:    lw a7, 104(a5)
-; RV32I-WITH-FP-NEXT:    lw a4, 108(a5)
-; RV32I-WITH-FP-NEXT:    lw a0, 124(a5)
-; RV32I-WITH-FP-NEXT:    lw a1, 120(a5)
-; RV32I-WITH-FP-NEXT:    lw a2, 116(a5)
-; RV32I-WITH-FP-NEXT:    lw a3, 112(a5)
-; RV32I-WITH-FP-NEXT:    sw a0, 124(a5)
-; RV32I-WITH-FP-NEXT:    sw a1, 120(a5)
-; RV32I-WITH-FP-NEXT:    sw a2, 116(a5)
-; RV32I-WITH-FP-NEXT:    sw a3, 112(a5)
-; RV32I-WITH-FP-NEXT:    sw a4, 108(a5)
-; RV32I-WITH-FP-NEXT:    sw a7, 104(a5)
-; RV32I-WITH-FP-NEXT:    sw t0, 100(a5)
+; RV32I-WITH-FP-NEXT:    lw a7, 100(a5)
+; RV32I-WITH-FP-NEXT:    lw a6, 104(a5)
+; RV32I-WITH-FP-NEXT:    lw a3, 108(a5)
+; RV32I-WITH-FP-NEXT:    lw t0, 124(a5)
+; RV32I-WITH-FP-NEXT:    lw a0, 120(a5)
+; RV32I-WITH-FP-NEXT:    lw a1, 116(a5)
+; RV32I-WITH-FP-NEXT:    lw a2, 112(a5)
+; RV32I-WITH-FP-NEXT:    sw t0, 124(a5)
+; RV32I-WITH-FP-NEXT:    sw a0, 120(a5)
+; RV32I-WITH-FP-NEXT:    sw a1, 116(a5)
+; RV32I-WITH-FP-NEXT:    sw a2, 112(a5)
+; RV32I-WITH-FP-NEXT:    sw a3, 108(a5)
+; RV32I-WITH-FP-NEXT:    sw a6, 104(a5)
+; RV32I-WITH-FP-NEXT:    sw a7, 100(a5)
 ; RV32I-WITH-FP-NEXT:    sw ra, 96(a5)
 ; RV32I-WITH-FP-NEXT:    sw s11, 92(a5)
 ; RV32I-WITH-FP-NEXT:    sw s10, 88(a5)
@@ -234,13 +234,13 @@ define void @callee() nounwind {
 ; RV32I-WITH-FP-NEXT:    lw a0, -72(s0) # 4-byte Folded Reload
 ; RV32I-WITH-FP-NEXT:    sw a0, 16(a5)
 ; RV32I-WITH-FP-NEXT:    lw a0, -68(s0) # 4-byte Folded Reload
-; RV32I-WITH-FP-NEXT:    sw a0, %lo(var+12)(a6)
+; RV32I-WITH-FP-NEXT:    sw a0, %lo(var+12)(a4)
 ; RV32I-WITH-FP-NEXT:    lw a0, -64(s0) # 4-byte Folded Reload
-; RV32I-WITH-FP-NEXT:    sw a0, %lo(var+8)(a6)
+; RV32I-WITH-FP-NEXT:    sw a0, %lo(var+8)(a4)
 ; RV32I-WITH-FP-NEXT:    lw a0, -60(s0) # 4-byte Folded Reload
-; RV32I-WITH-FP-NEXT:    sw a0, %lo(var+4)(a6)
+; RV32I-WITH-FP-NEXT:    sw a0, %lo(var+4)(a4)
 ; RV32I-WITH-FP-NEXT:    lw a0, -56(s0) # 4-byte Folded Reload
-; RV32I-WITH-FP-NEXT:    sw a0, %lo(var)(a6)
+; RV32I-WITH-FP-NEXT:    sw a0, %lo(var)(a4)
 ; RV32I-WITH-FP-NEXT:    lw ra, 76(sp) # 4-byte Folded Reload
 ; RV32I-WITH-FP-NEXT:    lw s0, 72(sp) # 4-byte Folded Reload
 ; RV32I-WITH-FP-NEXT:    lw s1, 68(sp) # 4-byte Folded Reload
@@ -260,16 +260,16 @@ define void @callee() nounwind {
 ; RV32IZCMP-LABEL: callee:
 ; RV32IZCMP:       # %bb.0:
 ; RV32IZCMP-NEXT:    cm.push {ra, s0-s11}, -96
-; RV32IZCMP-NEXT:    lui a6, %hi(var)
-; RV32IZCMP-NEXT:    lw a0, %lo(var)(a6)
+; RV32IZCMP-NEXT:    lui a4, %hi(var)
+; RV32IZCMP-NEXT:    lw a0, %lo(var)(a4)
 ; RV32IZCMP-NEXT:    sw a0, 28(sp) # 4-byte Folded Spill
-; RV32IZCMP-NEXT:    lw a0, %lo(var+4)(a6)
+; RV32IZCMP-NEXT:    lw a0, %lo(var+4)(a4)
 ; RV32IZCMP-NEXT:    sw a0, 24(sp) # 4-byte Folded Spill
-; RV32IZCMP-NEXT:    lw a0, %lo(var+8)(a6)
+; RV32IZCMP-NEXT:    lw a0, %lo(var+8)(a4)
 ; RV32IZCMP-NEXT:    sw a0, 20(sp) # 4-byte Folded Spill
-; RV32IZCMP-NEXT:    lw a0, %lo(var+12)(a6)
+; RV32IZCMP-NEXT:    lw a0, %lo(var+12)(a4)
 ; RV32IZCMP-NEXT:    sw a0, 16(sp) # 4-byte Folded Spill
-; RV32IZCMP-NEXT:    addi a5, a6, %lo(var)
+; RV32IZCMP-NEXT:    addi a5, a4, %lo(var)
 ; RV32IZCMP-NEXT:    lw a0, 16(a5)
 ; RV32IZCMP-NEXT:    sw a0, 12(sp) # 4-byte Folded Spill
 ; RV32IZCMP-NEXT:    lw a0, 20(a5)
@@ -289,28 +289,28 @@ define void @callee() nounwind {
 ; RV32IZCMP-NEXT:    lw s11, 72(a5)
 ; RV32IZCMP-NEXT:    lw ra, 76(a5)
 ; RV32IZCMP-NEXT:    lw s1, 80(a5)
-; RV32IZCMP-NEXT:    lw t3, 84(a5)
-; RV32IZCMP-NEXT:    lw t2, 88(a5)
-; RV32IZCMP-NEXT:    lw t1, 92(a5)
-; RV32IZCMP-NEXT:    lw t0, 96(a5)
+; RV32IZCMP-NEXT:    lw t2, 84(a5)
+; RV32IZCMP-NEXT:    lw t1, 88(a5)
+; RV32IZCMP-NEXT:    lw t0, 92(a5)
+; RV32IZCMP-NEXT:    lw a7, 96(a5)
 ; RV32IZCMP-NEXT:    lw s0, 100(a5)
-; RV32IZCMP-NEXT:    lw a7, 104(a5)
-; RV32IZCMP-NEXT:    lw a4, 108(a5)
-; RV32IZCMP-NEXT:    lw a0, 124(a5)
-; RV32IZCMP-NEXT:    lw a1, 120(a5)
-; RV32IZCMP-NEXT:    lw a2, 116(a5)
-; RV32IZCMP-NEXT:    lw a3, 112(a5)
-; RV32IZCMP-NEXT:    sw a0, 124(a5)
-; RV32IZCMP-NEXT:    sw a1, 120(a5)
-; RV32IZCMP-NEXT:    sw a2, 116(a5)
-; RV32IZCMP-NEXT:    sw a3, 112(a5)
-; RV32IZCMP-NEXT:    sw a4, 108(a5)
-; RV32IZCMP-NEXT:    sw a7, 104(a5)
+; RV32IZCMP-NEXT:    lw a6, 104(a5)
+; RV32IZCMP-NEXT:    lw a3, 108(a5)
+; RV32IZCMP-NEXT:    lw t3, 124(a5)
+; RV32IZCMP-NEXT:    lw a0, 120(a5)
+; RV32IZCMP-NEXT:    lw a1, 116(a5)
+; RV32IZCMP-NEXT:    lw a2, 112(a5)
+; RV32IZCMP-NEXT:    sw t3, 124(a5)
+; RV32IZCMP-NEXT:    sw a0, 120(a5)
+; RV32IZCMP-NEXT:    sw a1, 116(a5)
+; RV32IZCMP-NEXT:    sw a2, 112(a5)
+; RV32IZCMP-NEXT:    sw a3, 108(a5)
+; RV32IZCMP-NEXT:    sw a6, 104(a5)
 ; RV32IZCMP-NEXT:    sw s0, 100(a5)
-; RV32IZCMP-NEXT:    sw t0, 96(a5)
-; RV32IZCMP-NEXT:    sw t1, 92(a5)
-; RV32IZCMP-NEXT:    sw t2, 88(a5)
-; RV32IZCMP-NEXT:    sw t3, 84(a5)
+; RV32IZCMP-NEXT:    sw a7, 96(a5)
+; RV32IZCMP-NEXT:    sw t0, 92(a5)
+; RV32IZCMP-NEXT:    sw t1, 88(a5)
+; RV32IZCMP-NEXT:    sw t2, 84(a5)
 ; RV32IZCMP-NEXT:    sw s1, 80(a5)
 ; RV32IZCMP-NEXT:    sw ra, 76(a5)
 ; RV32IZCMP-NEXT:    sw s11, 72(a5)
@@ -331,13 +331,13 @@ define void @callee() nounwind {
 ; RV32IZCMP-NEXT:    lw a0, 12(sp) # 4-byte Folded Reload
 ; RV32IZCMP-NEXT:    sw a0, 16(a5)
 ; RV32IZCMP-NEXT:    lw a0, 16(sp) # 4-byte Folded Reload
-; RV32IZCMP-NEXT:    sw a0, %lo(var+12)(a6)
+; RV32IZCMP-NEXT:    sw a0, %lo(var+12)(a4)
 ; RV32IZCMP-NEXT:    lw a0, 20(sp) # 4-byte Folded Reload
-; RV32IZCMP-NEXT:    sw a0, %lo(var+8)(a6)
+; RV32IZCMP-NEXT:    sw a0, %lo(var+8)(a4)
 ; RV32IZCMP-NEXT:    lw a0, 24(sp) # 4-byte Folded Reload
-; RV32IZCMP-NEXT:    sw a0, %lo(var+4)(a6)
+; RV32IZCMP-NEXT:    sw a0, %lo(var+4)(a4)
 ; RV32IZCMP-NEXT:    lw a0, 28(sp) # 4-byte Folded Reload
-; RV32IZCMP-NEXT:    sw a0, %lo(var)(a6)
+; RV32IZCMP-NEXT:    sw a0, %lo(var)(a4)
 ; RV32IZCMP-NEXT:    cm.popret {ra, s0-s11}, 96
 ;
 ; RV32IZCMP-WITH-FP-LABEL: callee:
@@ -357,16 +357,16 @@ define void @callee() nounwind {
 ; RV32IZCMP-WITH-FP-NEXT:    sw s10, 32(sp) # 4-byte Folded Spill
 ; RV32IZCMP-WITH-FP-NEXT:    sw s11, 28(sp) # 4-byte Folded Spill
 ; RV32IZCMP-WITH-FP-NEXT:    addi s0, sp, 80
-; RV32IZCMP-WITH-FP-NEXT:    lui a6, %hi(var)
-; RV32IZCMP-WITH-FP-NEXT:    lw a0, %lo(var)(a6)
+; RV32IZCMP-WITH-FP-NEXT:    lui a4, %hi(var)
+; RV32IZCMP-WITH-FP-NEXT:    lw a0, %lo(var)(a4)
 ; RV32IZCMP-WITH-FP-NEXT:    sw a0, -56(s0) # 4-byte Folded Spill
-; RV32IZCMP-WITH-FP-NEXT:    lw a0, %lo(var+4)(a6)
+; RV32IZCMP-WITH-FP-NEXT:    lw a0, %lo(var+4)(a4)
 ; RV32IZCMP-WITH-FP-NEXT:    sw a0, -60(s0) # 4-byte Folded Spill
-; RV32IZCMP-WITH-FP-NEXT:    lw a0, %lo(var+8)(a6)
+; RV32IZCMP-WITH-FP-NEXT:    lw a0, %lo(var+8)(a4)
 ; RV32IZCMP-WITH-FP-NEXT:    sw a0, -64(s0) # 4-byte Folded Spill
-; RV32IZCMP-WITH-FP-NEXT:    lw a0, %lo(var+12)(a6)
+; RV32IZCMP-WITH-FP-NEXT:    lw a0, %lo(var+12)(a4)
 ; RV32IZCMP-WITH-FP-NEXT:    sw a0, -68(s0) # 4-byte Folded Spill
-; RV32IZCMP-WITH-FP-NEXT:    addi a5, a6, %lo(var)
+; RV32IZCMP-WITH-FP-NEXT:    addi a5, a4, %lo(var)
 ; RV32IZCMP-WITH-FP-NEXT:    lw a0, 16(a5)
 ; RV32IZCMP-WITH-FP-NEXT:    sw a0, -72(s0) # 4-byte Folded Spill
 ; RV32IZCMP-WITH-FP-NEXT:    lw a0, 20(a5)
@@ -386,30 +386,30 @@ define void @callee() nounwind {
 ; RV32IZCMP-WITH-FP-NEXT:    lw s10, 68(a5)
 ; RV32IZCMP-WITH-FP-NEXT:    lw s11, 72(a5)
 ; RV32IZCMP-WITH-FP-NEXT:    lw ra, 76(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw t4, 80(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw t3, 84(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw t2, 88(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw t3, 80(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw t2, 84(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw t1, 88(a5)
 ; RV32IZCMP-WITH-FP-NEXT:    lw s1, 92(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw t1, 96(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw t0, 100(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw a7, 104(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw a4, 108(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw a0, 124(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw a1, 120(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw a2, 116(a5)
-; RV32IZCMP-WITH-FP-NEXT:    lw a3, 112(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw a0, 124(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw a1, 120(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw a2, 116(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw a3, 112(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw a4, 108(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw a7, 104(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw t0, 100(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw t1, 96(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw t0, 96(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw a7, 100(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw a6, 104(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw a3, 108(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw t4, 124(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw a0, 120(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw a1, 116(a5)
+; RV32IZCMP-WITH-FP-NEXT:    lw a2, 112(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw t4, 124(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw a0, 120(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw a1, 116(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw a2, 112(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw a3, 108(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw a6, 104(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw a7, 100(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw t0, 96(a5)
 ; RV32IZCMP-WITH-FP-NEXT:    sw s1, 92(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw t2, 88(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw t3, 84(a5)
-; RV32IZCMP-WITH-FP-NEXT:    sw t4, 80(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw t1, 88(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw t2, 84(a5)
+; RV32IZCMP-WITH-FP-NEXT:    sw t3, 80(a5)
 ; RV32IZCMP-WITH-FP-NEXT:    sw ra, 76(a5)
 ; RV32IZCMP-WITH-FP-NEXT:    sw s11, 72(a5)
 ; RV32IZCMP-WITH-FP-NEXT:    sw s10, 68(a5)
@@ -430,13 +430,13 @@ define void @callee() nounwind {
 ; RV32IZCMP-WITH-FP-NEXT:    lw a0, -72(s0) # 4-byte Folded Reload
 ; RV32IZCMP-WITH-FP-NEXT:    sw a0, 16(a5)
 ; RV32IZCMP-WITH-FP-NEXT:    lw a0, -68(s0) # 4-byte Folded Reload
-; RV32IZCMP-WITH-FP-NEXT:    sw a0, %lo(var+12)(a6)
+; RV32IZCMP-WITH-FP-NEXT:    sw a0, %lo(var+12)(a4)
 ; RV32IZCMP-WITH-FP-NEXT:    lw a0, -64(s0) # 4-byte Folded Reload
-; RV32IZCMP-WITH-FP-NEXT:    sw a0, %lo(var+8)(a6)
+; RV32IZCMP-WITH-FP-NEXT:    sw a0, %lo(var+8)(a4)
 ; RV32IZCMP-WITH-FP-NEXT:    lw a0, -60(s0) # 4-byte Folded Reload
-; RV32IZCMP-WITH-FP-NEXT:    sw a0, %lo(var+4)(a6)
+; RV32IZCMP-WITH-FP-NEXT:    sw a0, %lo(var+4)(a4)
 ; RV32IZCMP-WITH-FP-NEXT:    lw a0, -56(s0) # 4-byte Folded Reload
-; RV32IZCMP-WITH-FP-NEXT:    sw a0, %lo(var)(a6)
+; RV32IZCMP-WITH-FP-NEXT:    sw a0, %lo(var)(a4)
 ; RV32IZCMP-WITH-FP-NEXT:    lw ra, 76(sp) # 4-byte Folded Reload
 ; RV32IZCMP-WITH-FP-NEXT:    lw s0, 72(sp) # 4-byte Folded Reload
 ; RV32IZCMP-WITH-FP-NEXT:    lw s1, 68(sp) # 4-byte Folded Reload
@@ -469,16 +469,16 @@ define void @callee() nounwind {
 ; RV64I-NEXT:    sd s9, 72(sp) # 8-byte Folded Spill
 ; RV64I-NEXT:    sd s10, 64(sp) # 8-byte Folded Spill
 ; RV64I-NEXT:    sd s11, 56(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    lui a6, %hi(var)
-; RV64I-NEXT:    lw a0, %lo(var)(a6)
+; RV64I-NEXT:    lui a4, %hi(var)
+; RV64I-NEXT:    lw a0, %lo(var)(a4)
 ; RV64I-NEXT:    sd a0, 48(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    lw a0, %lo(var+4)(a6)
+; RV64I-NEXT:    lw a0, %lo(var+4)(a4)
 ; RV64I-NEXT:    sd a0, 40(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    lw a0, %lo(var+8)(a6)
+; RV64I-NEXT:    lw a0, %lo(var+8)(a4)
 ; RV64I-NEXT:    sd a0, 32(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    lw a0, %lo(var+12)(a6)
+; RV64I-NEXT:    lw a0, %lo(var+12)(a4)
 ; RV64I-NEXT:    sd a0, 24(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    addi a5, a6, %lo(var)
+; RV64I-NEXT:    addi a5, a4, %lo(var)
 ; RV64I-NEXT:    lw a0, 16(a5)
 ; RV64I-NEXT:    sd a0, 16(sp) # 8-byte Folded Spill
 ; RV64I-NEXT:    lw a0, 20(a5)
@@ -503,18 +503,18 @@ define void @callee() nounwind {
 ; RV64I-NEXT:    lw s10, 92(a5)
 ; RV64I-NEXT:    lw s11, 96(a5)
 ; RV64I-NEXT:    lw ra, 100(a5)
-; RV64I-NEXT:    lw a7, 104(a5)
...
[truncated]

niwinanto · 2023-11-30T11:33:41Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

+  if (isLoadFromStackSlot(MI, FrameIdx) && AggressiveLoadRemat) {
+    for (; DefItr != UseItr && DefItr != MBB->end(); DefItr++) {
+      int StoreFrameIdx = 0;
+      if ((*DefItr).isCall() || (isStoreToStackSlot(*DefItr, StoreFrameIdx) &&


@topperc Copied from stale PR, _What if there is store, but we don't know exactly where it stores to? Don't we need to conservatively assume it could write to the slot we're loading?_

@topperc You are right, in that case special handling to stack load does duplicate the generic load case. So removing this part.

jrtc27 · 2023-12-01T05:42:08Z

llvm/lib/Target/RISCV/RISCVInstrInfo.td

@@ -626,7 +626,7 @@ def BGE  : BranchCC_rri<0b101, "bge">;
 def BLTU : BranchCC_rri<0b110, "bltu">;
 def BGEU : BranchCC_rri<0b111, "bgeu">;

-let IsSignExtendingOpW = 1 in {
+let IsSignExtendingOpW = 1, isReMaterializable = 1, isAsCheapAsAMove = 1 in {


isAsCheapAsAMove definitely isn't true for loads

@jrtc27 Yeah, it doesn't make sense to isAsCheapAsAMove to be true for loads. However, if I wanted to achieve the usecase, then I need to force the RegisterCoalescer::reMaterializeTrivialDef to rematerialize load even for not asCheapAsMove instructions. But it can affect other targets as well, I am not sure its side effects. May be I can introduce a target hook in TII(eg:allowAggressiveRemat) so targets can decide. But in general, I don't see a problem for this remat. What do you think?

github-actions · 2023-12-12T04:20:24Z

✅ With the latest revision this PR passed the C/C++ code formatter.

topperc · 2023-12-12T04:22:01Z

llvm/lib/Target/RISCV/RISCVInstrInfo.td

@@ -626,7 +626,7 @@ def BGE  : BranchCC_rri<0b101, "bge">;
 def BLTU : BranchCC_rri<0b110, "bltu">;
 def BGEU : BranchCC_rri<0b111, "bgeu">;

-let IsSignExtendingOpW = 1 in {
+let IsSignExtendingOpW = 1, isReMaterializable = 1 in {


You missed ld

I tried to include ld as you suggested,
But rv64-stackmap.ll is failing with error: ran out of registers during register allocation.
Especially, testcase

Spilled stack map values. ; ; Verify 28 stack map entries. ; ; CHECK-LABEL: .word .L{{.*}}-spilledValue ; CHECK-NEXT: .half 0 ; CHECK-NEXT: .half 28 ; ; Check that at least one is a spilled entry from RBP. ; Location: Indirect RBP + ... ; CHECK: .byte 3 ; CHECK-NEXT: .byte 0 ; CHECK-NEXT: .half 8 ; CHECK-NEXT: .half 2 ; CHECK-NEXT: .half 0 ; CHECK-NEXT: .word define void @spilledValue(i64 %arg0, i64 %arg1, i64 %arg2, i64 %arg3, i64 %arg4, i64 %l0, i64 %l1, i64 %l2, i64 %l3, i64 %l4, i64 %l5, i64 %l6, i64 %l7, i64 %l8, i64 %l9, i64 %l10, i64 %l11, i64 %l12, i64 %l13, i64 %l14, i64 %l15, i64 %l16, i64 %l17, i64 %l18, i64 %l19, i64 %l20, i64 %l21, i64 %l22, i64 %l23, i64 %l24, i64 %l25, i64 %l26, i64 %l27) { entry: call void (i64, i32, i8*, i32, ...) @llvm.experimental.patchpoint.void(i64 11, i32 28, i8* null, i32 5, i64 %arg0, i64 %arg1, i64 %arg2, i64 %arg3, i64 %arg4, i64 %l0, i64 %l1, i64 %l2, i64 %l3, i64 %l4, i64 %l5, i64 %l6, i64 %l7, i64 %l8, i64 %l9, i64 %l10, i64 %l11, i64 %l12, i64 %l13, i64 %l14, i64 %l15, i64 %l16, i64 %l17, i64 %l18, i64 %l19, i64 %l20, i64 %l21, i64 %l22, i64 %l23, i64 %l24, i64 %l25, i64 %l26, i64 %l27) ret void } declare void @llvm.experimental.stackmap(i64, i32, ...) declare void @llvm.experimental.patchpoint.void(i64, i32, i8*, i32, ...) declare i64 @llvm.experimental.patchpoint.i64(i64, i32, i8*, i32, ...)

I would like to investigate bit more to understand what is going wrong.

Started a new discourse topic to discuss this issue. https://discourse.llvm.org/t/rematerialized-virtual-register-ran-out-of-registers-during-register-allocation/75774

topperc · 2023-12-12T04:27:23Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

+    MachineInstr *UseMI = &*MRI.use_instr_begin(Dest.getReg());
+    MachineBasicBlock::const_iterator DefItr(MI);
+    MachineBasicBlock::const_iterator UseItr(UseMI);
+    const MachineBasicBlock *MBB = nullptr;


Why is this initialized to nullptr instead of MI.getParent()?

I will update.

arsenm

Don't understand why RISCV is overloading this in the first place; it looks like it's augmented a bunch of generic heuristics without any real target specific handling

LiqinWeng · 2024-01-15T00:35:26Z

Have you tested the relevant benchmarks, such as specs？？？

niwinanto · 2024-01-19T04:11:16Z

Don't understand why RISCV is overloading this in the first place; it looks like it's augmented a bunch of generic heuristics without any real target specific handling

So, you mean I should move this to generic function?

niwinanto · 2024-01-19T04:17:38Z

Have you tested the relevant benchmarks, such as specs？？？

There was a slight improvement to CoreMark(Ported for our experimental RISCV target). I have not tried spec, I do not have access to it. But I can try other benchmarks https://llvm.org/docs/TestSuiteGuide.html

LiqinWeng · 2024-02-23T09:37:30Z

Have you tested the relevant benchmarks, such as specs？？？

There was a slight improvement to CoreMark(Ported for our experimental RISCV target). I have not tried spec, I do not have access to it. But I can try other benchmarks https://llvm.org/docs/TestSuiteGuide.html

I will test it for spec of h264ref case~ How many points did coremark improve approximately?

arsenm · 2024-02-25T12:13:19Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

@@ -1567,6 +1573,36 @@ bool RISCVInstrInfo::isAsCheapAsAMove(const MachineInstr &MI) const {
  return MI.isAsCheapAsAMove();
 }

+bool RISCVInstrInfo::isReallyTriviallyReMaterializable(


This is a correctness predicate and you've added performance heuristics; this should go somewhere else

[RISCV] Rematerialize load

4475a21

llvmbot added the backend:RISC-V label Nov 30, 2023

niwinanto commented Nov 30, 2023

View reviewed changes

Remove special handling for stack

ff089ba

niwinanto mentioned this pull request Nov 30, 2023

Rematerialize load RISCV backend #73910

Closed

jrtc27 reviewed Dec 1, 2023

View reviewed changes

Remove iaAsCheapAsAMove flag from load

d74e066

llvmbot added the llvm:regalloc label Dec 3, 2023

niwinanto closed this Dec 12, 2023

niwinanto reopened this Dec 12, 2023

topperc reviewed Dec 12, 2023

View reviewed changes

Include isReMaterializable flag to LD

be9c2de

arsenm reviewed Dec 19, 2023

View reviewed changes

arsenm reviewed Feb 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Rematerialize load #73924

[RISCV] Rematerialize load #73924

niwinanto commented Nov 30, 2023

llvmbot commented Nov 30, 2023 •

edited

niwinanto Nov 30, 2023

niwinanto Nov 30, 2023 •

edited

jrtc27 Dec 1, 2023

niwinanto Dec 2, 2023

github-actions bot commented Dec 12, 2023 •

edited

topperc Dec 12, 2023

niwinanto Dec 13, 2023

niwinanto Dec 20, 2023

topperc Dec 12, 2023

niwinanto Dec 13, 2023

arsenm left a comment

LiqinWeng commented Jan 15, 2024

niwinanto commented Jan 19, 2024

niwinanto commented Jan 19, 2024

LiqinWeng commented Feb 23, 2024

arsenm Feb 25, 2024

[RISCV] Rematerialize load #73924

Are you sure you want to change the base?

[RISCV] Rematerialize load #73924

Conversation

niwinanto commented Nov 30, 2023

llvmbot commented Nov 30, 2023 • edited

Choose a reason for hiding this comment

niwinanto Nov 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 12, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

LiqinWeng commented Jan 15, 2024

niwinanto commented Jan 19, 2024

niwinanto commented Jan 19, 2024

LiqinWeng commented Feb 23, 2024

Choose a reason for hiding this comment

llvmbot commented Nov 30, 2023 •

edited

niwinanto Nov 30, 2023 •

edited

github-actions bot commented Dec 12, 2023 •

edited