[RegAlloc] Account for use availability when applying rematerializable weight discount #159180

lukel97 · 2025-09-16T21:55:09Z

This aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted.

CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others.

It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this but CalcSpillWeights doesn't, so the two diverge.

This fixes it by also checking allUsesAvailableAt in CalcSpillWeights.

In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408

… discount This aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted. CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others. It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this. This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. There may be a better place to share the function than LiveIntervals. In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408

llvmbot · 2025-09-16T21:55:44Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted.

CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others.

It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this but CalcSpillWeights doesn't, so the two diverge.

This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. There may be a better place to share the function than LiveIntervals.

In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408

Patch is 154.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159180.diff

17 Files Affected:

(modified) llvm/include/llvm/CodeGen/LiveIntervals.h (+4)
(modified) llvm/include/llvm/CodeGen/LiveRangeEdit.h (-5)
(modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+11)
(modified) llvm/lib/CodeGen/LiveIntervals.cpp (+52)
(modified) llvm/lib/CodeGen/LiveRangeEdit.cpp (+2-59)
(modified) llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll (+42-42)
(modified) llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll (+43-43)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz-vp.ll (+96-96)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctpop-vp.ll (+26-26)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz-vp.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/remat.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/vfdiv-constrained-sdnode.ll (+18-38)
(modified) llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll (+485-424)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-constrained-sdnode.ll (+62-38)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-sdnode.ll (+112-156)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmsub-constrained-sdnode.ll (+6-5)
(modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/cond-vector-reduce-mve-codegen.ll (+10-5)

diff --git a/llvm/include/llvm/CodeGen/LiveIntervals.h b/llvm/include/llvm/CodeGen/LiveIntervals.h
index 1050b3daa0f57..6678ebaac333b 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervals.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervals.h
@@ -466,6 +466,10 @@ class LiveIntervals {
   /// have any segments or value numbers.
   LLVM_ABI void constructMainRangeFromSubranges(LiveInterval &LI);
 
+  /// \returns true if all registers used by \p OrigMI at \p OrigIdx are also
+  /// available with the same value at \p UseIdx.
+  bool allUsesAvailableAt(const MachineInstr &MI, SlotIndex UseIdx) const;
+
 private:
   /// Compute live intervals for all virtual registers.
   void computeVirtRegs();
diff --git a/llvm/include/llvm/CodeGen/LiveRangeEdit.h b/llvm/include/llvm/CodeGen/LiveRangeEdit.h
index 3d5df194c71c9..adcca23c24fb3 100644
--- a/llvm/include/llvm/CodeGen/LiveRangeEdit.h
+++ b/llvm/include/llvm/CodeGen/LiveRangeEdit.h
@@ -193,11 +193,6 @@ class LiveRangeEdit : private MachineRegisterInfo::Delegate {
     explicit Remat(const VNInfo *ParentVNI) : ParentVNI(ParentVNI) {}
   };
 
-  /// allUsesAvailableAt - Return true if all registers used by OrigMI at
-  /// OrigIdx are also available with the same value at UseIdx.
-  bool allUsesAvailableAt(const MachineInstr *OrigMI, SlotIndex OrigIdx,
-                          SlotIndex UseIdx) const;
-
   /// canRematerializeAt - Determine if ParentVNI can be rematerialized at
   /// UseIdx. It is assumed that parent_.getVNINfoAt(UseIdx) == ParentVNI.
   bool canRematerializeAt(Remat &RM, VNInfo *OrigVNI, SlotIndex UseIdx);
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index b16694eafd90e..40df8c4415887 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -124,6 +124,17 @@ bool VirtRegAuxInfo::isRematerializable(const LiveInterval &LI,
 
     if (!TII.isTriviallyReMaterializable(*MI))
       return false;
+
+    // If MI has register uses, it will only be rematerializable if its uses are
+    // also live at the indices it will be rematerialized at.
+    const MachineRegisterInfo &MRI = MI->getMF()->getRegInfo();
+    for (MachineInstr &Use : MRI.use_instructions(Reg)) {
+      SlotIndex UseIdx = LIS.getInstructionIndex(Use);
+      if (LI.getVNInfoAt(UseIdx) != VNI)
+        continue;
+      if (!LIS.allUsesAvailableAt(*MI, UseIdx))
+        return false;
+    }
   }
   return true;
 }
diff --git a/llvm/lib/CodeGen/LiveIntervals.cpp b/llvm/lib/CodeGen/LiveIntervals.cpp
index 3485a27335f13..24c54fd8ed64b 100644
--- a/llvm/lib/CodeGen/LiveIntervals.cpp
+++ b/llvm/lib/CodeGen/LiveIntervals.cpp
@@ -34,6 +34,7 @@
 #include "llvm/CodeGen/Passes.h"
 #include "llvm/CodeGen/SlotIndexes.h"
 #include "llvm/CodeGen/StackMaps.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/CodeGen/VirtRegMap.h"
@@ -1820,3 +1821,54 @@ void LiveIntervals::constructMainRangeFromSubranges(LiveInterval &LI) {
   LICalc->reset(MF, getSlotIndexes(), DomTree, &getVNInfoAllocator());
   LICalc->constructMainRangeFromSubranges(LI);
 }
+
+bool LiveIntervals::allUsesAvailableAt(const MachineInstr &MI,
+                                       SlotIndex UseIdx) const {
+  SlotIndex OrigIdx = getInstructionIndex(MI).getRegSlot(true);
+  UseIdx = std::max(UseIdx, UseIdx.getRegSlot(true));
+  for (const MachineOperand &MO : MI.operands()) {
+    if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
+      continue;
+
+    // We can't remat physreg uses, unless it is a constant or target wants
+    // to ignore this use.
+    if (MO.getReg().isPhysical()) {
+      if (MRI->isConstantPhysReg(MO.getReg()) || TII->isIgnorableUse(MO))
+        continue;
+      return false;
+    }
+
+    const LiveInterval &li = getInterval(MO.getReg());
+    const VNInfo *OVNI = li.getVNInfoAt(OrigIdx);
+    if (!OVNI)
+      continue;
+
+    // Don't allow rematerialization immediately after the original def.
+    // It would be incorrect if OrigMI redefines the register.
+    // See PR14098.
+    if (SlotIndex::isSameInstr(OrigIdx, UseIdx))
+      return false;
+
+    if (OVNI != li.getVNInfoAt(UseIdx))
+      return false;
+
+    // Check that subrange is live at UseIdx.
+    if (li.hasSubRanges()) {
+      const TargetRegisterInfo *TRI = MRI->getTargetRegisterInfo();
+      unsigned SubReg = MO.getSubReg();
+      LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
+                              : MRI->getMaxLaneMaskForVReg(MO.getReg());
+      for (const LiveInterval::SubRange &SR : li.subranges()) {
+        if ((SR.LaneMask & LM).none())
+          continue;
+        if (!SR.liveAt(UseIdx))
+          return false;
+        // Early exit if all used lanes are checked. No need to continue.
+        LM &= ~SR.LaneMask;
+        if (LM.none())
+          break;
+      }
+    }
+  }
+  return true;
+}
diff --git a/llvm/lib/CodeGen/LiveRangeEdit.cpp b/llvm/lib/CodeGen/LiveRangeEdit.cpp
index 5514e4eb6cf3e..e08451d124606 100644
--- a/llvm/lib/CodeGen/LiveRangeEdit.cpp
+++ b/llvm/lib/CodeGen/LiveRangeEdit.cpp
@@ -101,60 +101,6 @@ bool LiveRangeEdit::anyRematerializable() {
   return !Remattable.empty();
 }
 
-/// allUsesAvailableAt - Return true if all registers used by OrigMI at
-/// OrigIdx are also available with the same value at UseIdx.
-bool LiveRangeEdit::allUsesAvailableAt(const MachineInstr *OrigMI,
-                                       SlotIndex OrigIdx,
-                                       SlotIndex UseIdx) const {
-  OrigIdx = OrigIdx.getRegSlot(true);
-  UseIdx = std::max(UseIdx, UseIdx.getRegSlot(true));
-  for (const MachineOperand &MO : OrigMI->operands()) {
-    if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
-      continue;
-
-    // We can't remat physreg uses, unless it is a constant or target wants
-    // to ignore this use.
-    if (MO.getReg().isPhysical()) {
-      if (MRI.isConstantPhysReg(MO.getReg()) || TII.isIgnorableUse(MO))
-        continue;
-      return false;
-    }
-
-    LiveInterval &li = LIS.getInterval(MO.getReg());
-    const VNInfo *OVNI = li.getVNInfoAt(OrigIdx);
-    if (!OVNI)
-      continue;
-
-    // Don't allow rematerialization immediately after the original def.
-    // It would be incorrect if OrigMI redefines the register.
-    // See PR14098.
-    if (SlotIndex::isSameInstr(OrigIdx, UseIdx))
-      return false;
-
-    if (OVNI != li.getVNInfoAt(UseIdx))
-      return false;
-
-    // Check that subrange is live at UseIdx.
-    if (li.hasSubRanges()) {
-      const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();
-      unsigned SubReg = MO.getSubReg();
-      LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
-                              : MRI.getMaxLaneMaskForVReg(MO.getReg());
-      for (LiveInterval::SubRange &SR : li.subranges()) {
-        if ((SR.LaneMask & LM).none())
-          continue;
-        if (!SR.liveAt(UseIdx))
-          return false;
-        // Early exit if all used lanes are checked. No need to continue.
-        LM &= ~SR.LaneMask;
-        if (LM.none())
-          break;
-      }
-    }
-  }
-  return true;
-}
-
 bool LiveRangeEdit::canRematerializeAt(Remat &RM, VNInfo *OrigVNI,
                                        SlotIndex UseIdx) {
   assert(ScannedRemattable && "Call anyRematerializable first");
@@ -164,12 +110,10 @@ bool LiveRangeEdit::canRematerializeAt(Remat &RM, VNInfo *OrigVNI,
     return false;
 
   // No defining instruction provided.
-  SlotIndex DefIdx;
   assert(RM.OrigMI && "No defining instruction for remattable value");
-  DefIdx = LIS.getInstructionIndex(*RM.OrigMI);
 
   // Verify that all used registers are available with the same values.
-  if (!allUsesAvailableAt(RM.OrigMI, DefIdx, UseIdx))
+  if (!LIS.allUsesAvailableAt(*RM.OrigMI, UseIdx))
     return false;
 
   return true;
@@ -230,8 +174,7 @@ bool LiveRangeEdit::foldAsLoad(LiveInterval *LI,
 
   // Since we're moving the DefMI load, make sure we're not extending any live
   // ranges.
-  if (!allUsesAvailableAt(DefMI, LIS.getInstructionIndex(*DefMI),
-                          LIS.getInstructionIndex(*UseMI)))
+  if (!LIS.allUsesAvailableAt(*DefMI, LIS.getInstructionIndex(*UseMI)))
     return false;
 
   // We also need to make sure it is safe to move the load.
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
index fba27e3d548cf..ee18a426c1b12 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
@@ -2025,7 +2025,8 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vsetvli a1, zero, e8, mf4, ta, ma
 ; RV32-NEXT:    vmv1r.v v7, v0
 ; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    slli a1, a1, 5
+; RV32-NEXT:    li a2, 40
+; RV32-NEXT:    mul a1, a1, a2
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
 ; RV32-NEXT:    vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
@@ -2036,48 +2037,47 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    sub a3, a0, a1
 ; RV32-NEXT:    addi a2, a2, 1365
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a2
+; RV32-NEXT:    vmv.v.x v24, a2
 ; RV32-NEXT:    sltu a2, a0, a3
 ; RV32-NEXT:    addi a2, a2, -1
 ; RV32-NEXT:    and a2, a2, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vsrl.vi v24, v16, 1, v0.t
+; RV32-NEXT:    vsrl.vi v8, v16, 1, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    li a4, 40
-; RV32-NEXT:    mul a3, a3, a4
+; RV32-NEXT:    slli a3, a3, 5
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v24, v24, v8, v0.t
-; RV32-NEXT:    vsub.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vand.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vsub.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    lui a3, 209715
 ; RV32-NEXT:    addi a3, a3, 819
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v24, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v24, v16, v8, v0.t
-; RV32-NEXT:    vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT:    vand.vv v16, v8, v24, v0.t
+; RV32-NEXT:    vsrl.vi v8, v8, 2, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    li a4, 24
 ; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
-; RV32-NEXT:    vadd.vv v16, v24, v16, v0.t
-; RV32-NEXT:    vsrl.vi v24, v16, 4, v0.t
-; RV32-NEXT:    vadd.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vand.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT:    vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 61681
 ; RV32-NEXT:    addi a3, a3, -241
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v16, a3
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vs8r.v v16, (a3) # vscale x 64-byte Folded Spill
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
+; RV32-NEXT:    vand.vv v16, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 4112
 ; RV32-NEXT:    addi a3, a3, 257
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
@@ -2098,32 +2098,32 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    mv a0, a1
 ; RV32-NEXT:  .LBB46_2:
 ; RV32-NEXT:    vmv1r.v v0, v7
-; RV32-NEXT:    slli a1, a1, 5
+; RV32-NEXT:    li a3, 40
+; RV32-NEXT:    mul a1, a1, a3
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
 ; RV32-NEXT:    vl8r.v v8, (a1) # vscale x 64-byte Folded Reload
 ; RV32-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; RV32-NEXT:    vsrl.vi v16, v8, 1, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 40
-; RV32-NEXT:    mul a0, a0, a1
+; RV32-NEXT:    slli a0, a0, 5
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT:    vand.vv v24, v16, v24, v0.t
-; RV32-NEXT:    vsub.vv v24, v8, v24, v0.t
+; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vsub.vv v16, v8, v16, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    li a1, 24
 ; RV32-NEXT:    mul a0, a0, a1
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
-; RV32-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT:    vand.vv v8, v24, v16, v0.t
-; RV32-NEXT:    vsrl.vi v24, v24, 2, v0.t
-; RV32-NEXT:    vand.vv v24, v24, v16, v0.t
-; RV32-NEXT:    vadd.vv v8, v8, v24, v0.t
-; RV32-NEXT:    vsrl.vi v24, v8, 4, v0.t
-; RV32-NEXT:    vadd.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
+; RV32-NEXT:    vand.vv v8, v16, v24, v0.t
+; RV32-NEXT:    vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
+; RV32-NEXT:    vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    slli a0, a0, 4
 ; RV32-NEXT:    add a0, sp, a0
@@ -2263,21 +2263,21 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vs8r.v v0, (a4) # vscale x 64-byte Folded Spill
 ; RV32-NEXT:    vand.vv v24, v24, v0
-; RV32-NEXT:    vsub.vv v16, v16, v24
+; RV32-NEXT:    vsub.vv v24, v16, v24
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmv.v.x v0, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v24, v16, v0
-; RV32-NEXT:    vsrl.vi v16, v16, 2
+; RV32-NEXT:    vand.vv v16, v24, v0
+; RV32-NEXT:    vsrl.vi v24, v24, 2
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
 ; RV32-NEXT:    vs8r.v v0, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v16, v16, v0
+; RV32-NEXT:    vand.vv v24, v24, v0
+; RV32-NEXT:    vadd.vv v24, v16, v24
+; RV32-NEXT:    vsrl.vi v16, v24, 4
 ; RV32-NEXT:    vadd.vv v16, v24, v16
-; RV32-NEXT:    vsrl.vi v24, v16, 4
-; RV32-NEXT:    vadd.vv v16, v16, v24
 ; RV32-NEXT:    lui a3, 61681
 ; RV32-NEXT:    lui a4, 4112
 ; RV32-NEXT:    addi a3, a3, -241
@@ -2312,16 +2312,16 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
 ; RV32-NEXT:    vand.vv v24, v24, v0
-; RV32-NEXT:    vsub.vv v8, v8, v24
+; RV32-NEXT:    vsub.vv v24, v8, v24
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    slli a0, a0, 4
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT:    vand.vv v24, v8, v0
-; RV32-NEXT:    vsrl.vi v8, v8, 2
-; RV32-NEXT:    vand.vv v8, v8, v0
-; RV32-NEXT:    vadd.vv v8, v24, v8
+; RV32-NEXT:    vand.vv v8, v24, v0
+; RV32-NEXT:    vsrl.vi v24, v24, 2
+; RV32-NEXT:    vand.vv v24, v24, v0
+; RV32-NEXT:    vadd.vv v8, v8, v24
 ; RV32-NEXT:    vsrl.vi v24, v8, 4
 ; RV32-NEXT:    vadd.vv v8, v8, v24
 ; RV32-NEXT:    csrr a0, vlenb
diff --git a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
index 6bf882fe47fef..52eaa51051631 100644
--- a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
@@ -2193,7 +2193,8 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vsetvli a1, zero, e8, mf4, ta, ma
 ; RV32-NEXT:    vmv1r.v v7, v0
 ; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    slli a1, a1, 5
+; RV32-NEXT:    li a2, 40
+; RV32-NEXT:    mul a1, a1, a2
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
 ; RV32-NEXT:    vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
@@ -2207,49 +2208,48 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    lui a3, 349525
 ; RV32-NEXT:    addi a3, a3, 1365
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vadd.vi v24, v16, -1, v0.t
+; RV32-NEXT:    vadd.vi v8, v16, -1, v0.t
 ; RV32-NEXT:    vnot.v v16, v16, v0.t
-; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vand.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v24, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vsrl.vi v24, v16, 1, v0.t
+; RV32-NEXT:    vsrl.vi v16, v8, 1, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    li a4, 40
-; RV32-NEXT:    mul a3, a3, a4
+; RV32-NEXT:    slli a3, a3, 5
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v24, v24, v8, v0.t
-; RV32-NEXT:    vsub.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vsub.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 209715
 ; RV32-NEXT:    addi a3, a3, 819
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v24, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v24, v16, v8, v0.t
-; RV32-NEXT:    vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT:    vand.vv v16, v8, v24, v0.t
+; RV32-NEXT:    vsrl.vi v8, v8, 2, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    li a4, 24
 ; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
-; RV32-NEXT:    vadd.vv v16, v24, v16, v0.t
-; RV32-NEXT:    vsrl.vi v24, v16, 4, v0.t
-; RV32-NEXT:    vadd.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vand.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT:    vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 61681
 ; RV32-NEXT:    addi a3, a3, -241
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v16, a3
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vs8r.v v16, (a3) # vscale x 64-byte Folded Spill
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
+; RV32-NEXT:    vand.vv v16, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 4112
 ; RV32-NEXT:    addi a3, a3, 257
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
@@ -2270,35 +2270,35 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    mv a0, a1
 ; RV32-NEXT:  .LBB46_2:
 ; RV32-NEXT:    vmv1r.v v0, v7
-; RV32-NEXT:    slli a1, a1, 5
+; RV32-NEXT:    li a3, 40
+; RV32-NEXT:    mul a1, a1, a3
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
 ; RV32-NEXT:    vl8r.v v8, (a1) # vscale x 64-byte Folded Reload
 ; RV32-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT:    vadd.vi v24, v8, -1, v0.t
+; RV32-NEXT:    vadd.vi v16, v8, -1, v0.t
 ; RV32-NEXT:    vnot.v v8, v8, v0.t
-; RV32-NEXT:    vand.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    vsrl.vi v16, v8, 1, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 40
-; RV32-NEXT:    mul a0, a0, a1
+; RV32-NEXT:    slli a0, a0, 5
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT:    vand.vv v24, v16, v24, v0.t
-; RV32-NEXT:    vsub.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vsub.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NE...
[truncated]

llvmbot · 2025-09-16T21:55:45Z

@llvm/pr-subscribers-llvm-regalloc

Author: Luke Lau (lukel97)

Changes

This aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted.

CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others.

It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this but CalcSpillWeights doesn't, so the two diverge.

This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. There may be a better place to share the function than LiveIntervals.

In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408

Patch is 154.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159180.diff

17 Files Affected:

(modified) llvm/include/llvm/CodeGen/LiveIntervals.h (+4)
(modified) llvm/include/llvm/CodeGen/LiveRangeEdit.h (-5)
(modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+11)
(modified) llvm/lib/CodeGen/LiveIntervals.cpp (+52)
(modified) llvm/lib/CodeGen/LiveRangeEdit.cpp (+2-59)
(modified) llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll (+42-42)
(modified) llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll (+43-43)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz-vp.ll (+96-96)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctpop-vp.ll (+26-26)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz-vp.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/remat.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/vfdiv-constrained-sdnode.ll (+18-38)
(modified) llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll (+485-424)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-constrained-sdnode.ll (+62-38)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-sdnode.ll (+112-156)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmsub-constrained-sdnode.ll (+6-5)
(modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/cond-vector-reduce-mve-codegen.ll (+10-5)

diff --git a/llvm/include/llvm/CodeGen/LiveIntervals.h b/llvm/include/llvm/CodeGen/LiveIntervals.h
index 1050b3daa0f57..6678ebaac333b 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervals.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervals.h
@@ -466,6 +466,10 @@ class LiveIntervals {
   /// have any segments or value numbers.
   LLVM_ABI void constructMainRangeFromSubranges(LiveInterval &LI);
 
+  /// \returns true if all registers used by \p OrigMI at \p OrigIdx are also
+  /// available with the same value at \p UseIdx.
+  bool allUsesAvailableAt(const MachineInstr &MI, SlotIndex UseIdx) const;
+
 private:
   /// Compute live intervals for all virtual registers.
   void computeVirtRegs();
diff --git a/llvm/include/llvm/CodeGen/LiveRangeEdit.h b/llvm/include/llvm/CodeGen/LiveRangeEdit.h
index 3d5df194c71c9..adcca23c24fb3 100644
--- a/llvm/include/llvm/CodeGen/LiveRangeEdit.h
+++ b/llvm/include/llvm/CodeGen/LiveRangeEdit.h
@@ -193,11 +193,6 @@ class LiveRangeEdit : private MachineRegisterInfo::Delegate {
     explicit Remat(const VNInfo *ParentVNI) : ParentVNI(ParentVNI) {}
   };
 
-  /// allUsesAvailableAt - Return true if all registers used by OrigMI at
-  /// OrigIdx are also available with the same value at UseIdx.
-  bool allUsesAvailableAt(const MachineInstr *OrigMI, SlotIndex OrigIdx,
-                          SlotIndex UseIdx) const;
-
   /// canRematerializeAt - Determine if ParentVNI can be rematerialized at
   /// UseIdx. It is assumed that parent_.getVNINfoAt(UseIdx) == ParentVNI.
   bool canRematerializeAt(Remat &RM, VNInfo *OrigVNI, SlotIndex UseIdx);
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index b16694eafd90e..40df8c4415887 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -124,6 +124,17 @@ bool VirtRegAuxInfo::isRematerializable(const LiveInterval &LI,
 
     if (!TII.isTriviallyReMaterializable(*MI))
       return false;
+
+    // If MI has register uses, it will only be rematerializable if its uses are
+    // also live at the indices it will be rematerialized at.
+    const MachineRegisterInfo &MRI = MI->getMF()->getRegInfo();
+    for (MachineInstr &Use : MRI.use_instructions(Reg)) {
+      SlotIndex UseIdx = LIS.getInstructionIndex(Use);
+      if (LI.getVNInfoAt(UseIdx) != VNI)
+        continue;
+      if (!LIS.allUsesAvailableAt(*MI, UseIdx))
+        return false;
+    }
   }
   return true;
 }
diff --git a/llvm/lib/CodeGen/LiveIntervals.cpp b/llvm/lib/CodeGen/LiveIntervals.cpp
index 3485a27335f13..24c54fd8ed64b 100644
--- a/llvm/lib/CodeGen/LiveIntervals.cpp
+++ b/llvm/lib/CodeGen/LiveIntervals.cpp
@@ -34,6 +34,7 @@
 #include "llvm/CodeGen/Passes.h"
 #include "llvm/CodeGen/SlotIndexes.h"
 #include "llvm/CodeGen/StackMaps.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/CodeGen/VirtRegMap.h"
@@ -1820,3 +1821,54 @@ void LiveIntervals::constructMainRangeFromSubranges(LiveInterval &LI) {
   LICalc->reset(MF, getSlotIndexes(), DomTree, &getVNInfoAllocator());
   LICalc->constructMainRangeFromSubranges(LI);
 }
+
+bool LiveIntervals::allUsesAvailableAt(const MachineInstr &MI,
+                                       SlotIndex UseIdx) const {
+  SlotIndex OrigIdx = getInstructionIndex(MI).getRegSlot(true);
+  UseIdx = std::max(UseIdx, UseIdx.getRegSlot(true));
+  for (const MachineOperand &MO : MI.operands()) {
+    if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
+      continue;
+
+    // We can't remat physreg uses, unless it is a constant or target wants
+    // to ignore this use.
+    if (MO.getReg().isPhysical()) {
+      if (MRI->isConstantPhysReg(MO.getReg()) || TII->isIgnorableUse(MO))
+        continue;
+      return false;
+    }
+
+    const LiveInterval &li = getInterval(MO.getReg());
+    const VNInfo *OVNI = li.getVNInfoAt(OrigIdx);
+    if (!OVNI)
+      continue;
+
+    // Don't allow rematerialization immediately after the original def.
+    // It would be incorrect if OrigMI redefines the register.
+    // See PR14098.
+    if (SlotIndex::isSameInstr(OrigIdx, UseIdx))
+      return false;
+
+    if (OVNI != li.getVNInfoAt(UseIdx))
+      return false;
+
+    // Check that subrange is live at UseIdx.
+    if (li.hasSubRanges()) {
+      const TargetRegisterInfo *TRI = MRI->getTargetRegisterInfo();
+      unsigned SubReg = MO.getSubReg();
+      LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
+                              : MRI->getMaxLaneMaskForVReg(MO.getReg());
+      for (const LiveInterval::SubRange &SR : li.subranges()) {
+        if ((SR.LaneMask & LM).none())
+          continue;
+        if (!SR.liveAt(UseIdx))
+          return false;
+        // Early exit if all used lanes are checked. No need to continue.
+        LM &= ~SR.LaneMask;
+        if (LM.none())
+          break;
+      }
+    }
+  }
+  return true;
+}
diff --git a/llvm/lib/CodeGen/LiveRangeEdit.cpp b/llvm/lib/CodeGen/LiveRangeEdit.cpp
index 5514e4eb6cf3e..e08451d124606 100644
--- a/llvm/lib/CodeGen/LiveRangeEdit.cpp
+++ b/llvm/lib/CodeGen/LiveRangeEdit.cpp
@@ -101,60 +101,6 @@ bool LiveRangeEdit::anyRematerializable() {
   return !Remattable.empty();
 }
 
-/// allUsesAvailableAt - Return true if all registers used by OrigMI at
-/// OrigIdx are also available with the same value at UseIdx.
-bool LiveRangeEdit::allUsesAvailableAt(const MachineInstr *OrigMI,
-                                       SlotIndex OrigIdx,
-                                       SlotIndex UseIdx) const {
-  OrigIdx = OrigIdx.getRegSlot(true);
-  UseIdx = std::max(UseIdx, UseIdx.getRegSlot(true));
-  for (const MachineOperand &MO : OrigMI->operands()) {
-    if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
-      continue;
-
-    // We can't remat physreg uses, unless it is a constant or target wants
-    // to ignore this use.
-    if (MO.getReg().isPhysical()) {
-      if (MRI.isConstantPhysReg(MO.getReg()) || TII.isIgnorableUse(MO))
-        continue;
-      return false;
-    }
-
-    LiveInterval &li = LIS.getInterval(MO.getReg());
-    const VNInfo *OVNI = li.getVNInfoAt(OrigIdx);
-    if (!OVNI)
-      continue;
-
-    // Don't allow rematerialization immediately after the original def.
-    // It would be incorrect if OrigMI redefines the register.
-    // See PR14098.
-    if (SlotIndex::isSameInstr(OrigIdx, UseIdx))
-      return false;
-
-    if (OVNI != li.getVNInfoAt(UseIdx))
-      return false;
-
-    // Check that subrange is live at UseIdx.
-    if (li.hasSubRanges()) {
-      const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();
-      unsigned SubReg = MO.getSubReg();
-      LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
-                              : MRI.getMaxLaneMaskForVReg(MO.getReg());
-      for (LiveInterval::SubRange &SR : li.subranges()) {
-        if ((SR.LaneMask & LM).none())
-          continue;
-        if (!SR.liveAt(UseIdx))
-          return false;
-        // Early exit if all used lanes are checked. No need to continue.
-        LM &= ~SR.LaneMask;
-        if (LM.none())
-          break;
-      }
-    }
-  }
-  return true;
-}
-
 bool LiveRangeEdit::canRematerializeAt(Remat &RM, VNInfo *OrigVNI,
                                        SlotIndex UseIdx) {
   assert(ScannedRemattable && "Call anyRematerializable first");
@@ -164,12 +110,10 @@ bool LiveRangeEdit::canRematerializeAt(Remat &RM, VNInfo *OrigVNI,
     return false;
 
   // No defining instruction provided.
-  SlotIndex DefIdx;
   assert(RM.OrigMI && "No defining instruction for remattable value");
-  DefIdx = LIS.getInstructionIndex(*RM.OrigMI);
 
   // Verify that all used registers are available with the same values.
-  if (!allUsesAvailableAt(RM.OrigMI, DefIdx, UseIdx))
+  if (!LIS.allUsesAvailableAt(*RM.OrigMI, UseIdx))
     return false;
 
   return true;
@@ -230,8 +174,7 @@ bool LiveRangeEdit::foldAsLoad(LiveInterval *LI,
 
   // Since we're moving the DefMI load, make sure we're not extending any live
   // ranges.
-  if (!allUsesAvailableAt(DefMI, LIS.getInstructionIndex(*DefMI),
-                          LIS.getInstructionIndex(*UseMI)))
+  if (!LIS.allUsesAvailableAt(*DefMI, LIS.getInstructionIndex(*UseMI)))
     return false;
 
   // We also need to make sure it is safe to move the load.
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
index fba27e3d548cf..ee18a426c1b12 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
@@ -2025,7 +2025,8 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vsetvli a1, zero, e8, mf4, ta, ma
 ; RV32-NEXT:    vmv1r.v v7, v0
 ; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    slli a1, a1, 5
+; RV32-NEXT:    li a2, 40
+; RV32-NEXT:    mul a1, a1, a2
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
 ; RV32-NEXT:    vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
@@ -2036,48 +2037,47 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    sub a3, a0, a1
 ; RV32-NEXT:    addi a2, a2, 1365
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a2
+; RV32-NEXT:    vmv.v.x v24, a2
 ; RV32-NEXT:    sltu a2, a0, a3
 ; RV32-NEXT:    addi a2, a2, -1
 ; RV32-NEXT:    and a2, a2, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vsrl.vi v24, v16, 1, v0.t
+; RV32-NEXT:    vsrl.vi v8, v16, 1, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    li a4, 40
-; RV32-NEXT:    mul a3, a3, a4
+; RV32-NEXT:    slli a3, a3, 5
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v24, v24, v8, v0.t
-; RV32-NEXT:    vsub.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vand.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vsub.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    lui a3, 209715
 ; RV32-NEXT:    addi a3, a3, 819
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v24, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v24, v16, v8, v0.t
-; RV32-NEXT:    vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT:    vand.vv v16, v8, v24, v0.t
+; RV32-NEXT:    vsrl.vi v8, v8, 2, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    li a4, 24
 ; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
-; RV32-NEXT:    vadd.vv v16, v24, v16, v0.t
-; RV32-NEXT:    vsrl.vi v24, v16, 4, v0.t
-; RV32-NEXT:    vadd.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vand.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT:    vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 61681
 ; RV32-NEXT:    addi a3, a3, -241
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v16, a3
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vs8r.v v16, (a3) # vscale x 64-byte Folded Spill
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
+; RV32-NEXT:    vand.vv v16, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 4112
 ; RV32-NEXT:    addi a3, a3, 257
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
@@ -2098,32 +2098,32 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    mv a0, a1
 ; RV32-NEXT:  .LBB46_2:
 ; RV32-NEXT:    vmv1r.v v0, v7
-; RV32-NEXT:    slli a1, a1, 5
+; RV32-NEXT:    li a3, 40
+; RV32-NEXT:    mul a1, a1, a3
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
 ; RV32-NEXT:    vl8r.v v8, (a1) # vscale x 64-byte Folded Reload
 ; RV32-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
 ; RV32-NEXT:    vsrl.vi v16, v8, 1, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 40
-; RV32-NEXT:    mul a0, a0, a1
+; RV32-NEXT:    slli a0, a0, 5
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT:    vand.vv v24, v16, v24, v0.t
-; RV32-NEXT:    vsub.vv v24, v8, v24, v0.t
+; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vsub.vv v16, v8, v16, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    li a1, 24
 ; RV32-NEXT:    mul a0, a0, a1
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
-; RV32-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT:    vand.vv v8, v24, v16, v0.t
-; RV32-NEXT:    vsrl.vi v24, v24, 2, v0.t
-; RV32-NEXT:    vand.vv v24, v24, v16, v0.t
-; RV32-NEXT:    vadd.vv v8, v8, v24, v0.t
-; RV32-NEXT:    vsrl.vi v24, v8, 4, v0.t
-; RV32-NEXT:    vadd.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
+; RV32-NEXT:    vand.vv v8, v16, v24, v0.t
+; RV32-NEXT:    vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
+; RV32-NEXT:    vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    slli a0, a0, 4
 ; RV32-NEXT:    add a0, sp, a0
@@ -2263,21 +2263,21 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vs8r.v v0, (a4) # vscale x 64-byte Folded Spill
 ; RV32-NEXT:    vand.vv v24, v24, v0
-; RV32-NEXT:    vsub.vv v16, v16, v24
+; RV32-NEXT:    vsub.vv v24, v16, v24
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmv.v.x v0, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v24, v16, v0
-; RV32-NEXT:    vsrl.vi v16, v16, 2
+; RV32-NEXT:    vand.vv v16, v24, v0
+; RV32-NEXT:    vsrl.vi v24, v24, 2
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
 ; RV32-NEXT:    vs8r.v v0, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v16, v16, v0
+; RV32-NEXT:    vand.vv v24, v24, v0
+; RV32-NEXT:    vadd.vv v24, v16, v24
+; RV32-NEXT:    vsrl.vi v16, v24, 4
 ; RV32-NEXT:    vadd.vv v16, v24, v16
-; RV32-NEXT:    vsrl.vi v24, v16, 4
-; RV32-NEXT:    vadd.vv v16, v16, v24
 ; RV32-NEXT:    lui a3, 61681
 ; RV32-NEXT:    lui a4, 4112
 ; RV32-NEXT:    addi a3, a3, -241
@@ -2312,16 +2312,16 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
 ; RV32-NEXT:    vand.vv v24, v24, v0
-; RV32-NEXT:    vsub.vv v8, v8, v24
+; RV32-NEXT:    vsub.vv v24, v8, v24
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NEXT:    slli a0, a0, 4
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT:    vand.vv v24, v8, v0
-; RV32-NEXT:    vsrl.vi v8, v8, 2
-; RV32-NEXT:    vand.vv v8, v8, v0
-; RV32-NEXT:    vadd.vv v8, v24, v8
+; RV32-NEXT:    vand.vv v8, v24, v0
+; RV32-NEXT:    vsrl.vi v24, v24, 2
+; RV32-NEXT:    vand.vv v24, v24, v0
+; RV32-NEXT:    vadd.vv v8, v8, v24
 ; RV32-NEXT:    vsrl.vi v24, v8, 4
 ; RV32-NEXT:    vadd.vv v8, v8, v24
 ; RV32-NEXT:    csrr a0, vlenb
diff --git a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
index 6bf882fe47fef..52eaa51051631 100644
--- a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
@@ -2193,7 +2193,8 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    vsetvli a1, zero, e8, mf4, ta, ma
 ; RV32-NEXT:    vmv1r.v v7, v0
 ; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    slli a1, a1, 5
+; RV32-NEXT:    li a2, 40
+; RV32-NEXT:    mul a1, a1, a2
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
 ; RV32-NEXT:    vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
@@ -2207,49 +2208,48 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    lui a3, 349525
 ; RV32-NEXT:    addi a3, a3, 1365
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vadd.vi v24, v16, -1, v0.t
+; RV32-NEXT:    vadd.vi v8, v16, -1, v0.t
 ; RV32-NEXT:    vnot.v v16, v16, v0.t
-; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vand.vv v8, v16, v8, v0.t
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v24, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vsrl.vi v24, v16, 1, v0.t
+; RV32-NEXT:    vsrl.vi v16, v8, 1, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
-; RV32-NEXT:    li a4, 40
-; RV32-NEXT:    mul a3, a3, a4
+; RV32-NEXT:    slli a3, a3, 5
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v24, v24, v8, v0.t
-; RV32-NEXT:    vsub.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vsub.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 209715
 ; RV32-NEXT:    addi a3, a3, 819
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v24, a3
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v24, v16, v8, v0.t
-; RV32-NEXT:    vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT:    vand.vv v16, v8, v24, v0.t
+; RV32-NEXT:    vsrl.vi v8, v8, 2, v0.t
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    li a4, 24
 ; RV32-NEXT:    mul a3, a3, a4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
-; RV32-NEXT:    vadd.vv v16, v24, v16, v0.t
-; RV32-NEXT:    vsrl.vi v24, v16, 4, v0.t
-; RV32-NEXT:    vadd.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vand.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT:    vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT:    vadd.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 61681
 ; RV32-NEXT:    addi a3, a3, -241
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT:    vmv.v.x v8, a3
+; RV32-NEXT:    vmv.v.x v16, a3
 ; RV32-NEXT:    csrr a3, vlenb
 ; RV32-NEXT:    slli a3, a3, 4
 ; RV32-NEXT:    add a3, sp, a3
 ; RV32-NEXT:    addi a3, a3, 16
-; RV32-NEXT:    vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT:    vs8r.v v16, (a3) # vscale x 64-byte Folded Spill
 ; RV32-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT:    vand.vv v16, v16, v8, v0.t
+; RV32-NEXT:    vand.vv v16, v8, v16, v0.t
 ; RV32-NEXT:    lui a3, 4112
 ; RV32-NEXT:    addi a3, a3, 257
 ; RV32-NEXT:    vsetvli a4, zero, e32, m8, ta, ma
@@ -2270,35 +2270,35 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
 ; RV32-NEXT:    mv a0, a1
 ; RV32-NEXT:  .LBB46_2:
 ; RV32-NEXT:    vmv1r.v v0, v7
-; RV32-NEXT:    slli a1, a1, 5
+; RV32-NEXT:    li a3, 40
+; RV32-NEXT:    mul a1, a1, a3
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
 ; RV32-NEXT:    vl8r.v v8, (a1) # vscale x 64-byte Folded Reload
 ; RV32-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT:    vadd.vi v24, v8, -1, v0.t
+; RV32-NEXT:    vadd.vi v16, v8, -1, v0.t
 ; RV32-NEXT:    vnot.v v8, v8, v0.t
-; RV32-NEXT:    vand.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vand.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    vsrl.vi v16, v8, 1, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    li a1, 40
-; RV32-NEXT:    mul a0, a0, a1
+; RV32-NEXT:    slli a0, a0, 5
 ; RV32-NEXT:    add a0, sp, a0
 ; RV32-NEXT:    addi a0, a0, 16
 ; RV32-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT:    vand.vv v24, v16, v24, v0.t
-; RV32-NEXT:    vsub.vv v8, v8, v24, v0.t
+; RV32-NEXT:    vand.vv v16, v16, v24, v0.t
+; RV32-NEXT:    vsub.vv v8, v8, v16, v0.t
 ; RV32-NEXT:    csrr a0, vlenb
 ; RV32-NE...
[truncated]

github-actions · 2025-09-16T22:56:29Z

✅ With the latest revision this PR passed the C/C++ code formatter.

preames · 2025-09-17T15:57:17Z

I think there's a simpler alternative here which should be considered. Instead of attempting to extend the reasoning to handle live intervals, we should simply update the code to explicitly bail out for any vreg uses. This is non-functional with the current code structure, and would keep this code path performing the same heuristic when combined with your second change.

I'm open to being convinced that the complexity of actually checking liveness is needed, but I think that needs explicitly justified. I believe weights aren't recalculated, so there's some complexity/staleness risk as we run through the allocation queue. I don't actually think this is wrong, but if we don't have to have the complexity, I'd rather not.

rampitec · 2025-09-17T18:00:25Z

In principle it looks correct, but it is really hard to reason about massive test changes.

llvm/lib/CodeGen/CalcSpillWeights.cpp

lukel97 · 2025-09-18T10:42:36Z

I'm open to being convinced that the complexity of actually checking liveness is needed, but I think that needs explicitly justified.

I went ahead and tried out the simpler approach of considering any vreg uses as non-rematerializable, i.e:

-    // If MI has register uses, it will only be rematerializable if its uses are
-    // also live at the indices it will be rematerialized at.
-    SmallVector<Register, 8> NewRegs;
-    LiveRangeEdit LRE(nullptr, NewRegs, *MI->getMF(), LIS, nullptr);
-    const MachineRegisterInfo &MRI = MI->getMF()->getRegInfo();
-    for (MachineInstr &Use : MRI.use_nodbg_instructions(Reg)) {
-      SlotIndex UseIdx = LIS.getInstructionIndex(Use);
-      if (LI.getVNInfoAt(UseIdx) != VNI)
-        continue;
-      if (!LRE.allUsesAvailableAt(MI, VNI->def, UseIdx))
+    for (auto &MO : MI->explicit_uses())
+      if (MO.isReg() && MO.getReg().isVirtual())
         return false;
-    }

When applied on top of #159211 we get slightly more spills and reloads vs checking the liveness.

On arm64-apple-darwin there's 0.5% geomean increase in spills and reloads on llvm-test-suite, and a 1.5% increase on riscv64-linux-gnu -march=rva23u64.

On SPEC CPU 2017 rva23u64 there's a 2.2/2.4% geomean increase in spills/reloads, but 14% more on 519.lbm_r

Results for SPEC CPU 2017 rva23u64 check liveness vs no virt reg uses

Metric: regalloc.NumSpills,regalloc.NumReloads

Program                                       regalloc.NumSpills                regalloc.NumReloads               
                                              lhs                rhs      diff  lhs                 rhs      diff 
FP2017speed/619.lbm_s/619.lbm_s                  46.00              53.00 15.2%    47.00               54.00 14.9%
FP2017rate/519.lbm_r/519.lbm_r                   50.00              57.00 14.0%    51.00               58.00 13.7%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s     265.00             283.00  6.8%   515.00              532.00  3.3%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r     265.00             283.00  6.8%   515.00              532.00  3.3%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r    1397.00            1461.00  4.6%  2243.00             2330.00  3.9%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s    1397.00            1461.00  4.6%  2243.00             2330.00  3.9%
INT2017rate/557.xz_r/557.xz_r                   269.00             276.00  2.6%   470.00              486.00  3.4%
INT2017speed/657.xz_s/657.xz_s                  269.00             276.00  2.6%   470.00              486.00  3.4%
INT2017rate/520.omnetpp_r/520.omnetpp_r         646.00             660.00  2.2%  1189.00             1193.00  0.3%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s     646.00             660.00  2.2%  1189.00             1193.00  0.3%
FP2017rate/544.nab_r/544.nab_r                  690.00             700.00  1.4%  1026.00             1041.00  1.5%
FP2017speed/644.nab_s/644.nab_s                 690.00             700.00  1.4%  1026.00             1041.00  1.5%
FP2017rate/511.povray_r/511.povray_r           1703.00            1722.00  1.1%  2808.00             2817.00  0.3%
FP2017rate/526.blender_r/526.blender_r        12452.00           12530.00  0.6% 24688.00            24686.00 -0.0%
INT2017speed/625.x264_s/625.x264_s             1858.00            1867.00  0.5%  4063.00             4079.00  0.4%
INT2017rate/525.x264_r/525.x264_r              1858.00            1867.00  0.5%  4063.00             4079.00  0.4%
INT2017rate/502.gcc_r/502.gcc_r               11035.00           11060.00  0.2% 24268.00            24213.00 -0.2%
INT2017speed/602.gcc_s/602.gcc_s              11035.00           11060.00  0.2% 24268.00            24213.00 -0.2%
FP2017rate/510.parest_r/510.parest_r          42791.00           42817.00  0.1% 75706.00            75722.00  0.0%
FP2017rate/508.namd_r/508.namd_r               6568.00            6568.00  0.0% 15177.00            15235.00  0.4%
INT2017speed/641.leela_s/641.leela_s            300.00             300.00  0.0%   420.00              425.00  1.2%
INT2017rate/541.leela_r/541.leela_r             300.00             300.00  0.0%   420.00              425.00  1.2%
INT2017rat...00.perlbench_r/500.perlbench_r    4361.00            4337.00 -0.6%  9647.00             9667.00  0.2%
INT2017spe...00.perlbench_s/600.perlbench_s    4361.00            4337.00 -0.6%  9647.00             9667.00  0.2%
FP2017speed/638.imagick_s/638.imagick_s        3333.00            3311.00 -0.7%  7990.00             8453.00  5.8%
FP2017rate/538.imagick_r/538.imagick_r         3333.00            3311.00 -0.7%  7990.00             8453.00  5.8%
INT2017speed/605.mcf_s/605.mcf_s                104.00             103.00 -1.0%   196.00              196.00  0.0%
INT2017rate/505.mcf_r/505.mcf_r                 104.00             103.00 -1.0%   196.00              196.00  0.0%
FP2017rate...97.specrand_fr/997.specrand_fr       0.00               0.00                                         
FP2017spee...96.specrand_fs/996.specrand_fs       0.00               0.00                                         
INT2017rat...99.specrand_ir/999.specrand_ir       0.00               0.00                                         
INT2017spe...98.specrand_is/998.specrand_is       0.00               0.00                                         
                           Geomean difference                              2.2%                               2.4%

I believe weights aren't recalculated, so there's some complexity/staleness risk as we run through the allocation queue. I don't actually think this is wrong, but if we don't have to have the complexity, I'd rather not.

The weights are usually calculated once at the start, but if a live range is spilled or split then it recomputes the weight of the interval that changed in LiveRangeEdit::calculateRegClassAndHint. So as a second experiment I tried recalculating the weights of any uses that may have become rematerializable, but there's not much of a difference, -0.1% reloads.

Ultimately the spill weight is a heuristic so we have the choice of how accurate we want it to be. My guess is that the extra complexity of checking the liveness is worth an extra 2.2% reduction in spills/reloads, but recalculating the weights of uses isn't worth the 0.1%.

Not a strongly held opinion though, I'm open to avoiding the liveness check if reviewers prefer.

lukel97 · 2025-09-18T10:49:21Z

In principle it looks correct, but it is really hard to reason about massive test changes.

The sheer size of the test diff mostly comes from those amdgcn.bitcast tests. amgdcn.bitcast.1024.bit.ll is 237k lines alone. Do people read the FileCheck output of it or is it mostly to check that it doesn't crash?

It's worth noting though that most of the test diffs here are just noise. In practice on AArch64/RISC-V/X86 there are no changes in llvm-test-suite/SPEC CPU 2017.

This reverts commit 0ca6a24.

…ematerializable-allUsesAvailableAt

preames · 2025-09-19T17:41:17Z

I went ahead and tried out the simpler approach of considering any vreg uses as non-rematerializable, i.e:

Just noting that I have read this whole comment, and find it convincing. Thank you for doing the due diligence.

preames

Remember to update the comment on the call to isRematerializable

We could consider a deeper rework of this. The 0.5 multiply is somewhat arbitrary, and we could consider e.g. scaling that decrease by the proportion of defs which are materialized. There's a bunch of option for playing with this if you want.

llvm/lib/CodeGen/CalcSpillWeights.cpp

llvm/include/llvm/CodeGen/LiveIntervals.h

llvm/lib/CodeGen/LiveIntervals.cpp

…ematerializable-allUsesAvailableAt

llvm/include/llvm/CodeGen/CalcSpillWeights.h

llvm/lib/CodeGen/CalcSpillWeights.cpp

arsenm · 2025-09-25T11:38:11Z

llvm/lib/CodeGen/CalcSpillWeights.cpp

+  // If MI has register uses, it will only be rematerializable if its uses are
+  // also live at the indices it will be rematerialized at.
+  for (MachineOperand &MO : MRI.reg_nodbg_operands(LI.reg())) {
+    if (MO.isUndef() || !MO.readsReg())


Suggested change

if (MO.isUndef() || !MO.readsReg())

if (!MO.readsReg())

Redundant undef check

…ematerializable-allUsesAvailableAt

preames

LGTM for the RISC-V tests and heuristic change. Please wait for @arsenm to confirm, particularly the AMDGPU tests. .

arsenm · 2025-09-25T14:57:02Z

llvm/test/CodeGen/AMDGPU/remat-sop.mir

-    ; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1
-    ; GCN-NEXT: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5)
+    ; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5)
+    ; GCN-NEXT: renamable $sgpr0 = IMPLICIT_DEF


I suppose this grew by one IMPLICIT_DEF

arsenm · 2025-09-25T14:58:35Z

llvm/test/CodeGen/AMDGPU/swdev502267-use-after-free-last-chance-recoloring-alloc-succeeds.mir

-  ; CHECK-NEXT:   SI_SPILL_AV128_SAVE $vgpr4_vgpr5_vgpr6_vgpr7, %stack.4, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.4, align 4, addrspace 5)
  ; CHECK-NEXT:   renamable $vgpr0 = V_TRUNC_F32_e32 killed $vgpr0, implicit $mode, implicit $exec
  ; CHECK-NEXT:   SI_SPILL_AV32_SAVE killed $vgpr0, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
+  ; CHECK-NEXT:   renamable $vgpr0_vgpr1_vgpr2_vgpr3 = SI_SPILL_AV128_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.2, align 4, addrspace 5)


This avoided a lot of spills. However there's a good chance this no longer is going down the last chance recoloring path, but that's inevitable with these sorts of tests

Stacked on llvm#159180. Unless overridden by the target, we currently only allow rematerlization of instructions with immediate or constant physical register operands, i.e. no virtual registers. The comment states that this is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. This patch relaxes this constraint which reduces a significant amount of reloads across various targets. This is another attempt at https://reviews.llvm.org/D106408, but llvm#159180 aims to have addressed the issue with the weights that may have caused the previous regressions.

…e weight discount (llvm#159180) This aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted. CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others. It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this but CalcSpillWeights doesn't, so the two diverge. This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408

#159211) In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in `TargetTransformInfo::isReMaterializableImpl`. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However #160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and #160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets. For llvm-test-suite built with -O3 -flto, we get the following geomean reduction in reloads: - arm64-apple-darwin: 11.6% - riscv64-linux-gnu: 8.1% - x86_64-linux-gnu: 6.5%

llvm#159211) In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in `TargetTransformInfo::isReMaterializableImpl`. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However llvm#160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and llvm#160709 and llvm#159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets. For llvm-test-suite built with -O3 -flto, we get the following geomean reduction in reloads: - arm64-apple-darwin: 11.6% - riscv64-linux-gnu: 8.1% - x86_64-linux-gnu: 6.5%

lukel97 requested review from arsenm, preames, rampitec, wangpc-pp and williamweixiao September 16, 2025 21:55

llvmbot added backend:RISC-V llvm:codegen llvm:regalloc labels Sep 16, 2025

lukel97 changed the title ~~[RegAlloc] Account for non-rematerializable uses when applying weight discount~~ [RegAlloc] Account for use availability when applying rematerializable weight discount Sep 16, 2025

lukel97 added 2 commits September 17, 2025 06:27

Update AMDGPU tests

7194163

Don't check debug uses

da6cc49

llvmbot added the backend:AMDGPU label Sep 16, 2025

Refactor to just reuse LiveRangeEdit

0ca6a24

clang-format

4a86d70

lukel97 mentioned this pull request Sep 16, 2025

[RegAlloc] Remove default restriction on non-trivial rematerialization #159211

Merged

arsenm reviewed Sep 17, 2025

View reviewed changes

llvm/lib/CodeGen/CalcSpillWeights.cpp Outdated Show resolved Hide resolved

lukel97 added 2 commits September 19, 2025 12:31

Revert "Refactor to just reuse LiveRangeEdit"

2daff16

This reverts commit 0ca6a24.

Merge branch 'main' of github.com:llvm/llvm-project into regalloc/isR…

741fbe1

…ematerializable-allUsesAvailableAt

preames reviewed Sep 19, 2025

View reviewed changes

llvm/lib/CodeGen/CalcSpillWeights.cpp Outdated Show resolved Hide resolved

preames reviewed Sep 19, 2025

View reviewed changes

llvm/include/llvm/CodeGen/LiveIntervals.h Outdated Show resolved Hide resolved

llvm/lib/CodeGen/LiveIntervals.cpp Outdated Show resolved Hide resolved

lukel97 added 4 commits September 22, 2025 16:52

Move to VirtRegAuxInfo

e94ba1a

Store VNInfo -> Def map and then iterate over uses

277a9ec

Merge branch 'main' of github.com:llvm/llvm-project into regalloc/isR…

d2bb2a6

…ematerializable-allUsesAvailableAt

Fix merge

0bcbd73

arsenm reviewed Sep 22, 2025

View reviewed changes

llvm/include/llvm/CodeGen/CalcSpillWeights.h Outdated Show resolved Hide resolved

arsenm reviewed Sep 22, 2025

View reviewed changes

llvm/lib/CodeGen/CalcSpillWeights.cpp Outdated Show resolved Hide resolved

Handle uses in defs, fix doccomment

029d78f

lukel97 mentioned this pull request Sep 23, 2025

[DRAFT][RegisterCoalescer] Enable non-trivial rematerialization #160153

Closed

arsenm reviewed Sep 25, 2025

View reviewed changes

lukel97 added 3 commits September 25, 2025 19:49

Remove redundant isUndef

9165745

Merge branch 'main' of github.com:llvm/llvm-project into regalloc/isR…

c9a0c10

…ematerializable-allUsesAvailableAt

Update amdgcn.bitcast.1024bit.ll

255beff

preames approved these changes Sep 25, 2025

View reviewed changes

arsenm approved these changes Sep 25, 2025

View reviewed changes

lukel97 merged commit be23cdc into llvm:main Sep 25, 2025
9 checks passed

[RegAlloc] Account for use availability when applying rematerializable weight discount #159180

[RegAlloc] Account for use availability when applying rematerializable weight discount #159180

Uh oh!

Conversation

lukel97 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

preames commented Sep 17, 2025

Uh oh!

rampitec commented Sep 17, 2025

Uh oh!

Uh oh!

lukel97 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Sep 18, 2025

Uh oh!

preames commented Sep 19, 2025

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lukel97 commented Sep 16, 2025 •

edited

Loading

llvmbot commented Sep 16, 2025 •

edited

Loading

github-actions bot commented Sep 16, 2025 •

edited

Loading

lukel97 commented Sep 18, 2025 •

edited

Loading