-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[RegAlloc] Account for use availability when applying rematerializable weight discount #159180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RegAlloc] Account for use availability when applying rematerializable weight discount #159180
Conversation
… discount This aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted. CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others. It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this. This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. There may be a better place to share the function than LiveIntervals. In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesThis aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted. CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others. It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this but CalcSpillWeights doesn't, so the two diverge. This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. There may be a better place to share the function than LiveIntervals. In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408 Patch is 154.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159180.diff 17 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/LiveIntervals.h b/llvm/include/llvm/CodeGen/LiveIntervals.h
index 1050b3daa0f57..6678ebaac333b 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervals.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervals.h
@@ -466,6 +466,10 @@ class LiveIntervals {
/// have any segments or value numbers.
LLVM_ABI void constructMainRangeFromSubranges(LiveInterval &LI);
+ /// \returns true if all registers used by \p OrigMI at \p OrigIdx are also
+ /// available with the same value at \p UseIdx.
+ bool allUsesAvailableAt(const MachineInstr &MI, SlotIndex UseIdx) const;
+
private:
/// Compute live intervals for all virtual registers.
void computeVirtRegs();
diff --git a/llvm/include/llvm/CodeGen/LiveRangeEdit.h b/llvm/include/llvm/CodeGen/LiveRangeEdit.h
index 3d5df194c71c9..adcca23c24fb3 100644
--- a/llvm/include/llvm/CodeGen/LiveRangeEdit.h
+++ b/llvm/include/llvm/CodeGen/LiveRangeEdit.h
@@ -193,11 +193,6 @@ class LiveRangeEdit : private MachineRegisterInfo::Delegate {
explicit Remat(const VNInfo *ParentVNI) : ParentVNI(ParentVNI) {}
};
- /// allUsesAvailableAt - Return true if all registers used by OrigMI at
- /// OrigIdx are also available with the same value at UseIdx.
- bool allUsesAvailableAt(const MachineInstr *OrigMI, SlotIndex OrigIdx,
- SlotIndex UseIdx) const;
-
/// canRematerializeAt - Determine if ParentVNI can be rematerialized at
/// UseIdx. It is assumed that parent_.getVNINfoAt(UseIdx) == ParentVNI.
bool canRematerializeAt(Remat &RM, VNInfo *OrigVNI, SlotIndex UseIdx);
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index b16694eafd90e..40df8c4415887 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -124,6 +124,17 @@ bool VirtRegAuxInfo::isRematerializable(const LiveInterval &LI,
if (!TII.isTriviallyReMaterializable(*MI))
return false;
+
+ // If MI has register uses, it will only be rematerializable if its uses are
+ // also live at the indices it will be rematerialized at.
+ const MachineRegisterInfo &MRI = MI->getMF()->getRegInfo();
+ for (MachineInstr &Use : MRI.use_instructions(Reg)) {
+ SlotIndex UseIdx = LIS.getInstructionIndex(Use);
+ if (LI.getVNInfoAt(UseIdx) != VNI)
+ continue;
+ if (!LIS.allUsesAvailableAt(*MI, UseIdx))
+ return false;
+ }
}
return true;
}
diff --git a/llvm/lib/CodeGen/LiveIntervals.cpp b/llvm/lib/CodeGen/LiveIntervals.cpp
index 3485a27335f13..24c54fd8ed64b 100644
--- a/llvm/lib/CodeGen/LiveIntervals.cpp
+++ b/llvm/lib/CodeGen/LiveIntervals.cpp
@@ -34,6 +34,7 @@
#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/SlotIndexes.h"
#include "llvm/CodeGen/StackMaps.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/CodeGen/VirtRegMap.h"
@@ -1820,3 +1821,54 @@ void LiveIntervals::constructMainRangeFromSubranges(LiveInterval &LI) {
LICalc->reset(MF, getSlotIndexes(), DomTree, &getVNInfoAllocator());
LICalc->constructMainRangeFromSubranges(LI);
}
+
+bool LiveIntervals::allUsesAvailableAt(const MachineInstr &MI,
+ SlotIndex UseIdx) const {
+ SlotIndex OrigIdx = getInstructionIndex(MI).getRegSlot(true);
+ UseIdx = std::max(UseIdx, UseIdx.getRegSlot(true));
+ for (const MachineOperand &MO : MI.operands()) {
+ if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
+ continue;
+
+ // We can't remat physreg uses, unless it is a constant or target wants
+ // to ignore this use.
+ if (MO.getReg().isPhysical()) {
+ if (MRI->isConstantPhysReg(MO.getReg()) || TII->isIgnorableUse(MO))
+ continue;
+ return false;
+ }
+
+ const LiveInterval &li = getInterval(MO.getReg());
+ const VNInfo *OVNI = li.getVNInfoAt(OrigIdx);
+ if (!OVNI)
+ continue;
+
+ // Don't allow rematerialization immediately after the original def.
+ // It would be incorrect if OrigMI redefines the register.
+ // See PR14098.
+ if (SlotIndex::isSameInstr(OrigIdx, UseIdx))
+ return false;
+
+ if (OVNI != li.getVNInfoAt(UseIdx))
+ return false;
+
+ // Check that subrange is live at UseIdx.
+ if (li.hasSubRanges()) {
+ const TargetRegisterInfo *TRI = MRI->getTargetRegisterInfo();
+ unsigned SubReg = MO.getSubReg();
+ LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
+ : MRI->getMaxLaneMaskForVReg(MO.getReg());
+ for (const LiveInterval::SubRange &SR : li.subranges()) {
+ if ((SR.LaneMask & LM).none())
+ continue;
+ if (!SR.liveAt(UseIdx))
+ return false;
+ // Early exit if all used lanes are checked. No need to continue.
+ LM &= ~SR.LaneMask;
+ if (LM.none())
+ break;
+ }
+ }
+ }
+ return true;
+}
diff --git a/llvm/lib/CodeGen/LiveRangeEdit.cpp b/llvm/lib/CodeGen/LiveRangeEdit.cpp
index 5514e4eb6cf3e..e08451d124606 100644
--- a/llvm/lib/CodeGen/LiveRangeEdit.cpp
+++ b/llvm/lib/CodeGen/LiveRangeEdit.cpp
@@ -101,60 +101,6 @@ bool LiveRangeEdit::anyRematerializable() {
return !Remattable.empty();
}
-/// allUsesAvailableAt - Return true if all registers used by OrigMI at
-/// OrigIdx are also available with the same value at UseIdx.
-bool LiveRangeEdit::allUsesAvailableAt(const MachineInstr *OrigMI,
- SlotIndex OrigIdx,
- SlotIndex UseIdx) const {
- OrigIdx = OrigIdx.getRegSlot(true);
- UseIdx = std::max(UseIdx, UseIdx.getRegSlot(true));
- for (const MachineOperand &MO : OrigMI->operands()) {
- if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
- continue;
-
- // We can't remat physreg uses, unless it is a constant or target wants
- // to ignore this use.
- if (MO.getReg().isPhysical()) {
- if (MRI.isConstantPhysReg(MO.getReg()) || TII.isIgnorableUse(MO))
- continue;
- return false;
- }
-
- LiveInterval &li = LIS.getInterval(MO.getReg());
- const VNInfo *OVNI = li.getVNInfoAt(OrigIdx);
- if (!OVNI)
- continue;
-
- // Don't allow rematerialization immediately after the original def.
- // It would be incorrect if OrigMI redefines the register.
- // See PR14098.
- if (SlotIndex::isSameInstr(OrigIdx, UseIdx))
- return false;
-
- if (OVNI != li.getVNInfoAt(UseIdx))
- return false;
-
- // Check that subrange is live at UseIdx.
- if (li.hasSubRanges()) {
- const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();
- unsigned SubReg = MO.getSubReg();
- LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
- : MRI.getMaxLaneMaskForVReg(MO.getReg());
- for (LiveInterval::SubRange &SR : li.subranges()) {
- if ((SR.LaneMask & LM).none())
- continue;
- if (!SR.liveAt(UseIdx))
- return false;
- // Early exit if all used lanes are checked. No need to continue.
- LM &= ~SR.LaneMask;
- if (LM.none())
- break;
- }
- }
- }
- return true;
-}
-
bool LiveRangeEdit::canRematerializeAt(Remat &RM, VNInfo *OrigVNI,
SlotIndex UseIdx) {
assert(ScannedRemattable && "Call anyRematerializable first");
@@ -164,12 +110,10 @@ bool LiveRangeEdit::canRematerializeAt(Remat &RM, VNInfo *OrigVNI,
return false;
// No defining instruction provided.
- SlotIndex DefIdx;
assert(RM.OrigMI && "No defining instruction for remattable value");
- DefIdx = LIS.getInstructionIndex(*RM.OrigMI);
// Verify that all used registers are available with the same values.
- if (!allUsesAvailableAt(RM.OrigMI, DefIdx, UseIdx))
+ if (!LIS.allUsesAvailableAt(*RM.OrigMI, UseIdx))
return false;
return true;
@@ -230,8 +174,7 @@ bool LiveRangeEdit::foldAsLoad(LiveInterval *LI,
// Since we're moving the DefMI load, make sure we're not extending any live
// ranges.
- if (!allUsesAvailableAt(DefMI, LIS.getInstructionIndex(*DefMI),
- LIS.getInstructionIndex(*UseMI)))
+ if (!LIS.allUsesAvailableAt(*DefMI, LIS.getInstructionIndex(*UseMI)))
return false;
// We also need to make sure it is safe to move the load.
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
index fba27e3d548cf..ee18a426c1b12 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
@@ -2025,7 +2025,8 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: vsetvli a1, zero, e8, mf4, ta, ma
; RV32-NEXT: vmv1r.v v7, v0
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 5
+; RV32-NEXT: li a2, 40
+; RV32-NEXT: mul a1, a1, a2
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
@@ -2036,48 +2037,47 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: sub a3, a0, a1
; RV32-NEXT: addi a2, a2, 1365
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a2
+; RV32-NEXT: vmv.v.x v24, a2
; RV32-NEXT: sltu a2, a0, a3
; RV32-NEXT: addi a2, a2, -1
; RV32-NEXT: and a2, a2, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vsrl.vi v24, v16, 1, v0.t
+; RV32-NEXT: vsrl.vi v8, v16, 1, v0.t
; RV32-NEXT: csrr a3, vlenb
-; RV32-NEXT: li a4, 40
-; RV32-NEXT: mul a3, a3, a4
+; RV32-NEXT: slli a3, a3, 5
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v24, v24, v8, v0.t
-; RV32-NEXT: vsub.vv v16, v16, v24, v0.t
+; RV32-NEXT: vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vand.vv v8, v8, v24, v0.t
+; RV32-NEXT: vsub.vv v8, v16, v8, v0.t
; RV32-NEXT: lui a3, 209715
; RV32-NEXT: addi a3, a3, 819
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v24, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v24, v16, v8, v0.t
-; RV32-NEXT: vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT: vand.vv v16, v8, v24, v0.t
+; RV32-NEXT: vsrl.vi v8, v8, 2, v0.t
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: li a4, 24
; RV32-NEXT: mul a3, a3, a4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v16, v16, v8, v0.t
-; RV32-NEXT: vadd.vv v16, v24, v16, v0.t
-; RV32-NEXT: vsrl.vi v24, v16, 4, v0.t
-; RV32-NEXT: vadd.vv v16, v16, v24, v0.t
+; RV32-NEXT: vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vand.vv v8, v8, v24, v0.t
+; RV32-NEXT: vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT: vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT: vadd.vv v8, v8, v16, v0.t
; RV32-NEXT: lui a3, 61681
; RV32-NEXT: addi a3, a3, -241
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v16, a3
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: slli a3, a3, 4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vs8r.v v16, (a3) # vscale x 64-byte Folded Spill
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v16, v16, v8, v0.t
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
; RV32-NEXT: lui a3, 4112
; RV32-NEXT: addi a3, a3, 257
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
@@ -2098,32 +2098,32 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: mv a0, a1
; RV32-NEXT: .LBB46_2:
; RV32-NEXT: vmv1r.v v0, v7
-; RV32-NEXT: slli a1, a1, 5
+; RV32-NEXT: li a3, 40
+; RV32-NEXT: mul a1, a1, a3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v8, (a1) # vscale x 64-byte Folded Reload
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
; RV32-NEXT: vsrl.vi v16, v8, 1, v0.t
; RV32-NEXT: csrr a0, vlenb
-; RV32-NEXT: li a1, 40
-; RV32-NEXT: mul a0, a0, a1
+; RV32-NEXT: slli a0, a0, 5
; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16
; RV32-NEXT: vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT: vand.vv v24, v16, v24, v0.t
-; RV32-NEXT: vsub.vv v24, v8, v24, v0.t
+; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vsub.vv v16, v8, v16, v0.t
; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: li a1, 24
; RV32-NEXT: mul a0, a0, a1
; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16
-; RV32-NEXT: vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT: vand.vv v8, v24, v16, v0.t
-; RV32-NEXT: vsrl.vi v24, v24, 2, v0.t
-; RV32-NEXT: vand.vv v24, v24, v16, v0.t
-; RV32-NEXT: vadd.vv v8, v8, v24, v0.t
-; RV32-NEXT: vsrl.vi v24, v8, 4, v0.t
-; RV32-NEXT: vadd.vv v8, v8, v24, v0.t
+; RV32-NEXT: vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
+; RV32-NEXT: vand.vv v8, v16, v24, v0.t
+; RV32-NEXT: vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vadd.vv v8, v8, v16, v0.t
+; RV32-NEXT: vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT: vadd.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add a0, sp, a0
@@ -2263,21 +2263,21 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v0, (a4) # vscale x 64-byte Folded Spill
; RV32-NEXT: vand.vv v24, v24, v0
-; RV32-NEXT: vsub.vv v16, v16, v24
+; RV32-NEXT: vsub.vv v24, v16, v24
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
; RV32-NEXT: vmv.v.x v0, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v24, v16, v0
-; RV32-NEXT: vsrl.vi v16, v16, 2
+; RV32-NEXT: vand.vv v16, v24, v0
+; RV32-NEXT: vsrl.vi v24, v24, 2
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: slli a3, a3, 4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
; RV32-NEXT: vs8r.v v0, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v16, v16, v0
+; RV32-NEXT: vand.vv v24, v24, v0
+; RV32-NEXT: vadd.vv v24, v16, v24
+; RV32-NEXT: vsrl.vi v16, v24, 4
; RV32-NEXT: vadd.vv v16, v24, v16
-; RV32-NEXT: vsrl.vi v24, v16, 4
-; RV32-NEXT: vadd.vv v16, v16, v24
; RV32-NEXT: lui a3, 61681
; RV32-NEXT: lui a4, 4112
; RV32-NEXT: addi a3, a3, -241
@@ -2312,16 +2312,16 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
; RV32-NEXT: addi a0, a0, 16
; RV32-NEXT: vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
; RV32-NEXT: vand.vv v24, v24, v0
-; RV32-NEXT: vsub.vv v8, v8, v24
+; RV32-NEXT: vsub.vv v24, v8, v24
; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16
; RV32-NEXT: vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT: vand.vv v24, v8, v0
-; RV32-NEXT: vsrl.vi v8, v8, 2
-; RV32-NEXT: vand.vv v8, v8, v0
-; RV32-NEXT: vadd.vv v8, v24, v8
+; RV32-NEXT: vand.vv v8, v24, v0
+; RV32-NEXT: vsrl.vi v24, v24, 2
+; RV32-NEXT: vand.vv v24, v24, v0
+; RV32-NEXT: vadd.vv v8, v8, v24
; RV32-NEXT: vsrl.vi v24, v8, 4
; RV32-NEXT: vadd.vv v8, v8, v24
; RV32-NEXT: csrr a0, vlenb
diff --git a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
index 6bf882fe47fef..52eaa51051631 100644
--- a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
@@ -2193,7 +2193,8 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: vsetvli a1, zero, e8, mf4, ta, ma
; RV32-NEXT: vmv1r.v v7, v0
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 5
+; RV32-NEXT: li a2, 40
+; RV32-NEXT: mul a1, a1, a2
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
@@ -2207,49 +2208,48 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: lui a3, 349525
; RV32-NEXT: addi a3, a3, 1365
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vadd.vi v24, v16, -1, v0.t
+; RV32-NEXT: vadd.vi v8, v16, -1, v0.t
; RV32-NEXT: vnot.v v16, v16, v0.t
-; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vand.vv v8, v16, v8, v0.t
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v24, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vsrl.vi v24, v16, 1, v0.t
+; RV32-NEXT: vsrl.vi v16, v8, 1, v0.t
; RV32-NEXT: csrr a3, vlenb
-; RV32-NEXT: li a4, 40
-; RV32-NEXT: mul a3, a3, a4
+; RV32-NEXT: slli a3, a3, 5
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v24, v24, v8, v0.t
-; RV32-NEXT: vsub.vv v16, v16, v24, v0.t
+; RV32-NEXT: vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vsub.vv v8, v8, v16, v0.t
; RV32-NEXT: lui a3, 209715
; RV32-NEXT: addi a3, a3, 819
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v24, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v24, v16, v8, v0.t
-; RV32-NEXT: vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT: vand.vv v16, v8, v24, v0.t
+; RV32-NEXT: vsrl.vi v8, v8, 2, v0.t
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: li a4, 24
; RV32-NEXT: mul a3, a3, a4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v16, v16, v8, v0.t
-; RV32-NEXT: vadd.vv v16, v24, v16, v0.t
-; RV32-NEXT: vsrl.vi v24, v16, 4, v0.t
-; RV32-NEXT: vadd.vv v16, v16, v24, v0.t
+; RV32-NEXT: vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vand.vv v8, v8, v24, v0.t
+; RV32-NEXT: vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT: vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT: vadd.vv v8, v8, v16, v0.t
; RV32-NEXT: lui a3, 61681
; RV32-NEXT: addi a3, a3, -241
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v16, a3
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: slli a3, a3, 4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vs8r.v v16, (a3) # vscale x 64-byte Folded Spill
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v16, v16, v8, v0.t
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
; RV32-NEXT: lui a3, 4112
; RV32-NEXT: addi a3, a3, 257
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
@@ -2270,35 +2270,35 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: mv a0, a1
; RV32-NEXT: .LBB46_2:
; RV32-NEXT: vmv1r.v v0, v7
-; RV32-NEXT: slli a1, a1, 5
+; RV32-NEXT: li a3, 40
+; RV32-NEXT: mul a1, a1, a3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v8, (a1) # vscale x 64-byte Folded Reload
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT: vadd.vi v24, v8, -1, v0.t
+; RV32-NEXT: vadd.vi v16, v8, -1, v0.t
; RV32-NEXT: vnot.v v8, v8, v0.t
-; RV32-NEXT: vand.vv v8, v8, v24, v0.t
+; RV32-NEXT: vand.vv v8, v8, v16, v0.t
; RV32-NEXT: vsrl.vi v16, v8, 1, v0.t
; RV32-NEXT: csrr a0, vlenb
-; RV32-NEXT: li a1, 40
-; RV32-NEXT: mul a0, a0, a1
+; RV32-NEXT: slli a0, a0, 5
; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16
; RV32-NEXT: vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT: vand.vv v24, v16, v24, v0.t
-; RV32-NEXT: vsub.vv v8, v8, v24, v0.t
+; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vsub.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a0, vlenb
; RV32-NE...
[truncated]
|
@llvm/pr-subscribers-llvm-regalloc Author: Luke Lau (lukel97) ChangesThis aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted. CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others. It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this but CalcSpillWeights doesn't, so the two diverge. This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. There may be a better place to share the function than LiveIntervals. In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408 Patch is 154.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159180.diff 17 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/LiveIntervals.h b/llvm/include/llvm/CodeGen/LiveIntervals.h
index 1050b3daa0f57..6678ebaac333b 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervals.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervals.h
@@ -466,6 +466,10 @@ class LiveIntervals {
/// have any segments or value numbers.
LLVM_ABI void constructMainRangeFromSubranges(LiveInterval &LI);
+ /// \returns true if all registers used by \p OrigMI at \p OrigIdx are also
+ /// available with the same value at \p UseIdx.
+ bool allUsesAvailableAt(const MachineInstr &MI, SlotIndex UseIdx) const;
+
private:
/// Compute live intervals for all virtual registers.
void computeVirtRegs();
diff --git a/llvm/include/llvm/CodeGen/LiveRangeEdit.h b/llvm/include/llvm/CodeGen/LiveRangeEdit.h
index 3d5df194c71c9..adcca23c24fb3 100644
--- a/llvm/include/llvm/CodeGen/LiveRangeEdit.h
+++ b/llvm/include/llvm/CodeGen/LiveRangeEdit.h
@@ -193,11 +193,6 @@ class LiveRangeEdit : private MachineRegisterInfo::Delegate {
explicit Remat(const VNInfo *ParentVNI) : ParentVNI(ParentVNI) {}
};
- /// allUsesAvailableAt - Return true if all registers used by OrigMI at
- /// OrigIdx are also available with the same value at UseIdx.
- bool allUsesAvailableAt(const MachineInstr *OrigMI, SlotIndex OrigIdx,
- SlotIndex UseIdx) const;
-
/// canRematerializeAt - Determine if ParentVNI can be rematerialized at
/// UseIdx. It is assumed that parent_.getVNINfoAt(UseIdx) == ParentVNI.
bool canRematerializeAt(Remat &RM, VNInfo *OrigVNI, SlotIndex UseIdx);
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index b16694eafd90e..40df8c4415887 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -124,6 +124,17 @@ bool VirtRegAuxInfo::isRematerializable(const LiveInterval &LI,
if (!TII.isTriviallyReMaterializable(*MI))
return false;
+
+ // If MI has register uses, it will only be rematerializable if its uses are
+ // also live at the indices it will be rematerialized at.
+ const MachineRegisterInfo &MRI = MI->getMF()->getRegInfo();
+ for (MachineInstr &Use : MRI.use_instructions(Reg)) {
+ SlotIndex UseIdx = LIS.getInstructionIndex(Use);
+ if (LI.getVNInfoAt(UseIdx) != VNI)
+ continue;
+ if (!LIS.allUsesAvailableAt(*MI, UseIdx))
+ return false;
+ }
}
return true;
}
diff --git a/llvm/lib/CodeGen/LiveIntervals.cpp b/llvm/lib/CodeGen/LiveIntervals.cpp
index 3485a27335f13..24c54fd8ed64b 100644
--- a/llvm/lib/CodeGen/LiveIntervals.cpp
+++ b/llvm/lib/CodeGen/LiveIntervals.cpp
@@ -34,6 +34,7 @@
#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/SlotIndexes.h"
#include "llvm/CodeGen/StackMaps.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/CodeGen/VirtRegMap.h"
@@ -1820,3 +1821,54 @@ void LiveIntervals::constructMainRangeFromSubranges(LiveInterval &LI) {
LICalc->reset(MF, getSlotIndexes(), DomTree, &getVNInfoAllocator());
LICalc->constructMainRangeFromSubranges(LI);
}
+
+bool LiveIntervals::allUsesAvailableAt(const MachineInstr &MI,
+ SlotIndex UseIdx) const {
+ SlotIndex OrigIdx = getInstructionIndex(MI).getRegSlot(true);
+ UseIdx = std::max(UseIdx, UseIdx.getRegSlot(true));
+ for (const MachineOperand &MO : MI.operands()) {
+ if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
+ continue;
+
+ // We can't remat physreg uses, unless it is a constant or target wants
+ // to ignore this use.
+ if (MO.getReg().isPhysical()) {
+ if (MRI->isConstantPhysReg(MO.getReg()) || TII->isIgnorableUse(MO))
+ continue;
+ return false;
+ }
+
+ const LiveInterval &li = getInterval(MO.getReg());
+ const VNInfo *OVNI = li.getVNInfoAt(OrigIdx);
+ if (!OVNI)
+ continue;
+
+ // Don't allow rematerialization immediately after the original def.
+ // It would be incorrect if OrigMI redefines the register.
+ // See PR14098.
+ if (SlotIndex::isSameInstr(OrigIdx, UseIdx))
+ return false;
+
+ if (OVNI != li.getVNInfoAt(UseIdx))
+ return false;
+
+ // Check that subrange is live at UseIdx.
+ if (li.hasSubRanges()) {
+ const TargetRegisterInfo *TRI = MRI->getTargetRegisterInfo();
+ unsigned SubReg = MO.getSubReg();
+ LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
+ : MRI->getMaxLaneMaskForVReg(MO.getReg());
+ for (const LiveInterval::SubRange &SR : li.subranges()) {
+ if ((SR.LaneMask & LM).none())
+ continue;
+ if (!SR.liveAt(UseIdx))
+ return false;
+ // Early exit if all used lanes are checked. No need to continue.
+ LM &= ~SR.LaneMask;
+ if (LM.none())
+ break;
+ }
+ }
+ }
+ return true;
+}
diff --git a/llvm/lib/CodeGen/LiveRangeEdit.cpp b/llvm/lib/CodeGen/LiveRangeEdit.cpp
index 5514e4eb6cf3e..e08451d124606 100644
--- a/llvm/lib/CodeGen/LiveRangeEdit.cpp
+++ b/llvm/lib/CodeGen/LiveRangeEdit.cpp
@@ -101,60 +101,6 @@ bool LiveRangeEdit::anyRematerializable() {
return !Remattable.empty();
}
-/// allUsesAvailableAt - Return true if all registers used by OrigMI at
-/// OrigIdx are also available with the same value at UseIdx.
-bool LiveRangeEdit::allUsesAvailableAt(const MachineInstr *OrigMI,
- SlotIndex OrigIdx,
- SlotIndex UseIdx) const {
- OrigIdx = OrigIdx.getRegSlot(true);
- UseIdx = std::max(UseIdx, UseIdx.getRegSlot(true));
- for (const MachineOperand &MO : OrigMI->operands()) {
- if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
- continue;
-
- // We can't remat physreg uses, unless it is a constant or target wants
- // to ignore this use.
- if (MO.getReg().isPhysical()) {
- if (MRI.isConstantPhysReg(MO.getReg()) || TII.isIgnorableUse(MO))
- continue;
- return false;
- }
-
- LiveInterval &li = LIS.getInterval(MO.getReg());
- const VNInfo *OVNI = li.getVNInfoAt(OrigIdx);
- if (!OVNI)
- continue;
-
- // Don't allow rematerialization immediately after the original def.
- // It would be incorrect if OrigMI redefines the register.
- // See PR14098.
- if (SlotIndex::isSameInstr(OrigIdx, UseIdx))
- return false;
-
- if (OVNI != li.getVNInfoAt(UseIdx))
- return false;
-
- // Check that subrange is live at UseIdx.
- if (li.hasSubRanges()) {
- const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();
- unsigned SubReg = MO.getSubReg();
- LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
- : MRI.getMaxLaneMaskForVReg(MO.getReg());
- for (LiveInterval::SubRange &SR : li.subranges()) {
- if ((SR.LaneMask & LM).none())
- continue;
- if (!SR.liveAt(UseIdx))
- return false;
- // Early exit if all used lanes are checked. No need to continue.
- LM &= ~SR.LaneMask;
- if (LM.none())
- break;
- }
- }
- }
- return true;
-}
-
bool LiveRangeEdit::canRematerializeAt(Remat &RM, VNInfo *OrigVNI,
SlotIndex UseIdx) {
assert(ScannedRemattable && "Call anyRematerializable first");
@@ -164,12 +110,10 @@ bool LiveRangeEdit::canRematerializeAt(Remat &RM, VNInfo *OrigVNI,
return false;
// No defining instruction provided.
- SlotIndex DefIdx;
assert(RM.OrigMI && "No defining instruction for remattable value");
- DefIdx = LIS.getInstructionIndex(*RM.OrigMI);
// Verify that all used registers are available with the same values.
- if (!allUsesAvailableAt(RM.OrigMI, DefIdx, UseIdx))
+ if (!LIS.allUsesAvailableAt(*RM.OrigMI, UseIdx))
return false;
return true;
@@ -230,8 +174,7 @@ bool LiveRangeEdit::foldAsLoad(LiveInterval *LI,
// Since we're moving the DefMI load, make sure we're not extending any live
// ranges.
- if (!allUsesAvailableAt(DefMI, LIS.getInstructionIndex(*DefMI),
- LIS.getInstructionIndex(*UseMI)))
+ if (!LIS.allUsesAvailableAt(*DefMI, LIS.getInstructionIndex(*UseMI)))
return false;
// We also need to make sure it is safe to move the load.
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
index fba27e3d548cf..ee18a426c1b12 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
@@ -2025,7 +2025,8 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: vsetvli a1, zero, e8, mf4, ta, ma
; RV32-NEXT: vmv1r.v v7, v0
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 5
+; RV32-NEXT: li a2, 40
+; RV32-NEXT: mul a1, a1, a2
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
@@ -2036,48 +2037,47 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: sub a3, a0, a1
; RV32-NEXT: addi a2, a2, 1365
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a2
+; RV32-NEXT: vmv.v.x v24, a2
; RV32-NEXT: sltu a2, a0, a3
; RV32-NEXT: addi a2, a2, -1
; RV32-NEXT: and a2, a2, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vsrl.vi v24, v16, 1, v0.t
+; RV32-NEXT: vsrl.vi v8, v16, 1, v0.t
; RV32-NEXT: csrr a3, vlenb
-; RV32-NEXT: li a4, 40
-; RV32-NEXT: mul a3, a3, a4
+; RV32-NEXT: slli a3, a3, 5
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v24, v24, v8, v0.t
-; RV32-NEXT: vsub.vv v16, v16, v24, v0.t
+; RV32-NEXT: vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vand.vv v8, v8, v24, v0.t
+; RV32-NEXT: vsub.vv v8, v16, v8, v0.t
; RV32-NEXT: lui a3, 209715
; RV32-NEXT: addi a3, a3, 819
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v24, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v24, v16, v8, v0.t
-; RV32-NEXT: vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT: vand.vv v16, v8, v24, v0.t
+; RV32-NEXT: vsrl.vi v8, v8, 2, v0.t
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: li a4, 24
; RV32-NEXT: mul a3, a3, a4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v16, v16, v8, v0.t
-; RV32-NEXT: vadd.vv v16, v24, v16, v0.t
-; RV32-NEXT: vsrl.vi v24, v16, 4, v0.t
-; RV32-NEXT: vadd.vv v16, v16, v24, v0.t
+; RV32-NEXT: vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vand.vv v8, v8, v24, v0.t
+; RV32-NEXT: vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT: vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT: vadd.vv v8, v8, v16, v0.t
; RV32-NEXT: lui a3, 61681
; RV32-NEXT: addi a3, a3, -241
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v16, a3
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: slli a3, a3, 4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vs8r.v v16, (a3) # vscale x 64-byte Folded Spill
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v16, v16, v8, v0.t
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
; RV32-NEXT: lui a3, 4112
; RV32-NEXT: addi a3, a3, 257
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
@@ -2098,32 +2098,32 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: mv a0, a1
; RV32-NEXT: .LBB46_2:
; RV32-NEXT: vmv1r.v v0, v7
-; RV32-NEXT: slli a1, a1, 5
+; RV32-NEXT: li a3, 40
+; RV32-NEXT: mul a1, a1, a3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v8, (a1) # vscale x 64-byte Folded Reload
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
; RV32-NEXT: vsrl.vi v16, v8, 1, v0.t
; RV32-NEXT: csrr a0, vlenb
-; RV32-NEXT: li a1, 40
-; RV32-NEXT: mul a0, a0, a1
+; RV32-NEXT: slli a0, a0, 5
; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16
; RV32-NEXT: vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT: vand.vv v24, v16, v24, v0.t
-; RV32-NEXT: vsub.vv v24, v8, v24, v0.t
+; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vsub.vv v16, v8, v16, v0.t
; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: li a1, 24
; RV32-NEXT: mul a0, a0, a1
; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16
-; RV32-NEXT: vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT: vand.vv v8, v24, v16, v0.t
-; RV32-NEXT: vsrl.vi v24, v24, 2, v0.t
-; RV32-NEXT: vand.vv v24, v24, v16, v0.t
-; RV32-NEXT: vadd.vv v8, v8, v24, v0.t
-; RV32-NEXT: vsrl.vi v24, v8, 4, v0.t
-; RV32-NEXT: vadd.vv v8, v8, v24, v0.t
+; RV32-NEXT: vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
+; RV32-NEXT: vand.vv v8, v16, v24, v0.t
+; RV32-NEXT: vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vadd.vv v8, v8, v16, v0.t
+; RV32-NEXT: vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT: vadd.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add a0, sp, a0
@@ -2263,21 +2263,21 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v0, (a4) # vscale x 64-byte Folded Spill
; RV32-NEXT: vand.vv v24, v24, v0
-; RV32-NEXT: vsub.vv v16, v16, v24
+; RV32-NEXT: vsub.vv v24, v16, v24
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
; RV32-NEXT: vmv.v.x v0, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v24, v16, v0
-; RV32-NEXT: vsrl.vi v16, v16, 2
+; RV32-NEXT: vand.vv v16, v24, v0
+; RV32-NEXT: vsrl.vi v24, v24, 2
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: slli a3, a3, 4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
; RV32-NEXT: vs8r.v v0, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v16, v16, v0
+; RV32-NEXT: vand.vv v24, v24, v0
+; RV32-NEXT: vadd.vv v24, v16, v24
+; RV32-NEXT: vsrl.vi v16, v24, 4
; RV32-NEXT: vadd.vv v16, v24, v16
-; RV32-NEXT: vsrl.vi v24, v16, 4
-; RV32-NEXT: vadd.vv v16, v16, v24
; RV32-NEXT: lui a3, 61681
; RV32-NEXT: lui a4, 4112
; RV32-NEXT: addi a3, a3, -241
@@ -2312,16 +2312,16 @@ define <vscale x 16 x i64> @vp_ctpop_nxv16i64_unmasked(<vscale x 16 x i64> %va,
; RV32-NEXT: addi a0, a0, 16
; RV32-NEXT: vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
; RV32-NEXT: vand.vv v24, v24, v0
-; RV32-NEXT: vsub.vv v8, v8, v24
+; RV32-NEXT: vsub.vv v24, v8, v24
; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16
; RV32-NEXT: vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT: vand.vv v24, v8, v0
-; RV32-NEXT: vsrl.vi v8, v8, 2
-; RV32-NEXT: vand.vv v8, v8, v0
-; RV32-NEXT: vadd.vv v8, v24, v8
+; RV32-NEXT: vand.vv v8, v24, v0
+; RV32-NEXT: vsrl.vi v24, v24, 2
+; RV32-NEXT: vand.vv v24, v24, v0
+; RV32-NEXT: vadd.vv v8, v8, v24
; RV32-NEXT: vsrl.vi v24, v8, 4
; RV32-NEXT: vadd.vv v8, v8, v24
; RV32-NEXT: csrr a0, vlenb
diff --git a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
index 6bf882fe47fef..52eaa51051631 100644
--- a/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll
@@ -2193,7 +2193,8 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: vsetvli a1, zero, e8, mf4, ta, ma
; RV32-NEXT: vmv1r.v v7, v0
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 5
+; RV32-NEXT: li a2, 40
+; RV32-NEXT: mul a1, a1, a2
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
@@ -2207,49 +2208,48 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: lui a3, 349525
; RV32-NEXT: addi a3, a3, 1365
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vadd.vi v24, v16, -1, v0.t
+; RV32-NEXT: vadd.vi v8, v16, -1, v0.t
; RV32-NEXT: vnot.v v16, v16, v0.t
-; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vand.vv v8, v16, v8, v0.t
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v24, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vsrl.vi v24, v16, 1, v0.t
+; RV32-NEXT: vsrl.vi v16, v8, 1, v0.t
; RV32-NEXT: csrr a3, vlenb
-; RV32-NEXT: li a4, 40
-; RV32-NEXT: mul a3, a3, a4
+; RV32-NEXT: slli a3, a3, 5
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v24, v24, v8, v0.t
-; RV32-NEXT: vsub.vv v16, v16, v24, v0.t
+; RV32-NEXT: vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vsub.vv v8, v8, v16, v0.t
; RV32-NEXT: lui a3, 209715
; RV32-NEXT: addi a3, a3, 819
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v24, a3
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v24, v16, v8, v0.t
-; RV32-NEXT: vsrl.vi v16, v16, 2, v0.t
+; RV32-NEXT: vand.vv v16, v8, v24, v0.t
+; RV32-NEXT: vsrl.vi v8, v8, 2, v0.t
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: li a4, 24
; RV32-NEXT: mul a3, a3, a4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
-; RV32-NEXT: vand.vv v16, v16, v8, v0.t
-; RV32-NEXT: vadd.vv v16, v24, v16, v0.t
-; RV32-NEXT: vsrl.vi v24, v16, 4, v0.t
-; RV32-NEXT: vadd.vv v16, v16, v24, v0.t
+; RV32-NEXT: vs8r.v v24, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vand.vv v8, v8, v24, v0.t
+; RV32-NEXT: vadd.vv v8, v16, v8, v0.t
+; RV32-NEXT: vsrl.vi v16, v8, 4, v0.t
+; RV32-NEXT: vadd.vv v8, v8, v16, v0.t
; RV32-NEXT: lui a3, 61681
; RV32-NEXT: addi a3, a3, -241
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
-; RV32-NEXT: vmv.v.x v8, a3
+; RV32-NEXT: vmv.v.x v16, a3
; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: slli a3, a3, 4
; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16
-; RV32-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill
+; RV32-NEXT: vs8r.v v16, (a3) # vscale x 64-byte Folded Spill
; RV32-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; RV32-NEXT: vand.vv v16, v16, v8, v0.t
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
; RV32-NEXT: lui a3, 4112
; RV32-NEXT: addi a3, a3, 257
; RV32-NEXT: vsetvli a4, zero, e32, m8, ta, ma
@@ -2270,35 +2270,35 @@ define <vscale x 16 x i64> @vp_cttz_nxv16i64(<vscale x 16 x i64> %va, <vscale x
; RV32-NEXT: mv a0, a1
; RV32-NEXT: .LBB46_2:
; RV32-NEXT: vmv1r.v v0, v7
-; RV32-NEXT: slli a1, a1, 5
+; RV32-NEXT: li a3, 40
+; RV32-NEXT: mul a1, a1, a3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v8, (a1) # vscale x 64-byte Folded Reload
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT: vadd.vi v24, v8, -1, v0.t
+; RV32-NEXT: vadd.vi v16, v8, -1, v0.t
; RV32-NEXT: vnot.v v8, v8, v0.t
-; RV32-NEXT: vand.vv v8, v8, v24, v0.t
+; RV32-NEXT: vand.vv v8, v8, v16, v0.t
; RV32-NEXT: vsrl.vi v16, v8, 1, v0.t
; RV32-NEXT: csrr a0, vlenb
-; RV32-NEXT: li a1, 40
-; RV32-NEXT: mul a0, a0, a1
+; RV32-NEXT: slli a0, a0, 5
; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16
; RV32-NEXT: vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
-; RV32-NEXT: vand.vv v24, v16, v24, v0.t
-; RV32-NEXT: vsub.vv v8, v8, v24, v0.t
+; RV32-NEXT: vand.vv v16, v16, v24, v0.t
+; RV32-NEXT: vsub.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a0, vlenb
; RV32-NE...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
I think there's a simpler alternative here which should be considered. Instead of attempting to extend the reasoning to handle live intervals, we should simply update the code to explicitly bail out for any vreg uses. This is non-functional with the current code structure, and would keep this code path performing the same heuristic when combined with your second change. I'm open to being convinced that the complexity of actually checking liveness is needed, but I think that needs explicitly justified. I believe weights aren't recalculated, so there's some complexity/staleness risk as we run through the allocation queue. I don't actually think this is wrong, but if we don't have to have the complexity, I'd rather not. |
In principle it looks correct, but it is really hard to reason about massive test changes. |
I went ahead and tried out the simpler approach of considering any vreg uses as non-rematerializable, i.e: - // If MI has register uses, it will only be rematerializable if its uses are
- // also live at the indices it will be rematerialized at.
- SmallVector<Register, 8> NewRegs;
- LiveRangeEdit LRE(nullptr, NewRegs, *MI->getMF(), LIS, nullptr);
- const MachineRegisterInfo &MRI = MI->getMF()->getRegInfo();
- for (MachineInstr &Use : MRI.use_nodbg_instructions(Reg)) {
- SlotIndex UseIdx = LIS.getInstructionIndex(Use);
- if (LI.getVNInfoAt(UseIdx) != VNI)
- continue;
- if (!LRE.allUsesAvailableAt(MI, VNI->def, UseIdx))
+ for (auto &MO : MI->explicit_uses())
+ if (MO.isReg() && MO.getReg().isVirtual())
return false;
- } When applied on top of #159211 we get slightly more spills and reloads vs checking the liveness. On arm64-apple-darwin there's 0.5% geomean increase in spills and reloads on llvm-test-suite, and a 1.5% increase on riscv64-linux-gnu -march=rva23u64. On SPEC CPU 2017 rva23u64 there's a 2.2/2.4% geomean increase in spills/reloads, but 14% more on 519.lbm_r Results for SPEC CPU 2017 rva23u64 check liveness vs no virt reg uses
The weights are usually calculated once at the start, but if a live range is spilled or split then it recomputes the weight of the interval that changed in Ultimately the spill weight is a heuristic so we have the choice of how accurate we want it to be. My guess is that the extra complexity of checking the liveness is worth an extra 2.2% reduction in spills/reloads, but recalculating the weights of uses isn't worth the 0.1%. Not a strongly held opinion though, I'm open to avoiding the liveness check if reviewers prefer. |
The sheer size of the test diff mostly comes from those amdgcn.bitcast tests. amgdcn.bitcast.1024.bit.ll is 237k lines alone. Do people read the FileCheck output of it or is it mostly to check that it doesn't crash? It's worth noting though that most of the test diffs here are just noise. In practice on AArch64/RISC-V/X86 there are no changes in llvm-test-suite/SPEC CPU 2017. |
This reverts commit 0ca6a24.
…ematerializable-allUsesAvailableAt
Just noting that I have read this whole comment, and find it convincing. Thank you for doing the due diligence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to update the comment on the call to isRematerializable
We could consider a deeper rework of this. The 0.5 multiply is somewhat arbitrary, and we could consider e.g. scaling that decrease by the proportion of defs which are materialized. There's a bunch of option for playing with this if you want.
// If MI has register uses, it will only be rematerializable if its uses are | ||
// also live at the indices it will be rematerialized at. | ||
for (MachineOperand &MO : MRI.reg_nodbg_operands(LI.reg())) { | ||
if (MO.isUndef() || !MO.readsReg()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (MO.isUndef() || !MO.readsReg()) | |
if (!MO.readsReg()) |
Redundant undef check
…ematerializable-allUsesAvailableAt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the RISC-V tests and heuristic change. Please wait for @arsenm to confirm, particularly the AMDGPU tests. .
; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr1 | ||
; GCN-NEXT: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) | ||
; GCN-NEXT: SI_SPILL_S32_SAVE killed renamable $sgpr0, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) | ||
; GCN-NEXT: renamable $sgpr0 = IMPLICIT_DEF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this grew by one IMPLICIT_DEF
; CHECK-NEXT: SI_SPILL_AV128_SAVE $vgpr4_vgpr5_vgpr6_vgpr7, %stack.4, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.4, align 4, addrspace 5) | ||
; CHECK-NEXT: renamable $vgpr0 = V_TRUNC_F32_e32 killed $vgpr0, implicit $mode, implicit $exec | ||
; CHECK-NEXT: SI_SPILL_AV32_SAVE killed $vgpr0, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5) | ||
; CHECK-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = SI_SPILL_AV128_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.2, align 4, addrspace 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This avoided a lot of spills. However there's a good chance this no longer is going down the last chance recoloring path, but that's inevitable with these sorts of tests
Stacked on llvm#159180. Unless overridden by the target, we currently only allow rematerlization of instructions with immediate or constant physical register operands, i.e. no virtual registers. The comment states that this is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. This patch relaxes this constraint which reduces a significant amount of reloads across various targets. This is another attempt at https://reviews.llvm.org/D106408, but llvm#159180 aims to have addressed the issue with the weights that may have caused the previous regressions.
…e weight discount (llvm#159180) This aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted. CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others. It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this but CalcSpillWeights doesn't, so the two diverge. This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408
#159211) In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in `TargetTransformInfo::isReMaterializableImpl`. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However #160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and #160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets. For llvm-test-suite built with -O3 -flto, we get the following geomean reduction in reloads: - arm64-apple-darwin: 11.6% - riscv64-linux-gnu: 8.1% - x86_64-linux-gnu: 6.5%
llvm#159211) In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in `TargetTransformInfo::isReMaterializableImpl`. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However llvm#160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and llvm#160709 and llvm#159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets. For llvm-test-suite built with -O3 -flto, we get the following geomean reduction in reloads: - arm64-apple-darwin: 11.6% - riscv64-linux-gnu: 8.1% - x86_64-linux-gnu: 6.5%
This aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted.
CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others.
It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this but CalcSpillWeights doesn't, so the two diverge.
This fixes it by also checking allUsesAvailableAt in CalcSpillWeights.
In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408