Skip to content

Conversation

preames
Copy link
Collaborator

@preames preames commented Sep 22, 2025

This is a proof of concept to present an alternative approach to @lukel97 's #159211. I am very supportive of the goal, but want to suggest an alternate approach.

This PR has two commits. The first reworks the existing isReallyTriviallyReMaterializable target hook, and plumbs a boolean through so that we can distinguish between trivial and non-trivial rematerialization. (see the existing comments in the TargetInstInfo.h header for the precise meaning of trivial) The second adds the raw interface (implementation to be cleaned up), and illustrates it's use in a single call site.

The motivation here is that I believe we are going to need to have a conceptual distinction between trivial and non-trivial rematerialization. I see three basic groupings in the usage:

  • Some callers are immediately going to materialize at some specific place. Examples of the first include RegisterCoalescer, SplitKit, and InlineSpiller.
  • Some callers are heuristics trying to reason about future guaranteed remat. Examples include MachineLICM. If you look at the current implementation, you'll notice it is doing this as a post check specifically because some of our backends lie about what is trivial vs non-trivial. (AMDGPU and RISC-V are the most guilty...)
  • Some callers are heuristics which are somewhere in between. Examples include CalcSpillWeights. One advantage of staging the change this way is to allow individual review and discussion of these changes.

An alternate API approach would be to update TTI to always return the non-trivial result, and then copy the MachineLICM approach of explicitly checking for non-triviality afterwards. I mildly prefer this approach, mostly because a target could chose to disable non-trivial remat even if the caller could handle it. This seems like a useful way to allow staging and per-target selection, but the per caller opt-in is the more important bit.

@llvmbot
Copy link
Member

llvmbot commented Sep 22, 2025

@llvm/pr-subscribers-llvm-regalloc
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-backend-aarch64

Author: Philip Reames (preames)

Changes

This is a proof of concept to present an alternative approach to @lukel97 's #159211. I am very supportive of the goal, but want to suggest an alternate approach.

This PR has two commits. The first reworks the existing isReallyTriviallyReMaterializable target hook, and plumbs a boolean through so that we can distinguish between trivial and non-trivial rematerialization. (see the existing comments in the TargetInstInfo.h header for the precise meaning of trivial) The second adds the raw interface (implementation to be cleaned up), and illustrates it's use in a single call site.

The motivation here is that I believe we are going to need to have a conceptual distinction between trivial and non-trivial rematerialization. I see three basic groupings in the usage:

  • Some callers are immediately going to materialize at some specific place. Examples of the first include RegisterCoalescer, SplitKit, and InlineSpiller.
  • Some callers are heuristics trying to reason about future guaranteed remat. Examples include MachineLICM. If you look at the current implementation, you'll notice it is doing this as a post check specifically because some of our backends lie about what is trivial vs non-trivial. (AMDGPU and RISC-V are the most guilty...)
  • Some callers are heuristics which are somewhere in between. Examples include CalcSpillWeights. One advantage of staging the change this way is to allow individual review and discussion of these changes.

An alternate API approach would be to update TTI to always return the non-trivial result, and then copy the MachineLICM approach of explicitly checking for non-triviality afterwards. I mildly prefer this approach, mostly because a target could chose to disable non-trivial remat even if the caller could handle it. This seems like a useful way to allow staging and per-target selection, but the per caller opt-in is the more important bit.


Patch is 39.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160153.diff

28 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+10-2)
  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (+7-6)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+4-3)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp (+3-3)
  • (modified) llvm/lib/Target/ARM/ARMBaseInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.td (+1-1)
  • (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.cpp (+3-3)
  • (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+5-3)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+1-1)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp (+4-4)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+3-3)
  • (modified) llvm/lib/Target/X86/X86InstrInfo.h (+2-1)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/machine-combiner-copy.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll (+23-24)
  • (modified) llvm/test/CodeGen/AArch64/peephole-and-tst.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/tbl-loops.ll (+4-4)
  • (modified) llvm/test/CodeGen/SystemZ/llvm.sincos.ll (+2-2)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/reductions.ll (+3-3)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll (+6-7)
  • (modified) llvm/test/CodeGen/Thumb2/mve-phireg.ll (+1-1)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll (+49-46)
  • (modified) llvm/test/CodeGen/Thumb2/mve-qrintrsplat.ll (+10-12)
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index f2ad5ee249b46..b786f7d8565a4 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -171,7 +171,14 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
     return (MI.getOpcode() == TargetOpcode::IMPLICIT_DEF &&
             MI.getNumOperands() == 1) ||
            (MI.getDesc().isRematerializable() &&
-            isReallyTriviallyReMaterializable(MI));
+            isReMaterializableImpl(MI, true));
+  }
+
+  bool isReMaterializable(const MachineInstr &MI) const {
+    return (MI.getOpcode() == TargetOpcode::IMPLICIT_DEF &&
+            MI.getNumOperands() == 1) ||
+           (MI.getDesc().isRematerializable() &&
+            isReMaterializableImpl(MI, false));
   }
 
   /// Given \p MO is a PhysReg use return if it can be ignored for the purpose
@@ -198,7 +205,8 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
   /// predicate must return false if the instruction has any side effects other
   /// than producing a value, or if it requres any address registers that are
   /// not always available.
-  virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const;
+  virtual bool isReMaterializableImpl(const MachineInstr &MI,
+                                      bool TrivialOnly) const;
 
   /// This method commutes the operands of the given machine instruction MI.
   /// The operands to be commuted are specified by their indices OpIdx1 and
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index db00f54daeb62..2d84d6b385956 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1325,7 +1325,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
   if (!TII->isAsCheapAsAMove(*DefMI))
     return false;
 
-  if (!TII->isTriviallyReMaterializable(*DefMI))
+  if (!TII->isReMaterializable(*DefMI))
     return false;
 
   if (!definesFullReg(*DefMI, SrcReg))
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 5be89b49fb6ba..da91bc186aac9 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -1590,8 +1590,8 @@ MachineTraceStrategy TargetInstrInfo::getMachineCombinerTraceStrategy() const {
   return MachineTraceStrategy::TS_MinInstrCount;
 }
 
-bool TargetInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool TargetInstrInfo::isReMaterializableImpl(
+    const MachineInstr &MI, bool OnlyTrivial) const {
   const MachineFunction &MF = *MI.getMF();
   const MachineRegisterInfo &MRI = MF.getRegInfo();
 
@@ -1658,10 +1658,11 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializable(
     if (MO.isDef() && Reg != DefReg)
       return false;
 
-    // Don't allow any virtual-register uses. Rematting an instruction with
-    // virtual register uses would length the live ranges of the uses, which
-    // is not necessarily a good idea, certainly not "trivial".
-    if (MO.isUse())
+    // If asked for trivial materialization, don't allow any virtual-register
+    // uses. Rematting an instruction with virtual register uses would length
+    // the live ranges of the uses, which means rematerialization must become
+    // a per-user query which many callers don't want.
+    if (OnlyTrivial && MO.isUse())
       return false;
   }
 
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 13d05ee54d7b3..dc9e41aa097fb 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -124,8 +124,8 @@ static bool canRemat(const MachineInstr &MI) {
   return false;
 }
 
-bool SIInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool SIInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                         bool OnlyTrivial) const {
 
   if (canRemat(MI)) {
     // Normally VALU use of exec would block the rematerialization, but that
@@ -139,13 +139,14 @@ bool SIInstrInfo::isReallyTriviallyReMaterializable(
     // There is difference to generic method which does not allow
     // rematerialization if there are virtual register uses. We allow this,
     // therefore this method includes SOP instructions as well.
+    // FIXME: This should only be done if OnlyTrivial is setup.
     if (!MI.hasImplicitDef() &&
         MI.getNumImplicitOperands() == MI.getDesc().implicit_uses().size() &&
         !MI.mayRaiseFPException())
       return true;
   }
 
-  return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+  return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
 }
 
 // Returns true if the scalar result of a VALU instruction depends on exec.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index e249fc6cbb79d..300aeaa31941e 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -244,7 +244,8 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
     return ST;
   }
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
 
   bool isIgnorableUse(const MachineOperand &MO) const override;
 
diff --git a/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp b/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
index 5c35b3327c16d..9eb65533b682d 100644
--- a/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
@@ -6510,14 +6510,14 @@ bool ARMBaseInstrInfo::shouldOutlineFromFunctionByDefault(
   return Subtarget.isMClass() && MF.getFunction().hasMinSize();
 }
 
-bool ARMBaseInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool ARMBaseInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                              bool TrivialOnly) const {
   // Try hard to rematerialize any VCTPs because if we spill P0, it will block
   // the tail predication conversion. This means that the element count
   // register has to be live for longer, but that has to be better than
   // spill/restore and VPT predication.
   return (isVCTP(&MI) && !isPredicated(MI)) ||
-         TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+         TargetInstrInfo::isReMaterializableImpl(MI, TrivialOnly);
 }
 
 unsigned llvm::getBLXOpcode(const MachineFunction &MF) {
diff --git a/llvm/lib/Target/ARM/ARMBaseInstrInfo.h b/llvm/lib/Target/ARM/ARMBaseInstrInfo.h
index 71de3c6ad597a..57c53036349a6 100644
--- a/llvm/lib/Target/ARM/ARMBaseInstrInfo.h
+++ b/llvm/lib/Target/ARM/ARMBaseInstrInfo.h
@@ -479,7 +479,8 @@ class ARMBaseInstrInfo : public ARMGenInstrInfo {
   MachineInstr *canFoldIntoMOVCC(Register Reg, const MachineRegisterInfo &MRI,
                                  const TargetInstrInfo *TII) const;
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
 
 private:
   /// Modeling special VFP / NEON fp MLA / MLS hazards.
diff --git a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
index 20ccc622f58dc..9565a55e4c6c5 100644
--- a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
@@ -943,7 +943,7 @@ let Predicates = [IsLA64] in {
 def ADD_D : ALU_3R<0x00108000>;
 def SUB_D : ALU_3R<0x00118000>;
 // ADDI_D isn't always rematerializable, but isReMaterializable will be used as
-// a hint which is verified in isReallyTriviallyReMaterializable.
+// a hint which is verified in isReMaterializableImpl.
 // See LoongArchInstrInfo::isAsCheapAsAMove for more details.
 let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
 def ADDI_D : ALU_2RI12<0x02c00000, simm12_addlike>;
diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
index 55e38bcf4afc9..4d436d6d5d017 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
@@ -1075,8 +1075,8 @@ Register PPCInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
 
 // For opcodes with the ReMaterializable flag set, this function is called to
 // verify the instruction is really rematable.
-bool PPCInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool PPCInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                          bool OnlyTrivial) const {
   switch (MI.getOpcode()) {
   default:
     // Let base implementaion decide.
@@ -1112,7 +1112,7 @@ bool PPCInstrInfo::isReallyTriviallyReMaterializable(
   case PPC::DMXXSETACCZ:
     return true;
   }
-  return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+  return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
 }
 
 Register PPCInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.h b/llvm/lib/Target/PowerPC/PPCInstrInfo.h
index 63ebd65910572..61f580eb40344 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrInfo.h
+++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.h
@@ -530,7 +530,8 @@ class PPCInstrInfo : public PPCGenInstrInfo {
                              unsigned &SubIdx) const override;
   Register isLoadFromStackSlot(const MachineInstr &MI,
                                int &FrameIndex) const override;
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
   Register isStoreToStackSlot(const MachineInstr &MI,
                               int &FrameIndex) const override;
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 0ed97c61ec78a..dd3a163592f01 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -232,8 +232,8 @@ Register RISCVInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
   return 0;
 }
 
-bool RISCVInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool RISCVInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                            bool OnlyTrivial) const {
   switch (RISCV::getRVVMCOpcode(MI.getOpcode())) {
   case RISCV::VMV_V_X:
   case RISCV::VFMV_V_F:
@@ -241,9 +241,11 @@ bool RISCVInstrInfo::isReallyTriviallyReMaterializable(
   case RISCV::VMV_S_X:
   case RISCV::VFMV_S_F:
   case RISCV::VID_V:
+    // FIXME: the v.x and v.f forms are not 'trivial' in the meaning
+    // of this API.  Split them!
     return MI.getOperand(1).isUndef();
   default:
-    return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+    return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
   }
 }
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 57ec431749ebe..2d19a292da67c 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -75,7 +75,8 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
   Register isStoreToStackSlot(const MachineInstr &MI, int &FrameIndex,
                               TypeSize &MemBytes) const override;
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
 
   bool shouldBreakCriticalEdgeToSink(MachineInstr &MI) const override {
     return MI.getOpcode() == RISCV::ADDI && MI.getOperand(1).isReg() &&
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index 47900cffa370c..9825733e29897 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -779,7 +779,7 @@ def SH : Store_rri<0b001, "sh">, Sched<[WriteSTH, ReadStoreData, ReadMemBase]>;
 def SW : Store_rri<0b010, "sw">, Sched<[WriteSTW, ReadStoreData, ReadMemBase]>;
 
 // ADDI isn't always rematerializable, but isReMaterializable will be used as
-// a hint which is verified in isReallyTriviallyReMaterializable.
+// a hint which is verified in isReMaterializableImpl.
 let isReMaterializable = 1, isAsCheapAsAMove = 1 in
 def ADDI  : ALU_ri<0b000, "addi">;
 
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
index feac04a17068a..88521fa5c06f5 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
@@ -39,18 +39,18 @@ WebAssemblyInstrInfo::WebAssemblyInstrInfo(const WebAssemblySubtarget &STI)
                               WebAssembly::CATCHRET),
       RI(STI.getTargetTriple()) {}
 
-bool WebAssemblyInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool WebAssemblyInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                                  bool OnlyTrivial) const {
   switch (MI.getOpcode()) {
   case WebAssembly::CONST_I32:
   case WebAssembly::CONST_I64:
   case WebAssembly::CONST_F32:
   case WebAssembly::CONST_F64:
-    // TargetInstrInfo::isReallyTriviallyReMaterializable misses these
+    // TargetInstrInfo::isReMaterializableImpl misses these
     // because of the ARGUMENTS implicit def, so we manualy override it here.
     return true;
   default:
-    return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+    return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
   }
 }
 
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h b/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h
index ba00097034bf5..aa11da3e3e388 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h
@@ -37,7 +37,8 @@ class WebAssemblyInstrInfo final : public WebAssemblyGenInstrInfo {
 
   const WebAssemblyRegisterInfo &getRegisterInfo() const { return RI; }
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
 
   void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
                    const DebugLoc &DL, Register DestReg, Register SrcReg,
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 58d526269ff3c..4ccf7918b3741 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -755,8 +755,8 @@ static bool regIsPICBase(Register BaseReg, const MachineRegisterInfo &MRI) {
   return isPICBase;
 }
 
-bool X86InstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool X86InstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                          bool OnlyTrivial) const {
   switch (MI.getOpcode()) {
   default:
     // This function should only be called for opcodes with the ReMaterializable
@@ -951,7 +951,7 @@ bool X86InstrInfo::isReallyTriviallyReMaterializable(
     break;
   }
   }
-  return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+  return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
 }
 
 void X86InstrInfo::reMaterialize(MachineBasicBlock &MBB,
diff --git a/llvm/lib/Target/X86/X86InstrInfo.h b/llvm/lib/Target/X86/X86InstrInfo.h
index 86133b3d969b1..1bd600c59cfc0 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.h
+++ b/llvm/lib/Target/X86/X86InstrInfo.h
@@ -340,7 +340,8 @@ class X86InstrInfo final : public X86GenInstrInfo {
   Register isStoreToStackSlotPostFE(const MachineInstr &MI,
                                     int &FrameIndex) const override;
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
   void reMaterialize(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
                      Register DestReg, unsigned SubIdx,
                      const MachineInstr &Orig,
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index 8655bb1292ef7..0d4c250c5df62 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -267,7 +267,7 @@ define void @larger_smull(ptr nocapture noundef readonly %x, i16 noundef %y, ptr
 ; CHECK-SD-NEXT:    and x9, x8, #0xfffffff0
 ; CHECK-SD-NEXT:    add x10, x2, #32
 ; CHECK-SD-NEXT:    add x11, x0, #16
-; CHECK-SD-NEXT:    mov x12, x9
+; CHECK-SD-NEXT:    and x12, x8, #0xfffffff0
 ; CHECK-SD-NEXT:  .LBB3_4: // %vector.body
 ; CHECK-SD-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-SD-NEXT:    ldp q1, q2, [x11, #-16]
@@ -313,7 +313,7 @@ define void @larger_smull(ptr nocapture noundef readonly %x, i16 noundef %y, ptr
 ; CHECK-GI-NEXT:    and x10, x9, #0xfffffff0
 ; CHECK-GI-NEXT:    add x11, x2, #32
 ; CHECK-GI-NEXT:    add x12, x0, #16
-; CHECK-GI-NEXT:    mov x13, x10
+; CHECK-GI-NEXT:    and x13, x9, #0xfffffff0
 ; CHECK-GI-NEXT:    xtn v0.4h, v0.4s
 ; CHECK-GI-NEXT:  .LBB3_3: // %vector.body
 ; CHECK-GI-NEXT:    // =>This Inner Loop Header: Depth=1
@@ -428,7 +428,7 @@ define void @larger_umull(ptr nocapture noundef readonly %x, i16 noundef %y, ptr
 ; CHECK-SD-NEXT:    and x9, x8, #0xfffffff0
 ; CHECK-SD-NEXT:    add x10, x2, #32
 ; CHECK-SD-NEXT:    add x11, x0, #16
-; CHECK-SD-NEXT:    mov x12, x9
+; CHECK-SD-NEXT:    and x12, x8, #0xfffffff0
 ; CHECK-SD-NEXT:  .LBB4_4: // %vector.body
 ; CHECK-SD-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-SD-NEXT:    ldp q1, q2, [x11, #-16]
@@ -472,7 +472,7 @@ define void @larger_umull(ptr nocapture noundef readonly %x, i16 noundef %y, ptr
 ; CHECK-GI-NEXT:    and x8, x9, #0xfffffff0
 ; CHECK-GI-NEXT:    add x10, x2, #32
 ; CHECK-GI-NEXT:    add x11, x0, #16
-; CHECK-GI-NEXT:    mov x12, x8
+; CHECK-GI-NEXT:    and x12, x9, #0xfffffff0
 ; CHECK-GI-NEXT:  .LBB4_3: // %vector.body
 ; CHECK-GI-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-GI-NEXT:    and w13, w1, #0xffff
@@ -596,7 +596,7 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
 ; CHECK-SD-NEXT:    and x11, x10, #0xfffffff0
 ; CHECK-SD-NEXT:    fmov s2, w9
 ; CHECK-SD-NEXT:    add x8, x0, #8
-; CHECK-SD-NEXT:    mov x12, x11
+; CHECK-SD-NEXT:    and x12, x10, #0xfffffff0
 ; CHECK-SD-NEXT:  .LBB5_5: // %vector.body
 ; CHECK-SD-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-SD-NEXT:    ldp d3, d4, [x8, #-8]
@@ -646,10 +646,10 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
 ; CHECK-GI-NEXT:    movi v0.2d, #0000000000000000
 ; CHECK-GI-NEXT:    movi v1.2d, #0000000000000000
 ; CHECK-GI-NEXT:    add x10, x0, #8
+; CHECK-GI-NEXT:    and x11, x8, #0xfffffff0
 ; CHECK-GI-NEXT:    sbfx w9, w9, #8, #8
 ; CHECK-GI-NEXT:    dup v2.8h, w9
 ; CHECK-GI-NEXT:    and x9, x8, #0xfffffff0
-; CHECK-GI-NEXT:    mov x11, x9
 ; CHECK-GI-NEXT:  .LBB5_5: // %vector.body
 ; CHECK-GI-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-GI-NEXT:    ldp d3, d4, [x10, #-8]
diff --git a/llvm/test/CodeGen/AArch64/machine-combiner-copy.ll b/llvm/test/CodeGen/AArch64/machine-combiner-copy.ll
index 4c8e589391c3a..c23e4e182f2ef 100644
--- a/llvm/test/CodeGen/AArch64/machine-combiner-copy.ll
+++ b/llvm/test/CodeGen/AArch64/machine-combiner-copy.ll
@@ -17,7 +17,7 @@ define void @fma_dup_f16(ptr noalias nocapture noundef readonly %A, half noundef
 ; CHECK-NEXT:    and x9, x8, #0xfffffff0
 ; CHECK-NEXT:    add x10, x1, #16
 ; CHECK-NEXT:    add x11, x0, #16
-; CHECK-NEXT:    mov x12, x9
+; CHECK-NEXT:    and x12, x8, #0xfffffff0
 ; CHECK-NEXT:  .LBB0_4: // %vector.body
 ; CHECK-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    ldp q1, q4, [x10, #-16]
diff --git a/llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll b/llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll
index f6bbdf5d95d87..1770bb9794f09 100644
--- a/llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll
+++ b/llvm/test/CodeGen/AArch64/machine-...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Sep 22, 2025

@llvm/pr-subscribers-backend-arm

Author: Philip Reames (preames)

Changes

This is a proof of concept to present an alternative approach to @lukel97 's #159211. I am very supportive of the goal, but want to suggest an alternate approach.

This PR has two commits. The first reworks the existing isReallyTriviallyReMaterializable target hook, and plumbs a boolean through so that we can distinguish between trivial and non-trivial rematerialization. (see the existing comments in the TargetInstInfo.h header for the precise meaning of trivial) The second adds the raw interface (implementation to be cleaned up), and illustrates it's use in a single call site.

The motivation here is that I believe we are going to need to have a conceptual distinction between trivial and non-trivial rematerialization. I see three basic groupings in the usage:

  • Some callers are immediately going to materialize at some specific place. Examples of the first include RegisterCoalescer, SplitKit, and InlineSpiller.
  • Some callers are heuristics trying to reason about future guaranteed remat. Examples include MachineLICM. If you look at the current implementation, you'll notice it is doing this as a post check specifically because some of our backends lie about what is trivial vs non-trivial. (AMDGPU and RISC-V are the most guilty...)
  • Some callers are heuristics which are somewhere in between. Examples include CalcSpillWeights. One advantage of staging the change this way is to allow individual review and discussion of these changes.

An alternate API approach would be to update TTI to always return the non-trivial result, and then copy the MachineLICM approach of explicitly checking for non-triviality afterwards. I mildly prefer this approach, mostly because a target could chose to disable non-trivial remat even if the caller could handle it. This seems like a useful way to allow staging and per-target selection, but the per caller opt-in is the more important bit.


Patch is 39.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160153.diff

28 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+10-2)
  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (+7-6)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+4-3)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp (+3-3)
  • (modified) llvm/lib/Target/ARM/ARMBaseInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.td (+1-1)
  • (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.cpp (+3-3)
  • (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+5-3)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+1-1)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp (+4-4)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h (+2-1)
  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+3-3)
  • (modified) llvm/lib/Target/X86/X86InstrInfo.h (+2-1)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/machine-combiner-copy.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll (+23-24)
  • (modified) llvm/test/CodeGen/AArch64/peephole-and-tst.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/tbl-loops.ll (+4-4)
  • (modified) llvm/test/CodeGen/SystemZ/llvm.sincos.ll (+2-2)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/reductions.ll (+3-3)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll (+6-7)
  • (modified) llvm/test/CodeGen/Thumb2/mve-phireg.ll (+1-1)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll (+49-46)
  • (modified) llvm/test/CodeGen/Thumb2/mve-qrintrsplat.ll (+10-12)
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index f2ad5ee249b46..b786f7d8565a4 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -171,7 +171,14 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
     return (MI.getOpcode() == TargetOpcode::IMPLICIT_DEF &&
             MI.getNumOperands() == 1) ||
            (MI.getDesc().isRematerializable() &&
-            isReallyTriviallyReMaterializable(MI));
+            isReMaterializableImpl(MI, true));
+  }
+
+  bool isReMaterializable(const MachineInstr &MI) const {
+    return (MI.getOpcode() == TargetOpcode::IMPLICIT_DEF &&
+            MI.getNumOperands() == 1) ||
+           (MI.getDesc().isRematerializable() &&
+            isReMaterializableImpl(MI, false));
   }
 
   /// Given \p MO is a PhysReg use return if it can be ignored for the purpose
@@ -198,7 +205,8 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
   /// predicate must return false if the instruction has any side effects other
   /// than producing a value, or if it requres any address registers that are
   /// not always available.
-  virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const;
+  virtual bool isReMaterializableImpl(const MachineInstr &MI,
+                                      bool TrivialOnly) const;
 
   /// This method commutes the operands of the given machine instruction MI.
   /// The operands to be commuted are specified by their indices OpIdx1 and
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index db00f54daeb62..2d84d6b385956 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1325,7 +1325,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
   if (!TII->isAsCheapAsAMove(*DefMI))
     return false;
 
-  if (!TII->isTriviallyReMaterializable(*DefMI))
+  if (!TII->isReMaterializable(*DefMI))
     return false;
 
   if (!definesFullReg(*DefMI, SrcReg))
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 5be89b49fb6ba..da91bc186aac9 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -1590,8 +1590,8 @@ MachineTraceStrategy TargetInstrInfo::getMachineCombinerTraceStrategy() const {
   return MachineTraceStrategy::TS_MinInstrCount;
 }
 
-bool TargetInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool TargetInstrInfo::isReMaterializableImpl(
+    const MachineInstr &MI, bool OnlyTrivial) const {
   const MachineFunction &MF = *MI.getMF();
   const MachineRegisterInfo &MRI = MF.getRegInfo();
 
@@ -1658,10 +1658,11 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializable(
     if (MO.isDef() && Reg != DefReg)
       return false;
 
-    // Don't allow any virtual-register uses. Rematting an instruction with
-    // virtual register uses would length the live ranges of the uses, which
-    // is not necessarily a good idea, certainly not "trivial".
-    if (MO.isUse())
+    // If asked for trivial materialization, don't allow any virtual-register
+    // uses. Rematting an instruction with virtual register uses would length
+    // the live ranges of the uses, which means rematerialization must become
+    // a per-user query which many callers don't want.
+    if (OnlyTrivial && MO.isUse())
       return false;
   }
 
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 13d05ee54d7b3..dc9e41aa097fb 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -124,8 +124,8 @@ static bool canRemat(const MachineInstr &MI) {
   return false;
 }
 
-bool SIInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool SIInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                         bool OnlyTrivial) const {
 
   if (canRemat(MI)) {
     // Normally VALU use of exec would block the rematerialization, but that
@@ -139,13 +139,14 @@ bool SIInstrInfo::isReallyTriviallyReMaterializable(
     // There is difference to generic method which does not allow
     // rematerialization if there are virtual register uses. We allow this,
     // therefore this method includes SOP instructions as well.
+    // FIXME: This should only be done if OnlyTrivial is setup.
     if (!MI.hasImplicitDef() &&
         MI.getNumImplicitOperands() == MI.getDesc().implicit_uses().size() &&
         !MI.mayRaiseFPException())
       return true;
   }
 
-  return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+  return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
 }
 
 // Returns true if the scalar result of a VALU instruction depends on exec.
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index e249fc6cbb79d..300aeaa31941e 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -244,7 +244,8 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
     return ST;
   }
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
 
   bool isIgnorableUse(const MachineOperand &MO) const override;
 
diff --git a/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp b/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
index 5c35b3327c16d..9eb65533b682d 100644
--- a/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
@@ -6510,14 +6510,14 @@ bool ARMBaseInstrInfo::shouldOutlineFromFunctionByDefault(
   return Subtarget.isMClass() && MF.getFunction().hasMinSize();
 }
 
-bool ARMBaseInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool ARMBaseInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                              bool TrivialOnly) const {
   // Try hard to rematerialize any VCTPs because if we spill P0, it will block
   // the tail predication conversion. This means that the element count
   // register has to be live for longer, but that has to be better than
   // spill/restore and VPT predication.
   return (isVCTP(&MI) && !isPredicated(MI)) ||
-         TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+         TargetInstrInfo::isReMaterializableImpl(MI, TrivialOnly);
 }
 
 unsigned llvm::getBLXOpcode(const MachineFunction &MF) {
diff --git a/llvm/lib/Target/ARM/ARMBaseInstrInfo.h b/llvm/lib/Target/ARM/ARMBaseInstrInfo.h
index 71de3c6ad597a..57c53036349a6 100644
--- a/llvm/lib/Target/ARM/ARMBaseInstrInfo.h
+++ b/llvm/lib/Target/ARM/ARMBaseInstrInfo.h
@@ -479,7 +479,8 @@ class ARMBaseInstrInfo : public ARMGenInstrInfo {
   MachineInstr *canFoldIntoMOVCC(Register Reg, const MachineRegisterInfo &MRI,
                                  const TargetInstrInfo *TII) const;
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
 
 private:
   /// Modeling special VFP / NEON fp MLA / MLS hazards.
diff --git a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
index 20ccc622f58dc..9565a55e4c6c5 100644
--- a/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
@@ -943,7 +943,7 @@ let Predicates = [IsLA64] in {
 def ADD_D : ALU_3R<0x00108000>;
 def SUB_D : ALU_3R<0x00118000>;
 // ADDI_D isn't always rematerializable, but isReMaterializable will be used as
-// a hint which is verified in isReallyTriviallyReMaterializable.
+// a hint which is verified in isReMaterializableImpl.
 // See LoongArchInstrInfo::isAsCheapAsAMove for more details.
 let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
 def ADDI_D : ALU_2RI12<0x02c00000, simm12_addlike>;
diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
index 55e38bcf4afc9..4d436d6d5d017 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
@@ -1075,8 +1075,8 @@ Register PPCInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
 
 // For opcodes with the ReMaterializable flag set, this function is called to
 // verify the instruction is really rematable.
-bool PPCInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool PPCInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                          bool OnlyTrivial) const {
   switch (MI.getOpcode()) {
   default:
     // Let base implementaion decide.
@@ -1112,7 +1112,7 @@ bool PPCInstrInfo::isReallyTriviallyReMaterializable(
   case PPC::DMXXSETACCZ:
     return true;
   }
-  return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+  return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
 }
 
 Register PPCInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.h b/llvm/lib/Target/PowerPC/PPCInstrInfo.h
index 63ebd65910572..61f580eb40344 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrInfo.h
+++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.h
@@ -530,7 +530,8 @@ class PPCInstrInfo : public PPCGenInstrInfo {
                              unsigned &SubIdx) const override;
   Register isLoadFromStackSlot(const MachineInstr &MI,
                                int &FrameIndex) const override;
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
   Register isStoreToStackSlot(const MachineInstr &MI,
                               int &FrameIndex) const override;
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 0ed97c61ec78a..dd3a163592f01 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -232,8 +232,8 @@ Register RISCVInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
   return 0;
 }
 
-bool RISCVInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool RISCVInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                            bool OnlyTrivial) const {
   switch (RISCV::getRVVMCOpcode(MI.getOpcode())) {
   case RISCV::VMV_V_X:
   case RISCV::VFMV_V_F:
@@ -241,9 +241,11 @@ bool RISCVInstrInfo::isReallyTriviallyReMaterializable(
   case RISCV::VMV_S_X:
   case RISCV::VFMV_S_F:
   case RISCV::VID_V:
+    // FIXME: the v.x and v.f forms are not 'trivial' in the meaning
+    // of this API.  Split them!
     return MI.getOperand(1).isUndef();
   default:
-    return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+    return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
   }
 }
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 57ec431749ebe..2d19a292da67c 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -75,7 +75,8 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
   Register isStoreToStackSlot(const MachineInstr &MI, int &FrameIndex,
                               TypeSize &MemBytes) const override;
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
 
   bool shouldBreakCriticalEdgeToSink(MachineInstr &MI) const override {
     return MI.getOpcode() == RISCV::ADDI && MI.getOperand(1).isReg() &&
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index 47900cffa370c..9825733e29897 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -779,7 +779,7 @@ def SH : Store_rri<0b001, "sh">, Sched<[WriteSTH, ReadStoreData, ReadMemBase]>;
 def SW : Store_rri<0b010, "sw">, Sched<[WriteSTW, ReadStoreData, ReadMemBase]>;
 
 // ADDI isn't always rematerializable, but isReMaterializable will be used as
-// a hint which is verified in isReallyTriviallyReMaterializable.
+// a hint which is verified in isReMaterializableImpl.
 let isReMaterializable = 1, isAsCheapAsAMove = 1 in
 def ADDI  : ALU_ri<0b000, "addi">;
 
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
index feac04a17068a..88521fa5c06f5 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
@@ -39,18 +39,18 @@ WebAssemblyInstrInfo::WebAssemblyInstrInfo(const WebAssemblySubtarget &STI)
                               WebAssembly::CATCHRET),
       RI(STI.getTargetTriple()) {}
 
-bool WebAssemblyInstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool WebAssemblyInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                                  bool OnlyTrivial) const {
   switch (MI.getOpcode()) {
   case WebAssembly::CONST_I32:
   case WebAssembly::CONST_I64:
   case WebAssembly::CONST_F32:
   case WebAssembly::CONST_F64:
-    // TargetInstrInfo::isReallyTriviallyReMaterializable misses these
+    // TargetInstrInfo::isReMaterializableImpl misses these
     // because of the ARGUMENTS implicit def, so we manualy override it here.
     return true;
   default:
-    return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+    return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
   }
 }
 
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h b/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h
index ba00097034bf5..aa11da3e3e388 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h
@@ -37,7 +37,8 @@ class WebAssemblyInstrInfo final : public WebAssemblyGenInstrInfo {
 
   const WebAssemblyRegisterInfo &getRegisterInfo() const { return RI; }
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
 
   void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
                    const DebugLoc &DL, Register DestReg, Register SrcReg,
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 58d526269ff3c..4ccf7918b3741 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -755,8 +755,8 @@ static bool regIsPICBase(Register BaseReg, const MachineRegisterInfo &MRI) {
   return isPICBase;
 }
 
-bool X86InstrInfo::isReallyTriviallyReMaterializable(
-    const MachineInstr &MI) const {
+bool X86InstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                          bool OnlyTrivial) const {
   switch (MI.getOpcode()) {
   default:
     // This function should only be called for opcodes with the ReMaterializable
@@ -951,7 +951,7 @@ bool X86InstrInfo::isReallyTriviallyReMaterializable(
     break;
   }
   }
-  return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+  return TargetInstrInfo::isReMaterializableImpl(MI, OnlyTrivial);
 }
 
 void X86InstrInfo::reMaterialize(MachineBasicBlock &MBB,
diff --git a/llvm/lib/Target/X86/X86InstrInfo.h b/llvm/lib/Target/X86/X86InstrInfo.h
index 86133b3d969b1..1bd600c59cfc0 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.h
+++ b/llvm/lib/Target/X86/X86InstrInfo.h
@@ -340,7 +340,8 @@ class X86InstrInfo final : public X86GenInstrInfo {
   Register isStoreToStackSlotPostFE(const MachineInstr &MI,
                                     int &FrameIndex) const override;
 
-  bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+  bool isReMaterializableImpl(const MachineInstr &MI,
+                              bool OnlyTrivial) const override;
   void reMaterialize(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
                      Register DestReg, unsigned SubIdx,
                      const MachineInstr &Orig,
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index 8655bb1292ef7..0d4c250c5df62 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -267,7 +267,7 @@ define void @larger_smull(ptr nocapture noundef readonly %x, i16 noundef %y, ptr
 ; CHECK-SD-NEXT:    and x9, x8, #0xfffffff0
 ; CHECK-SD-NEXT:    add x10, x2, #32
 ; CHECK-SD-NEXT:    add x11, x0, #16
-; CHECK-SD-NEXT:    mov x12, x9
+; CHECK-SD-NEXT:    and x12, x8, #0xfffffff0
 ; CHECK-SD-NEXT:  .LBB3_4: // %vector.body
 ; CHECK-SD-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-SD-NEXT:    ldp q1, q2, [x11, #-16]
@@ -313,7 +313,7 @@ define void @larger_smull(ptr nocapture noundef readonly %x, i16 noundef %y, ptr
 ; CHECK-GI-NEXT:    and x10, x9, #0xfffffff0
 ; CHECK-GI-NEXT:    add x11, x2, #32
 ; CHECK-GI-NEXT:    add x12, x0, #16
-; CHECK-GI-NEXT:    mov x13, x10
+; CHECK-GI-NEXT:    and x13, x9, #0xfffffff0
 ; CHECK-GI-NEXT:    xtn v0.4h, v0.4s
 ; CHECK-GI-NEXT:  .LBB3_3: // %vector.body
 ; CHECK-GI-NEXT:    // =>This Inner Loop Header: Depth=1
@@ -428,7 +428,7 @@ define void @larger_umull(ptr nocapture noundef readonly %x, i16 noundef %y, ptr
 ; CHECK-SD-NEXT:    and x9, x8, #0xfffffff0
 ; CHECK-SD-NEXT:    add x10, x2, #32
 ; CHECK-SD-NEXT:    add x11, x0, #16
-; CHECK-SD-NEXT:    mov x12, x9
+; CHECK-SD-NEXT:    and x12, x8, #0xfffffff0
 ; CHECK-SD-NEXT:  .LBB4_4: // %vector.body
 ; CHECK-SD-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-SD-NEXT:    ldp q1, q2, [x11, #-16]
@@ -472,7 +472,7 @@ define void @larger_umull(ptr nocapture noundef readonly %x, i16 noundef %y, ptr
 ; CHECK-GI-NEXT:    and x8, x9, #0xfffffff0
 ; CHECK-GI-NEXT:    add x10, x2, #32
 ; CHECK-GI-NEXT:    add x11, x0, #16
-; CHECK-GI-NEXT:    mov x12, x8
+; CHECK-GI-NEXT:    and x12, x9, #0xfffffff0
 ; CHECK-GI-NEXT:  .LBB4_3: // %vector.body
 ; CHECK-GI-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-GI-NEXT:    and w13, w1, #0xffff
@@ -596,7 +596,7 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
 ; CHECK-SD-NEXT:    and x11, x10, #0xfffffff0
 ; CHECK-SD-NEXT:    fmov s2, w9
 ; CHECK-SD-NEXT:    add x8, x0, #8
-; CHECK-SD-NEXT:    mov x12, x11
+; CHECK-SD-NEXT:    and x12, x10, #0xfffffff0
 ; CHECK-SD-NEXT:  .LBB5_5: // %vector.body
 ; CHECK-SD-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-SD-NEXT:    ldp d3, d4, [x8, #-8]
@@ -646,10 +646,10 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
 ; CHECK-GI-NEXT:    movi v0.2d, #0000000000000000
 ; CHECK-GI-NEXT:    movi v1.2d, #0000000000000000
 ; CHECK-GI-NEXT:    add x10, x0, #8
+; CHECK-GI-NEXT:    and x11, x8, #0xfffffff0
 ; CHECK-GI-NEXT:    sbfx w9, w9, #8, #8
 ; CHECK-GI-NEXT:    dup v2.8h, w9
 ; CHECK-GI-NEXT:    and x9, x8, #0xfffffff0
-; CHECK-GI-NEXT:    mov x11, x9
 ; CHECK-GI-NEXT:  .LBB5_5: // %vector.body
 ; CHECK-GI-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-GI-NEXT:    ldp d3, d4, [x10, #-8]
diff --git a/llvm/test/CodeGen/AArch64/machine-combiner-copy.ll b/llvm/test/CodeGen/AArch64/machine-combiner-copy.ll
index 4c8e589391c3a..c23e4e182f2ef 100644
--- a/llvm/test/CodeGen/AArch64/machine-combiner-copy.ll
+++ b/llvm/test/CodeGen/AArch64/machine-combiner-copy.ll
@@ -17,7 +17,7 @@ define void @fma_dup_f16(ptr noalias nocapture noundef readonly %A, half noundef
 ; CHECK-NEXT:    and x9, x8, #0xfffffff0
 ; CHECK-NEXT:    add x10, x1, #16
 ; CHECK-NEXT:    add x11, x0, #16
-; CHECK-NEXT:    mov x12, x9
+; CHECK-NEXT:    and x12, x8, #0xfffffff0
 ; CHECK-NEXT:  .LBB0_4: // %vector.body
 ; CHECK-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    ldp q1, q4, [x10, #-16]
diff --git a/llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll b/llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll
index f6bbdf5d95d87..1770bb9794f09 100644
--- a/llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll
+++ b/llvm/test/CodeGen/AArch64/machine-...
[truncated]

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions h,cpp -- llvm/include/llvm/CodeGen/TargetInstrInfo.h llvm/lib/CodeGen/RegisterCoalescer.cpp llvm/lib/CodeGen/TargetInstrInfo.cpp llvm/lib/Target/AMDGPU/SIInstrInfo.cpp llvm/lib/Target/AMDGPU/SIInstrInfo.h llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp llvm/lib/Target/ARM/ARMBaseInstrInfo.h llvm/lib/Target/PowerPC/PPCInstrInfo.cpp llvm/lib/Target/PowerPC/PPCInstrInfo.h llvm/lib/Target/RISCV/RISCVInstrInfo.cpp llvm/lib/Target/RISCV/RISCVInstrInfo.h llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.h llvm/lib/Target/X86/X86InstrInfo.cpp llvm/lib/Target/X86/X86InstrInfo.h

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index da91bc186..981ad3e74 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -1590,8 +1590,8 @@ MachineTraceStrategy TargetInstrInfo::getMachineCombinerTraceStrategy() const {
   return MachineTraceStrategy::TS_MinInstrCount;
 }
 
-bool TargetInstrInfo::isReMaterializableImpl(
-    const MachineInstr &MI, bool OnlyTrivial) const {
+bool TargetInstrInfo::isReMaterializableImpl(const MachineInstr &MI,
+                                             bool OnlyTrivial) const {
   const MachineFunction &MF = *MI.getMF();
   const MachineRegisterInfo &MRI = MF.getRegInfo();
 

@preames preames changed the title [RegisterCoalescer] Enable non-trivial rematerialization [DRAFT][RegisterCoalescer] Enable non-trivial rematerialization Sep 22, 2025
@aemerson aemerson requested review from MatzeB and arsenm September 22, 2025 17:55
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my head I had understood non-trivial rematerialization as rematerialization of an instruction with a virtual register use that needs its live range extended, mostly based off of this existing comment:

    // Rematting an instruction with
    // virtual register uses would length the live ranges of the uses, which
    // is not necessarily a good idea, certainly not "trivial".

But I think I agree now that a better definition is trivial = guaranteed to always be remateralizable, non-trivial = may succeed if the uses are live at the point of rematerialization.

I would prefer the alternative approach you suggested:

An alternate API approach would be to update TTI to always return the non-trivial result, and then copy the MachineLICM approach of explicitly checking for non-triviality afterwards.

Where isTriviallyReMaterializable would check for virtual registers. We'd then go through and replace existing uses of isTriviallyReMaterializable in the first group of users you identified (where they check for use availability) with isReMaterializable

In theory and with #159180, I would hope that there would never be any downside to rematerializing an instruction whose uses are live at the point of rematerialization, and the test runs in #159211 didn't show any indication of the contrary.

So in the interest of avoiding backend fragmentation, I think we should be optimistic here and assume that this is beneficial for all targets. If after this lands we get issues reported on specific targets, we can go back and revisit this to be opt-in

Comment on lines +1662 to +1664
// uses. Rematting an instruction with virtual register uses would length
// the live ranges of the uses, which means rematerialization must become
// a per-user query which many callers don't want.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the existing comment was accurate since we never lengthen live ranges when rematerializing. It's something we definitely need to consider if we ever do end up extending live ranges, but that's truly non-trivial and I don't know if that's on the roadmap any time soon

@preames
Copy link
Collaborator Author

preames commented Sep 23, 2025

Where isTriviallyReMaterializable would check for virtual registers. We'd then go through and replace existing uses of isTriviallyReMaterializable in the first group of users you identified (where they check for use availability) with isReMaterializable

I'll throw up a patch tomorrow which does something along these lines as an NFC such that we can then go through call site by call site and make adjustment as warranted.

@preames
Copy link
Collaborator Author

preames commented Sep 23, 2025

Where isTriviallyReMaterializable would check for virtual registers. We'd then go through and replace existing uses of isTriviallyReMaterializable in the first group of users you identified (where they check for use availability) with isReMaterializable

I'll throw up a patch tomorrow which does something along these lines as an NFC such that we can then go through call site by call site and make adjustment as warranted.

Actually, crud, I don't think I can do this as purely NFC. We have existing backends which lie about instructions being trivially rematerializable. I need a way to preserve that behavior through the transition of getting each of the callsites updated. I'll play with it more tomorrow, but I think this structure is at least a reasonable stepping stone even if the end goal is to fold it back away.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing something about this huge mess. The backends were forced to lie based on the only type of remat being "trivial."

Another case I'm interested in that's nontrivial is rematerializing that requires rewriting the instruction. e.g. rematerializing only part of a value if only sub registers are demanded, or changing the use instruction to a different opcode based on register bank

@preames
Copy link
Collaborator Author

preames commented Sep 23, 2025

First stepping stone towards the alternate approach: #160319

@preames
Copy link
Collaborator Author

preames commented Sep 23, 2025

Another case I'm interested in that's nontrivial is rematerializing that requires rewriting the instruction. e.g. rematerializing only part of a value if only sub registers are demanded, or changing the use instruction to a different opcode based on register bank

Yeah, there's variants here we can explore once we get the virtual register use case handled properly and generically. One I'm interested in is multiple instruction remat. On RISC-V, we've got e.g. LUI+ADDI which is the canonical 32 bit constant materialization sequence, and is frequently macro fused into one op in hardware anyways. It sucks we can't remat that today. Another sub-category is the virtual register uses where we don't fit within the existing live interval of the use, but can trivially extend it. Each sub-category is non-trivial and requires thought and care as we move through.

@preames
Copy link
Collaborator Author

preames commented Sep 23, 2025

Second part of the alternate approach here: #160377

Once that goes in, I'm going to go audit the remaining non-trivial call sites very carefully. :) Probably one or two small changes there.

I'm closing this one for now. I may return to the approach if we do need to enable on a per-target basis.

@preames preames closed this Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants