Skip to content

Conversation

preames
Copy link
Collaborator

@preames preames commented Sep 23, 2025

This change builds on #160319 which tries to clarify which callers (not backends) assume that the result is actually trivial.

This change itself should be NFC. Essentially, I'm just renaming the existing isTrivialRematerializable to the non-trivial version and then adding a new trivial version (with the same name as the prior function) and simplifying a few callers which want that semantic.

This change does not enable non-trivial remat any more broadly than was already done for our targets which were lying through the old APIs; that will come separately. The goal here is simply to make the code easier to follow in terms of what assumptions are being made where.

This change builds on llvm#160319 which tries to clarify which *callers* (not backends) assume that the result is actually trivial.

This change itself should be NFC.  Essentially, I'm just renaming
the existing isTrivialRematerializable to the non-trivial version
and then adding a new trivial version (with the same name as the
prior function) and simplifying a few callers which want that
semantic.

This change does *not* enable non-trivial remat any more broadly than
was already done for our targets which were lying through the old
APIs; that will come separately.  The goal here is simply to make
the code easier to follow in terms of what assumptions are being
made where.
@llvmbot
Copy link
Member

llvmbot commented Sep 23, 2025

@llvm/pr-subscribers-llvm-regalloc
@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-backend-amdgpu

Author: Philip Reames (preames)

Changes

This change builds on #160319 which tries to clarify which callers (not backends) assume that the result is actually trivial.

This change itself should be NFC. Essentially, I'm just renaming the existing isTrivialRematerializable to the non-trivial version and then adding a new trivial version (with the same name as the prior function) and simplifying a few callers which want that semantic.

This change does not enable non-trivial remat any more broadly than was already done for our targets which were lying through the old APIs; that will come separately. The goal here is simply to make the code easier to follow in terms of what assumptions are being made where.


Full diff: https://github.com/llvm/llvm-project/pull/160377.diff

11 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+15-3)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+1-5)
  • (modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/LiveRangeEdit.cpp (+2-2)
  • (modified) llvm/lib/CodeGen/MachineLICM.cpp (+4-23)
  • (modified) llvm/lib/CodeGen/RegAllocScore.cpp (+1-4)
  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/TargetRegisterInfo.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+3-3)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.h (+1-1)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyRegStackify.cpp (+1-4)
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index cefb0d35690ba..046b60256fb29 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -168,6 +168,19 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
   /// registers so that the instructions result is independent of the place
   /// in the function.
   bool isTriviallyReMaterializable(const MachineInstr &MI) const {
+    if (!isReMaterializable(MI))
+      return false;
+    for (const MachineOperand &MO : MI.all_uses()) {
+      if (MO.getReg().isVirtual())
+        return false;
+    }
+    return true;
+  }
+
+  /// Return true if the instruction would be materializable at a point
+  /// in the containing function where all virtual register uses were
+  /// known to be live and available in registers.
+  bool isReMaterializable(const MachineInstr &MI) const {
     return (MI.getOpcode() == TargetOpcode::IMPLICIT_DEF &&
             MI.getNumOperands() == 1) ||
            (MI.getDesc().isRematerializable() &&
@@ -194,10 +207,9 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
 protected:
   /// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
   /// set, this hook lets the target specify whether the instruction is actually
-  /// trivially rematerializable, taking into consideration its operands. This
+  /// rematerializable, taking into consideration its operands. This
   /// predicate must return false if the instruction has any side effects other
-  /// than producing a value, or if it requres any address registers that are
-  /// not always available.
+  /// than producing a value.
   virtual bool isReMaterializableImpl(const MachineInstr &MI) const;
 
   /// This method commutes the operands of the given machine instruction MI.
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index 1db742253d594..05d3d18aa9557 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -2212,11 +2212,7 @@ findPrologueEndLoc(const MachineFunction *MF) {
       -> std::optional<std::pair<const MachineInstr *, bool>> {
     // Is this instruction trivial data shuffling or frame-setup?
     bool isCopy = (TII.isCopyInstr(MI) ? true : false);
-    bool isTrivRemat =
-        TII.isTriviallyReMaterializable(MI) &&
-        llvm::all_of(MI.all_uses(), [](const MachineOperand &MO) {
-          return MO.getReg().isVirtual();
-        });
+    bool isTrivRemat = TII.isTriviallyReMaterializable(MI);
     bool isFrameSetup = MI.getFlag(MachineInstr::FrameSetup);
 
     if (!isFrameSetup && MI.getDebugLoc()) {
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index b16694eafd90e..f5a6eec57ee7e 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -122,7 +122,7 @@ bool VirtRegAuxInfo::isRematerializable(const LiveInterval &LI,
       assert(MI && "Dead valno in interval");
     }
 
-    if (!TII.isTriviallyReMaterializable(*MI))
+    if (!TII.isReMaterializable(*MI))
       return false;
   }
   return true;
diff --git a/llvm/lib/CodeGen/LiveRangeEdit.cpp b/llvm/lib/CodeGen/LiveRangeEdit.cpp
index 33e980a5993d3..2f65be51bb726 100644
--- a/llvm/lib/CodeGen/LiveRangeEdit.cpp
+++ b/llvm/lib/CodeGen/LiveRangeEdit.cpp
@@ -80,7 +80,7 @@ void LiveRangeEdit::scanRemattable() {
     MachineInstr *DefMI = LIS.getInstructionFromIndex(OrigVNI->def);
     if (!DefMI)
       continue;
-    if (TII.isTriviallyReMaterializable(*DefMI))
+    if (TII.isReMaterializable(*DefMI))
       Remattable.insert(OrigVNI);
   }
   ScannedRemattable = true;
@@ -387,7 +387,7 @@ void LiveRangeEdit::eliminateDeadDef(MachineInstr *MI, ToShrinkSet &ToShrink) {
     // register uses. That may provoke RA to split an interval at the KILL
     // and later result in an invalid live segment end.
     if (isOrigDef && DeadRemats && !HasLiveVRegUses &&
-        TII.isTriviallyReMaterializable(*MI)) {
+        TII.isReMaterializable(*MI)) {
       LiveInterval &NewLI = createEmptyIntervalFrom(Dest, false);
       VNInfo::Allocator &Alloc = LIS.getVNInfoAllocator();
       VNInfo *VNI = NewLI.getNextValue(Idx, Alloc);
diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index 4f164e2d53460..7acddff753693 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -244,8 +244,6 @@ namespace {
 
     bool IsGuaranteedToExecute(MachineBasicBlock *BB, MachineLoop *CurLoop);
 
-    bool isTriviallyReMaterializable(const MachineInstr &MI) const;
-
     void EnterScope(MachineBasicBlock *MBB);
 
     void ExitScope(MachineBasicBlock *MBB);
@@ -771,23 +769,6 @@ bool MachineLICMImpl::IsGuaranteedToExecute(MachineBasicBlock *BB,
   return true;
 }
 
-/// Check if \p MI is trivially remateralizable and if it does not have any
-/// virtual register uses. Even though rematerializable RA might not actually
-/// rematerialize it in this scenario. In that case we do not want to hoist such
-/// instruction out of the loop in a belief RA will sink it back if needed.
-bool MachineLICMImpl::isTriviallyReMaterializable(
-    const MachineInstr &MI) const {
-  if (!TII->isTriviallyReMaterializable(MI))
-    return false;
-
-  for (const MachineOperand &MO : MI.all_uses()) {
-    if (MO.getReg().isVirtual())
-      return false;
-  }
-
-  return true;
-}
-
 void MachineLICMImpl::EnterScope(MachineBasicBlock *MBB) {
   LLVM_DEBUG(dbgs() << "Entering " << printMBBReference(*MBB) << '\n');
 
@@ -1300,9 +1281,9 @@ bool MachineLICMImpl::IsProfitableToHoist(MachineInstr &MI,
     return false;
   }
 
-  // Rematerializable instructions should always be hoisted providing the
-  // register allocator can just pull them down again when needed.
-  if (isTriviallyReMaterializable(MI))
+  // Trivially rematerializable instructions should always be hoisted
+  // providing the register allocator can just pull them down again when needed.
+  if (TII->isTriviallyReMaterializable(MI))
     return true;
 
   // FIXME: If there are long latency loop-invariant instructions inside the
@@ -1386,7 +1367,7 @@ bool MachineLICMImpl::IsProfitableToHoist(MachineInstr &MI,
 
   // High register pressure situation, only hoist if the instruction is going
   // to be remat'ed.
-  if (!isTriviallyReMaterializable(MI) &&
+  if (!TII->isTriviallyReMaterializable(MI) &&
       !MI.isDereferenceableInvariantLoad()) {
     LLVM_DEBUG(dbgs() << "Can't remat / high reg-pressure: " << MI);
     return false;
diff --git a/llvm/lib/CodeGen/RegAllocScore.cpp b/llvm/lib/CodeGen/RegAllocScore.cpp
index ce1eea3519b71..25a142a2a20d8 100644
--- a/llvm/lib/CodeGen/RegAllocScore.cpp
+++ b/llvm/lib/CodeGen/RegAllocScore.cpp
@@ -80,10 +80,7 @@ llvm::calculateRegAllocScore(const MachineFunction &MF,
       },
       [&](const MachineInstr &MI) {
         auto *TTI = MF.getSubtarget().getInstrInfo();
-        return TTI->isTriviallyReMaterializable(MI) &&
-               llvm::all_of(MI.all_uses(), [](const MachineOperand &MO) {
-                 return MO.getReg().isVirtual();
-               });
+        return TTI->isTriviallyReMaterializable(MI);
       });
 }
 
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index db00f54daeb62..2d84d6b385956 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1325,7 +1325,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
   if (!TII->isAsCheapAsAMove(*DefMI))
     return false;
 
-  if (!TII->isTriviallyReMaterializable(*DefMI))
+  if (!TII->isReMaterializable(*DefMI))
     return false;
 
   if (!definesFullReg(*DefMI, SrcReg))
diff --git a/llvm/lib/CodeGen/TargetRegisterInfo.cpp b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
index c9e46182decc2..9e5c1bfdd745e 100644
--- a/llvm/lib/CodeGen/TargetRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
@@ -67,7 +67,7 @@ bool TargetRegisterInfo::shouldRegionSplitForVirtReg(
   const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
   const MachineRegisterInfo &MRI = MF.getRegInfo();
   MachineInstr *MI = MRI.getUniqueVRegDef(VirtReg.reg());
-  if (MI && TII->isTriviallyReMaterializable(*MI) &&
+  if (MI && TII->isReMaterializable(*MI) &&
       VirtReg.size() > HugeSizeForSplit)
     return false;
   return true;
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index 254b75b784e75..f52581132a999 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -1779,7 +1779,7 @@ bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
     for (auto MI = Region.first; MI != Region.second; ++MI) {
       // The instruction must be trivially rematerializable.
       MachineInstr &DefMI = *MI;
-      if (!isTriviallyReMaterializable(DefMI))
+      if (!isReMaterializable(DefMI))
         continue;
 
       // We only support rematerializing virtual registers with one definition.
@@ -2002,8 +2002,8 @@ void PreRARematStage::rematerialize() {
 }
 
 // Copied from MachineLICM
-bool PreRARematStage::isTriviallyReMaterializable(const MachineInstr &MI) {
-  if (!DAG.TII->isTriviallyReMaterializable(MI))
+bool PreRARematStage::isReMaterializable(const MachineInstr &MI) {
+  if (!DAG.TII->isReMaterializable(MI))
     return false;
 
   for (const MachineOperand &MO : MI.all_uses()) {
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
index 790370ff8ab4d..256a422e22f5f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
@@ -485,7 +485,7 @@ class PreRARematStage : public GCNSchedStage {
 
   /// Whether the MI is trivially rematerializable and does not have any virtual
   /// register use.
-  bool isTriviallyReMaterializable(const MachineInstr &MI);
+  bool isReMaterializable(const MachineInstr &MI);
 
   /// Rematerializes all instructions in PreRARematStage::Rematerializations
   /// and stores the achieved occupancy after remat in
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyRegStackify.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyRegStackify.cpp
index 7591541779884..08ca20b5eef6e 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyRegStackify.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyRegStackify.cpp
@@ -260,10 +260,7 @@ static void query(const MachineInstr &MI, bool &Read, bool &Write,
 // Test whether Def is safe and profitable to rematerialize.
 static bool shouldRematerialize(const MachineInstr &Def,
                                 const WebAssemblyInstrInfo *TII) {
-  return Def.isAsCheapAsAMove() && TII->isTriviallyReMaterializable(Def) &&
-         llvm::all_of(Def.all_uses(), [](const MachineOperand &MO) {
-           return MO.getReg().isVirtual();
-         });
+  return Def.isAsCheapAsAMove() && TII->isTriviallyReMaterializable(Def);
 }
 
 // Identify the definition for this register at this point. This is a

Copy link

github-actions bot commented Sep 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, went through the call sites and the choices between trivial and non-trivial rematerialization made sense.

preames and others added 3 commits September 24, 2025 07:05
Co-authored-by: Luke Lau <luke_lau@icloud.com>
In particular, the partial revert which effected llvm::calculateRegAllocScore
and adapt the code in that spot to use non-trivial remat for now.
// The instruction must be rematerializable.
MachineInstr &DefMI = *MI;
if (!isTriviallyReMaterializable(DefMI))
if (!isReMaterializable(DefMI))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm I want to draw your attention to this change. I'm suspecting the use of non-trivial remat might be a latent bug here from the comments, and code structure. When I tried to make this actually trivial, I do see test failures though.

@preames
Copy link
Collaborator Author

preames commented Sep 24, 2025

@arsenm Can I get a second LGTM here? This is a conceptually big change despite being NFC, and I want to make sure we have agreement on direction before moving forward.

@preames preames merged commit ea721e2 into llvm:main Sep 25, 2025
9 checks passed
@preames preames deleted the pr-tti-remat-hook-split-nfc branch September 25, 2025 01:52
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 25, 2025

LLVM Buildbot has detected a new failure on builder mlir-nvidia-gcc7 running on mlir-nvidia while building llvm at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/18839

Here is the relevant piece of the build log for the reference
Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x564b750324c0 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x564b750324c0 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

llvm::all_of(MI.all_uses(), [](const MachineOperand &MO) {
return MO.getReg().isVirtual();
});
bool isTrivRemat = TII.isTriviallyReMaterializable(MI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little puzzled.
This patch changes the behavior of the code here.

So before, for isTrivRemat to be set, all registers in the MI had to be virtual.
With the patch we call

  /// Return true if the instruction is trivially rematerializable, meaning it
  /// has no side effects and requires no operands that aren't always available.
  /// This means the only allowed uses are constants and unallocatable physical
  /// registers so that the instructions result is independent of the place
  /// in the function.
  bool isTriviallyReMaterializable(const MachineInstr &MI) const {
    if (!isReMaterializable(MI))
      return false;
    for (const MachineOperand &MO : MI.all_uses()) {
      if (MO.getReg().isVirtual())
        return false;
    }
    return true;
  }

which will return false if any of the registers in the MI is virtual.

Switching
if (MO.getReg().isVirtual())
to
if (!MO.getReg().isVirtual())
makes it behave as before but then I guess it doesn't do what the method comment says.

I noticed this for a downstream testcase where the prologue_end placement changed with ca2e8fc and then again with this patch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you are completely correct. Oops. This was a major bug in #160319. The check should have been "are all of the operands not a virtual register", but this is not what actually got written and landed.

I think this change accidentally fixed this (by doing what the intention had been all along), but boy does that make both patch descriptions misleading. Do you think this is worth a revert-and-reapply on both?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so now it's doing what is intended. Good!

Ah, I don't know if you need to fiddle with revert/reapply, I just got confused about the ping/pong behavior of our testcase and that's clear now.
Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for flagging this. I'm going to put a note on each review for later reference as this could be quite confusing later when digging through history.

@preames
Copy link
Collaborator Author

preames commented Sep 25, 2025

Warning to anyone looking at this later, the landed version contained of #160319 contained a nasty bug where I'd accidentally reversed the condition being checked for. This was (accidentally) undone in this review, so this ends up not being NFC as intended. The final state is in line with the original intention of the two patches combined, so not going to revert. It just makes the descriptions really dang confusing. Sorry!

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…60377)

This change builds on llvm#160319
which tries to clarify which *callers* (not backends) assume that the
result is actually trivial.

This change itself should be NFC. Essentially, I'm just renaming the
existing isTrivialRematerializable to the non-trivial version and then
adding a new trivial version (with the same name as the prior function)
and simplifying a few callers which want that semantic.

This change does *not* enable non-trivial remat any more broadly than
was already done for our targets which were lying through the old APIs;
that will come separately. The goal here is simply to make the code
easier to follow in terms of what assumptions are being made where.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
lukel97 added a commit that referenced this pull request Oct 4, 2025
#159211)

In the register allocator we define non-trivial rematerialization as the
rematerlization of an instruction with virtual register uses.

We have been able to perform non-trivial rematerialization for a while,
but it has been prevented by default unless specifically overriden by
the target in `TargetTransformInfo::isReMaterializableImpl`. The
original reasoning for this given by the comment in the default
implementation is because we might increase a live range of the virtual
register, but we don't actually do this.
LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize
instructions whose virtual registers are already live at the use sites.

https://reviews.llvm.org/D106408 had originally tried to remove this
restriction but it was reverted after some performance regressions were
reported. We think it is likely that the regressions were caused by the
fact that the old isTriviallyReMaterializable API sometimes returned
true for non-trivial rematerializations.

However #160377 recently split
the API out into a separate non-trivial and trivial version and updated
the call-sites accordingly, and
#160709 and #159180 fixed
heuristics which weren't accounting for the difference between
non-trivial and trivial.

With these fixes in place, this patch proposes to again allow
non-trivial rematerialization by default which reduces a significant
amount of spills and reloads across various targets.

For llvm-test-suite built with -O3 -flto, we get the following geomean
reduction in reloads:

- arm64-apple-darwin: 11.6%
- riscv64-linux-gnu: 8.1%
- x86_64-linux-gnu: 6.5%
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 4, 2025
…erialization (#159211)

In the register allocator we define non-trivial rematerialization as the
rematerlization of an instruction with virtual register uses.

We have been able to perform non-trivial rematerialization for a while,
but it has been prevented by default unless specifically overriden by
the target in `TargetTransformInfo::isReMaterializableImpl`. The
original reasoning for this given by the comment in the default
implementation is because we might increase a live range of the virtual
register, but we don't actually do this.
LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize
instructions whose virtual registers are already live at the use sites.

https://reviews.llvm.org/D106408 had originally tried to remove this
restriction but it was reverted after some performance regressions were
reported. We think it is likely that the regressions were caused by the
fact that the old isTriviallyReMaterializable API sometimes returned
true for non-trivial rematerializations.

However llvm/llvm-project#160377 recently split
the API out into a separate non-trivial and trivial version and updated
the call-sites accordingly, and
llvm/llvm-project#160709 and #159180 fixed
heuristics which weren't accounting for the difference between
non-trivial and trivial.

With these fixes in place, this patch proposes to again allow
non-trivial rematerialization by default which reduces a significant
amount of spills and reloads across various targets.

For llvm-test-suite built with -O3 -flto, we get the following geomean
reduction in reloads:

- arm64-apple-darwin: 11.6%
- riscv64-linux-gnu: 8.1%
- x86_64-linux-gnu: 6.5%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants