[AMDGPU] Register allocation anti-hints to reduce MFMA hazard NOPs #156943

mssefat · 2025-09-04T18:15:11Z

[AMDGPU] Register allocation anti-hints to reduce MFMA hazard NOPs

Reduce unnecessary s_nop insertion for MFMA hazards by introducing anti-hints in RA that allows virtual registers to specify which other virtual registers they should avoid being allocated to the overlapping physical registers.

When subsequent instructions such as ds_read, buffer_load, or other memory/VALU instructions follow MFMA instructions, the register allocator often reuses the same VGPRs that MFMA instructions used as destinations or C matrix operands. This reuse creates hazards, forcing the hazard recognizer to insert s_nop instructions.

Example:
v_mfma_f32_16x16x32_fp8_fp8 v[26:29], v[102:103], v[6:7], v[134:137]
...
s_nop 5 ; <-- hazard mitigation
ds_read_b128 v[26:29], v125 offset:10240

This patch introduces a register anti-hint mechanism to reduce MFMA hazards:

Added anti-hint mechanism in MachineRegisterInfo and added serialization support for MIR testing.
In GCNPreRAOptimizations, track MFMA destination and accumulator registers, identify subsequent instructions whose register usage may trigger hazards, and add anti-hints relation between conflicting registers.
Modified AllocationOrder to respect anti-hints by moving anti-hinted physical registers to the end of the allocation order to make them last-resort choices.

llvmbot · 2025-09-04T18:15:47Z

@llvm/pr-subscribers-backend-amdgpu

Author: None (mssefat)

Changes

[AMDGPU] Improve register allocation to reduce MFMA hazard NOPs

Reduce unnecessary s_nop insertion for MFMA hazards by creating hints for register allocation.

When subsequent instructions such as ds_read, buffer_load, or other memory/VALU instructions
follow MFMA instructions, the register allocator often reuses the same VGPRs that MFMA instructions
used as destinations or C matrix operands. This reuse creates hazards, forcing the hazard
recognizer to insert s_nop instructions.

Example:
v_mfma_f32_16x16x32_fp8_fp8 v[26:29], v[102:103], v[6:7], v[134:137]
...
s_nop 5 ; <--
ds_read_b128 v[26:29], v125 offset:10240

This patch introduces a two-phase register hint mechanism to reduce MFMA hazards:

In GCNPreRAOptimizations, track MFMA destination and accumulator registers, identify
subsequent instructions' register usage that may trigger hazards, and create register avoid list in SIMachineFunctionInfo.
The register allocator uses this list to create hints by prioritizing registers that don't conflict with avoid registers.

Patch is 586.52 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156943.diff

17 Files Affected:

(modified) llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp (+94)
(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h (+14)
(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+32)
(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.h (+3-1)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir (+258-257)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.small.mir (+271-271)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll (+102-102)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx942.ll (+1802-1860)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.bf16.ll (+30-32)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll (+224-240)
(added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.hint.hazard.barrier.gfx942.mir (+1292)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll (+75-75)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll (+141-135)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.xf32.gfx942.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.smfmac.gfx950.ll (+238-231)
(modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll (+45-81)
(modified) llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll (+23-33)

diff --git a/llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp b/llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
index 4deb2a9485e4d..6d2b10bdb5804 100644
--- a/llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
@@ -34,6 +34,7 @@
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
+#include "SIMachineFunctionInfo.h"
 #include "SIRegisterInfo.h"
 #include "llvm/CodeGen/LiveIntervals.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
@@ -43,6 +44,12 @@ using namespace llvm;
 
 #define DEBUG_TYPE "amdgpu-pre-ra-optimizations"
 
+static cl::opt<bool> EnableRegisterAvoidListForMFMARegs(
+    "amdgpu-avoid-hazard-hint-for-mfma", cl::Hidden,
+    cl::desc("Enable Register Avoidance for "
+             "MFMA in GCNPreRAOptimizations stage."),
+    cl::init(true));
+
 namespace {
 
 class GCNPreRAOptimizationsImpl {
@@ -248,6 +255,93 @@ bool GCNPreRAOptimizationsImpl::run(MachineFunction &MF) {
 
   bool Changed = false;
 
+  // Single pass implementation
+  if (EnableRegisterAvoidListForMFMARegs && ST.hasMAIInsts()) {
+    // Max lookback window for RAW or WAW hazard
+    constexpr unsigned MaxLookbackWindow = 19;
+    SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
+    for (const MachineBasicBlock &MBB : MF) {
+
+      SmallVector<std::pair<SlotIndex, SmallVector<Register, 4>>, 16>
+          RecentMFMAs;
+      for (const MachineInstr &MI : MBB) {
+        if (MI.isDebugInstr())
+          continue;
+        const SlotIndex CurrentSlot = LIS->getInstructionIndex(MI).getRegSlot();
+        // Handle MFMA instructions
+        if (SIInstrInfo::isMFMA(MI)) {
+          SmallVector<Register, 4> MFMARegisters;
+          auto collectMFMARegister = [&](unsigned OpIdx) {
+            if (OpIdx >= MI.getNumOperands())
+              return;
+
+            const MachineOperand &MO = MI.getOperand(OpIdx);
+            if (MO.isReg() && MO.getReg().isVirtual())
+              MFMARegisters.push_back(MO.getReg());
+          };
+          // Only collect Matrix C (operand 3) and destination (operand 0)
+          // registers
+          collectMFMARegister(0);
+          collectMFMARegister(3);
+
+          if (!MFMARegisters.empty()) {
+            RecentMFMAs.emplace_back(CurrentSlot, std::move(MFMARegisters));
+            // Maintain window
+            if (RecentMFMAs.size() > MaxLookbackWindow)
+              RecentMFMAs.erase(RecentMFMAs.begin());
+          }
+          continue;
+        }
+        bool ShouldCheckReuse = MI.mayLoad() || MI.mayStore() || MI.isCopy() ||
+                                SIInstrInfo::isVALU(MI);
+        // Skip non-relevant instructions, or skip until at least one MFMA is
+        // encountered
+        if (!ShouldCheckReuse || RecentMFMAs.empty())
+          continue;
+
+        // Process operands that might reuse MFMA registers
+        for (const MachineOperand &MO : MI.operands()) {
+          if (!MO.isReg() || !MO.getReg().isVirtual())
+            continue;
+
+          const Register CandidateReg = MO.getReg();
+          const TargetRegisterClass *CandidateRC =
+              MRI->getRegClass(CandidateReg);
+
+          // Only process VGPR registers
+          if (!TRI->isVGPRClass(CandidateRC))
+            continue;
+
+          for (auto It = RecentMFMAs.rbegin(); It != RecentMFMAs.rend(); ++It) {
+            const SmallVector<Register, 4> &MFMARegs = It->second;
+            for (Register MFMAReg : MFMARegs) {
+              // Verify register class compatibility
+              const TargetRegisterClass *MFMARC = MRI->getRegClass(MFMAReg);
+              if (!TRI->hasVGPRs(MFMARC))
+                continue;
+
+              // Check if MFMA register is dead at current instruction
+              const LiveInterval &MFMAInterval = LIS->getInterval(MFMAReg);
+              if (!MFMAInterval.liveAt(CurrentSlot)) {
+
+                // Add bidirectional avoidance hint
+                MFI->addRegisterToAvoid(CandidateReg, MFMAReg);
+                MFI->addRegisterToAvoid(MFMAReg, CandidateReg);
+
+                // Set hint if we found registers to avoid
+                MRI->setRegAllocationHint(
+                    MFMAReg, AMDGPURI::HasRegisterAvoidanceList, Register());
+                MRI->setRegAllocationHint(CandidateReg,
+                                          AMDGPURI::HasRegisterAvoidanceList,
+                                          Register());
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+
   for (unsigned I = 0, E = MRI->getNumVirtRegs(); I != E; ++I) {
     Register Reg = Register::index2VirtReg(I);
     if (!LIS->hasInterval(Reg))
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
index ca8f8033a2d54..17fb1f2a2db04 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
@@ -1207,6 +1207,20 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
   unsigned getMaxNumWorkGroupsX() const { return MaxNumWorkGroups[0]; }
   unsigned getMaxNumWorkGroupsY() const { return MaxNumWorkGroups[1]; }
   unsigned getMaxNumWorkGroupsZ() const { return MaxNumWorkGroups[2]; }
+
+  // Map of registers to avoid for a given register
+  DenseMap<Register, SmallVector<Register, 8>> RegisterAvoidanceMap;
+
+  void addRegisterToAvoid(Register VirtReg, Register AvoidReg) {
+    RegisterAvoidanceMap[VirtReg].push_back(AvoidReg);
+  }
+
+  ArrayRef<Register> getRegistersToAvoid(Register VirtReg) const {
+    auto It = RegisterAvoidanceMap.find(VirtReg);
+    if (It != RegisterAvoidanceMap.end())
+      return It->second;
+    return ArrayRef<Register>();
+  }
 };
 
 } // end namespace llvm
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index a1fcf26eab27b..61c4f19c7111a 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -3838,6 +3838,38 @@ bool SIRegisterInfo::getRegAllocationHints(Register VirtReg,
     }
     return false;
   }
+  case AMDGPURI::HasRegisterAvoidanceList: {
+    const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
+    ArrayRef<Register> AvoidRegs = MFI->getRegistersToAvoid(VirtReg);
+
+    if (AvoidRegs.empty())
+      return TargetRegisterInfo::getRegAllocationHints(VirtReg, Order, Hints,
+                                                       MF, VRM);
+    // Collect physical registers to avoid
+    SmallSet<MCPhysReg, 32> AvoidPhysRegs;
+    for (Register AvoidReg : AvoidRegs) {
+      if (VRM && VRM->hasPhys(AvoidReg)) {
+        // Virtual register already mapped - try to avoid its physical register
+        MCPhysReg AvoidPhys = VRM->getPhys(AvoidReg);
+        for (MCRegAliasIterator AI(AvoidPhys, this, true); AI.isValid(); ++AI)
+          AvoidPhysRegs.insert(*AI);
+      }
+    }
+
+    if (AvoidPhysRegs.empty()) {
+      // No physical registers added yet - use default order
+      return TargetRegisterInfo::getRegAllocationHints(VirtReg, Order, Hints,
+                                                       MF, VRM);
+    }
+
+    // Prioritize registers that don't conflict with avoided registers
+    for (MCPhysReg PhysReg : Order) {
+      if (!AvoidPhysRegs.count(PhysReg) && !MRI.isReserved(PhysReg))
+        Hints.push_back(PhysReg);
+    }
+
+    return false;
+  }
   default:
     return TargetRegisterInfo::getRegAllocationHints(VirtReg, Order, Hints, MF,
                                                      VRM);
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index eeefef1116aa3..8f1ca4f18afb9 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -31,9 +31,11 @@ class RegisterBank;
 struct SGPRSpillBuilder;
 
 /// Register allocation hint types. Helps eliminate unneeded COPY with True16
+/// HasRegisterAvoidanceList helps with minimizing usage of conflicting physical
+/// registers
 namespace AMDGPURI {
 
-enum { Size16 = 1, Size32 = 2 };
+enum { Size16 = 1, Size32 = 2, HasRegisterAvoidanceList = 3 };
 
 } // end namespace AMDGPURI
 
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir
index aad6e031aa9ed..3996a94e0347e 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir
@@ -15,9 +15,12 @@
   ; GCN-NEXT:    ; implicit-def: $vgpr76_vgpr77_vgpr78_vgpr79
   ; GCN-NEXT:    ; implicit-def: $vgpr106
   ; GCN-NEXT:    ; implicit-def: $vgpr132
+  ; GCN-NEXT:    ; implicit-def: $vgpr112
+  ; GCN-NEXT:    ; implicit-def: $vgpr113
+  ; GCN-NEXT:    ; implicit-def: $vgpr114
+  ; GCN-NEXT:    ; implicit-def: $vgpr115
   ; GCN-NEXT:    ; implicit-def: $vgpr133
   ; GCN-NEXT:    ; implicit-def: $vgpr139
-  ; GCN-NEXT:    ; implicit-def: $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127
   ; GCN-NEXT:    ; iglp_opt mask(0x00000002)
   ; GCN-NEXT:    ; implicit-def: $sgpr0
   ; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
@@ -167,46 +170,45 @@
   ; GCN-NEXT:    buffer_wbl2 sc0 sc1
   ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
   ; GCN-NEXT:    ds_write_b128 v95, v[68:71] offset:1024
-  ; GCN-NEXT:    ; implicit-def: $vgpr64
-  ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[0:15], v[72:73], v[76:77], v[0:15]
-  ; GCN-NEXT:    v_add_u32_e32 v72, 0xc0, v93
-  ; GCN-NEXT:    ; implicit-def: $vgpr73
-  ; GCN-NEXT:    v_add_u32_e32 v76, v132, v64
   ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
   ; GCN-NEXT:    buffer_load_dwordx4 v[64:67], v92, s[8:11], 0 offen offset:192 sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
+  ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[0:15], v[72:73], v[76:77], v[0:15]
+  ; GCN-NEXT:    v_add_u32_e32 v72, 0xc0, v93
+  ; GCN-NEXT:    v_add_u32_e32 v73, v132, v112
   ; GCN-NEXT:    buffer_load_dwordx4 v[68:71], v72, s[8:11], 0 offen sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
   ; GCN-NEXT:    ; kill: killed $vgpr72
-  ; GCN-NEXT:    v_add_u32_e32 v72, v132, v73
-  ; GCN-NEXT:    buffer_load_dwordx2 v[98:99], v76, s[0:3], 0 offen sc0 sc1
+  ; GCN-NEXT:    v_add_u32_e32 v72, v132, v113
+  ; GCN-NEXT:    buffer_load_dwordx2 v[98:99], v73, s[0:3], 0 offen sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
   ; GCN-NEXT:    buffer_load_dwordx2 v[102:103], v72, s[0:3], 0 offen sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
-  ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[0:15], v[74:75], v[78:79], v[0:15]
-  ; GCN-NEXT:    ; implicit-def: $vgpr74
-  ; GCN-NEXT:    v_add_u32_e32 v72, v132, v74
-  ; GCN-NEXT:    ; implicit-def: $vgpr75
+  ; GCN-NEXT:    v_add_u32_e32 v72, v132, v114
   ; GCN-NEXT:    buffer_load_dwordx2 v[100:101], v72, s[0:3], 0 offen sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
-  ; GCN-NEXT:    v_add_u32_e32 v72, v132, v75
+  ; GCN-NEXT:    v_add_u32_e32 v72, v132, v115
   ; GCN-NEXT:    buffer_load_dwordx2 v[104:105], v72, s[0:3], 0 offen sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
   ; GCN-NEXT:    ;;#ASMSTART
   ; GCN-NEXT:    s_waitcnt vmcnt(8)
   ; GCN-NEXT:    ;;#ASMEND
+  ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[0:15], v[74:75], v[78:79], v[0:15]
+  ; GCN-NEXT:    ; kill: killed $vgpr73
   ; GCN-NEXT:    ds_read_b128 v[72:75], v94
   ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
-  ; GCN-NEXT:    ; kill: killed $vgpr76
   ; GCN-NEXT:    ; implicit-def: $vgpr76_vgpr77_vgpr78_vgpr79
   ; GCN-NEXT:    ; implicit-def: $sgpr8
+  ; GCN-NEXT:    ; implicit-def: $vgpr112
+  ; GCN-NEXT:    ; implicit-def: $vgpr113
+  ; GCN-NEXT:    ; implicit-def: $vgpr114
   ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[48:63], v[72:73], v[76:77], v[48:63]
   ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[48:63], v[74:75], v[78:79], v[48:63]
   ; GCN-NEXT:    ds_read_b128 v[72:75], v94 offset:512
@@ -411,8 +413,6 @@
   ; GCN-NEXT:    v_max3_f32 v64, v64, v65, v66
   ; GCN-NEXT:    ; implicit-def: $vgpr65
   ; GCN-NEXT:    ; implicit-def: $vgpr66
-  ; GCN-NEXT:    ; implicit-def: $vgpr68
-  ; GCN-NEXT:    ; implicit-def: $vgpr67
   ; GCN-NEXT:    v_add_u32_e32 v65, s7, v65
   ; GCN-NEXT:    v_and_b32_e32 v65, 0x1fffffff, v65
   ; GCN-NEXT:    v_mul_lo_u32 v65, v65, s6
@@ -440,40 +440,36 @@
   ; GCN-NEXT:    buffer_wbl2 sc0 sc1
   ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
   ; GCN-NEXT:    ds_write_b64 v138, v[96:97]
-  ; GCN-NEXT:    v_add_u32_e32 v68, v132, v68
+  ; GCN-NEXT:    ; implicit-def: $vgpr96
   ; GCN-NEXT:    v_cndmask_b32_e64 v64, v65, v64, s[6:7]
   ; GCN-NEXT:    v_max_f32_e32 v64, v64, v64
   ; GCN-NEXT:    ; implicit-def: $vgpr65
   ; GCN-NEXT:    v_max_f32_e32 v66, v65, v65
   ; GCN-NEXT:    v_max_f32_e32 v134, v66, v64
-  ; GCN-NEXT:    ; implicit-def: $vgpr64
+  ; GCN-NEXT:    v_add_u32_e32 v64, v132, v96
   ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
-  ; GCN-NEXT:    buffer_load_dwordx2 v[156:157], v68, s[0:3], 0 offen sc0 sc1
+  ; GCN-NEXT:    buffer_load_dwordx2 v[160:161], v64, s[0:3], 0 offen sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
-  ; GCN-NEXT:    v_add_u32_e32 v64, v132, v64
-  ; GCN-NEXT:    buffer_load_dwordx2 v[158:159], v64, s[0:3], 0 offen sc0 sc1
+  ; GCN-NEXT:    v_add_u32_e32 v64, v132, v112
+  ; GCN-NEXT:    buffer_load_dwordx2 v[162:163], v64, s[0:3], 0 offen sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
-  ; GCN-NEXT:    ; implicit-def: $vgpr66
-  ; GCN-NEXT:    v_add_u32_e32 v64, v132, v66
+  ; GCN-NEXT:    v_add_u32_e32 v64, v132, v113
   ; GCN-NEXT:    buffer_load_dwordx2 v[128:129], v64, s[0:3], 0 offen sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
-  ; GCN-NEXT:    v_add_u32_e32 v64, v132, v67
+  ; GCN-NEXT:    v_add_u32_e32 v64, v132, v114
   ; GCN-NEXT:    buffer_load_dwordx2 v[130:131], v64, s[0:3], 0 offen sc0 sc1
   ; GCN-NEXT:    s_waitcnt vmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
-  ; GCN-NEXT:    v_fma_f32 v57, s4, v57, -v134
   ; GCN-NEXT:    v_fma_f32 v48, s4, v48, -v134
-  ; GCN-NEXT:    v_fma_f32 v96, s4, v58, -v134
-  ; GCN-NEXT:    v_mul_f32_e32 v57, 0x3fb8aa3b, v57
+  ; GCN-NEXT:    v_fma_f32 v57, s4, v57, -v134
   ; GCN-NEXT:    v_mul_f32_e32 v48, 0x3fb8aa3b, v48
   ; GCN-NEXT:    v_fma_f32 v64, s4, v49, -v134
-  ; GCN-NEXT:    v_exp_f32_e32 v163, v57
-  ; GCN-NEXT:    v_mul_f32_e32 v57, 0x3fb8aa3b, v96
+  ; GCN-NEXT:    v_mul_f32_e32 v57, 0x3fb8aa3b, v57
   ; GCN-NEXT:    v_fma_f32 v66, s4, v50, -v134
-  ; GCN-NEXT:    v_exp_f32_e32 v164, v57
+  ; GCN-NEXT:    v_exp_f32_e32 v165, v57
   ; GCN-NEXT:    v_exp_f32_e32 v49, v48
   ; GCN-NEXT:    v_mul_f32_e32 v48, 0x3fb8aa3b, v64
   ; GCN-NEXT:    v_fma_f32 v67, s4, v51, -v134
@@ -499,31 +495,30 @@
   ; GCN-NEXT:    v_mul_f32_e32 v48, 0x3fb8aa3b, v70
   ; GCN-NEXT:    v_exp_f32_e32 v55, v48
   ; GCN-NEXT:    v_mul_f32_e32 v48, 0x3fb8aa3b, v71
-  ; GCN-NEXT:    ds_read_b128 v[144:147], v139 offset:576
-  ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
-  ; GCN-NEXT:    buffer_inv sc0 sc1
   ; GCN-NEXT:    v_fma_f32 v66, s4, v56, -v134
   ; GCN-NEXT:    v_exp_f32_e32 v56, v48
   ; GCN-NEXT:    v_sub_f32_e32 v48, v65, v134
+  ; GCN-NEXT:    ds_read_b128 v[144:147], v139 offset:576
+  ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
+  ; GCN-NEXT:    buffer_inv sc0 sc1
   ; GCN-NEXT:    v_cvt_f16_f32_e32 v64, v49
   ; GCN-NEXT:    v_cvt_f16_f32_e32 v67, v50
   ; GCN-NEXT:    v_cvt_f16_f32_e32 v68, v51
+  ; GCN-NEXT:    v_fma_f32 v96, s4, v58, -v134
   ; GCN-NEXT:    v_cvt_f16_f32_e32 v58, v52
   ; GCN-NEXT:    v_mul_f32_e32 v48, 0x3fb8aa3b, v48
   ; GCN-NEXT:    ds_read_b128 v[148:151], v139 offset:1152
   ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
   ; GCN-NEXT:    v_exp_f32_e32 v48, v48
-  ; GCN-NEXT:    v_pack_b32_f16 v161, v68, v58
-  ; GCN-NEXT:    v_pack_b32_f16 v160, v64, v67
-  ; GCN-NEXT:    v_mul_f32_e32 v58, 0x3fb8aa3b, v66
+  ; GCN-NEXT:    v_fma_f32 v156, s4, v59, -v134
+  ; GCN-NEXT:    v_pack_b32_f16 v59, v68, v58
+  ; GCN-NEXT:    v_pack_b32_f16 v58, v64, v67
+  ; GCN-NEXT:    v_mul_f32_e32 v80, 0x3fb8aa3b, v66
   ; GCN-NEXT:    ; implicit-def: $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79
   ; GCN-NEXT:    ds_read_b128 v[152:155], v139 offset:1728
   ; GCN-NEXT:    s_waitcnt lgkmcnt(0)
   ; GCN-NEXT:    buffer_inv sc0 sc1
-  ; GCN-NEXT:    v_fma_f32 v162, s4, v61, -v134
-  ; GCN-NEXT:    v_cvt_f16_f32_e32 v61, v55
-  ; GCN-NEXT:    v_cvt_f16_f32_e32 v57, v56
   ; GCN-NEXT:    v_pk_mul_f32 v[64:65], v[64:65], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[66:67], v[66:67], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[68:69], v[68:69], v[48:49] op_sel_hi:[1,0]
@@ -532,10 +527,14 @@
   ; GCN-NEXT:    v_pk_mul_f32 v[74:75], v[74:75], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[76:77], v[76:77], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[78:79], v[78:79], v[48:49] op_sel_hi:[1,0]
+  ; GCN-NEXT:    v_mul_f32_e32 v57, 0x3fb8aa3b, v96
+  ; GCN-NEXT:    ; implicit-def: $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111
+  ; GCN-NEXT:    v_fma_f32 v157, s4, v60, -v134
+  ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[64:79], v[140:141], v[58:59], v[64:79]
+  ; GCN-NEXT:    v_exp_f32_e32 v141, v80
   ; GCN-NEXT:    ; implicit-def: $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95
-  ; GCN-NEXT:    v_fma_f32 v59, s4, v59, -v134
+  ; GCN-NEXT:    v_pk_mul_f32 v[96:97], v[96:97], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[80:81], v[80:81], v[48:49] op_sel_hi:[1,0]
-  ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[64:79], v[140:141], v[160:161], v[64:79]
   ; GCN-NEXT:    v_pk_mul_f32 v[82:83], v[82:83], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[84:85], v[84:85], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[86:87], v[86:87], v[48:49] op_sel_hi:[1,0]
@@ -543,10 +542,6 @@
   ; GCN-NEXT:    v_pk_mul_f32 v[90:91], v[90:91], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[92:93], v[92:93], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[94:95], v[94:95], v[48:49] op_sel_hi:[1,0]
-  ; GCN-NEXT:    ; implicit-def: $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111
-  ; GCN-NEXT:    v_exp_f32_e32 v58, v58
-  ; GCN-NEXT:    v_pk_mul_f32 v[96:97], v[96:97], v[48:49] op_sel_hi:[1,0]
-  ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[80:95], v[144:145], v[160:161], v[80:95]
   ; GCN-NEXT:    v_pk_mul_f32 v[98:99], v[98:99], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[100:101], v[100:101], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[102:103], v[102:103], v[48:49] op_sel_hi:[1,0]
@@ -554,258 +549,263 @@
   ; GCN-NEXT:    v_pk_mul_f32 v[106:107], v[106:107], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[108:109], v[108:109], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[110:111], v[110:111], v[48:49] op_sel_hi:[1,0]
-  ; GCN-NEXT:    v_pack_b32_f16 v145, v61, v57
-  ; GCN-NEXT:    v_mul_f32_e32 v57, 0x3fb8aa3b, v59
   ; GCN-NEXT:    v_cvt_f16_f32_e32 v140, v53
-  ; GCN-NEXT:    v_cvt_f16_f32_e32 v141, v54
-  ; GCN-NEXT:    v_exp_f32_e32 v59, v57
-  ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[96:111], v[148:149], v[160:161], v[96:111]
-  ; GCN-NEXT:    v_fma_f32 v60, s4, v60, -v134
+  ; GCN-NEXT:    v_mfma_f32_32x32x8_f16 v[80:95], v[144:145], v[58:59], v[80:95]
+  ; GCN-NEXT:    v_cvt_f16_f32_e32 v144, v54
+  ; GCN-NEXT:    v_cvt_f16_f32_e32 v145, v55
+  ; GCN-NEXT:    v_exp_f32_e32 v167, v57
+  ; GCN-NEXT:    ; implicit-def: $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127
+  ; GCN-NEXT:    v_mul_f32_e32 v168, 0x3fb8aa3b, v157
   ; GCN-NEXT:    v_pk_mul_f32 v[112:113], v[112:113], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEXT:    v_pk_mul_f32 v[114:115], v[114:115], v[48:49] op_sel_hi:[1,0]
   ; GCN-NEX...
[truncated]

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

arsenm · 2025-09-04T23:58:57Z

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.hint.hazard.barrier.gfx942.mir

+    %213:vreg_128_align2 = contract V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 %206.sub0_sub1:vreg_128_align2, %74.sub0_sub1:vreg_128_align2, %174:vreg_128_align2, 0, 0, 0, implicit $mode, implicit $exec
+    %948:vreg_128_align2 = contract V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 %206.sub2_sub3:vreg_128_align2, %74.sub2_sub3:vreg_128_align2, %213:vreg_128_align2, 0, 0, 0, implicit $mode, implicit $exec
+    %931:vreg_128_align2 = DS_READ_B128_gfx9 %4:vgpr_32, 0, 0, implicit $exec :: (load (s128), addrspace 3)
+    SCHED_GROUP_BARRIER 8, 1, 0


What are all these sched_group_barriers doing at the end?

Updated the test. Please check.

arsenm · 2025-09-04T23:59:32Z

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.hint.hazard.barrier.gfx942.mir

+  attributes #0 = {nounwind "amdgpu-waves-per-eu"="2"  "amdgpu-agpr-alloc"="0" "frame-pointer"="none"}
+
+...
+---


This is much too big of a function for this test

Updated the test. Please check.

github-actions · 2025-09-23T00:07:55Z

✅ With the latest revision this PR passed the C/C++ code formatter.

srpande

Since this change is actually introducing anti-hints in general in RA, I would rephrase the original summary to include anti-hint.
The key part of this work is introduction of anti-hint and using that in allocation order to improve RA.
Can you please reword that?

mssefat · 2025-09-23T16:35:03Z

Updated the original summary to include anti-hints.

arsenm

This should split the MIR changes into a separate PR from the amdgpu changes

arsenm · 2025-09-24T03:45:46Z

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.anti-hints-parse.gfx942.mir

+  ; ModuleID = '/work/mdssefat/FullTimeWork/MLSCHED/composable_kernel/noopexample/llvm.amdgcn.mfma.hint.haard.barrier.gfx942_short.mir'
+  source_filename = "/work/mdssefat/FullTimeWork/MLSCHED/composable_kernel/noopexample/llvm.amdgcn.mfma.hint.haard.barrier.gfx942_short.mir"
+  target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+  target triple = "amdgcn-amd-amdhsa"
+
+  ; Function Attrs: nounwind


Suggested change

; ModuleID = '/work/mdssefat/FullTimeWork/MLSCHED/composable_kernel/noopexample/llvm.amdgcn.mfma.hint.haard.barrier.gfx942_short.mir'

source_filename = "/work/mdssefat/FullTimeWork/MLSCHED/composable_kernel/noopexample/llvm.amdgcn.mfma.hint.haard.barrier.gfx942_short.mir"

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128:128:48-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"

target triple = "amdgcn-amd-amdhsa"

; Function Attrs: nounwind

I have removed the changes related to MIR print and parse and will create a separate PR once this PR gets approved.

arsenm · 2025-09-24T03:46:10Z

llvm/include/llvm/CodeGen/MIRYamlMapping.h

@@ -209,6 +210,11 @@ template <> struct MappingTraits<VirtualRegisterDefinition> {
                       StringValue()); // Don't print out when it's empty.
    YamlIO.mapOptional("flags", Reg.RegisterFlags,
                       std::vector<FlowStringValue>());
+    if (!YamlIO.outputting() ||


This all needs dedicated MIR parsing tests in test/CodeGen/MIR, including error cases

Will update on the MIR related PR.

arsenm · 2025-09-24T03:46:54Z

llvm/include/llvm/CodeGen/MachineRegisterInfo.h

+    assert(VReg.isVirtual() && AntiHintVReg.isVirtual());
+    if (!AntiHintRegs.inBounds(VReg))
+      return false;
+    const auto &AntiHints = AntiHintRegs[VReg];


Fixed it. Please check.

arsenm · 2025-09-24T03:47:12Z

llvm/lib/CodeGen/MachineRegisterInfo.cpp

+  // Remove duplicates
+  llvm::sort(PhysAntiHints);
+  PhysAntiHints.erase(llvm::unique(PhysAntiHints), PhysAntiHints.end());
+}


Missing newline at end of file

Fixed it. Please check.

arsenm · 2025-09-24T03:47:48Z

llvm/lib/CodeGen/MachineRegisterInfo.cpp

+    const VirtRegMap *VRM) const {
+  assert(VReg.isVirtual());
+  if (!AntiHintRegs.inBounds(VReg) || !VRM)
+    return;


Just make VRM mandatory?

Fixed it. Please check.

arsenm · 2025-09-24T03:48:46Z

llvm/lib/CodeGen/MachineRegisterInfo.cpp

+    // Check if the anti-hinted register has been allocated
+    if (VRM->hasPhys(AntiHintVReg)) {
+      MCPhysReg PhysReg = VRM->getPhys(AntiHintVReg);
+      // Add the physical register and all its aliases
+      for (MCRegAliasIterator AI(PhysReg, TRI, true); AI.isValid(); ++AI) {
+        PhysAntiHints.push_back(*AI);
+      }
+    }


This code should not be trying to handle register aliases, just return the raw value. How to treat aliases is context dependent

Fixed it. Please check.

rovka · 2025-09-24T09:23:55Z

llvm/lib/CodeGen/AllocationOrder.cpp

+                                     const TargetRegisterInfo *TRI) {
+  // Create filtered order
+  FilteredOrderStorage.clear();
+  FilteredOrderStorage.reserve(Order.size());


Can FilteredOrderStorage be local to this method? Also, would it work to use std::stable_partition or something instead of looping twice?

Fixed it. Please check. I think FilteredOrderStorage should not be local as it needs to provide persistent storage.

rovka · 2025-09-24T09:33:10Z

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp

+          // Only collect Matrix C (operand 3) and destination (operand 0)
+          // registers
+          collectMFMARegister(0);
+          collectMFMARegister(3);


I haven't looked at all the MFMA instruction definitions, but if they all have consistent names for their operands you could use getNamedOperand instead of hardcoding the operand indices.

Fixed it. Please check.

rovka · 2025-10-06T08:04:49Z

llvm/include/llvm/CodeGen/MachineRegisterInfo.h

+  void setRegAllocationAntiHint(Register VReg, Register AntiHintVReg) {
+    assert(VReg.isVirtual() && "Anti-hints are only for virtual registers");
+    assert(AntiHintVReg.isVirtual() && "Anti-hint target must be virtual");
+    AntiHintRegs.grow(Register::index2VirtReg(getNumVirtRegs()));


Do you expect to have anti-hints for most virtual registers? If not, it's probably better to grow just enough to hold VReg.

Fixed it. Please check.

rovka · 2025-10-06T08:11:32Z

llvm/include/llvm/CodeGen/MachineRegisterInfo.h

+  /// setRegAllocationAntiHint - Add a register allocation anti-hint for the
+  /// specified virtual register. This tells the allocator to avoid allocating
+  /// VReg to the same physical register as AntiHintVReg (or overlapping ones).
+  void setRegAllocationAntiHint(Register VReg, Register AntiHintVReg) {


Suggested change

void setRegAllocationAntiHint(Register VReg, Register AntiHintVReg) {

void addRegAllocAntiHint(Register VReg, Register AntiHintVReg) {

set makes it sound like you can only have one.

Fixed it. Please check.

rovka · 2025-10-06T08:31:40Z

llvm/lib/CodeGen/AllocationOrder.h


 class LLVM_LIBRARY_VISIBILITY AllocationOrder {
  const SmallVector<MCPhysReg, 16> Hints;
+  SmallVector<MCPhysReg, 16> FilteredOrderStorage;


Suggested change

SmallVector<MCPhysReg, 16> FilteredOrderStorage;

// Used as storage if the Order received in the constructor needs to be altered.

SmallVector<MCPhysReg, 16> FilteredOrderStorage;

Fixed it. Please check.

rovka · 2025-10-06T08:37:32Z

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp


  bool Changed = false;

+  // Single pass implementation


Suggested change

// Single pass implementation

Single pass implementation of what? This comment isn't really helping :) If you want to have a comment here, it should describe why we need to process MFMA instructions.

Fixed it. Please check.

rovka · 2025-10-06T08:38:02Z

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp

+    // Max lookback window for RAW or WAW hazard
+    constexpr unsigned MaxLookbackWindow = 19;
+    for (const MachineBasicBlock &MBB : MF) {
+


Suggested change

Fixed it. Please check.

rovka · 2025-10-06T08:45:46Z

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp

+                                << " not found\n");
+              return;
+            }
+            const MachineOperand &MO = MI.getOperand(OpIdx);


Why not TII->getNamedOperand() instead?

Fixed it. Please check.

rovka · 2025-10-06T10:11:05Z

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp

+            for (Register MFMAReg : MFMARegs) {
+              // Verify register class compatibility
+              const TargetRegisterClass *MFMARC = MRI->getRegClass(MFMAReg);
+              if (!TRI->hasVGPRs(MFMARC))


Maybe don't add them to MFMARegs to begin with? Or are you planning to do something with non-VGPR regs soon?

Fixed it. Please check.

rovka · 2025-10-06T10:30:10Z

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp

+
+              // Check if MFMA register is dead at current instruction
+              const LiveInterval &MFMAInterval = LIS->getInterval(MFMAReg);
+              if (!MFMAInterval.liveAt(CurrentSlot)) {


Why are you storing the SlotIndex for all the MFMAs, if you're only interested in the CurrentSlot?

Fixed it. Please check.

rebased

…e PR (cherry picked from commit 7732ae8ae1080ab030db1939141350abc7aa265d)

(cherry picked from commit ee6d876fcc3d84d6ea3a68b3eee1ce97e714b6e6)

llvmbot added the backend:AMDGPU label Sep 4, 2025

bcahoon requested review from arsenm, srpande and jrbyrnes and removed request for srpande September 4, 2025 20:11

arsenm reviewed Sep 5, 2025

View reviewed changes

mssefat force-pushed the reg-hint-avoid-hazard branch from ee1ade0 to 0b0597a Compare September 23, 2025 00:04

mssefat force-pushed the reg-hint-avoid-hazard branch from 0b0597a to 5122167 Compare September 23, 2025 00:24

srpande reviewed Sep 23, 2025

View reviewed changes

mssefat changed the title ~~[AMDGPU] Improve register allocation to reduce MFMA hazard NOPs~~ [AMDGPU] Register allocation anti-hints to reduce MFMA hazard NOPs Sep 23, 2025

arsenm reviewed Sep 24, 2025

View reviewed changes

rovka reviewed Sep 24, 2025

View reviewed changes

srpande requested a review from rovka September 24, 2025 17:22

mssefat force-pushed the reg-hint-avoid-hazard branch from fc6b1f6 to 6963db4 Compare September 26, 2025 17:36

rovka reviewed Oct 6, 2025

View reviewed changes

mssefat added 11 commits October 7, 2025 18:47

[AMDGPU] Improve register allocation to reduce MFMA hazard NOPs

7ef2da8

rebased

Rebase

7d3f015

rebase test files

f8df624

Resotred SIRegisterInfo files

2d8ef8b

Resotred SIMachineFunctionInfo files

f0f214b

Updated sources to support anti-hint mechanism

9b0ca6f

Made anti-hints map conditional in MIRYamlMapping

e34e16b

Updated tests

887b556

Updated mir test

19d7119

Renamed test file

cd0f88d

Added print and parse tests

7a8d17e

mssefat added 9 commits October 7, 2025 18:52

Fixed typo

3a84b9b

Fixed typo

b893331

Fixed test!

f04eb48

Fixed test

e82a0e2

Fixed typo

a9b05ec

[AMDGPU] Anti-hints in register allocation

c7a4140

Modified flag name to reflect anti-hints

893d522

[NFC] Restore and remove to move MIR serialization changes to separat…

9fe1c13

…e PR (cherry picked from commit 7732ae8ae1080ab030db1939141350abc7aa265d)

Named operand and stable partition applied

6d8e044

(cherry picked from commit ee6d876fcc3d84d6ea3a68b3eee1ce97e714b6e6)

mssefat force-pushed the reg-hint-avoid-hazard branch from 83d34f8 to 6d8e044 Compare October 8, 2025 00:59

	void setRegAllocationAntiHint(Register VReg, Register AntiHintVReg) {
	void addRegAllocAntiHint(Register VReg, Register AntiHintVReg) {

	SmallVector<MCPhysReg, 16> FilteredOrderStorage;
	// Used as storage if the Order received in the constructor needs to be altered.
	SmallVector<MCPhysReg, 16> FilteredOrderStorage;

[AMDGPU] Register allocation anti-hints to reduce MFMA hazard NOPs #156943

Are you sure you want to change the base?

[AMDGPU] Register allocation anti-hints to reduce MFMA hazard NOPs #156943

Uh oh!

Conversation

mssefat commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 4, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srpande left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mssefat commented Sep 23, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mssefat commented Sep 4, 2025 •

edited

Loading

github-actions bot commented Sep 23, 2025 •

edited

Loading

srpande left a comment •

edited

Loading