Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Prefer lower total register usage in regions with spilling #71882

Closed
wants to merge 1 commit into from

Conversation

jrbyrnes
Copy link
Contributor

In general, we say RP is less if it:

  1. has better occupancy
  2. has same occupancy, but better tuple register pressure
  3. has same occupancy, and same tuple register pressure, but better register pressure.

This is good for the general case, but is not good for the case when there is spilling. In cases with spilling, we should prefer the schedule with less RP rather than giving preference to tuple RP. Otherwise, we may increase RP by hundreds in order to save a few tuples.

A note on the test:
For the OccInitialSchedule stage we have
PressureAfter: VGPRs: 67 AGPRs: 0, SGPRs: 8, LVGPR WT: 0, LSGPR WT: 8
PressureBefore: VGPRs: 72 AGPRs: 0, SGPRs: 4, LVGPR WT: 0, LSGPR WT: 4

We have improved VGPR RP at expense of SGPR Tuple pressure. The scheduler decided to define and use the second sgpr_128 before killing the first. We should not prefer the original order.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 10, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Jeffrey Byrnes (jrbyrnes)

Changes

In general, we say RP is less if it:

  1. has better occupancy
  2. has same occupancy, but better tuple register pressure
  3. has same occupancy, and same tuple register pressure, but better register pressure.

This is good for the general case, but is not good for the case when there is spilling. In cases with spilling, we should prefer the schedule with less RP rather than giving preference to tuple RP. Otherwise, we may increase RP by hundreds in order to save a few tuples.

A note on the test:
For the OccInitialSchedule stage we have
PressureAfter: VGPRs: 67 AGPRs: 0, SGPRs: 8, LVGPR WT: 0, LSGPR WT: 8
PressureBefore: VGPRs: 72 AGPRs: 0, SGPRs: 4, LVGPR WT: 0, LSGPR WT: 4

We have improved VGPR RP at expense of SGPR Tuple pressure. The scheduler decided to define and use the second sgpr_128 before killing the first. We should not prefer the original order.


Patch is 20.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71882.diff

5 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp (+8-6)
  • (modified) llvm/lib/Target/AMDGPU/GCNRegPressure.cpp (+20-2)
  • (modified) llvm/lib/Target/AMDGPU/GCNRegPressure.h (+3-2)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+1-2)
  • (added) llvm/test/CodeGen/AMDGPU/spill-regpressure-less.mir (+353)
diff --git a/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp b/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
index d89c9b1febded0f..62a03eec141671e 100644
--- a/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
@@ -410,9 +410,11 @@ void GCNIterativeScheduler::scheduleRegion(Region &R, Range &&Schedule,
 // Sort recorded regions by pressure - highest at the front
 void GCNIterativeScheduler::sortRegionsByPressure(unsigned TargetOcc) {
   const auto &ST = MF.getSubtarget<GCNSubtarget>();
-  llvm::sort(Regions, [&ST, TargetOcc](const Region *R1, const Region *R2) {
-    return R2->MaxPressure.less(ST, R1->MaxPressure, TargetOcc);
-  });
+  const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
+  llvm::sort(
+      Regions, [&ST, TargetOcc, &MFI](const Region *R1, const Region *R2) {
+        return R2->MaxPressure.less(ST, *MFI, R1->MaxPressure, TargetOcc);
+      });
 }
 
 ///////////////////////////////////////////////////////////////////////////////
@@ -524,19 +526,19 @@ void GCNIterativeScheduler::scheduleMinReg(bool force) {
 
   auto MaxPressure = Regions.front()->MaxPressure;
   for (auto *R : Regions) {
-    if (!force && R->MaxPressure.less(ST, MaxPressure, TgtOcc))
+    if (!force && R->MaxPressure.less(ST, *MFI, MaxPressure, TgtOcc))
       break;
 
     BuildDAG DAG(*R, *this);
     const auto MinSchedule = makeMinRegSchedule(DAG.getTopRoots(), *this);
 
     const auto RP = getSchedulePressure(*R, MinSchedule);
-    LLVM_DEBUG(if (R->MaxPressure.less(ST, RP, TgtOcc)) {
+    LLVM_DEBUG(if (R->MaxPressure.less(ST, *MFI, RP, TgtOcc)) {
       dbgs() << "\nWarning: Pressure becomes worse after minreg!";
       printSchedRP(dbgs(), R->MaxPressure, RP);
     });
 
-    if (!force && MaxPressure.less(ST, RP, TgtOcc))
+    if (!force && MaxPressure.less(ST, *MFI, RP, TgtOcc))
       break;
 
     scheduleRegion(*R, MinSchedule, RP);
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
index a04c470b7b9762f..9f6ccd3974a3842 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
@@ -13,6 +13,7 @@
 
 #include "GCNRegPressure.h"
 #include "AMDGPU.h"
+#include "SIMachineFunctionInfo.h"
 #include "llvm/CodeGen/RegisterPressure.h"
 
 using namespace llvm;
@@ -89,7 +90,8 @@ void GCNRegPressure::inc(unsigned Reg,
 }
 
 bool GCNRegPressure::less(const GCNSubtarget &ST,
-                          const GCNRegPressure& O,
+                          const SIMachineFunctionInfo &MFI,
+                          const GCNRegPressure &O,
                           unsigned MaxOccupancy) const {
   const auto SGPROcc = std::min(MaxOccupancy,
                                 ST.getOccupancyWithNumSGPRs(getSGPRNum()));
@@ -104,6 +106,8 @@ bool GCNRegPressure::less(const GCNSubtarget &ST,
 
   const auto Occ = std::min(SGPROcc, VGPROcc);
   const auto OtherOcc = std::min(OtherSGPROcc, OtherVGPROcc);
+
+  // Give first prefernce to the better occupancy
   if (Occ != OtherOcc)
     return Occ > OtherOcc;
 
@@ -115,7 +119,19 @@ bool GCNRegPressure::less(const GCNSubtarget &ST,
     SGPRImportant = false;
   }
 
-  // compare large regs pressure
+  // In regions with spilling, we should give prefernce to the schedule with
+  // less general RP.
+  if (Occ <= MFI.getMinWavesPerEU()) {
+    unsigned GPRPressure =
+        SGPRImportant ? getSGPRNum() : getVGPRNum(ST.hasGFX90AInsts());
+    unsigned OtherGPRPressure =
+        SGPRImportant ? O.getSGPRNum() : O.getVGPRNum(ST.hasGFX90AInsts());
+
+    if (GPRPressure != OtherGPRPressure)
+      return GPRPressure < OtherGPRPressure;
+  }
+
+  // Give second prefernce to less register tuple pressure
   bool SGPRFirst = SGPRImportant;
   for (int I = 2; I > 0; --I, SGPRFirst = !SGPRFirst) {
     if (SGPRFirst) {
@@ -130,6 +146,8 @@ bool GCNRegPressure::less(const GCNSubtarget &ST,
         return VW < OtherVW;
     }
   }
+
+  // Give final prefernce to less general RP
   return SGPRImportant ? (getSGPRNum() < O.getSGPRNum()):
                          (getVGPRNum(ST.hasGFX90AInsts()) <
                           O.getVGPRNum(ST.hasGFX90AInsts()));
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.h b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
index c750fe74749e2b3..2b428c44e46bb00 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.h
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
@@ -74,8 +74,9 @@ struct GCNRegPressure {
     return getOccupancy(ST) > O.getOccupancy(ST);
   }
 
-  bool less(const GCNSubtarget &ST, const GCNRegPressure& O,
-    unsigned MaxOccupancy = std::numeric_limits<unsigned>::max()) const;
+  bool less(const GCNSubtarget &ST, const SIMachineFunctionInfo &MFI,
+            const GCNRegPressure &O,
+            unsigned MaxOccupancy = std::numeric_limits<unsigned>::max()) const;
 
   bool operator==(const GCNRegPressure &O) const {
     return std::equal(&Value[0], &Value[TOTAL_KINDS], O.Value);
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index c3d60b635d3240a..975ba050e6be01f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -1184,8 +1184,7 @@ bool ILPInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {
 
 bool GCNSchedStage::mayCauseSpilling(unsigned WavesAfter) {
   if (WavesAfter <= MFI.getMinWavesPerEU() &&
-      !PressureAfter.less(ST, PressureBefore) &&
-      isRegionWithExcessRP()) {
+      !PressureAfter.less(ST, MFI, PressureBefore) && isRegionWithExcessRP()) {
     LLVM_DEBUG(dbgs() << "New pressure will result in more spilling.\n");
     return true;
   }
diff --git a/llvm/test/CodeGen/AMDGPU/spill-regpressure-less.mir b/llvm/test/CodeGen/AMDGPU/spill-regpressure-less.mir
new file mode 100644
index 000000000000000..567895c5fb434b2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/spill-regpressure-less.mir
@@ -0,0 +1,353 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -march=amdgcn -mcpu=gfx90a -start-before=machine-scheduler -stop-after=machine-scheduler -verify-machineinstrs -o - %s | FileCheck -check-prefix=GCN %s
+
+--- |
+  define amdgpu_kernel void @spill_regpressure_less() #0 {
+    ret void
+  }
+
+  attributes #0 = { "amdgpu-waves-per-eu"="8,8" }
+...
+
+---
+name:            spill_regpressure_less
+tracksRegLiveness: true
+machineFunctionInfo:
+  stackPtrOffsetReg: '$sgpr32'
+  occupancy:       8
+body:             |
+  bb.0:
+    ; GCN-LABEL: name: spill_regpressure_less
+    ; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF3:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF4:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF5:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF6:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF7:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF8:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF9:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF10:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF11:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF12:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF13:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF16:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF17:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF18:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF19:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF20:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF21:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF22:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF23:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF24:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF25:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF26:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF27:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF28:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF29:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF30:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF31:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF32:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF33:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF34:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF35:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF36:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF37:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF38:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF39:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF40:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF41:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF42:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF43:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF44:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF45:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF46:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF47:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF48:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF49:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF50:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF51:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF52:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF53:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF54:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF55:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF56:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF57:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF58:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF59:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF60:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF61:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF62:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF63:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF64:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF65:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF66:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */, implicit [[DEF]], implicit [[DEF1]], implicit [[DEF2]], implicit [[DEF3]], implicit [[DEF4]], implicit [[DEF5]], implicit [[DEF6]], implicit [[DEF7]], implicit [[DEF8]], implicit [[DEF9]], implicit [[DEF10]], implicit [[DEF11]], implicit [[DEF12]], implicit [[DEF13]], implicit [[DEF14]], implicit [[DEF15]], implicit [[DEF16]], implicit [[DEF17]], implicit [[DEF18]], implicit [[DEF19]], implicit [[DEF20]], implicit [[DEF21]], implicit [[DEF22]], implicit [[DEF23]], implicit [[DEF24]], implicit [[DEF25]], implicit [[DEF26]], implicit [[DEF27]], implicit [[DEF28]], implicit [[DEF29]], implicit [[DEF30]], implicit [[DEF31]], implicit [[DEF32]], implicit [[DEF33]], implicit [[DEF34]], implicit [[DEF35]], implicit [[DEF36]], implicit [[DEF37]], implicit [[DEF38]], implicit [[DEF39]], implicit [[DEF40]], implicit [[DEF41]], implicit [[DEF42]], implicit [[DEF43]], implicit [[DEF44]], implicit [[DEF45]], implicit [[DEF46]], implicit [[DEF47]], implicit [[DEF48]], implicit [[DEF49]], implicit [[DEF50]], implicit [[DEF51]], implicit [[DEF52]], implicit [[DEF53]], implicit [[DEF54]], implicit [[DEF55]], implicit [[DEF56]], implicit [[DEF57]], implicit [[DEF58]], implicit [[DEF59]], implicit [[DEF60]], implicit [[DEF61]], implicit [[DEF62]], implicit [[DEF63]], implicit [[DEF64]], implicit [[DEF65]], implicit [[DEF66]]
+    ; GCN-NEXT: KILL [[DEF]]
+    ; GCN-NEXT: KILL [[DEF1]]
+    ; GCN-NEXT: KILL [[DEF10]]
+    ; GCN-NEXT: KILL [[DEF12]]
+    ; GCN-NEXT: KILL [[DEF13]]
+    ; GCN-NEXT: KILL [[DEF14]]
+    ; GCN-NEXT: KILL [[DEF15]]
+    ; GCN-NEXT: KILL [[DEF16]]
+    ; GCN-NEXT: [[DEF67:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: KILL [[DEF17]]
+    ; GCN-NEXT: [[DEF68:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF69:%[0-9]+]]:sgpr_128 = IMPLICIT_DEF
+    ; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */, implicit [[DEF69]], implicit [[DEF23]], implicit [[DEF24]], implicit [[DEF25]], implicit [[DEF26]], implicit [[DEF27]], implicit [[DEF28]]
+    ; GCN-NEXT: KILL [[DEF2]]
+    ; GCN-NEXT: KILL [[DEF3]]
+    ; GCN-NEXT: KILL [[DEF4]]
+    ; GCN-NEXT: KILL [[DEF5]]
+    ; GCN-NEXT: KILL [[DEF6]]
+    ; GCN-NEXT: KILL [[DEF7]]
+    ; GCN-NEXT: KILL [[DEF8]]
+    ; GCN-NEXT: KILL [[DEF9]]
+    ; GCN-NEXT: KILL [[DEF18]]
+    ; GCN-NEXT: KILL [[DEF19]]
+    ; GCN-NEXT: [[DEF70:%[0-9]+]]:sgpr_128 = IMPLICIT_DEF
+    ; GCN-NEXT: INLINEASM &"", 1 /* sideeffect attdialect */, implicit [[DEF70]], implicit [[DEF2]], implicit [[DEF3]], implicit [[DEF4]], implicit [[DEF5]], implicit [[DEF6]], implicit [[DEF7]], implicit [[DEF8]], implicit [[DEF9]]
+    ; GCN-NEXT: KILL [[DEF69]], implicit-def %70, implicit-def %71, implicit-def %72, implicit-def %73, implicit-def %74, implicit-def %75, implicit-def %76, implicit-def %77
+    ; GCN-NEXT: [[DEF71:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: [[DEF72:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: KILL [[DEF20]]
+    ; GCN-NEXT: [[DEF73:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: KILL [[DEF11]]
+    ; GCN-NEXT: [[DEF74:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: KILL [[DEF21]]
+    ; GCN-NEXT: [[DEF75:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: KILL [[DEF22]]
+    ; GCN-NEXT: [[DEF76:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+    ; GCN-NEXT: KILL [[DEF23]]
+    ; GCN-NEXT: KILL [[DEF24]]
+    ; GCN-NEXT: KILL [[DEF25]]
+    ; GCN-NEXT: KILL [[DEF26]]
+    ; GCN-NEXT: KILL [[DEF27]]
+    ; GCN-NEXT: KILL [[DEF28]]
+    ; GCN-NEXT: KILL [[DEF29]]
+    ; GCN-NEXT: KILL [[DEF30]]
+    ; GCN-NEXT: KILL [[DEF31]]
+    ; GCN-NEXT: KILL [[DEF32]]
+    ; GCN-NEXT: KILL [[DEF33]]
+    ; GCN-NEXT: KILL [[DEF34]]
+    ; GCN-NEXT: KILL [[DEF35]]
+    ; GCN-NEXT: KILL [[DEF36]]
+    ; GCN-NEXT: KILL [[DEF37]]
+    ; GCN-NEXT: KILL [[DEF38]]
+    ; GCN-NEXT: KILL [[DEF39]]
+    ; GCN-NEXT: KILL [[DEF40]]
+    ; GCN-NEXT: KILL [[DEF41]]
+    ; GCN-NEXT: KILL [[DEF42]]
+    ; GCN-NEXT: KILL [[DEF43]]
+    ; GCN-NEXT: KILL [[DEF44]]
+    ; GCN-NEXT: KILL [[DEF45]]
+    ; GCN-NEXT: KILL [[DEF46]]
+    ; GCN-NEXT: KILL [[DEF47]]
+    ; GCN-NEXT: KILL [[DEF48]]
+    ; GCN-NEXT: KILL [[DEF49]]
+    ; GCN-NEXT: KILL [[DEF50]]
+    ; GCN-NEXT: KILL [[DEF51]]
+    ; GCN-NEXT: KILL [[DEF52]]
+    ; GCN-NEXT: KILL [[DEF53]]
+    ; GCN-NEXT: KILL [[DEF54]]
+    ; GCN-NEXT: KILL [[DEF55]]
+    ; GCN-NEXT: KILL [[DEF56]]
+    ; GCN-NEXT: KILL [[DEF57]]
+    ; GCN-NEXT: KILL [[DEF58]]
+    ; GCN-NEXT: KILL [[DEF59]]
+    ; GCN-NEXT: KILL [[DEF60]]
+    ; GCN-NEXT: KILL [[DEF61]]
+    ; GCN-NEXT: KILL [[DEF62]]
+    ; GCN-NEXT: KILL [[DEF63]]
+    ; GCN-NEXT: KILL [[DEF64]]
+    ; GCN-NEXT: KILL [[DEF65]]
+    ; GCN-NEXT: KILL [[DEF66]]
+    ; GCN-NEXT: KILL [[DEF67]]
+    ; GCN-NEXT: KILL [[DEF68]]
+    ; GCN-NEXT: KILL [[DEF71]]
+    ; GCN-NEXT: KILL [[DEF72]]
+    ; GCN-NEXT: KILL [[DEF73]]
+    ; GCN-NEXT: KILL [[DEF74]]
+    ; GCN-NEXT: KILL [[DEF75]]
+    ; GCN-NEXT: KILL [[DEF76]]
+    ; GCN-NEXT: KILL [[DEF70]]
+    ; GCN-NEXT: KILL %70
+    ; GCN-NEXT: KILL %71
+    ; GCN-NEXT: KILL %72
+    ; GCN-NEXT: KILL %73
+    ; GCN-NEXT: KILL %74
+    ; GCN-NEXT: KILL %75
+    ; GCN-NEXT: KILL %76
+    ; GCN-NEXT: KILL %77
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:vgpr_32 = IMPLICIT_DEF
+    %2:vgpr_32 = IMPLICIT_DEF
+    %3:vgpr_32 = IMPLICIT_DEF
+    %4:vgpr_32 = IMPLICIT_DEF
+    %5:vgpr_32 = IMPLICIT_DEF
+    %6:vgpr_32 = IMPLICIT_DEF
+    %7:vgpr_32 = IMPLICIT_DEF
+    %8:vgpr_32 = IMPLICIT_DEF
+    %9:vgpr_32 = IMPLICIT_DEF
+    %10:vgpr_32 = IMPLICIT_DEF
+    %11:vgpr_32 = IMPLICIT_DEF
+    %12:vgpr_32 = IMPLICIT_DEF
+    %13:vgpr_32 = IMPLICIT_DEF
+    %14:vgpr_32 = IMPLICIT_DEF
+    %15:vgpr_32 = IMPLICIT_DEF
+    %16:vgpr_32 = IMPLICIT_DEF
+    %17:vgpr_32 = IMPLICIT_DEF
+    %18:vgpr_32 = IMPLICIT_DEF
+    %19:vgpr_32 = IMPLICIT_DEF
+    %20:vgpr_32 = IMPLICIT_DEF
+    %21:vgpr_32 = IMPLICIT_DEF
+    %22:vgpr_32 = IMPLICIT_DEF
+    %23:vgpr_32 = IMPLICIT_DEF
+    %24:vgpr_32 = IMPLICIT_DEF
+    %25:vgpr_32 = IMPLICIT_DEF
+    %26:vgpr_32 = IMPLICIT_DEF
+    %27:vgpr_32 = IMPLICIT_DEF
+    %28:vgpr_32 = IMPLICIT_DEF
+    %29:vgpr_32 = IMPLICIT_DEF
+    %30:vgpr_32 = IMPLICIT_DEF
+    %31:vgpr_32 = IMPLICIT_DEF
+    %32:vgpr_32 = IMPLICIT_DEF
+    %33:vgpr_32 = IMPLICIT_DEF
+    %34:vgpr_32 = IMPLICIT_DEF
+    %35:vgpr_32 = IMPLICIT_DEF
+    %36:vgpr_32 = IMPLICIT_DEF
+    %37:vgpr_32 = IMPLICIT_DEF
+    %38:vgpr_32 = IMPLICIT_DEF
+    %39:vgpr_32 = IMPLICIT_DEF
+    %40:vgpr_32 = IMPLICIT_DEF
+    %41:vgpr_32 = IMPLICIT_DEF
+    %42:vgpr_32 = IMPLICIT_DEF
+    %43:vgpr_32 = IMPLICIT_DEF
+    %44:vgpr_32 = IMPLICIT_DEF
+    %45:vgpr_32 = IMPLICIT_DEF
+    %46:vgpr_32 = IMPLICIT_DEF
+    %47:vgpr_32 = IMPLICIT_DEF
+    %48:vgpr_32 = IMPLICIT_DEF
+    %49:vgpr_32 = IMPLICIT_DEF
+    %50:vgpr_32 = IMPLICIT_DEF
+    %51:vgpr_32 = IMPLICIT_DEF
+    %52:vgpr_32 = IMPLICIT_DEF
+    %53:vgpr_32 = IMPLICIT_DEF
+    %54:vgpr_32 = IMPLICIT_DEF
+    %55:vgpr_32 = IMPLICIT_DEF
+    %56:vgpr_32 = IMPLICIT_DEF
+    %57:vgpr_32 = IMPLICIT_DEF
+    %58:vgpr_32 = IMPLICIT_DEF
+    %59:vgpr_32 = IMPLICIT_DEF
+    %60:vgpr_32 = IMPLICIT_DEF
+    %61:vgpr_32 = IMPLICIT_DEF
+    %62:vgpr_32 = IMPLICIT_DEF
+    %63:vgpr_32 = IMPLICIT_DEF
+    %64:vgpr_32 = IMPLICIT_DEF
+    %65:vgpr_32 = IMPLICIT_DEF
+    %66:vgpr_32 = IMPLICIT_DEF
+    %67:vgpr_32 = IMPLICIT_DEF
+    %68:vgpr_32 = IMPLICIT_DEF
+    INLINEASM &"", 1, implicit %0, implicit %1, implicit %2, implicit %3, implicit %4, implicit %5, implicit %6, implicit %7, implicit %8, implicit %9, implicit %10, implicit %11, implicit %12, implicit %13, implicit %14, implicit %15, implicit %16, implicit %17, implicit %18, implicit %19, implicit %20, implicit %21, implicit %22, implicit %23, implicit %24, implicit %25, implicit %26, implicit %27, implicit %28, implicit %29, implicit %30, implicit %31, implicit %32, implicit %33, implicit %34, implicit %35, implicit %36, implicit %37, implicit %38, implicit %39, implicit %40, implicit %41, implicit %42, implicit %43, implicit %44, implicit %45, implicit %46, implicit %47, implicit %48, implicit %49, implicit %50, implicit %51, implicit %52, implicit %53, implicit %54, implicit %55, implicit %56, implicit %57, implicit %58, implicit %59, implicit %60, implicit %61, implicit %62, implicit %63, implicit %64, implicit %65, implicit %66
+    %69:sgpr_128 = IMPLICIT_DEF
+    INLINEASM &"", 1, implicit %69, implicit %23, implicit %24, implicit %25, implicit %26, implicit %27, implicit %28
+    KILL %0
+    KILL %1
+    KILL %2
+    KILL %3
+    KILL %4
+    KILL %5
+    KILL %6
+    KILL %7
+    KILL %8
+    KILL %9
+    KILL %10
+    KILL %12
+    KILL %13
+    KILL %14
+    KILL %15
+    KILL %16
+    KILL %17
+    KILL %18
+    KILL %19
+    KILL %69:sgpr_128, implicit-def %77:vgpr_32, implicit-def %78:vgpr_32, implicit-def %79:vgpr_32, implicit-def %80:vgpr_32, implicit-def %81:vgpr_32, implicit-def %82:vgpr_32, implicit-def %83:vgpr_32, implicit-def %84:vgpr_32
+    %70:vgpr_32 = IMPLICIT_DEF
+    %71:vgpr_32 = IMPLICIT_DEF
+    %72:vgpr_32 = IMPLICIT_DEF
+    %73:vgpr_32 = IMPLICIT_DEF
+    %74:vgpr_32 = IMPLICIT_DEF
+    %75:vgpr_32 = IMPLICIT_DEF
+    %76:sgpr_128 = IMPLICIT_DEF
+    INLINEASM &"", 1, implicit %76, implicit %2, implicit %3, implicit %4, implicit %5, implicit %6, implicit %7, implicit %8, implicit %9
+    KILL %20
+    KILL %11
+    KILL %21
+    KILL %22
+    KILL %23
+    KILL %24
+    KILL %25
+    KILL %26
+    KILL %27
+    KILL %28
+    KILL %29
+    KILL %30
+    KILL %31
+    KILL %32
+    KILL %33
+    KILL %34
+    KILL %35
+    KILL %36
+    KILL %37
+    KILL %38
+    KILL %39
+    KILL %40
+    KILL %41
+    KILL %42
+    KILL %43
+    KILL %44
+    KILL %45
+    KILL %46
+    KILL %47
+    KILL %48
+    KILL %49
+    KILL %50
+    KILL %51
+    KILL %52
+    KILL %53
+    KILL %54
+    KILL %55
+    KILL %56
+    KILL %57
+    KILL %58
+    KILL %59
+    KILL %60
+    KILL %61
+    KILL %62
+    KIL...
[truncated]

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/test/CodeGen/AMDGPU/spill-regpressure-less.mir Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Nov 15, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@jrbyrnes
Copy link
Contributor Author

This work now builds on top of another patch "Improve ExcessRP flagging". That patch alone has functional changes to the scheduler so I will begin to work on putting out a PR for that work, unless there is reason to not do so.

@arsenm
Copy link
Contributor

arsenm commented Nov 29, 2023

This work now builds on top of another patch "Improve ExcessRP flagging". That patch alone has functional changes to the scheduler so I will begin to work on putting out a PR for that work, unless there is reason to not do so.

That's what you're supposed to do

@jrbyrnes
Copy link
Contributor Author

Directly calculate excess pressure in GCNPressure.less() to remove patch dependence and decouple.

We should still separate the RegionsWithExcessRP flag in scheduler as it reduces compile time and improves scheduling in regions with spilling, but decoupling is preferred.

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
bool ExcessVGPR = getVGPRNum(false) > MaxVGPRs || getAGPRNum() > MaxVGPRs;
bool ExcessSGPR = getSGPRNum() > MaxSGPRs;
bool OtherExcessVGPR =
O.getVGPRNum(false) > MaxVGPRs || O.getAGPRNum() > MaxVGPRs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should getVGPRNum has ST.hasGFX90AInsts() here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is comparing getAGPRNum to MaxVGPRs correct? They'd be different for gfx90a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out --

These conditions were reformulated from existing code in checkScheduling

  unsigned MaxVGPRs = ST.getMaxNumVGPRs(MF);
  unsigned MaxSGPRs = ST.getMaxNumSGPRs(MF);

  if (PressureAfter.getVGPRNum(false) > MaxVGPRs ||
      PressureAfter.getAGPRNum() > MaxVGPRs ||
      PressureAfter.getSGPRNum() > MaxSGPRs) {
    DAG.RescheduleRegions[RegionIdx] = true;
    DAG.RegionsWithHighRP[RegionIdx] = true;
    DAG.RegionsWithExcessRP[RegionIdx] = true;
  }
 

This appears to be wrong in unified register file case. A scheduling region with pressure - 276 VGPR, 0 AGPR is not flagged as having excess with min-waves-per-eu of 1 even though the ISA states "A wave may have up to 512 total VGPRs, 256 of each type". As it stands, it will allow up to 512 VGPR before flagging. I'll address this in a separate patch.

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Dec 1, 2023

Weighted cost + addressable register limits

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
@dmikushin
Copy link

@jrbyrnes , is it possible to make the patch to choose the default value of ExcessVGPRWeight such that the original behavior of unpatched code could be preserved? It should initially minimize unexpected effects in the high-level code and the chance of this code being blamed. Then, you could tune the parameter in a separate update.

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Dec 6, 2023

is it possible to make the patch to choose the default value of ExcessVGPRWeight such that the original behavior of unpatched code could be preserved?

Unfortunately, it is not possible in the current formulation. The new weighted-cost logic is inconsistent with the unpatched tiered-comparisons logic; mainly because the weight-cost formula does not include all of the terms (e.g. tuple pressure).

@dmikushin
Copy link

@jrbyrnes , since SGPR spills to VGPR, the value of 32 makes sense, as well as 64 for wave64. But why the max value of ExcessVGPRWeight is 2048?

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Dec 14, 2023

Updated review to reflect some ideas contained in offline discussion.

The primary goal of the weighted cost calculation is to determine which RP results in less excess VGPR since these will need to be spilled to memory. Since SGPR can be spilled into VGPR, we can reformulate the excess SGPR in terms of VGPR pressure as so: 1 VGPR = WavefrontSize Excess SGPR.

There are some subtlties to consider, however. 1. [1:Wavefrontsize] Excess SGPR are all the same in terms VGPRs reguired (1) -- in other words 1 ExcessSGPR uses the same number of VGPR as 31 ExcessSGPR, 2. N excess SGPR is worse than N - 1 excess SGPR even if they require the same number of VGPRs due to spill code inserted (write/readlane), 3. (2 Excess VGPR, 0 Excess SGPR) is preferable to (1 Excess VGPR, 1 Excess SGPR) since the latter needs to insert more spill code.

Subtlties 2 and 3 can also be thought of as one RP being less than the other.

We should have the following behavior (ExcessVGPR, ExcessSGPR) (WavefrontSize is 64):

Case Pressure 1 Pressure 2 Winner
0 (2,0) (1,1) Pressure 1
1 (1,1) (1,63) Pressure 1
2 (2,0) (1,63) Pressure 1
3 (2,0) (1,64) Pressure 1
4 (3,0) (1,64) Pressure 2
5 (3,0) (1,128) Pressure 1
6 (3,0) (0,1280)* Pressure 2

*VGPR usage is not excess even with VGPR spills from SGPRs.

It is a bit hard to write lit tests for all of these conditions, but I have unit tested the logic and validated the above cases.

Also,Removed the flag to set the relative weighting of excess SGPR and VGPR since the current formulation only really makes sense if using the WavefrontSize.

@jrbyrnes
Copy link
Contributor Author

@jrbyrnes , since SGPR spills to VGPR, the value of 32 makes sense, as well as 64 for wave64. But why the max value of ExcessVGPRWeight is 2048?

I set that to the upper limit to avoid overflow concerns, and I couldn't imagine any reason why we would want to set it any higher than that. But, in the latest iteration, I have removed the option altogether.

@vpykhtin
Copy link
Contributor

vpykhtin commented Dec 18, 2023

It is a bit hard to write lit tests for all of these conditions, but I have unit tested the logic and validated the above cases.
Can you please add the unit test to this PR?

This is mostly LGTM. I would like this to be reviewed by other engineers, especially the code that calculates excess of register, I don't really know the details about Arch and AGPR registers.

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Jan 2, 2024

Hi @arsenm , does the latest iteration of this PR resolve your change request? Also, do you have a comment on the excess RP calculation for the unified register file case (see Valery's latest comment). Thanks.

@arsenm
Copy link
Contributor

arsenm commented Jan 3, 2024

  1. (2 Excess VGPR, 0 Excess SGPR) is preferable to (1 Excess VGPR, 1 Excess SGPR) since the latter needs to insert more spill code.

This part doesn't entirely make sense to me. The SGPR spill code may end up larger, but could still be faster. Unless the spill-to-vgpr handling ends up inducing a VGPR spill, I'd spill expect to prefer an excess SGPR to excess VGPR

if (SGPRImportant != OtherSGPRImportant) {
SGPRImportant = false;
}

// compare large regs pressure
// Give thid precedence to lower register tuple pressure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 'thid'

@jrbyrnes
Copy link
Contributor Author

This part doesn't entirely make sense to me. The SGPR spill code may end up larger, but could still be faster. Unless the spill-to-vgpr handling ends up inducing a VGPR spill, I'd spill expect to prefer an excess SGPR to excess VGPR

Okay, I have modified it to reflect the principle: prefer the register pressure with SGPR-spills-to-VGPR when there is a tie in Excess VGPR pressure.

That the following behavior is observed:

Case Pressure 1 Pressure 2 Winner
0 (2,0) (1,1) Pressure 2
1 (1,1) (1,63) Pressure 1
2 (2,0) (1,63) Pressure 2
3 (2,0) (1,64) Pressure 2
4 (3,0) (1,64) Pressure 2
5 (3,0) (1,128) Pressure 2
6 (3,0) (0,1280)* Pressure 2

It also has the following behvaior:

Case Pressure 1 Pressure 2 Winner
7 (0,0) (0,1)* Pressure 1

*VGPR usage is not excess even with VGPR usage from SGPR spills.

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved
Change-Id: I0e4a631a22f5501b1586d1b8fc999eabd572268c
@jrbyrnes
Copy link
Contributor Author

113052b

@jrbyrnes jrbyrnes closed this Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants