[AMDGPU][MachineScheduler] Alternative way to control excess RP. #68004

alex-t · 2023-10-02T16:46:58Z

This pull request was created as a discussion place to illustrate the idea: to decide if the region has excess RP in finalizeGCNRegion based on the DAG.getRealRegPressure(RegionIdx). Decide if we should keep or revert the
result of the UnclusteredHighRP stage based on the RP after the stage: if the
RP is not less than before - revert.

llvmbot · 2023-10-02T16:48:10Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Changes

This pull request was created as a discussion place to illustrate the idea: to decide if the region has excess RP in finalizeGCNRegion based on the DAG.getRealRegPressure(RegionIdx). Decide if we should keep or revert the
result of the UnclusteredHighRP stage based on the RP after the stage: if the
RP is not less than before - revert.

Full diff: https://github.com/llvm/llvm-project/pull/68004.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+31-21)
(modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.h (+7-4)

diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index ce481e1f1a8bc48..793bbe90307efce 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -894,10 +894,24 @@ void GCNSchedStage::setupNewBlock() {
 
 void GCNSchedStage::finalizeGCNRegion() {
   DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
-  DAG.RescheduleRegions[RegionIdx] = false;
-  if (S.HasHighPressure)
+  PressureAfter = DAG.getRealRegPressure(RegionIdx);
+
+  unsigned NewVGPRRP = PressureAfter.getVGPRNum(false);
+  unsigned NewAGPRRP = PressureAfter.getAGPRNum();
+  unsigned NewSGPRRP = PressureAfter.getSGPRNum();
+
+  if ((NewVGPRRP >= S.VGPRCriticalLimit - S.VGPRExcessMargin) ||
+      (NewAGPRRP >= S.VGPRCriticalLimit - S.VGPRExcessMargin) ||
+      (NewSGPRRP >= S.SGPRCriticalLimit - S.SGPRExcessMargin))
     DAG.RegionsWithHighRP[RegionIdx] = true;
 
+  if ((NewVGPRRP >= S.VGPRExcessLimit - S.VGPRExcessMargin) ||
+      (NewAGPRRP >= S.VGPRExcessLimit - S.SGPRExcessMargin) ||
+      (NewSGPRRP >= S.SGPRExcessLimit - S.VGPRExcessMargin)) {
+    DAG.RegionsWithExcessRP[RegionIdx] = true;
+    DAG.RescheduleRegions[RegionIdx] = true;
+  }
+
   // Revert scheduling if we have dropped occupancy or there is some other
   // reason that the original schedule is better.
   checkScheduling();
@@ -912,7 +926,6 @@ void GCNSchedStage::finalizeGCNRegion() {
 
 void GCNSchedStage::checkScheduling() {
   // Check the results of scheduling.
-  PressureAfter = DAG.getRealRegPressure(RegionIdx);
   LLVM_DEBUG(dbgs() << "Pressure after scheduling: " << print(PressureAfter));
   LLVM_DEBUG(dbgs() << "Region: " << RegionIdx << ".\n");
 
@@ -959,16 +972,6 @@ void GCNSchedStage::checkScheduling() {
                       << DAG.MinOccupancy << ".\n");
   }
 
-  unsigned MaxVGPRs = ST.getMaxNumVGPRs(MF);
-  unsigned MaxSGPRs = ST.getMaxNumSGPRs(MF);
-  if (PressureAfter.getVGPRNum(false) > MaxVGPRs ||
-      PressureAfter.getAGPRNum() > MaxVGPRs ||
-      PressureAfter.getSGPRNum() > MaxSGPRs) {
-    DAG.RescheduleRegions[RegionIdx] = true;
-    DAG.RegionsWithHighRP[RegionIdx] = true;
-    DAG.RegionsWithExcessRP[RegionIdx] = true;
-  }
-
   // Revert if this region's schedule would cause a drop in occupancy or
   // spilling.
   if (shouldRevertScheduling(WavesAfter)) {
@@ -1117,16 +1120,23 @@ bool OccInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {
 bool UnclusteredHighRPStage::shouldRevertScheduling(unsigned WavesAfter) {
   // If RP is not reduced in the unclustered reschedule stage, revert to the
   // old schedule.
-  if ((WavesAfter <= PressureBefore.getOccupancy(ST) &&
-       mayCauseSpilling(WavesAfter)) ||
-      GCNSchedStage::shouldRevertScheduling(WavesAfter)) {
-    LLVM_DEBUG(dbgs() << "Unclustered reschedule did not help.\n");
-    return true;
-  }
+  if (DAG.RegionsWithExcessRP[RegionIdx]) {
+    unsigned NewVGPRRP = PressureAfter.getVGPRNum(false);
+    unsigned NewAGPRRP = PressureAfter.getAGPRNum();
+    unsigned NewSGPRRP = PressureAfter.getSGPRNum();
 
-  // Do not attempt to relax schedule even more if we are already spilling.
-  if (isRegionWithExcessRP())
+    unsigned OldVGPRRP = PressureBefore.getVGPRNum(false);
+    unsigned OldAGPRRP = PressureBefore.getAGPRNum();
+    unsigned OldSGPRRP = PressureBefore.getSGPRNum();
+
+    if (NewVGPRRP > S.VGPRExcessLimit && NewVGPRRP >= OldVGPRRP)
+      return true;
+    if (NewAGPRRP > S.VGPRExcessLimit && NewAGPRRP >= OldAGPRRP)
+      return true;
+    if (NewSGPRRP > S.SGPRExcessLimit && NewSGPRRP >= OldSGPRRP)
+      return true;
     return false;
+  }
 
   LLVM_DEBUG(
       dbgs()
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
index 7862ec1e894b62e..2119a6f3109bca8 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
@@ -56,10 +56,6 @@ class GCNSchedStrategy : public GenericScheduler {
 
   std::vector<unsigned> MaxPressure;
 
-  unsigned SGPRExcessLimit;
-
-  unsigned VGPRExcessLimit;
-
   unsigned TargetOccupancy;
 
   MachineFunction *MF;
@@ -94,10 +90,17 @@ class GCNSchedStrategy : public GenericScheduler {
 
   unsigned VGPRCriticalLimit;
 
+  unsigned SGPRExcessLimit;
+
+  unsigned VGPRExcessLimit;
+
   unsigned SGPRLimitBias = 0;
 
   unsigned VGPRLimitBias = 0;
 
+  unsigned VGPRExcessMargin = 1;
+  unsigned SGPRExcessMargin = 0;
+
   GCNSchedStrategy(const MachineSchedContext *C);
 
   SUnit *pickNode(bool &IsTopNode) override;

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

kerbowa · 2023-10-20T15:27:13Z

Continuing the discussion here as opposed to email.

The UnclusteredHighRP is intended for those regions which have high RP after the scheduling is done.
I think that we should run the UnclusteredHighRP only for regions which have excess Rp after the scheduling is done.

The original intent of unclustered scheduling was to increase occupancy in the kernel when it was possible to do so if we tried scheduling without mutations. The extra checks for excess RP and spilling were added later. There were concrete cases that motivated both of these changes.

That's not to say I don't approve of the new approach, any simplification of the current logic would be welcome, but I think it needs to be supported by performance numbers both on compute and graphics.

jrbyrnes

Just have a few questions about implementation details -- at a higher level, seems like we are trading one heuristic for another w.r.t flagging regions as ExcessRP -- so I'm curious about the relative performance.

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

jrbyrnes · 2023-10-18T19:17:01Z

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

@@ -894,10 +894,22 @@ void GCNSchedStage::setupNewBlock() {

 void GCNSchedStage::finalizeGCNRegion() {
  DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
-  DAG.RescheduleRegions[RegionIdx] = false;


Why was this removed?

Should not we mark for rescheduling the "excess RP" regions only?

if ((NewVGPRRP >= S.VGPRExcessLimit - S.VGPRExcessMargin) || (NewAGPRRP >= S.VGPRExcessLimit - S.VGPRExcessMargin) || (NewSGPRRP >= S.SGPRExcessLimit - S.SGPRExcessMargin)) { DAG.RegionsWithExcessRP[RegionIdx] = true; DAG.RescheduleRegions[RegionIdx] = true; }

This is setting it to false -- the intent seems to be that we don't carry over the flag from previous scheduling stages, and we only set it if RP is still not good after the current stage.

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

alex-t · 2023-12-27T17:25:49Z

Continuing the discussion here as opposed to email.

The UnclusteredHighRP is intended for those regions which have high RP after the scheduling is done.
I think that we should run the UnclusteredHighRP only for regions which have excess Rp after the scheduling is done.

The original intent of unclustered scheduling was to increase occupancy in the kernel when it was possible to do so if we tried scheduling without mutations. The extra checks for excess RP and spilling were added later. There were concrete cases that motivated both of these changes.

That's not to say I don't approve of the new approach, any simplification of the current logic would be welcome, but I think it needs to be supported by performance numbers both on compute and graphics.

I have asked CQE to run the extended testing cycle. Here is their response: "We have completed the staging testing (daily cycle coverage) with your patch. Its Conditional-Go, we didn’t observe any new failures in this cycle (except the staging branch’s known issues)."
So, as this change was not assumed to introduce any improvements - just makes the excess regions handling more clear, I would consider it as ready for upstream.

alex-t · 2024-01-03T18:18:38Z

Ping! Does anybody have further objections? This is going to be landed soon otherwise.

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

alex-t · 2024-01-19T14:41:01Z

Heads up. Any other objections? IMO this could be upstreamed. We may revert it at any moment.

jrbyrnes · 2024-02-26T20:42:03Z

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

-    LLVM_DEBUG(dbgs() << "Unclustered reschedule did not help.\n");
-    return true;
-  }
+  if (DAG.RegionsWithExcessRP[RegionIdx]) {


After 113052b , RP.less has excess RP comparisons that this should be consistent with:

return (NewVGPRRP > S.VGPRExcessLimit || NewAGPRRP > S.AGPRExcessLimit || NewSGPRRP > S.SGPRExcessLimit || /* Unified VGPR excess case */ ) && !PressureAfter.less(ST, PressureBefore);

arsenm

Requires merge to main

alex-t requested review from kerbowa and jrbyrnes October 2, 2023 16:46

llvmbot added the backend:AMDGPU label Oct 2, 2023

piotrAMD reviewed Oct 3, 2023

View reviewed changes

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp Outdated Show resolved Hide resolved

jrbyrnes reviewed Oct 20, 2023

View reviewed changes

alex-t force-pushed the misched_regexcess branch from e14bc90 to 7b76907 Compare December 7, 2023 17:46

[AMDGPU][MachineScheduler] Alternative way to control excess RP.

d149d37

alex-t force-pushed the misched_regexcess branch from 7b76907 to d149d37 Compare December 7, 2023 20:49

llvmbot added the llvm:globalisel label Dec 7, 2023

[AMDGPU][MachineScheduler] Alternative way to control excess RP.

67f80e8

arsenm reviewed Jan 12, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp Outdated Show resolved Hide resolved

[AMDGPU][MachineScheduler] Alternative way to control excess RP.

d02af65

jrbyrnes reviewed Feb 26, 2024

View reviewed changes

arsenm requested changes Feb 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU][MachineScheduler] Alternative way to control excess RP. #68004

[AMDGPU][MachineScheduler] Alternative way to control excess RP. #68004

alex-t commented Oct 2, 2023

llvmbot commented Oct 2, 2023 •

edited

kerbowa commented Oct 20, 2023 •

edited

jrbyrnes left a comment

jrbyrnes Oct 18, 2023

alex-t Oct 24, 2023

jrbyrnes Jan 10, 2024

alex-t commented Dec 27, 2023

alex-t commented Jan 3, 2024

alex-t commented Jan 19, 2024

jrbyrnes Feb 26, 2024

arsenm left a comment

[AMDGPU][MachineScheduler] Alternative way to control excess RP. #68004

Are you sure you want to change the base?

[AMDGPU][MachineScheduler] Alternative way to control excess RP. #68004

Conversation

alex-t commented Oct 2, 2023

llvmbot commented Oct 2, 2023 • edited

kerbowa commented Oct 20, 2023 • edited

jrbyrnes left a comment

Choose a reason for hiding this comment

jrbyrnes Oct 18, 2023

Choose a reason for hiding this comment

alex-t Oct 24, 2023

Choose a reason for hiding this comment

jrbyrnes Jan 10, 2024

Choose a reason for hiding this comment

alex-t commented Dec 27, 2023

alex-t commented Jan 3, 2024

alex-t commented Jan 19, 2024

jrbyrnes Feb 26, 2024

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

llvmbot commented Oct 2, 2023 •

edited

kerbowa commented Oct 20, 2023 •

edited