Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU][MachineScheduler] Alternative way to control excess RP. #68004

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alex-t
Copy link
Contributor

@alex-t alex-t commented Oct 2, 2023

This pull request was created as a discussion place to illustrate the idea: to decide if the region has excess RP in finalizeGCNRegion based on the DAG.getRealRegPressure(RegionIdx). Decide if we should keep or revert the
result of the UnclusteredHighRP stage based on the RP after the stage: if the
RP is not less than before - revert.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 2, 2023

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Changes

This pull request was created as a discussion place to illustrate the idea: to decide if the region has excess RP in finalizeGCNRegion based on the DAG.getRealRegPressure(RegionIdx). Decide if we should keep or revert the
result of the UnclusteredHighRP stage based on the RP after the stage: if the
RP is not less than before - revert.


Full diff: https://github.com/llvm/llvm-project/pull/68004.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+31-21)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.h (+7-4)
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index ce481e1f1a8bc48..793bbe90307efce 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -894,10 +894,24 @@ void GCNSchedStage::setupNewBlock() {
 
 void GCNSchedStage::finalizeGCNRegion() {
   DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
-  DAG.RescheduleRegions[RegionIdx] = false;
-  if (S.HasHighPressure)
+  PressureAfter = DAG.getRealRegPressure(RegionIdx);
+
+  unsigned NewVGPRRP = PressureAfter.getVGPRNum(false);
+  unsigned NewAGPRRP = PressureAfter.getAGPRNum();
+  unsigned NewSGPRRP = PressureAfter.getSGPRNum();
+
+  if ((NewVGPRRP >= S.VGPRCriticalLimit - S.VGPRExcessMargin) ||
+      (NewAGPRRP >= S.VGPRCriticalLimit - S.VGPRExcessMargin) ||
+      (NewSGPRRP >= S.SGPRCriticalLimit - S.SGPRExcessMargin))
     DAG.RegionsWithHighRP[RegionIdx] = true;
 
+  if ((NewVGPRRP >= S.VGPRExcessLimit - S.VGPRExcessMargin) ||
+      (NewAGPRRP >= S.VGPRExcessLimit - S.SGPRExcessMargin) ||
+      (NewSGPRRP >= S.SGPRExcessLimit - S.VGPRExcessMargin)) {
+    DAG.RegionsWithExcessRP[RegionIdx] = true;
+    DAG.RescheduleRegions[RegionIdx] = true;
+  }
+
   // Revert scheduling if we have dropped occupancy or there is some other
   // reason that the original schedule is better.
   checkScheduling();
@@ -912,7 +926,6 @@ void GCNSchedStage::finalizeGCNRegion() {
 
 void GCNSchedStage::checkScheduling() {
   // Check the results of scheduling.
-  PressureAfter = DAG.getRealRegPressure(RegionIdx);
   LLVM_DEBUG(dbgs() << "Pressure after scheduling: " << print(PressureAfter));
   LLVM_DEBUG(dbgs() << "Region: " << RegionIdx << ".\n");
 
@@ -959,16 +972,6 @@ void GCNSchedStage::checkScheduling() {
                       << DAG.MinOccupancy << ".\n");
   }
 
-  unsigned MaxVGPRs = ST.getMaxNumVGPRs(MF);
-  unsigned MaxSGPRs = ST.getMaxNumSGPRs(MF);
-  if (PressureAfter.getVGPRNum(false) > MaxVGPRs ||
-      PressureAfter.getAGPRNum() > MaxVGPRs ||
-      PressureAfter.getSGPRNum() > MaxSGPRs) {
-    DAG.RescheduleRegions[RegionIdx] = true;
-    DAG.RegionsWithHighRP[RegionIdx] = true;
-    DAG.RegionsWithExcessRP[RegionIdx] = true;
-  }
-
   // Revert if this region's schedule would cause a drop in occupancy or
   // spilling.
   if (shouldRevertScheduling(WavesAfter)) {
@@ -1117,16 +1120,23 @@ bool OccInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {
 bool UnclusteredHighRPStage::shouldRevertScheduling(unsigned WavesAfter) {
   // If RP is not reduced in the unclustered reschedule stage, revert to the
   // old schedule.
-  if ((WavesAfter <= PressureBefore.getOccupancy(ST) &&
-       mayCauseSpilling(WavesAfter)) ||
-      GCNSchedStage::shouldRevertScheduling(WavesAfter)) {
-    LLVM_DEBUG(dbgs() << "Unclustered reschedule did not help.\n");
-    return true;
-  }
+  if (DAG.RegionsWithExcessRP[RegionIdx]) {
+    unsigned NewVGPRRP = PressureAfter.getVGPRNum(false);
+    unsigned NewAGPRRP = PressureAfter.getAGPRNum();
+    unsigned NewSGPRRP = PressureAfter.getSGPRNum();
 
-  // Do not attempt to relax schedule even more if we are already spilling.
-  if (isRegionWithExcessRP())
+    unsigned OldVGPRRP = PressureBefore.getVGPRNum(false);
+    unsigned OldAGPRRP = PressureBefore.getAGPRNum();
+    unsigned OldSGPRRP = PressureBefore.getSGPRNum();
+
+    if (NewVGPRRP > S.VGPRExcessLimit && NewVGPRRP >= OldVGPRRP)
+      return true;
+    if (NewAGPRRP > S.VGPRExcessLimit && NewAGPRRP >= OldAGPRRP)
+      return true;
+    if (NewSGPRRP > S.SGPRExcessLimit && NewSGPRRP >= OldSGPRRP)
+      return true;
     return false;
+  }
 
   LLVM_DEBUG(
       dbgs()
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
index 7862ec1e894b62e..2119a6f3109bca8 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
@@ -56,10 +56,6 @@ class GCNSchedStrategy : public GenericScheduler {
 
   std::vector<unsigned> MaxPressure;
 
-  unsigned SGPRExcessLimit;
-
-  unsigned VGPRExcessLimit;
-
   unsigned TargetOccupancy;
 
   MachineFunction *MF;
@@ -94,10 +90,17 @@ class GCNSchedStrategy : public GenericScheduler {
 
   unsigned VGPRCriticalLimit;
 
+  unsigned SGPRExcessLimit;
+
+  unsigned VGPRExcessLimit;
+
   unsigned SGPRLimitBias = 0;
 
   unsigned VGPRLimitBias = 0;
 
+  unsigned VGPRExcessMargin = 1;
+  unsigned SGPRExcessMargin = 0;
+
   GCNSchedStrategy(const MachineSchedContext *C);
 
   SUnit *pickNode(bool &IsTopNode) override;

@kerbowa
Copy link
Member

kerbowa commented Oct 20, 2023

Continuing the discussion here as opposed to email.

The UnclusteredHighRP is intended for those regions which have high RP after the scheduling is done.
I think that we should run the UnclusteredHighRP only for regions which have excess Rp after the scheduling is done.

The original intent of unclustered scheduling was to increase occupancy in the kernel when it was possible to do so if we tried scheduling without mutations. The extra checks for excess RP and spilling were added later. There were concrete cases that motivated both of these changes.

That's not to say I don't approve of the new approach, any simplification of the current logic would be welcome, but I think it needs to be supported by performance numbers both on compute and graphics.

Copy link
Contributor

@jrbyrnes jrbyrnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have a few questions about implementation details -- at a higher level, seems like we are trading one heuristic for another w.r.t flagging regions as ExcessRP -- so I'm curious about the relative performance.

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp Show resolved Hide resolved
@@ -894,10 +894,22 @@ void GCNSchedStage::setupNewBlock() {

void GCNSchedStage::finalizeGCNRegion() {
DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
DAG.RescheduleRegions[RegionIdx] = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not we mark for rescheduling the "excess RP" regions only?

if ((NewVGPRRP >= S.VGPRExcessLimit - S.VGPRExcessMargin) || (NewAGPRRP >= S.VGPRExcessLimit - S.VGPRExcessMargin) || (NewSGPRRP >= S.SGPRExcessLimit - S.SGPRExcessMargin)) { DAG.RegionsWithExcessRP[RegionIdx] = true; DAG.RescheduleRegions[RegionIdx] = true; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is setting it to false -- the intent seems to be that we don't carry over the flag from previous scheduling stages, and we only set it if RP is still not good after the current stage.

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp Show resolved Hide resolved
@alex-t
Copy link
Contributor Author

alex-t commented Dec 27, 2023

Continuing the discussion here as opposed to email.

The UnclusteredHighRP is intended for those regions which have high RP after the scheduling is done.
I think that we should run the UnclusteredHighRP only for regions which have excess Rp after the scheduling is done.

The original intent of unclustered scheduling was to increase occupancy in the kernel when it was possible to do so if we tried scheduling without mutations. The extra checks for excess RP and spilling were added later. There were concrete cases that motivated both of these changes.

That's not to say I don't approve of the new approach, any simplification of the current logic would be welcome, but I think it needs to be supported by performance numbers both on compute and graphics.

I have asked CQE to run the extended testing cycle. Here is their response: "We have completed the staging testing (daily cycle coverage) with your patch. Its Conditional-Go, we didn’t observe any new failures in this cycle (except the staging branch’s known issues)."
So, as this change was not assumed to introduce any improvements - just makes the excess regions handling more clear, I would consider it as ready for upstream.

@alex-t
Copy link
Contributor Author

alex-t commented Jan 3, 2024

Ping! Does anybody have further objections? This is going to be landed soon otherwise.

@alex-t
Copy link
Contributor Author

alex-t commented Jan 19, 2024

Heads up. Any other objections? IMO this could be upstreamed. We may revert it at any moment.

LLVM_DEBUG(dbgs() << "Unclustered reschedule did not help.\n");
return true;
}
if (DAG.RegionsWithExcessRP[RegionIdx]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 113052b , RP.less has excess RP comparisons that this should be consistent with:

return (NewVGPRRP > S.VGPRExcessLimit || NewAGPRRP > S.AGPRExcessLimit 
			|| NewSGPRRP > S.SGPRExcessLimit 
			|| /* Unified VGPR excess case */ ) 
			&&  !PressureAfter.less(ST, PressureBefore);

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires merge to main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants