-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU][MachineScheduler] Alternative way to control excess RP. #68004
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu ChangesThis pull request was created as a discussion place to illustrate the idea: to decide if the region has excess RP in finalizeGCNRegion based on the DAG.getRealRegPressure(RegionIdx). Decide if we should keep or revert the Full diff: https://github.com/llvm/llvm-project/pull/68004.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index ce481e1f1a8bc48..793bbe90307efce 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -894,10 +894,24 @@ void GCNSchedStage::setupNewBlock() {
void GCNSchedStage::finalizeGCNRegion() {
DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
- DAG.RescheduleRegions[RegionIdx] = false;
- if (S.HasHighPressure)
+ PressureAfter = DAG.getRealRegPressure(RegionIdx);
+
+ unsigned NewVGPRRP = PressureAfter.getVGPRNum(false);
+ unsigned NewAGPRRP = PressureAfter.getAGPRNum();
+ unsigned NewSGPRRP = PressureAfter.getSGPRNum();
+
+ if ((NewVGPRRP >= S.VGPRCriticalLimit - S.VGPRExcessMargin) ||
+ (NewAGPRRP >= S.VGPRCriticalLimit - S.VGPRExcessMargin) ||
+ (NewSGPRRP >= S.SGPRCriticalLimit - S.SGPRExcessMargin))
DAG.RegionsWithHighRP[RegionIdx] = true;
+ if ((NewVGPRRP >= S.VGPRExcessLimit - S.VGPRExcessMargin) ||
+ (NewAGPRRP >= S.VGPRExcessLimit - S.SGPRExcessMargin) ||
+ (NewSGPRRP >= S.SGPRExcessLimit - S.VGPRExcessMargin)) {
+ DAG.RegionsWithExcessRP[RegionIdx] = true;
+ DAG.RescheduleRegions[RegionIdx] = true;
+ }
+
// Revert scheduling if we have dropped occupancy or there is some other
// reason that the original schedule is better.
checkScheduling();
@@ -912,7 +926,6 @@ void GCNSchedStage::finalizeGCNRegion() {
void GCNSchedStage::checkScheduling() {
// Check the results of scheduling.
- PressureAfter = DAG.getRealRegPressure(RegionIdx);
LLVM_DEBUG(dbgs() << "Pressure after scheduling: " << print(PressureAfter));
LLVM_DEBUG(dbgs() << "Region: " << RegionIdx << ".\n");
@@ -959,16 +972,6 @@ void GCNSchedStage::checkScheduling() {
<< DAG.MinOccupancy << ".\n");
}
- unsigned MaxVGPRs = ST.getMaxNumVGPRs(MF);
- unsigned MaxSGPRs = ST.getMaxNumSGPRs(MF);
- if (PressureAfter.getVGPRNum(false) > MaxVGPRs ||
- PressureAfter.getAGPRNum() > MaxVGPRs ||
- PressureAfter.getSGPRNum() > MaxSGPRs) {
- DAG.RescheduleRegions[RegionIdx] = true;
- DAG.RegionsWithHighRP[RegionIdx] = true;
- DAG.RegionsWithExcessRP[RegionIdx] = true;
- }
-
// Revert if this region's schedule would cause a drop in occupancy or
// spilling.
if (shouldRevertScheduling(WavesAfter)) {
@@ -1117,16 +1120,23 @@ bool OccInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {
bool UnclusteredHighRPStage::shouldRevertScheduling(unsigned WavesAfter) {
// If RP is not reduced in the unclustered reschedule stage, revert to the
// old schedule.
- if ((WavesAfter <= PressureBefore.getOccupancy(ST) &&
- mayCauseSpilling(WavesAfter)) ||
- GCNSchedStage::shouldRevertScheduling(WavesAfter)) {
- LLVM_DEBUG(dbgs() << "Unclustered reschedule did not help.\n");
- return true;
- }
+ if (DAG.RegionsWithExcessRP[RegionIdx]) {
+ unsigned NewVGPRRP = PressureAfter.getVGPRNum(false);
+ unsigned NewAGPRRP = PressureAfter.getAGPRNum();
+ unsigned NewSGPRRP = PressureAfter.getSGPRNum();
- // Do not attempt to relax schedule even more if we are already spilling.
- if (isRegionWithExcessRP())
+ unsigned OldVGPRRP = PressureBefore.getVGPRNum(false);
+ unsigned OldAGPRRP = PressureBefore.getAGPRNum();
+ unsigned OldSGPRRP = PressureBefore.getSGPRNum();
+
+ if (NewVGPRRP > S.VGPRExcessLimit && NewVGPRRP >= OldVGPRRP)
+ return true;
+ if (NewAGPRRP > S.VGPRExcessLimit && NewAGPRRP >= OldAGPRRP)
+ return true;
+ if (NewSGPRRP > S.SGPRExcessLimit && NewSGPRRP >= OldSGPRRP)
+ return true;
return false;
+ }
LLVM_DEBUG(
dbgs()
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
index 7862ec1e894b62e..2119a6f3109bca8 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
@@ -56,10 +56,6 @@ class GCNSchedStrategy : public GenericScheduler {
std::vector<unsigned> MaxPressure;
- unsigned SGPRExcessLimit;
-
- unsigned VGPRExcessLimit;
-
unsigned TargetOccupancy;
MachineFunction *MF;
@@ -94,10 +90,17 @@ class GCNSchedStrategy : public GenericScheduler {
unsigned VGPRCriticalLimit;
+ unsigned SGPRExcessLimit;
+
+ unsigned VGPRExcessLimit;
+
unsigned SGPRLimitBias = 0;
unsigned VGPRLimitBias = 0;
+ unsigned VGPRExcessMargin = 1;
+ unsigned SGPRExcessMargin = 0;
+
GCNSchedStrategy(const MachineSchedContext *C);
SUnit *pickNode(bool &IsTopNode) override;
|
Continuing the discussion here as opposed to email.
The original intent of unclustered scheduling was to increase occupancy in the kernel when it was possible to do so if we tried scheduling without mutations. The extra checks for excess RP and spilling were added later. There were concrete cases that motivated both of these changes. That's not to say I don't approve of the new approach, any simplification of the current logic would be welcome, but I think it needs to be supported by performance numbers both on compute and graphics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have a few questions about implementation details -- at a higher level, seems like we are trading one heuristic for another w.r.t flagging regions as ExcessRP -- so I'm curious about the relative performance.
@@ -894,10 +894,22 @@ void GCNSchedStage::setupNewBlock() { | |||
|
|||
void GCNSchedStage::finalizeGCNRegion() { | |||
DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd); | |||
DAG.RescheduleRegions[RegionIdx] = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not we mark for rescheduling the "excess RP" regions only?
if ((NewVGPRRP >= S.VGPRExcessLimit - S.VGPRExcessMargin) || (NewAGPRRP >= S.VGPRExcessLimit - S.VGPRExcessMargin) || (NewSGPRRP >= S.SGPRExcessLimit - S.SGPRExcessMargin)) { DAG.RegionsWithExcessRP[RegionIdx] = true; DAG.RescheduleRegions[RegionIdx] = true; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is setting it to false -- the intent seems to be that we don't carry over the flag from previous scheduling stages, and we only set it if RP is still not good after the current stage.
e14bc90
to
7b76907
Compare
7b76907
to
d149d37
Compare
I have asked CQE to run the extended testing cycle. Here is their response: "We have completed the staging testing (daily cycle coverage) with your patch. Its Conditional-Go, we didn’t observe any new failures in this cycle (except the staging branch’s known issues)." |
Ping! Does anybody have further objections? This is going to be landed soon otherwise. |
Heads up. Any other objections? IMO this could be upstreamed. We may revert it at any moment. |
LLVM_DEBUG(dbgs() << "Unclustered reschedule did not help.\n"); | ||
return true; | ||
} | ||
if (DAG.RegionsWithExcessRP[RegionIdx]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After 113052b , RP.less has excess RP comparisons that this should be consistent with:
return (NewVGPRRP > S.VGPRExcessLimit || NewAGPRRP > S.AGPRExcessLimit
|| NewSGPRRP > S.SGPRExcessLimit
|| /* Unified VGPR excess case */ )
&& !PressureAfter.less(ST, PressureBefore);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires merge to main
This pull request was created as a discussion place to illustrate the idea: to decide if the region has excess RP in finalizeGCNRegion based on the DAG.getRealRegPressure(RegionIdx). Decide if we should keep or revert the
result of the UnclusteredHighRP stage based on the RP after the stage: if the
RP is not less than before - revert.