Skip to content

[AMDGPU] Fix inconsistencies in RP tracking advance/reset behavior#196099

Closed
lucas-rami wants to merge 2 commits into
llvm:users/lucas-rami/pre-commit-test-gcnrp-semanticsfrom
lucas-rami:users/lucas-rami/gcnrp-semantics
Closed

[AMDGPU] Fix inconsistencies in RP tracking advance/reset behavior#196099
lucas-rami wants to merge 2 commits into
llvm:users/lucas-rami/pre-commit-test-gcnrp-semanticsfrom
lucas-rami:users/lucas-rami/gcnrp-semantics

Conversation

@lucas-rami
Copy link
Copy Markdown
Contributor

Some of the variants of advance and reset in the GCNRPTracker and GCNDownwardRPTracker had unclear/inconsistent semantics on their return value. This aims to clarify that through improved documentation and light functional changes.

These inconsistencies ultimately triggered an assert in GCNRPTaget::saveRP on a complex kernel during scheduling. GCNScheduleDAGMILive::getRealRegPressure would incorrectly return a null pressure for a non-empty region which only had debug values. Such regions can arise if the PreRARematStage rematerializes all non-debug instructions out of their original region, leaving only debug values. Attempting to rematerialize registers across that same region afterwards would trigger the assert.

@llvmorg-github-actions
Copy link
Copy Markdown

@llvm/pr-subscribers-backend-amdgpu

Author: Lucas Ramirez (lucas-rami)

Changes

Some of the variants of advance and reset in the GCNRPTracker and GCNDownwardRPTracker had unclear/inconsistent semantics on their return value. This aims to clarify that through improved documentation and light functional changes.

These inconsistencies ultimately triggered an assert in GCNRPTaget::saveRP on a complex kernel during scheduling. GCNScheduleDAGMILive::getRealRegPressure would incorrectly return a null pressure for a non-empty region which only had debug values. Such regions can arise if the PreRARematStage rematerializes all non-debug instructions out of their original region, leaving only debug values. Attempting to rematerialize registers across that same region afterwards would trigger the assert.


Full diff: https://github.com/llvm/llvm-project/pull/196099.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNRegPressure.cpp (+6-4)
  • (modified) llvm/lib/Target/AMDGPU/GCNRegPressure.h (+22-6)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+8-4)
  • (modified) llvm/unittests/Target/AMDGPU/GCNRegPressureTest.cpp (+13-28)
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
index 683e658aa4fb4..9b8e63a39fab8 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
@@ -621,13 +621,14 @@ void GCNUpwardRPTracker::recede(const MachineInstr &MI) {
 // GCNDownwardRPTracker
 
 bool GCNDownwardRPTracker::reset(const MachineInstr &MI,
-                                 const LiveRegSet *LiveRegsCopy) {
+                                 MachineBasicBlock::const_iterator End,
+                                 const LiveRegSet *LiveRegsCopy) {                                 
   MRI = &MI.getMF()->getRegInfo();
   LastTrackedMI = nullptr;
   MBBEnd = MI.getParent()->end();
   NextMI = &MI;
-  NextMI = skipDebugInstructionsForward(NextMI, MBBEnd);
-  if (NextMI == MBBEnd)
+  NextMI = skipDebugInstructionsForward(NextMI, End);
+  if (NextMI == End)
     return false;
   GCNRPTracker::reset(*NextMI, LiveRegsCopy, false);
   return true;
@@ -746,7 +747,8 @@ bool GCNDownwardRPTracker::advance(MachineBasicBlock::const_iterator End) {
 bool GCNDownwardRPTracker::advance(MachineBasicBlock::const_iterator Begin,
                                    MachineBasicBlock::const_iterator End,
                                    const LiveRegSet *LiveRegsCopy) {
-  reset(*Begin, LiveRegsCopy);
+  if (!reset(*Begin, End, LiveRegsCopy))
+    return false;
   return advance(End);
 }
 
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.h b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
index 00cb617a55fa7..97d2f5ad59278 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.h
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
@@ -428,10 +428,20 @@ class GCNDownwardRPTracker : public GCNRPTracker {
     return Res;
   }
 
-  /// Reset tracker to the point before the \p MI
-  /// filling \p LiveRegs upon this point using LIS.
-  /// \p returns false if block is empty except debug values.
-  bool reset(const MachineInstr &MI, const LiveRegSet *LiveRegs = nullptr);
+  /// Reset tracker to the point before the \p MI filling \p LiveRegs upon this
+  /// point using LIS. \p End must be between the MI and the end of its parent
+  /// block (inclusive). \p returns false if the range [MI, End) is empty except
+  /// debug values, in which case the current/maximum pressure are not changed.
+  bool reset(const MachineInstr &MI, MachineBasicBlock::const_iterator End,
+             const LiveRegSet *LiveRegs = nullptr);
+
+  /// Reset tracker to the point before the \p MI filling \p LiveRegs upon this
+  /// point using LIS. \p returns false if there are only debug values between
+  /// \p MI (inclusive) and end of its parent block, in which case the
+  /// current/maximum pressure are not changed.
+  bool reset(const MachineInstr &MI, const LiveRegSet *LiveRegs = nullptr) {
+    return reset(MI, MI.getParent()->end(), LiveRegs);
+  }
 
   /// Move to the state right before the next MI or after the end of MBB.
   /// \p returns false if reached end of the block.
@@ -462,10 +472,16 @@ class GCNDownwardRPTracker : public GCNRPTracker {
   /// \p MI and use LIS for RP calculations.
   bool advance(MachineInstr *MI = nullptr, bool UseInternalIterator = true);
 
-  /// Advance instructions until before \p End.
+  /// Advance instructions until before \p End using internal iterators to
+  /// process instructions in program order. Returns whether iterators actually
+  /// had to advance to reach \p End.
   bool advance(MachineBasicBlock::const_iterator End);
 
-  /// Reset to \p Begin and advance to \p End.
+  /// Reset tracker to \p Begin (filling \p LiveRegs upon this point using LIS)
+  /// and advance to \p End, which must be between \p Begin and the end of its
+  /// parent block (inclusive). \p returns false if the range [Begin, End) is
+  /// empty except debug values, in which case the current/maximum pressure are
+  /// not changed.
   bool advance(MachineBasicBlock::const_iterator Begin,
                MachineBasicBlock::const_iterator End,
                const LiveRegSet *LiveRegsCopy = nullptr);
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index 5d20d1e10a0da..7a4774336c5e7 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -1029,8 +1029,13 @@ GCNScheduleDAGMILive::getRealRegPressure(unsigned RegionIdx) const {
   if (Regions[RegionIdx].first == Regions[RegionIdx].second)
     return llvm::getRegPressure(MRI, LiveIns[RegionIdx]);
   GCNDownwardRPTracker RPTracker(*LIS);
-  RPTracker.advance(Regions[RegionIdx].first, Regions[RegionIdx].second,
-                    &LiveIns[RegionIdx]);
+  if (!RPTracker.advance(Regions[RegionIdx].first, Regions[RegionIdx].second,
+                         &LiveIns[RegionIdx])) {
+    // Advance can produce false on a non-empty region if all MIs in the region
+    // are debug values; in such cases the maintained max pressure is invalid
+    // and the only source of pressure are the region's live-ins.
+    return llvm::getRegPressure(MRI, LiveIns[RegionIdx]);
+  }
   return RPTracker.moveMaxPressure();
 }
 
@@ -2025,8 +2030,7 @@ GCNSchedStage::getScheduleMetrics(const std::vector<SUnit> &InputSchedule) {
 #ifndef NDEBUG
   LLVM_DEBUG(
       printScheduleModel(ReadyCyclesSorted);
-      dbgs() << "\n\t"
-             << "Metric: "
+      dbgs() << "\n\t" << "Metric: "
              << (SumBubbles
                      ? (SumBubbles * ScheduleMetrics::ScaleFactor) / CurrCycle
                      : 1)
diff --git a/llvm/unittests/Target/AMDGPU/GCNRegPressureTest.cpp b/llvm/unittests/Target/AMDGPU/GCNRegPressureTest.cpp
index 8febf44fd6f5f..8a49250026066 100644
--- a/llvm/unittests/Target/AMDGPU/GCNRegPressureTest.cpp
+++ b/llvm/unittests/Target/AMDGPU/GCNRegPressureTest.cpp
@@ -119,25 +119,15 @@ body:             |
     // which would return false in this case.
     //
     // There aren't any non-debug instruction between the beginning of bb1 and
-    // Dbg1 (exclusive). However, the call to reset takes the end of the MBB as
-    // the limit, so it pushes the beginning of the block up to %2's def and
-    // considers the reset successful.
-    EXPECT_TRUE(RPTracker.reset(*MBB1.begin(), &MBB1LiveIns));
-    EXPECT_TRUE(RPTrackerNoLiveIns.reset(*MBB1.begin(), nullptr));
-    // advance then unnecessarily processes instructions in order until the end
-    // of the block, even though it is already past Dbg1. It still returns false
-    // because it is stopped by the end of block delimiter, not the end
-    // iterator.
-    EXPECT_FALSE(RPTracker.advance(Dbg1));
-    EXPECT_FALSE(RPTrackerNoLiveIns.advance(Dbg1));
-
-    // In that case, the maximum pressure is also the pressure induced by the
-    // block's live-ins plus %2's def i.e., 3 VGPRs. This is confusing because
-    // %2's def is outside the [Begin,End) range we passed to advance, and there
-    // is no indication that a false return value should make the tracked
-    // pressure invalid.
-    EXPECT_EQ(RPTracker.moveMaxPressure().getVGPRNum(false), 3U);
-    EXPECT_EQ(RPTrackerNoLiveIns.moveMaxPressure().getVGPRNum(false), 3U);
+    // Dbg1 (exclusive), the reset is therefore unsuccessful. The advance caller
+    // returns early on a failure to reset.
+    EXPECT_FALSE(RPTracker.reset(*MBB1.begin(), Dbg1, &MBB1LiveIns));
+    EXPECT_FALSE(RPTrackerNoLiveIns.reset(*MBB1.begin(), Dbg1, nullptr));
+
+    // In that case, the maximum pressure is unchanged from the beginning since
+    // reset was unsuccessful.
+    EXPECT_EQ(RPTracker.moveMaxPressure().getVGPRNum(false), 0U);
+    EXPECT_EQ(RPTrackerNoLiveIns.moveMaxPressure().getVGPRNum(false), 0U);
   }
 }
 
@@ -179,17 +169,12 @@ body:             |
   // which would return true in this case.
   //
   // There aren't any non-debug instruction in bb.2, the reset is therefore
-  // unsuccessful. However the advance caller discards that return value and
-  // proceeds to calling its override.
-  EXPECT_FALSE(RPTracker.reset(*MBB1.begin(), &MBB1LiveIns));
-  EXPECT_FALSE(RPTrackerNoLiveIns.reset(*MBB1.begin(), nullptr));
-  // advance then produces true even though no advancement actually happened.
-  EXPECT_TRUE(RPTracker.advance(MBB1.end()));
-  EXPECT_TRUE(RPTrackerNoLiveIns.advance(MBB1.end()));
+  // unsuccessful. The advance caller returns early on a failure to reset.
+  EXPECT_FALSE(RPTracker.reset(*MBB1.begin(), MBB1.end(), &MBB1LiveIns));
+  EXPECT_FALSE(RPTrackerNoLiveIns.reset(*MBB1.begin(), MBB1.end(), nullptr));
 
   // In that case, the maximum pressure is unchanged from the beginning since
-  // reset was unsuccessful. This is confusing because the top-level advance
-  // call produced true, yet the block's live-in pressure was not considered.
+  // reset was unsuccessful.
   EXPECT_EQ(RPTracker.moveMaxPressure().getVGPRNum(false), 0U);
   EXPECT_EQ(RPTrackerNoLiveIns.moveMaxPressure().getVGPRNum(false), 0U);
 }

@lucas-rami
Copy link
Copy Markdown
Contributor Author

lucas-rami commented May 6, 2026

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

✅ With the latest revision this PR passed the C/C++ code formatter.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

🪟 Windows x64 Test Results

  • 134470 tests passed
  • 3256 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

🐧 Linux x64 Test Results

  • 174001 tests passed
  • 3320 tests skipped
  • 1 test failed

Failed Tests

(click on a test name to see its output)

MLIR

MLIR.Pass/ir-printing-file-tree.mlir
Exit Code: -9
Timeout: Reached timeout of 1200 seconds

Command Output (stdout):
--
# RUN: at line 2
rm -rf /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp || true
# executed command: rm -rf /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp
# note: command had no output on stdout or stderr
# RUN: at line 3
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/mlir-opt /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Pass/ir-printing-file-tree.mlir -mlir-print-ir-tree-dir=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp    -pass-pipeline='builtin.module(builtin.module(func.func(cse,canonicalize)))'    -mlir-print-ir-before=cse
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/mlir-opt /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Pass/ir-printing-file-tree.mlir -mlir-print-ir-tree-dir=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp '-pass-pipeline=builtin.module(builtin.module(func.func(cse,canonicalize)))' -mlir-print-ir-before=cse
# .---command stdout------------
# | module @outer {
# |   func.func @"sym/A"() {
# |     return
# |   }
# |   func.func @"sym\\backslash"() {
# |     return
# |   }
# |   module @inner {
# |     func.func @symB() {
# |       return
# |     }
# |     func.func @symC() {
# |       return
# |     }
# |   }
# | }
# | 
# `-----------------------------
# RUN: at line 6
test -f /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp/builtin_module_outer/builtin_module_inner/func_func_symB/0_0_0_cse.mlir
# executed command: test -f /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp/builtin_module_outer/builtin_module_inner/func_func_symB/0_0_0_cse.mlir
# note: command had no output on stdout or stderr
# RUN: at line 7
test ! -f /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp/builtin_module_outer/builtin_module_inner/func_func_symB/0_0_1_canonicalize.mlir
# executed command: test '!' -f /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp/builtin_module_outer/builtin_module_inner/func_func_symB/0_0_1_canonicalize.mlir
# note: command had no output on stdout or stderr
# RUN: at line 8
test -f /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp/builtin_module_outer/builtin_module_inner/func_func_symC/0_0_0_cse.mlir
# executed command: test -f /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp/builtin_module_outer/builtin_module_inner/func_func_symC/0_0_0_cse.mlir
# note: command had no output on stdout or stderr
# RUN: at line 9
test ! -f /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp/builtin_module_outer/builtin_module_inner/func_func_symC/0_0_1_canonicalize.mlir
# executed command: test '!' -f /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp/builtin_module_outer/builtin_module_inner/func_func_symC/0_0_1_canonicalize.mlir
# note: command had no output on stdout or stderr
# RUN: at line 12
rm -rf /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp || true
# executed command: rm -rf /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp
# note: command had no output on stdout or stderr
# RUN: at line 13
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/mlir-opt /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Pass/ir-printing-file-tree.mlir -mlir-print-ir-tree-dir=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp    -pass-pipeline='builtin.module(canonicalize,canonicalize,func.func(cse),builtin.module(canonicalize,func.func(cse,canonicalize),cse),cse)'    -mlir-print-ir-after-all
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/mlir-opt /home/gha/actions-runner/_work/llvm-project/llvm-project/mlir/test/Pass/ir-printing-file-tree.mlir -mlir-print-ir-tree-dir=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/mlir/test/Pass/Output/ir-printing-file-tree.mlir.tmp '-pass-pipeline=builtin.module(canonicalize,canonicalize,func.func(cse),builtin.module(canonicalize,func.func(cse,canonicalize),cse),cse)' -mlir-print-ir-after-all
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

printScheduleModel(ReadyCyclesSorted);
dbgs() << "\n\t"
<< "Metric: "
dbgs() << "\n\t" << "Metric: "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dbgs() << "\n\t" << "Metric: "
dbgs() << "\n\tMetric: "

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or just remove this change completely, since it's unrelated)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch, yes this is a spurious whole file clang-format :)

&LiveIns[RegionIdx])) {
// Advance can produce false on a non-empty region if all MIs in the region
// are debug values; in such cases the maintained max pressure is invalid
// and the only source of pressure are the region's live-ins.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe teach it to return the correct pressure instead? Or would that be too complicated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible for the RP tracker to compute the correct pressure when a LiveRegSet is provided to advance/reset, since it can just call llvm::getRegPressure. When a LiveRegSet is not provided however, the tracker must look for a SlotIndex at which to estimate pressure. When there are no non-debug MIs in the block, I don't see a way to derive such an index and therefore to compute the correct pressure. It felt weird to me to have this dependency on LiveRegSet to be able to compute the correct pressure in such cases.

// of the block, even though it is already past Dbg1. It still returns false
// because it is stopped by the end of block delimiter, not the end
// iterator.
EXPECT_FALSE(RPTracker.advance(Dbg1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should still test advance (just change the comment).

@lucas-rami lucas-rami force-pushed the users/lucas-rami/pre-commit-test-gcnrp-semantics branch 2 times, most recently from c35c449 to d4712be Compare May 8, 2026 16:31
lucas-rami added 2 commits May 8, 2026 16:33
Some of the variants of `advance` and `reset` in the `GCNRPTracker` and
`GCNDownwardRPTracker` had unclear/inconsistent semantics on their
return value. This aims to clarify that through improved documentation
and light functional changes.

These inconsistencies ultimately triggered an assert in
`GCNRPTaget::saveRP` on a complex kernel during scheduling.
`GCNScheduleDAGMILive::getRealRegPressure` would incorrectly return a
null pressure for a non-empty region which only had debug values. Such
regions can arise if the `PreRARematStage` rematerializes all non-debug
instructions out of their original region, leaving only debug values.
Attempting to rematerialize registers across that same region afterwards
would trigger the assert.
@lucas-rami lucas-rami force-pushed the users/lucas-rami/gcnrp-semantics branch from 066f9c3 to 895b992 Compare May 8, 2026 16:33
@lucas-rami lucas-rami deleted the branch llvm:users/lucas-rami/pre-commit-test-gcnrp-semantics May 8, 2026 17:35
@lucas-rami lucas-rami closed this May 8, 2026
@lucas-rami
Copy link
Copy Markdown
Contributor Author

lucas-rami commented May 8, 2026

I have no idea why but this got closed automatically when I landed #196098 (the previous PR in the stack).

Edit: looks like I pushed the branch for this PR on my LLVM fork and that prevents GitHub from automatically switching its base to main when the previous PR landed. GitHub won't let me re-open it so I'll have to re-created a PR. Sorry for the inconvenience :/

Edit 2: recreated at #196595

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants