-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AMDGPU] Use subtarget call to determine number of VGPRs #157927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Use subtarget call to determine number of VGPRs #157927
Conversation
Since the register file was increased that is no longer valid to call VGPR_32RegClass.getNumregs() to get a total number of arch registers available on a subtarget. Fixes: SWDEV-554472
@llvm/pr-subscribers-backend-amdgpu Author: Stanislav Mekhanoshin (rampitec) ChangesSince the register file was increased that is no longer valid to Fixes: SWDEV-554472 Full diff: https://github.com/llvm/llvm-project/pull/157927.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 2a977247bc2cb..edce4856f77b0 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -3452,7 +3452,7 @@ bool SITargetLowering::CanLowerReturn(
// We must use the stack if return would require unavailable registers.
unsigned MaxNumVGPRs = Subtarget->getMaxNumVGPRs(MF);
- unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();
+ unsigned TotalNumVGPRs = Subtarget->getAddressableNumArchVGPRs();
for (unsigned i = MaxNumVGPRs; i < TotalNumVGPRs; ++i)
if (CCInfo.isAllocated(AMDGPU::VGPR_32RegClass.getRegister(i)))
return false;
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
index 8a1120321af9f..54426d33d3473 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -86,7 +86,7 @@ SIMachineFunctionInfo::SIMachineFunctionInfo(const Function &F,
// FIXME: MayNeedAGPRs is a misnomer for how this is used. MFMA selection
// should be separated from availability of AGPRs
if (MFMAVGPRForm ||
- (ST.getMaxNumVGPRs(F) <= AMDGPU::VGPR_32RegClass.getNumRegs() &&
+ (ST.getMaxNumVGPRs(F) <= ST.getAddressableNumArchVGPRs() &&
!mayUseAGPRs(F)))
MayNeedAGPRs = false; // We will select all MAI with VGPR operands.
}
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 9f4f42185d9a0..40da4f96aefdb 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -1399,13 +1399,16 @@ unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) {
return IsWave32 ? 1024 : 512;
}
-unsigned getAddressableNumArchVGPRs(const MCSubtargetInfo *STI) { return 256; }
+unsigned getAddressableNumArchVGPRs(const MCSubtargetInfo *STI) {
+ const auto &Features = STI->getFeatureBits();
+ if (Features.test(Feature1024AddressableVGPRs))
+ return Features.test(FeatureWavefrontSize32) ? 1024 : 512;
+ return 256;
+}
unsigned getAddressableNumVGPRs(const MCSubtargetInfo *STI,
unsigned DynamicVGPRBlockSize) {
const auto &Features = STI->getFeatureBits();
- if (Features.test(FeatureGFX1250Insts))
- return Features.test(FeatureWavefrontSize32) ? 1024 : 512;
if (Features.test(FeatureGFX90AInsts))
return 512;
|
Is the ticket number right? Also, test? |
Ugh, no, ticket number was wrong, fixed. That was from another tab in the browser :) The test is problematic, it is a scheduling difference in a huge kernel. All unavailable registers are reserved anyway, so it was not incorrect, just some heuristics shift under a very high pressure. This test will be slow and will not tell anything to anyone, including me and reporter a month later. |
But then even generally speaking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testcase?
I have to repeat myself: The test is problematic, it is a scheduling difference in a huge kernel. All unavailable registers are reserved anyway, so it was not incorrect, just some heuristics shift under a very high pressure. This test will be slow and will not tell anything to anyone, including me and reporter a month later. |
Like we do not have tests for most of the scheduling. |
// We must use the stack if return would require unavailable registers. | ||
unsigned MaxNumVGPRs = Subtarget->getMaxNumVGPRs(MF); | ||
unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs(); | ||
unsigned TotalNumVGPRs = Subtarget->getAddressableNumArchVGPRs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use is curious, we should only be looking at the number of registers used for return in the calling convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not even saying it was correct. But using registers which do not exist is not correct for sure. It has just... floated to the top.
// should be separated from availability of AGPRs | ||
if (MFMAVGPRForm || | ||
(ST.getMaxNumVGPRs(F) <= AMDGPU::VGPR_32RegClass.getNumRegs() && | ||
(ST.getMaxNumVGPRs(F) <= ST.getAddressableNumArchVGPRs() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on deleting this part
This is well correlated with the scheduler being as buggy as it is |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/19115 Here is the relevant piece of the build log for the reference
|
Since the register file was increased that is no longer valid to
call VGPR_32RegClass.getNumregs() to get a total number of arch
registers available on a subtarget.
Fixes: SWDEV-550425