-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 #81394
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: None (SahilPatidar) ChangesResolve #78226 Full diff: https://github.com/llvm/llvm-project/pull/81394.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
index d3b2cb1936b53e..292d7ed74dfb1c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
@@ -715,10 +715,6 @@ bool AMDGPUCallLowering::lowerFormalArguments(
if (!IsEntryFunc && !IsGraphics) {
// For the fixed ABI, pass workitem IDs in the last argument register.
TLI.allocateSpecialInputVGPRsFixed(CCInfo, MF, *TRI, *Info);
-
- if (!Subtarget.enableFlatScratch())
- CCInfo.AllocateReg(Info->getScratchRSrcReg());
- TLI.allocateSpecialInputSGPRs(CCInfo, MF, *TRI, *Info);
}
IncomingValueAssigner Assigner(AssignFn);
@@ -732,9 +728,14 @@ bool AMDGPUCallLowering::lowerFormalArguments(
uint64_t StackSize = Assigner.StackSize;
// Start adding system SGPRs.
- if (IsEntryFunc)
+ if (IsEntryFunc) {
TLI.allocateSystemSGPRs(CCInfo, MF, *Info, CC, IsGraphics);
-
+ } else {
+ if (!Subtarget.enableFlatScratch())
+ CCInfo.AllocateReg(Info->getScratchRSrcReg());
+ if (!IsGraphics)
+ TLI.allocateSpecialInputSGPRs(CCInfo, MF, *TRI, *Info);
+ }
// When we tail call, we need to check if the callee's arguments will fit on
// the caller's stack. So, whenever we lower formal arguments, we should keep
// track of this information, since we might lower a tail call in this
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
index c5207228dc913f..4c922a81c02efd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
@@ -23,6 +23,7 @@ def CC_SI_Gfx : CallingConv<[
// 33 is reserved for the frame pointer
// 34 is reserved for the base pointer
CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16, bf16, v2bf16] , CCAssignToReg<[
+ SGPR0, SGPR1, SGPR2, SGPR3,
SGPR4, SGPR5, SGPR6, SGPR7,
SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index cc0c4d4e36eaa8..8eb0d05615c187 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -2784,12 +2784,6 @@ SDValue SITargetLowering::LowerFormalArguments(
} else if (!IsGraphics) {
// For the fixed ABI, pass workitem IDs in the last argument register.
allocateSpecialInputVGPRsFixed(CCInfo, MF, *TRI, *Info);
-
- // FIXME: Sink this into allocateSpecialInputSGPRs
- if (!Subtarget->enableFlatScratch())
- CCInfo.AllocateReg(Info->getScratchRSrcReg());
-
- allocateSpecialInputSGPRs(CCInfo, MF, *TRI, *Info);
}
if (!IsKernel) {
@@ -2993,8 +2987,14 @@ SDValue SITargetLowering::LowerFormalArguments(
}
// Start adding system SGPRs.
- if (IsEntryFunc)
+ if (IsEntryFunc) {
allocateSystemSGPRs(CCInfo, MF, *Info, CallConv, IsGraphics);
+ } else {
+ if (!Subtarget->enableFlatScratch())
+ CCInfo.AllocateReg(Info->getScratchRSrcReg());
+ if (!IsGraphics)
+ allocateSpecialInputSGPRs(CCInfo, MF, *TRI, *Info);
+ }
auto &ArgUsageInfo =
DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();
|
@arsenm , |
…gets using scratch instructions for stack llvm#78226
7f25f52
to
8f60400
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is correct but needs the test updates
Duplicate #78553 |
@arsenm, |
@arsenm Testing Time: 3196.43s Total Discovered Tests: 3484 Total of 20 cases failed, but when I updated, around 5 to 6 tests crashed. |
That indicates a free register wasn't found available for an input. What do the function signatures look like for the cases that hit this? |
@arsenm, |
And another crash is in this file: llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll. Assertion failed: (castIsValid(op, S, Ty) && "Invalid cast!"), function Create, file /Users/sahilpatidar/Desktop/llvm-project-17.0.3.src/llvm/lib/IR/Instructions.cpp, line 3427. |
I don't see how you could have run into this with this patch. It's not even running codegen |
Are you running something else in a release? |
I'm running this command - |
@arsenm, |
update_llc_test_checks doesn't know how to run opt. For combined tests, I usually hack around it by deleting the opt/llc run lines and then running the appropriate scripts, before restoring the run lines. You can also try to use update_any_test_checks, but I haven't used that successfully before |
for this test function |
@arsenm,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks
@SahilPatidar Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested Please check whether problems have been caused by your change specifically, as How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
In llvm#81394, calls to amdgpu_gfx functions are now allowed to use s0-s3 for inreg arguments. This causes a regression in an offline lit test, where we call an external compute library function from a compute shader. The changes leave registers from the s0-s3 range to not be live-in in the MBB containing the SI_CALL instruction. This seems to be caused by a missing Gfx CC check in `SITargetLowering::LowerCall`, where we insert a `CopyFromReg` from a call chain to either s48-s51 or s0-s3. Since by the now missing copy at the beginning of the MBB, SGPR0-SGPR3 are not implicitly live anymore, the lowering of the call still using SGPR0-SGPR3 will also fail, so we should not insert the `CopyFromReg` into the chain as well.
…ents on targets using scratch instructions for stack llvm#78226 (llvm#81394)" This reverts commit 3ac243b. It is not handling RSrc registers s0-s3 correctly. This leads to a broken test, where it expects s0-s3 as function argument and uses it as RSrc register as well. We need to re-visit the patch, but we only want to have s0-s3 as argument registers if we don't need them as RSrc registers.
…targets using scratch instructions for stack llvm#78226 (llvm#81394) Resolve llvm#78226
Hi @arsenm @SahilPatidar Please take a look at the referenced revert PR #86273, since the change introduced here breaks some tests. |
…ents on targets using scratch instructions for stack #78226" (#86273) Reverts #81394 This reverts commit 3ac243b. It is not handling RSrc registers s0-s3 correctly. This leads to a broken test, where it expects s0-s3 as function argument and uses it as RSrc register as well. We need to re-visit the patch, but apparently we only want to have s0-s3 as argument registers if we don't need them as RSrc registers.
Resolve #78226