Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 #81394

Merged
merged 3 commits into from
Mar 21, 2024

Conversation

SahilPatidar
Copy link
Contributor

Resolve #78226

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 11, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: None (SahilPatidar)

Changes

Resolve #78226


Full diff: https://github.com/llvm/llvm-project/pull/81394.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp (+7-6)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td (+1)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+7-7)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
index d3b2cb1936b53e..292d7ed74dfb1c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
@@ -715,10 +715,6 @@ bool AMDGPUCallLowering::lowerFormalArguments(
   if (!IsEntryFunc && !IsGraphics) {
     // For the fixed ABI, pass workitem IDs in the last argument register.
     TLI.allocateSpecialInputVGPRsFixed(CCInfo, MF, *TRI, *Info);
-
-    if (!Subtarget.enableFlatScratch())
-      CCInfo.AllocateReg(Info->getScratchRSrcReg());
-    TLI.allocateSpecialInputSGPRs(CCInfo, MF, *TRI, *Info);
   }
 
   IncomingValueAssigner Assigner(AssignFn);
@@ -732,9 +728,14 @@ bool AMDGPUCallLowering::lowerFormalArguments(
   uint64_t StackSize = Assigner.StackSize;
 
   // Start adding system SGPRs.
-  if (IsEntryFunc)
+  if (IsEntryFunc) {
     TLI.allocateSystemSGPRs(CCInfo, MF, *Info, CC, IsGraphics);
-
+  } else {
+    if (!Subtarget.enableFlatScratch())
+      CCInfo.AllocateReg(Info->getScratchRSrcReg());
+    if (!IsGraphics)
+      TLI.allocateSpecialInputSGPRs(CCInfo, MF, *TRI, *Info);
+  }
   // When we tail call, we need to check if the callee's arguments will fit on
   // the caller's stack. So, whenever we lower formal arguments, we should keep
   // track of this information, since we might lower a tail call in this
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
index c5207228dc913f..4c922a81c02efd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
@@ -23,6 +23,7 @@ def CC_SI_Gfx : CallingConv<[
   // 33 is reserved for the frame pointer
   // 34 is reserved for the base pointer
   CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16, bf16, v2bf16] , CCAssignToReg<[
+    SGPR0, SGPR1, SGPR2, SGPR3,
     SGPR4, SGPR5, SGPR6, SGPR7,
     SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
     SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index cc0c4d4e36eaa8..8eb0d05615c187 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -2784,12 +2784,6 @@ SDValue SITargetLowering::LowerFormalArguments(
   } else if (!IsGraphics) {
     // For the fixed ABI, pass workitem IDs in the last argument register.
     allocateSpecialInputVGPRsFixed(CCInfo, MF, *TRI, *Info);
-
-    // FIXME: Sink this into allocateSpecialInputSGPRs
-    if (!Subtarget->enableFlatScratch())
-      CCInfo.AllocateReg(Info->getScratchRSrcReg());
-
-    allocateSpecialInputSGPRs(CCInfo, MF, *TRI, *Info);
   }
 
   if (!IsKernel) {
@@ -2993,8 +2987,14 @@ SDValue SITargetLowering::LowerFormalArguments(
   }
 
   // Start adding system SGPRs.
-  if (IsEntryFunc)
+  if (IsEntryFunc) {
     allocateSystemSGPRs(CCInfo, MF, *Info, CallConv, IsGraphics);
+  } else {
+    if (!Subtarget->enableFlatScratch())
+      CCInfo.AllocateReg(Info->getScratchRSrcReg());
+    if (!IsGraphics)
+      allocateSpecialInputSGPRs(CCInfo, MF, *TRI, *Info);
+  }
 
   auto &ArgUsageInfo =
     DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();

@SahilPatidar SahilPatidar changed the title amdgpu_gfx functions do not use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 Feb 11, 2024
@SahilPatidar
Copy link
Contributor Author

SahilPatidar commented Feb 15, 2024

@arsenm ,
please let me know if anything is incorrect.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is correct but needs the test updates

@arsenm
Copy link
Contributor

arsenm commented Feb 29, 2024

Duplicate #78553

@SahilPatidar
Copy link
Contributor Author

@arsenm,
why are some test updates crashing with this output:
Assertion failed: (Reg != AMDGPU::NoRegister), function allocateFixedSGPRInputImpl, file /Users/sahilpatidar/Desktop/llvm/llvm-project/llvm/lib/Target/AMDGPU/SIISelLowering.cpp, line 2292.

@SahilPatidar
Copy link
Contributor Author

SahilPatidar commented Feb 29, 2024

@arsenm
Failed Tests (20):
LLVM :: CodeGen/AMDGPU/GlobalISel/irtranslator-call-non-fixed.ll
LLVM :: CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll
LLVM :: CodeGen/AMDGPU/GlobalISel/irtranslator-function-args.ll
LLVM :: CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll
LLVM :: CodeGen/AMDGPU/bf16.ll
LLVM :: CodeGen/AMDGPU/call-args-inreg.ll
LLVM :: CodeGen/AMDGPU/combine_andor_with_cmps.ll
LLVM :: CodeGen/AMDGPU/flat_atomics_i32_system.ll
LLVM :: CodeGen/AMDGPU/flat_atomics_i64_system.ll
LLVM :: CodeGen/AMDGPU/fsub-as-fneg-src-modifier.ll
LLVM :: CodeGen/AMDGPU/function-args-inreg.ll
LLVM :: CodeGen/AMDGPU/gfx-callable-argument-types.ll
LLVM :: CodeGen/AMDGPU/global_atomics_i32_system.ll
LLVM :: CodeGen/AMDGPU/global_atomics_i64_system.ll
LLVM :: CodeGen/AMDGPU/indirect-call.ll
LLVM :: CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll
LLVM :: CodeGen/AMDGPU/schedule-addrspaces.ll
LLVM :: CodeGen/AMDGPU/scratch-pointer-sink.ll
LLVM :: CodeGen/AMDGPU/stacksave_stackrestore.ll
LLVM :: CodeGen/AMDGPU/wwm-reserved-spill.ll

Testing Time: 3196.43s

Total Discovered Tests: 3484
Unsupported : 1 (0.03%)
Passed : 3455 (99.17%)
Expectedly Failed: 8 (0.23%)
Failed : 20 (0.57%)

Total of 20 cases failed, but when I updated, around 5 to 6 tests crashed.

@arsenm
Copy link
Contributor

arsenm commented Feb 29, 2024

@arsenm, why are some test updates crashing with this output: Assertion failed: (Reg != AMDGPU::NoRegister), function allocateFixedSGPRInputImpl, file /Users/sahilpatidar/Desktop/llvm/llvm-project/llvm/lib/Target/AMDGPU/SIISelLowering.cpp, line 2292.

That indicates a free register wasn't found available for an input. What do the function signatures look like for the cases that hit this?

@SahilPatidar
Copy link
Contributor Author

@arsenm,
for this test function define void @void_func_a13i32_inreg([13 x i32] inreg %arg0, ptr addrspace(1) %ptr)

@SahilPatidar
Copy link
Contributor Author

And another crash is in this file: llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll.

Assertion failed: (castIsValid(op, S, Ty) && "Invalid cast!"), function Create, file /Users/sahilpatidar/Desktop/llvm-project-17.0.3.src/llvm/lib/IR/Instructions.cpp, line 3427.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink
#0 0x0000000105ee7968 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/bin/opt+0x105ba3968)
#1 0x0000000105ee7ebc PrintStackTraceSignalHandler(void*) (/usr/local/bin/opt+0x105ba3ebc)
#2 0x0000000105ee5cb8 llvm::sys::RunSignalHandlers() (/usr/local/bin/opt+0x105ba1cb8)
#3 0x0000000105ee9af4 SignalHandler(int) (/usr/local/bin/opt+0x105ba5af4)
#4 0x0000000186cbda24 (/usr/lib/system/libsystem_platform.dylib+0x18046da24)
#5 0x0000000186c8dcc0 (/usr/lib/system/libsystem_pthread.dylib+0x18043dcc0)
#6 0x0000000186b99a40 (/usr/lib/system/libsystem_c.dylib+0x180349a40)
#7 0x0000000186b98d30 (/usr/lib/system/libsystem_c.dylib+0x180348d30)
#8 0x00000001047e7138 llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*) (/usr/local/bin/opt+0x1044a3138)
#9 0x0000000105bb2c60 llvm::IRBuilderBase::CreateCast(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&) (/usr/local/bin/opt+0x10586ec60)
#10 0x0000000105bb2e1c llvm::IRBuilderBase::CreateBitCast(llvm::Value*, llvm::Type*, llvm::Twine const&) (/usr/local/bin/opt+0x10586ee1c)
#11 0x0000000100ee816c llvm::AMDGPULibCalls::fold_pow(llvm::CallInst*, llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>&, llvm::AMDGPULibFunc const&) (/usr/local/bin/opt+0x100ba416c)
#12 0x0000000100ee5a70 llvm::AMDGPULibCalls::fold(llvm::CallInst*, llvm::AAResults*) (/usr/local/bin/opt+0x100ba1a70)
#13 0x0000000100eeb4b4 llvm::AMDGPUSimplifyLibCallsPass::run(llvm::Function&, llvm::AnalysisManagerllvm::Function&) (/usr/local/bin/opt+0x100ba74b4)
#14 0x0000000101092ee8 llvm::detail::PassModel<llvm::Function, llvm::AMDGPUSimplifyLibCallsPass, llvm::PreservedAnalyses, llvm::AnalysisManagerllvm::Function>::run(llvm::Function&, llvm::AnalysisManagerllvm::Function&) (/usr/local/bin/opt+0x100d4eee8)
#15 0x00000001048f9574 llvm::PassManager<llvm::Function, llvm::AnalysisManagerllvm::Function>::run(llvm::Function&, llvm::AnalysisManagerllvm::Function&) (/usr/local/bin/opt+0x1045b5574)
#16 0x000000010109d3d8 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManagerllvm::Function>, llvm::PreservedAnalyses, llvm::AnalysisManagerllvm::Function>::run(llvm::Function&, llvm::AnalysisManagerllvm::Function&) (/usr/local/bin/opt+0x100d593d8)
#17 0x0000000104900870 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManagerllvm::Module&) (/usr/local/bin/opt+0x1045bc870)
#18 0x000000010109cf24 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManagerllvm::Module>::run(llvm::Module&, llvm::AnalysisManagerllvm::Module&) (/usr/local/bin/opt+0x100d58f24)
#19 0x00000001048f8720 llvm::PassManager<llvm::Module, llvm::AnalysisManagerllvm::Module>::run(llvm::Module&, llvm::AnalysisManagerllvm::Module&) (/usr/local/bin/opt+0x1045b4720)
#20 0x000000010034ba64 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRefllvm::PassPlugin, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (/usr/local/bin/opt+0x100007a64)
#21 0x000000010039e3dc main (/usr/local/bin/opt+0x10005a3dc)
#22 0x000000018690d0e0
/bin/sh: line 1: 3274 Abort trap: 6 opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink < llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll

@arsenm
Copy link
Contributor

arsenm commented Mar 14, 2024

And another crash is in this file: llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll.

I don't see how you could have run into this with this patch. It's not even running codegen

@arsenm
Copy link
Contributor

arsenm commented Mar 14, 2024

And another crash is in this file: llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll.

I don't see how you could have run into this with this patch. It's not even running codegen

/Users/sahilpatidar/Desktop/llvm-project-17.0.3.src

Are you running something else in a release?

@SahilPatidar
Copy link
Contributor Author

I'm running this command - python3 llvm/utils/update_llc_test_checks.py --llc-binary=build/bin/llc llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll

@SahilPatidar
Copy link
Contributor Author

@arsenm,
The issue might be with the system-installed opt binary. Is there a way to include the opt binary from the build?

@arsenm
Copy link
Contributor

arsenm commented Mar 15, 2024

@arsenm, The issue might be with the system-installed opt binary. Is there a way to include the opt binary from the build?

update_llc_test_checks doesn't know how to run opt. For combined tests, I usually hack around it by deleting the opt/llc run lines and then running the appropriate scripts, before restoring the run lines.

You can also try to use update_any_test_checks, but I haven't used that successfully before

@SahilPatidar
Copy link
Contributor Author

@arsenm, why are some test updates crashing with this output: Assertion failed: (Reg != AMDGPU::NoRegister), function allocateFixedSGPRInputImpl, file /Users/sahilpatidar/Desktop/llvm/llvm-project/llvm/lib/Target/AMDGPU/SIISelLowering.cpp, line 2292.

That indicates a free register wasn't found available for an input. What do the function signatures look like for the cases that hit this?

for this test function define void @void_func_a13i32_inreg([13 x i32] inreg %arg0, ptr addrspace(1) %ptr)

@SahilPatidar
Copy link
Contributor Author

@arsenm,
here are a few expected failed cases. Let me know if they need to be updated.

Total Discovered Tests: 3484
  Unsupported      :   49 (1.41%)
  Passed           : 3429 (98.42%)
  Expectedly Failed:    6 (0.17%)

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks

@arsenm arsenm merged commit 3ac243b into llvm:main Mar 21, 2024
4 of 5 checks passed
Copy link

@SahilPatidar Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may recieve a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

tsymalla pushed a commit to tsymalla/llvm-project that referenced this pull request Mar 22, 2024
In llvm#81394, calls to amdgpu_gfx
functions are now allowed to use s0-s3 for inreg arguments.
This causes a regression in an offline lit test, where we call an
external compute library function from a compute shader.
The changes leave registers from the s0-s3 range to not be live-in in
the MBB containing the SI_CALL instruction.
This seems to be caused by a missing Gfx CC check in
`SITargetLowering::LowerCall`, where we insert a `CopyFromReg` from a
call chain to either s48-s51 or s0-s3.
Since by the now missing copy at the beginning of the MBB, SGPR0-SGPR3
are not implicitly live anymore, the lowering of the call still using
SGPR0-SGPR3 will also fail, so we should not insert the `CopyFromReg`
into the chain as well.
tsymalla pushed a commit to tsymalla/llvm-project that referenced this pull request Mar 22, 2024
…ents on targets using scratch instructions for stack llvm#78226 (llvm#81394)"

This reverts commit 3ac243b.
It is not handling RSrc registers s0-s3 correctly. This leads to a
broken test, where it expects s0-s3 as function argument and uses it as
RSrc register as well.
We need to re-visit the patch, but we only want to have s0-s3 as
argument registers if we don't need them as RSrc registers.
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
@tsymalla
Copy link
Contributor

Hi @arsenm @SahilPatidar Please take a look at the referenced revert PR #86273, since the change introduced here breaks some tests.

tsymalla added a commit that referenced this pull request Mar 26, 2024
…ents on targets using scratch instructions for stack #78226" (#86273)

Reverts #81394

This reverts commit 3ac243b.
It is not handling RSrc registers s0-s3 correctly. This leads to a
broken test, where it expects s0-s3 as function argument and uses it as
RSrc register as well.
We need to re-visit the patch, but apparently we only want to have s0-s3
as
argument registers if we don't need them as RSrc registers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

amdgpu_gfx functions do not use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack
4 participants