Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Add calling convention check in call lowering. #86267

Closed
wants to merge 1 commit into from

Conversation

tsymalla
Copy link
Contributor

In #81394, calls to amdgpu_gfx functions are now allowed to use s0-s3 for inreg arguments. This causes a regression in an offline lit test, where we call an external compute library function from a compute shader. The changes leave registers from the s0-s3 range to not be live-in in the MBB containing the SI_CALL instruction.
This seems to be caused by a missing Gfx CC check in SITargetLowering::LowerCall, where we insert a CopyFromReg from a call chain to either s48-s51 or s0-s3.
Since by the now missing copy at the beginning of the MBB, SGPR0-SGPR3 are not implicitly live anymore, the lowering of the call still using SGPR0-SGPR3 will also fail, so we should not insert the CopyFromReg into the chain as well.

I hope that makes sense somehow.

In llvm#81394, calls to amdgpu_gfx
functions are now allowed to use s0-s3 for inreg arguments.
This causes a regression in an offline lit test, where we call an
external compute library function from a compute shader.
The changes leave registers from the s0-s3 range to not be live-in in
the MBB containing the SI_CALL instruction.
This seems to be caused by a missing Gfx CC check in
`SITargetLowering::LowerCall`, where we insert a `CopyFromReg` from a
call chain to either s48-s51 or s0-s3.
Since by the now missing copy at the beginning of the MBB, SGPR0-SGPR3
are not implicitly live anymore, the lowering of the call still using
SGPR0-SGPR3 will also fail, so we should not insert the `CopyFromReg`
into the chain as well.
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 22, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Thomas Symalla (tsymalla)

Changes

In #81394, calls to amdgpu_gfx functions are now allowed to use s0-s3 for inreg arguments. This causes a regression in an offline lit test, where we call an external compute library function from a compute shader. The changes leave registers from the s0-s3 range to not be live-in in the MBB containing the SI_CALL instruction.
This seems to be caused by a missing Gfx CC check in SITargetLowering::LowerCall, where we insert a CopyFromReg from a call chain to either s48-s51 or s0-s3.
Since by the now missing copy at the beginning of the MBB, SGPR0-SGPR3 are not implicitly live anymore, the lowering of the call still using SGPR0-SGPR3 will also fail, so we should not insert the CopyFromReg into the chain as well.

I hope that makes sense somehow.


Full diff: https://github.com/llvm/llvm-project/pull/86267.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+1-1)
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 7f0cff72c18661..d82038bb6c51c5 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -3684,7 +3684,7 @@ SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
   if (!IsSibCall)
     Chain = DAG.getCALLSEQ_START(Chain, 0, 0, DL);
 
-  if (!IsSibCall || IsChainCallConv) {
+  if (!AMDGPU::isGraphics(CallConv) && (!IsSibCall || IsChainCallConv)) {
     if (!Subtarget->enableFlatScratch()) {
       SmallVector<SDValue, 4> CopyFromChains;
 

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test. I'd expect us to always emit the necessary parameter copies, which would often be identity copies to fold later

@tsymalla
Copy link
Contributor Author

Missing test.

Yeah, I am working towards a minimal example.

I'd expect us to always emit the necessary parameter copies, which would often be identity copies to fold later

I'd expect that as well, but it seems mentioned change causes the removal of these copies:

Working:

1392B	  ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
1408B	  %78:sgpr_128 = COPY $sgpr100_sgpr101_sgpr102_sgpr103
1424B	  $sgpr0_sgpr1_sgpr2_sgpr3 = COPY %78:sgpr_128
1840B	  $sgpr30_sgpr31 = SI_CALL killed %109:sreg_64, 0, <regmask $sgpr_null $sgpr_null_hi $src_private_base $src_private_base_hi $src_private_base_lo $src_private_limit $src_private_limit_hi $src_private_limit_lo $src_shared_base $src_shared_base_hi $src_shared_base_lo $src_shared_limit $src_shared_limit_hi $src_shared_limit_lo $sgpr4 $sgpr5 $sgpr6 $sgpr7 $sgpr8 $sgpr9 $sgpr10 $sgpr11 $sgpr12 $sgpr13 $sgpr14 $sgpr15 $sgpr16 $sgpr17 $sgpr18 $sgpr19 $sgpr20 $sgpr21 $sgpr22 and 1139 more...>, implicit $sgpr0_sgpr1_sgpr2_sgpr3, 

Broken:

1520B	  ADJCALLSTACKUP 0, 0, implicit-def dead $scc, implicit-def $sgpr32, implicit $sgpr32
1536B	  $sgpr0 = COPY %0:sgpr_32
1552B	  $sgpr1 = COPY %1:sgpr_32
1568B	  $sgpr2 = COPY %2:sgpr_32
1584B	  $sgpr3 = COPY %3:sgpr_32, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr0, implicit $sgpr1, implicit $sgpr2, implicit $sgpr0_sgpr1, implicit $sgpr0_sgpr1_sgpr2, implicit $sgpr2_sgpr3

=> sgpr2_sgpr3 is not in the live-ins.

@tsymalla
Copy link
Contributor Author

Note: The implicits and implicit-defs disappear with the change, preventing the live register analysis from complaining

@tsymalla
Copy link
Contributor Author

Will revert original change

@tsymalla tsymalla closed this Mar 22, 2024
@arsenm
Copy link
Contributor

arsenm commented Mar 22, 2024

Yeah, I am working towards a minimal example.

Shouldn't it be any amdgpu_gfx function calling ccc with inreg/SGPR arguments?

Will revert original change

Better to just finish the fix here?

@tsymalla
Copy link
Contributor Author

Better to just finish the fix here?

Don't have time for that, unfortunately.

@arsenm
Copy link
Contributor

arsenm commented Mar 27, 2024

Better to just finish the fix here?

Don't have time for that, unfortunately.

I insist you at least push the test which failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants