Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Don't DEALLOC_VGPRS from callable functions #72245

Closed
wants to merge 3 commits into from

Conversation

rovka
Copy link
Collaborator

@rovka rovka commented Nov 14, 2023

Callable functions should not send the DEALLOC_VGPRS message, because that might release the VGPRs and scratch allocation before potential scratch stores in the caller have completed.

Callable functions should not send the DEALLOC_VGPRS message, because
that might release the VGPRs and scratch allocation before the caller is
done with them.
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 14, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Diana (rovka)

Changes

Callable functions should not send the DEALLOC_VGPRS message, because that might release the VGPRs and scratch allocation before the caller is done with them.


Full diff: https://github.com/llvm/llvm-project/pull/72245.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+7-2)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+4)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+5)
  • (modified) llvm/test/CodeGen/AMDGPU/release-vgprs.mir (+26)
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7d..d862b37443aec83 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1027,6 +1027,8 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
     Wait.VmCnt = 0;
   }
 
+  CallingConv::ID CC = MI.getMF()->getFunction().getCallingConv();
+
   // All waits must be resolved at call return.
   // NOTE: this could be improved with knowledge of all call sites or
   //   with knowledge of the called routines.
@@ -1039,10 +1041,13 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
   // Identify S_ENDPGM instructions which may have to wait for outstanding VMEM
   // stores. In this case it can be useful to send a message to explicitly
   // release all VGPRs before the stores have completed, but it is only safe to
-  // do this if there are no outstanding scratch stores.
+  // do this if:
+  // * there are no outstanding scratch stores
+  // * this is not a callable function
   else if (MI.getOpcode() == AMDGPU::S_ENDPGM ||
            MI.getOpcode() == AMDGPU::S_ENDPGM_SAVED) {
-    if (ST->getGeneration() >= AMDGPUSubtarget::GFX11 && !OptNone &&
+    if (ST->getGeneration() >= AMDGPUSubtarget::GFX11 &&
+        !AMDGPU::isCallableCC(CC) && !OptNone &&
         ScoreBrackets.getScoreRange(VS_CNT) != 0 &&
         !ScoreBrackets.hasPendingEvent(SCRATCH_WRITE_ACCESS))
       ReleaseVGPRInsts.insert(&MI);
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index a09abc639d7590f..5ea135c7b90dd62 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -1924,6 +1924,10 @@ bool isChainCC(CallingConv::ID CC) {
   }
 }
 
+bool isCallableCC(CallingConv::ID CC) {
+  return !isEntryFunctionCC(CC) && !isChainCC(CC);
+}
+
 bool isKernelCC(const Function *Func) {
   return AMDGPU::isModuleEntryFunctionCC(Func->getCallingConv());
 }
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
index 1e0994d0862cf5d..965414019263448 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
@@ -1114,6 +1114,11 @@ bool isModuleEntryFunctionCC(CallingConv::ID CC);
 LLVM_READNONE
 bool isChainCC(CallingConv::ID CC);
 
+// Functions that are called via the 'call' instruction, rather than launched
+// by the hardware or via the 'llvm.amdgcn.cs.chain' intrinsic.
+LLVM_READNONE
+bool isCallableCC(CallingConv::ID CC);
+
 bool isKernelCC(const Function *Func);
 
 // FIXME: Remove this when calling conventions cleaned up
diff --git a/llvm/test/CodeGen/AMDGPU/release-vgprs.mir b/llvm/test/CodeGen/AMDGPU/release-vgprs.mir
index 3a879e818af797b..39ced04253b571a 100644
--- a/llvm/test/CodeGen/AMDGPU/release-vgprs.mir
+++ b/llvm/test/CodeGen/AMDGPU/release-vgprs.mir
@@ -22,6 +22,8 @@
   define amdgpu_ps void @global_atomic() { ret void }
   define amdgpu_ps void @image_atomic() { ret void }
   define amdgpu_ps void @global_store_optnone() noinline optnone { ret void }
+  define amdgpu_gfx void @gfx_function() { ret void }
+  define void @ccc_function() { ret void }
 ...
 
 ---
@@ -556,3 +558,27 @@ body:             |
     S_WAITCNT_VSCNT undef $sgpr_null, 0
     S_ENDPGM 0
 ...
+
+---
+name:            gfx_function
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: gfx_function
+    ; CHECK-NOT: S_SENDMSG 3
+    ; CHECK: S_ENDPGM 0
+    GLOBAL_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr1, 0, 4, implicit $exec
+    S_WAITCNT_VSCNT undef $sgpr_null, 0
+    S_ENDPGM 0
+...
+
+---
+name:            ccc_function
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: ccc_function
+    ; CHECK-NOT: S_SENDMSG 3
+    ; CHECK: S_ENDPGM 0
+    GLOBAL_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr1, 0, 4, implicit $exec
+    S_WAITCNT_VSCNT undef $sgpr_null, 0
+    S_ENDPGM 0
+...

@jasilvanus
Copy link
Contributor

Wasn't there also the issue that we shouldn't send the message if the current function contains calls, because a callee might have accessed scratch? Is this handled elsewhere?

@rovka
Copy link
Collaborator Author

rovka commented Nov 14, 2023

Wasn't there also the issue that we shouldn't send the message if the current function contains calls, because a callee might have accessed scratch? Is this handled elsewhere?

It will be handled in a different patch. "Coming soon"

; CHECK-NOT: S_SENDMSG 3
; CHECK: S_ENDPGM 0
GLOBAL_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr1, 0, 4, implicit $exec
S_WAITCNT_VSCNT undef $sgpr_null, 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's a bit misleading to have a S_WAITCN_VSCNT here, I just copied it from the test snippet above. I'll brush this up tomorrow.

GLOBAL_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr1, 0, 4, implicit $exec
S_WAITCNT_VSCNT undef $sgpr_null, 0
S_ENDPGM 0
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangentially related but do we avoid inserting waits before unreachable in noreturn functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think unreachable exists in MIR does it? I would expect to see an MBB ending with a tail call, and SIInsertWaitcnts certainly should not insert any waits after the tail call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you end up with an MBB ending in nothing, which may or may not also happen to be at the end of the function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we have any special handling for this. How much do we care about it? Are you concerned about the code size? I'm guessing if we reach an unreachable then waiting a little extra isn't the biggest problem...

@rovka
Copy link
Collaborator Author

rovka commented Nov 21, 2023

Ping.

Copy link
Contributor

@jasilvanus jasilvanus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my comments have been addressed, looks reasonable to me.

Leaving approval to others who know this code better.

@jayfoad
Copy link
Contributor

jayfoad commented Nov 21, 2023

Callable functions should not send the DEALLOC_VGPRS message, because that might release the VGPRs and scratch allocation before potential scratch stores in the caller have completed.

I am still confused about this. I thought callable functions waited for all counters to be zero on entry (see "Wait for any outstanding memory operations that the input registers may depend on" in SIInsertWaitcnts::runOnMachineFunction) so how can there still be any outstanding scratch stores from a caller?

@nhaehnle
Copy link
Collaborator

Taking a glimpse, this doesn't wait for vscnt because stores don't write to registers. If we can agree on that, then I think this is good to go.

@jayfoad
Copy link
Contributor

jayfoad commented Nov 21, 2023

Taking a glimpse, this doesn't wait for vscnt because stores don't write to registers. If we can agree on that, then I think this is good to go.

Good point. I implemented that! https://reviews.llvm.org/D153537

@jayfoad
Copy link
Contributor

jayfoad commented Dec 14, 2023

I think this will be superseded by #75436.

@rovka
Copy link
Collaborator Author

rovka commented Jan 9, 2024

Wasn't there also the issue that we shouldn't send the message if the current function contains calls, because a callee might have accessed scratch? Is this handled elsewhere?

It will be handled in a different patch. "Coming soon"

And finally: #77439

@rovka rovka closed this Jan 9, 2024
@rovka rovka deleted the dealloc-vgprs-funcs branch January 9, 2024 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants