[AMDGPU] Don't DEALLOC_VGPRS from callable functions #72245

rovka · 2023-11-14T12:11:51Z

Callable functions should not send the DEALLOC_VGPRS message, because that might release the VGPRs and scratch allocation before potential scratch stores in the caller have completed.

Callable functions should not send the DEALLOC_VGPRS message, because that might release the VGPRs and scratch allocation before the caller is done with them.

llvmbot · 2023-11-14T12:12:25Z

@llvm/pr-subscribers-backend-amdgpu

Author: Diana (rovka)

Changes

Callable functions should not send the DEALLOC_VGPRS message, because that might release the VGPRs and scratch allocation before the caller is done with them.

Full diff: https://github.com/llvm/llvm-project/pull/72245.diff

4 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+7-2)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+4)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+5)
(modified) llvm/test/CodeGen/AMDGPU/release-vgprs.mir (+26)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index ede4841b8a5fd7d..d862b37443aec83 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1027,6 +1027,8 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
     Wait.VmCnt = 0;
   }
 
+  CallingConv::ID CC = MI.getMF()->getFunction().getCallingConv();
+
   // All waits must be resolved at call return.
   // NOTE: this could be improved with knowledge of all call sites or
   //   with knowledge of the called routines.
@@ -1039,10 +1041,13 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
   // Identify S_ENDPGM instructions which may have to wait for outstanding VMEM
   // stores. In this case it can be useful to send a message to explicitly
   // release all VGPRs before the stores have completed, but it is only safe to
-  // do this if there are no outstanding scratch stores.
+  // do this if:
+  // * there are no outstanding scratch stores
+  // * this is not a callable function
   else if (MI.getOpcode() == AMDGPU::S_ENDPGM ||
            MI.getOpcode() == AMDGPU::S_ENDPGM_SAVED) {
-    if (ST->getGeneration() >= AMDGPUSubtarget::GFX11 && !OptNone &&
+    if (ST->getGeneration() >= AMDGPUSubtarget::GFX11 &&
+        !AMDGPU::isCallableCC(CC) && !OptNone &&
         ScoreBrackets.getScoreRange(VS_CNT) != 0 &&
         !ScoreBrackets.hasPendingEvent(SCRATCH_WRITE_ACCESS))
       ReleaseVGPRInsts.insert(&MI);
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index a09abc639d7590f..5ea135c7b90dd62 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -1924,6 +1924,10 @@ bool isChainCC(CallingConv::ID CC) {
   }
 }
 
+bool isCallableCC(CallingConv::ID CC) {
+  return !isEntryFunctionCC(CC) && !isChainCC(CC);
+}
+
 bool isKernelCC(const Function *Func) {
   return AMDGPU::isModuleEntryFunctionCC(Func->getCallingConv());
 }
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
index 1e0994d0862cf5d..965414019263448 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
@@ -1114,6 +1114,11 @@ bool isModuleEntryFunctionCC(CallingConv::ID CC);
 LLVM_READNONE
 bool isChainCC(CallingConv::ID CC);
 
+// Functions that are called via the 'call' instruction, rather than launched
+// by the hardware or via the 'llvm.amdgcn.cs.chain' intrinsic.
+LLVM_READNONE
+bool isCallableCC(CallingConv::ID CC);
+
 bool isKernelCC(const Function *Func);
 
 // FIXME: Remove this when calling conventions cleaned up
diff --git a/llvm/test/CodeGen/AMDGPU/release-vgprs.mir b/llvm/test/CodeGen/AMDGPU/release-vgprs.mir
index 3a879e818af797b..39ced04253b571a 100644
--- a/llvm/test/CodeGen/AMDGPU/release-vgprs.mir
+++ b/llvm/test/CodeGen/AMDGPU/release-vgprs.mir
@@ -22,6 +22,8 @@
   define amdgpu_ps void @global_atomic() { ret void }
   define amdgpu_ps void @image_atomic() { ret void }
   define amdgpu_ps void @global_store_optnone() noinline optnone { ret void }
+  define amdgpu_gfx void @gfx_function() { ret void }
+  define void @ccc_function() { ret void }
 ...
 
 ---
@@ -556,3 +558,27 @@ body:             |
     S_WAITCNT_VSCNT undef $sgpr_null, 0
     S_ENDPGM 0
 ...
+
+---
+name:            gfx_function
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: gfx_function
+    ; CHECK-NOT: S_SENDMSG 3
+    ; CHECK: S_ENDPGM 0
+    GLOBAL_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr1, 0, 4, implicit $exec
+    S_WAITCNT_VSCNT undef $sgpr_null, 0
+    S_ENDPGM 0
+...
+
+---
+name:            ccc_function
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: ccc_function
+    ; CHECK-NOT: S_SENDMSG 3
+    ; CHECK: S_ENDPGM 0
+    GLOBAL_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr1, 0, 4, implicit $exec
+    S_WAITCNT_VSCNT undef $sgpr_null, 0
+    S_ENDPGM 0
+...

jasilvanus · 2023-11-14T12:15:16Z

Wasn't there also the issue that we shouldn't send the message if the current function contains calls, because a callee might have accessed scratch? Is this handled elsewhere?

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

rovka · 2023-11-14T13:56:05Z

Wasn't there also the issue that we shouldn't send the message if the current function contains calls, because a callee might have accessed scratch? Is this handled elsewhere?

It will be handled in a different patch. "Coming soon"

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

rovka · 2023-11-14T14:02:48Z

llvm/test/CodeGen/AMDGPU/release-vgprs.mir

+    ; CHECK-NOT: S_SENDMSG 3
+    ; CHECK: S_ENDPGM 0
+    GLOBAL_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr1, 0, 4, implicit $exec
+    S_WAITCNT_VSCNT undef $sgpr_null, 0


I guess it's a bit misleading to have a S_WAITCN_VSCNT here, I just copied it from the test snippet above. I'll brush this up tomorrow.

arsenm · 2023-11-15T06:33:11Z

llvm/test/CodeGen/AMDGPU/release-vgprs.mir

+    GLOBAL_STORE_DWORD undef renamable $vgpr0_vgpr1, killed renamable $vgpr1, 0, 4, implicit $exec
+    S_WAITCNT_VSCNT undef $sgpr_null, 0
+    S_ENDPGM 0
+...


Tangentially related but do we avoid inserting waits before unreachable in noreturn functions?

I don't think unreachable exists in MIR does it? I would expect to see an MBB ending with a tail call, and SIInsertWaitcnts certainly should not insert any waits after the tail call.

I believe you end up with an MBB ending in nothing, which may or may not also happen to be at the end of the function

I'm not sure if we have any special handling for this. How much do we care about it? Are you concerned about the code size? I'm guessing if we reach an unreachable then waiting a little extra isn't the biggest problem...

rovka · 2023-11-21T08:35:35Z

Ping.

jasilvanus

All my comments have been addressed, looks reasonable to me.

Leaving approval to others who know this code better.

jayfoad · 2023-11-21T13:35:49Z

Callable functions should not send the DEALLOC_VGPRS message, because that might release the VGPRs and scratch allocation before potential scratch stores in the caller have completed.

I am still confused about this. I thought callable functions waited for all counters to be zero on entry (see "Wait for any outstanding memory operations that the input registers may depend on" in SIInsertWaitcnts::runOnMachineFunction) so how can there still be any outstanding scratch stores from a caller?

nhaehnle · 2023-11-21T14:57:54Z

Taking a glimpse, this doesn't wait for vscnt because stores don't write to registers. If we can agree on that, then I think this is good to go.

jayfoad · 2023-11-21T15:01:34Z

Taking a glimpse, this doesn't wait for vscnt because stores don't write to registers. If we can agree on that, then I think this is good to go.

Good point. I implemented that! https://reviews.llvm.org/D153537

jayfoad · 2023-12-14T13:01:32Z

I think this will be superseded by #75436.

rovka · 2024-01-09T10:36:09Z

Wasn't there also the issue that we shouldn't send the message if the current function contains calls, because a callee might have accessed scratch? Is this handled elsewhere?

It will be handled in a different patch. "Coming soon"

And finally: #77439

[AMDGPU] Don't DEALLOC_VGPRS from callable functions

48b3801

Callable functions should not send the DEALLOC_VGPRS message, because that might release the VGPRs and scratch allocation before the caller is done with them.

rovka requested review from jayfoad, arsenm, nhaehnle and jasilvanus November 14, 2023 12:11

llvmbot added the backend:AMDGPU label Nov 14, 2023

jasilvanus reviewed Nov 14, 2023

View reviewed changes

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp Show resolved Hide resolved

jasilvanus reviewed Nov 14, 2023

View reviewed changes

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp Outdated Show resolved Hide resolved

rovka commented Nov 14, 2023

View reviewed changes

arsenm reviewed Nov 15, 2023

View reviewed changes

rovka added 2 commits November 15, 2023 15:13

Address comments + minor cleanup

39f5b84

Remove unused function

a8249d2

jasilvanus reviewed Nov 21, 2023

View reviewed changes

jayfoad mentioned this pull request Nov 23, 2023

[AMDGPU] Lazily emit waitcnts on function entry #73122

Closed

arsenm approved these changes Nov 28, 2023

View reviewed changes

jayfoad mentioned this pull request Dec 14, 2023

[AMDGPU][SIInsertWaitcnts] Set initial state for VS_CNT in non-kernel functions #75436

Merged

rovka closed this Jan 9, 2024

rovka deleted the dealloc-vgprs-funcs branch January 9, 2024 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Don't DEALLOC_VGPRS from callable functions #72245

[AMDGPU] Don't DEALLOC_VGPRS from callable functions #72245

rovka commented Nov 14, 2023 •

edited

llvmbot commented Nov 14, 2023

jasilvanus commented Nov 14, 2023

rovka commented Nov 14, 2023

rovka Nov 14, 2023

arsenm Nov 15, 2023

jayfoad Nov 15, 2023

arsenm Nov 15, 2023

rovka Nov 15, 2023

rovka commented Nov 21, 2023

jasilvanus left a comment

jayfoad commented Nov 21, 2023

nhaehnle commented Nov 21, 2023

jayfoad commented Nov 21, 2023

jayfoad commented Dec 14, 2023

rovka commented Jan 9, 2024

[AMDGPU] Don't DEALLOC_VGPRS from callable functions #72245

[AMDGPU] Don't DEALLOC_VGPRS from callable functions #72245

Conversation

rovka commented Nov 14, 2023 • edited

llvmbot commented Nov 14, 2023

jasilvanus commented Nov 14, 2023

rovka commented Nov 14, 2023

rovka Nov 14, 2023

Choose a reason for hiding this comment

arsenm Nov 15, 2023

Choose a reason for hiding this comment

jayfoad Nov 15, 2023

Choose a reason for hiding this comment

arsenm Nov 15, 2023

Choose a reason for hiding this comment

rovka Nov 15, 2023

Choose a reason for hiding this comment

rovka commented Nov 21, 2023

jasilvanus left a comment

Choose a reason for hiding this comment

jayfoad commented Nov 21, 2023

nhaehnle commented Nov 21, 2023

jayfoad commented Nov 21, 2023

jayfoad commented Dec 14, 2023

rovka commented Jan 9, 2024

rovka commented Nov 14, 2023 •

edited