[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 #103490

shiltian · 2024-08-14T00:41:46Z

No description provided.

shiltian · 2024-08-14T00:42:04Z

[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 #103490 👈
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite

llvmbot · 2024-08-14T00:42:30Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/103490.diff

4 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+6-1)
(modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+72-32)
(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll (+14-8)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 86fc100f1c2da0..52f93472eac206 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -8414,8 +8414,13 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
     return getPreloadedValue(DAG, *MFI, VT,
                              AMDGPUFunctionArgInfo::IMPLICIT_BUFFER_PTR);
   }
-  case Intrinsic::amdgcn_dispatch_ptr:
   case Intrinsic::amdgcn_queue_ptr: {
+    const Module *M = DAG.getMachineFunction().getFunction().getParent();
+    if (AMDGPU::getAMDHSACodeObjectVersion(*M) >= AMDGPU::AMDHSA_COV5)
+      return loadImplicitKernelArgument(DAG, MVT::i64, DL, Align(8), QUEUE_PTR);
+    [[fallthrough]];
+  }
+  case Intrinsic::amdgcn_dispatch_ptr: {
     if (!Subtarget->isAmdHsaOrMesa(MF.getFunction())) {
       DiagnosticInfoUnsupported BadIntrin(
           MF.getFunction(), "unsupported hsa intrinsic without hsa target",
diff --git a/llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll b/llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll
index e53653408feb40..dcbe3363f5874a 100644
--- a/llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll
+++ b/llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll
@@ -204,26 +204,50 @@ define amdgpu_kernel void @marked_kernel_use_workgroup_id(ptr addrspace(1) %ptr)
 }
 
 define void @marked_func_use_other_sgpr(ptr addrspace(1) %ptr) #0 {
-; FIXEDABI-LABEL: marked_func_use_other_sgpr:
-; FIXEDABI:       ; %bb.0:
-; FIXEDABI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; FIXEDABI-NEXT:    v_mov_b32_e32 v2, s6
-; FIXEDABI-NEXT:    v_mov_b32_e32 v3, s7
-; FIXEDABI-NEXT:    flat_load_ubyte v2, v[2:3] glc
-; FIXEDABI-NEXT:    s_waitcnt vmcnt(0)
-; FIXEDABI-NEXT:    v_mov_b32_e32 v2, s8
-; FIXEDABI-NEXT:    v_mov_b32_e32 v3, s9
-; FIXEDABI-NEXT:    flat_load_ubyte v2, v[2:3] glc
-; FIXEDABI-NEXT:    s_waitcnt vmcnt(0)
-; FIXEDABI-NEXT:    v_mov_b32_e32 v2, s4
-; FIXEDABI-NEXT:    v_mov_b32_e32 v3, s5
-; FIXEDABI-NEXT:    flat_load_ubyte v2, v[2:3] glc
-; FIXEDABI-NEXT:    s_waitcnt vmcnt(0)
-; FIXEDABI-NEXT:    v_mov_b32_e32 v2, s10
-; FIXEDABI-NEXT:    v_mov_b32_e32 v3, s11
-; FIXEDABI-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
-; FIXEDABI-NEXT:    s_waitcnt vmcnt(0)
-; FIXEDABI-NEXT:    s_setpc_b64 s[30:31]
+; FIXEDABI-SDAG-LABEL: marked_func_use_other_sgpr:
+; FIXEDABI-SDAG:       ; %bb.0:
+; FIXEDABI-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; FIXEDABI-SDAG-NEXT:    s_mov_b64 s[6:7], 0xc8
+; FIXEDABI-SDAG-NEXT:    s_load_dwordx2 s[6:7], s[6:7], 0x0
+; FIXEDABI-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v2, s6
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v3, s7
+; FIXEDABI-SDAG-NEXT:    flat_load_ubyte v2, v[2:3] glc
+; FIXEDABI-SDAG-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v2, s8
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v3, s9
+; FIXEDABI-SDAG-NEXT:    flat_load_ubyte v2, v[2:3] glc
+; FIXEDABI-SDAG-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v2, s4
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v3, s5
+; FIXEDABI-SDAG-NEXT:    flat_load_ubyte v2, v[2:3] glc
+; FIXEDABI-SDAG-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v2, s10
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v3, s11
+; FIXEDABI-SDAG-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
+; FIXEDABI-SDAG-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; FIXEDABI-GISEL-LABEL: marked_func_use_other_sgpr:
+; FIXEDABI-GISEL:       ; %bb.0:
+; FIXEDABI-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v2, s6
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v3, s7
+; FIXEDABI-GISEL-NEXT:    flat_load_ubyte v2, v[2:3] glc
+; FIXEDABI-GISEL-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v2, s8
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v3, s9
+; FIXEDABI-GISEL-NEXT:    flat_load_ubyte v2, v[2:3] glc
+; FIXEDABI-GISEL-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v2, s4
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v3, s5
+; FIXEDABI-GISEL-NEXT:    flat_load_ubyte v2, v[2:3] glc
+; FIXEDABI-GISEL-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v2, s10
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v3, s11
+; FIXEDABI-GISEL-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
+; FIXEDABI-GISEL-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-GISEL-NEXT:    s_setpc_b64 s[30:31]
   %queue.ptr = call ptr addrspace(4) @llvm.amdgcn.queue.ptr()
   %implicitarg.ptr = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
   %dispatch.ptr = call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
@@ -236,18 +260,34 @@ define void @marked_func_use_other_sgpr(ptr addrspace(1) %ptr) #0 {
 }
 
 define amdgpu_kernel void @marked_kernel_use_other_sgpr(ptr addrspace(1) %ptr) #0 {
-; FIXEDABI-LABEL: marked_kernel_use_other_sgpr:
-; FIXEDABI:       ; %bb.0:
-; FIXEDABI-NEXT:    s_add_u32 s0, s4, 8
-; FIXEDABI-NEXT:    flat_load_ubyte v0, v[0:1] glc
-; FIXEDABI-NEXT:    s_addc_u32 s1, s5, 0
-; FIXEDABI-NEXT:    s_waitcnt vmcnt(0)
-; FIXEDABI-NEXT:    v_mov_b32_e32 v0, s0
-; FIXEDABI-NEXT:    v_mov_b32_e32 v1, s1
-; FIXEDABI-NEXT:    flat_load_ubyte v0, v[0:1] glc
-; FIXEDABI-NEXT:    s_waitcnt vmcnt(0)
-; FIXEDABI-NEXT:    flat_load_ubyte v0, v[0:1] glc
-; FIXEDABI-NEXT:    s_endpgm
+; FIXEDABI-SDAG-LABEL: marked_kernel_use_other_sgpr:
+; FIXEDABI-SDAG:       ; %bb.0:
+; FIXEDABI-SDAG-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0xd0
+; FIXEDABI-SDAG-NEXT:    s_add_u32 s2, s4, 8
+; FIXEDABI-SDAG-NEXT:    s_addc_u32 s3, s5, 0
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v0, s2
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v1, s3
+; FIXEDABI-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v3, s1
+; FIXEDABI-SDAG-NEXT:    v_mov_b32_e32 v2, s0
+; FIXEDABI-SDAG-NEXT:    flat_load_ubyte v2, v[2:3] glc
+; FIXEDABI-SDAG-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; FIXEDABI-SDAG-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-SDAG-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; FIXEDABI-SDAG-NEXT:    s_endpgm
+;
+; FIXEDABI-GISEL-LABEL: marked_kernel_use_other_sgpr:
+; FIXEDABI-GISEL:       ; %bb.0:
+; FIXEDABI-GISEL-NEXT:    s_add_u32 s0, s4, 8
+; FIXEDABI-GISEL-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; FIXEDABI-GISEL-NEXT:    s_addc_u32 s1, s5, 0
+; FIXEDABI-GISEL-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v0, s0
+; FIXEDABI-GISEL-NEXT:    v_mov_b32_e32 v1, s1
+; FIXEDABI-GISEL-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; FIXEDABI-GISEL-NEXT:    s_waitcnt vmcnt(0)
+; FIXEDABI-GISEL-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; FIXEDABI-GISEL-NEXT:    s_endpgm
   %queue.ptr = call ptr addrspace(4) @llvm.amdgcn.queue.ptr()
   %implicitarg.ptr = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
   %dispatch.ptr = call ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
diff --git a/llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll b/llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll
index 032ec65fa85133..c2c6670c3ecacd 100644
--- a/llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll
+++ b/llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll
@@ -22,7 +22,8 @@ define amdgpu_kernel void @kern_indirect_use_dispatch_ptr(i32) #1 {
 }
 
 ; GCN-LABEL: {{^}}use_queue_ptr:
-; GCN: s_load_dword s{{[0-9]+}}, s[6:7]
+; GCN: s_mov_b64 s[4:5], 0xc8
+; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
 define hidden void @use_queue_ptr() #1 {
   %queue_ptr = call noalias ptr addrspace(4) @llvm.amdgcn.queue.ptr() #0
   %value = load volatile i32, ptr addrspace(4) %queue_ptr
diff --git a/llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll b/llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll
index b89dbd42e0466f..8999b8f0d07678 100644
--- a/llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll
+++ b/llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll
@@ -287,20 +287,24 @@ define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr)  {
 ;
 ; GFX8V5-LABEL: llvm_amdgcn_queue_ptr:
 ; GFX8V5:       ; %bb.0:
-; GFX8V5-NEXT:    s_add_u32 s0, s6, 8
-; GFX8V5-NEXT:    flat_load_ubyte v0, v[0:1] glc
-; GFX8V5-NEXT:    s_addc_u32 s1, s7, 0
-; GFX8V5-NEXT:    s_waitcnt vmcnt(0)
+; GFX8V5-NEXT:    s_load_dwordx2 s[0:1], s[6:7], 0xd0
+; GFX8V5-NEXT:    s_add_u32 s2, s6, 8
+; GFX8V5-NEXT:    s_addc_u32 s3, s7, 0
+; GFX8V5-NEXT:    v_mov_b32_e32 v2, s8
+; GFX8V5-NEXT:    v_mov_b32_e32 v3, s9
+; GFX8V5-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX8V5-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX8V5-NEXT:    v_mov_b32_e32 v1, s1
 ; GFX8V5-NEXT:    flat_load_ubyte v0, v[0:1] glc
 ; GFX8V5-NEXT:    s_waitcnt vmcnt(0)
+; GFX8V5-NEXT:    v_mov_b32_e32 v0, s2
+; GFX8V5-NEXT:    v_mov_b32_e32 v1, s3
+; GFX8V5-NEXT:    flat_load_ubyte v0, v[0:1] glc
+; GFX8V5-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8V5-NEXT:    v_mov_b32_e32 v0, s4
 ; GFX8V5-NEXT:    v_mov_b32_e32 v1, s5
 ; GFX8V5-NEXT:    flat_load_ubyte v0, v[0:1] glc
 ; GFX8V5-NEXT:    s_load_dwordx2 s[0:1], s[6:7], 0x0
-; GFX8V5-NEXT:    v_mov_b32_e32 v2, s8
-; GFX8V5-NEXT:    v_mov_b32_e32 v3, s9
 ; GFX8V5-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; GFX8V5-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX8V5-NEXT:    v_mov_b32_e32 v1, s1
@@ -327,16 +331,18 @@ define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr)  {
 ;
 ; GFX9V5-LABEL: llvm_amdgcn_queue_ptr:
 ; GFX9V5:       ; %bb.0:
+; GFX9V5-NEXT:    s_load_dwordx2 s[0:1], s[6:7], 0xd0
 ; GFX9V5-NEXT:    v_mov_b32_e32 v2, 0
+; GFX9V5-NEXT:    ; kill: killed $sgpr0_sgpr1
+; GFX9V5-NEXT:    ; kill: killed $sgpr4_sgpr5
+; GFX9V5-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9V5-NEXT:    global_load_ubyte v0, v2, s[0:1] glc
 ; GFX9V5-NEXT:    global_load_ubyte v0, v2, s[6:7] offset:8 glc
 ; GFX9V5-NEXT:    global_load_ubyte v0, v2, s[4:5] glc
-; GFX9V5-NEXT:    ; kill: killed $sgpr0_sgpr1
 ; GFX9V5-NEXT:    s_load_dwordx2 s[0:1], s[6:7], 0x0
 ; GFX9V5-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9V5-NEXT:    v_mov_b32_e32 v0, s8
 ; GFX9V5-NEXT:    v_mov_b32_e32 v1, s9
-; GFX9V5-NEXT:    ; kill: killed $sgpr4_sgpr5
 ; GFX9V5-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9V5-NEXT:    global_store_dwordx2 v2, v[0:1], s[0:1]
 ; GFX9V5-NEXT:    s_waitcnt vmcnt(0)

arsenm

I think the intrinsic should be interpreted as the user SGPR value. There is already a representation for load from implicit argument pointer, and we don't need 2 ways of doing that.

shiltian · 2024-08-14T12:28:09Z

There is already a representation for load from implicit argument pointer, and we don't need 2 ways of doing that.

Which is the representation for loading queue ptr from implicit kernel argument?

arsenm · 2024-08-14T16:15:40Z

There is already a representation for load from implicit argument pointer, and we don't need 2 ways of doing that.

Which is the representation for loading queue ptr from implicit kernel argument?

Call the implicitarg ptr intrinsic and load from that offset

shiltian · 2024-08-14T16:44:26Z

That is an indirect way doing that.

What this patch is doing can allow us not to set .hidden_queue_ptr unnecessarily. Currently we set it when the function attribute amdgpu-no-queue-ptr is absent even for GFX9+ with COV5+. Since we already handle the use of queue pointer for aperture base and trap handling correctly based on COV, it is supposed to safely not set it. However, we still do it. That is probably based on the assumption that a function can call the intrinsic, since we lower it to read the SGPR in any case. If we can lower it to implicit kernel argument for COV5+, we can safely drop .hidden_queue_ptr for COV5+.

kerbowa · 2024-08-15T17:24:35Z

That is an indirect way doing that.

What this patch is doing can allow us not to set .hidden_queue_ptr unnecessarily. Currently we set it when the function attribute amdgpu-no-queue-ptr is absent even for GFX9+ with COV5+. Since we already handle the use of queue pointer for aperture base and trap handling correctly based on COV, it is supposed to safely not set it. However, we still do it. That is probably based on the assumption that a function can call the intrinsic, since we lower it to read the SGPR in any case. If we can lower it to implicit kernel argument for COV5+, we can safely drop .hidden_queue_ptr for COV5+.

What do you mean by .hidden_queue_ptr, the preloaded SGPR pair? That should be set with the KD field ENABLE_SGPR_QUEUE_PTR.

Currently the lowering for this is totally broken even without this patch. There is an explicit check for COV5 for whether to allocate SGPRs for the queue_ptr and we are not allocating anything. So we load from some random incorrect SGPRs as far as I can tell.

kerbowa

It looks like the intention of @changpeng's COv5 patches was that we should never allocate or access the user SGPR queue_ptr.

llvm-project/llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Line 2413 in f3ca3ef

AMDGPU::getAMDHSACodeObjectVersion(*M) < AMDGPU::AMDHSA_COV5)

Otherwise if we go with Matt's suggestion to make the intrinsic mean the preloaded SGPR then the allocation needs to be fixed.

changpeng · 2024-08-15T19:55:13Z

It looks like the intention of @changpeng's COv5 patches was that we should never allocate or access the user SGPR queue_ptr.

llvm-project/llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Line 2413 in f3ca3ef

AMDGPU::getAMDHSACodeObjectVersion(*M) < AMDGPU::AMDHSA_COV5)

Otherwise if we go with Matt's suggestion to make the intrinsic mean the preloaded SGPR then the allocation needs to be fixed.

Right. We should only use the implicitarg field for queue_ptr for cov5. I do remember I have to update many LIT tests due to less sgpr usage.

arsenm · 2024-08-16T12:05:03Z

That is an indirect way doing that.

Yes, that's my point, it's redundant.

What this patch is doing can allow us not to set .hidden_queue_ptr unnecessarily. Currently we set it when the function attribute amdgpu-no-queue-ptr is absent even for GFX9+ with COV5+.

Right. We should only use the implicitarg field for queue_ptr for cov5. I do remember I have to update many LIT tests due to less sgpr usage.

But you still can use the queue pointer, it's not gone. The logic should be able to differentiate should not use queue pointer and queue pointer does not exist

shiltian · 2024-08-16T14:05:52Z

But you still can use the queue pointer, it's not gone. The logic should be able to differentiate should not use queue pointer and queue pointer does not exist

But the thing is, the two SGPRs are never set in COV5. If you still want to use them, you basically have to revert those patches that @changpeng has done to disable allocation of the two SGPRs completely in COV5.

changpeng · 2024-08-16T16:47:38Z

But the thing is, the two SGPRs are never set in COV5. If you still want to use them, you basically have to revert those patches that @changpeng has done to disable allocation of the two SGPRs completely in COV5.

COV5 introduced a queue_ptr field in implicit kernarg, and there is no reason to allocate SGPRs. For gfx9+, the doc implies that queue_ptr field does not exist. However, the intrinsic has to return the queue_ptr. (My first impression is that the intrinsic is no longer needed, and should never be invoked). Suppose we have to keep this intrinsic, I would suggest we update the doc to say so that queue_ptr field is still needed when intrinsic call exists.

arsenm · 2024-08-16T19:10:57Z

But the thing is, the two SGPRs are never set in COV5. If you still want to use them, you basically have to revert those patches that @changpeng has done to disable allocation of the two SGPRs completely in COV5.

Yes. It is still possible to use the queue pointer. You could still want to use the queue pointer as a preloaded argument.

shiltian · 2024-08-16T21:20:45Z

Let me try to make things clear here. The current situation (without this patch) is:

The intrinsic is lowered to read s[6:7], no matter what GFX and COV. This assumes s[6:7] has the value, which means the kernel descriptor needs to be set, no matter what GFX and COV.
@changpeng's previous patches removed the setup of the kernel descriptor for COV5+. This means if the builtin/intrinsic is used, it is completely broken right now.

Given this, there are two approaches to fix this.

Change the lowering of the intrinsic to make it align with @changpeng's patches, that lower the intrinsic to reading from implicit kernel argument. This is what this patch is trying to do.
No matter what COV is, s[6:7] needs to be set, as long as the builtin/intrinsic is used (explicitly or implicitly). This is what @arsenm is proposing.

Let's compare the two approaches.

For 1, we can safely assume s[6:7] will never need to be set up for queue pointer for COV5+, no matter of the existence of the function attribute amdgpu-no-queue-ptr, which means no matter whether the optimization is on or off, and no matter whether the optimization can add the function attribute (such as an indirect function call that can't be resolved at compile time), we always have the two SGPRs available for other use.

For 2, we need to revert some changes by @changpeng that set the kernel descriptor if the intrinsic is used explicitly or implicitly. By "implicitly" I mean when at compile time we can't resolve an indirect function call such that we need to assume the intrinsic might be used. This means, even for COV5+, if the optimization is off, or the optimization can't work as expected, we lose the two SGPRs, no matter whether they are really used or not.

IMHO, I don't see the win of 2. It might follow the "convention" that an intrinsic needs to read from registers, but we have so many intrinsics that don't read from register. The reason it read from register in the past is there is no other way.

Did I misunderstand anything here? @arsenm @kerbowa @changpeng

arsenm · 2024-08-19T16:43:54Z

My expectation is for cov5, llvm.amdgcn.queue.ptr would not be used in the first place. If it were used, I would expect it to enable the same user SGPR input as it always did. Lowering it to the load from implicit argument is a strange midpoint that should not occur in real code.

IMHO, I don't see the win of 2

Multiple representations of the same thing are bad. When we detect uses of implicit argument pointers, now we have an extra special case to consider.

So I would just revert the changes that broke the queue pointer intrinsic.

…ernel argument if feasible

shiltian · 2024-09-03T03:51:42Z

I'm still working on updating test cases. This is WIP.

arsenm

lgtm with the tests all updated

shiltian · 2024-10-15T17:01:25Z

Close this one. Will do it in #112403.

shiltian marked this pull request as ready for review August 14, 2024 00:41

shiltian requested review from arsenm and changpeng and removed request for arsenm August 14, 2024 00:42

shiltian requested a review from arsenm August 14, 2024 00:42

llvmbot added the backend:AMDGPU label Aug 14, 2024

shiltian requested a review from kerbowa August 14, 2024 00:42

shiltian force-pushed the users/shiltian/lower-queue-ptr-to-implicit-kernel-arg branch from be06d03 to f3ca3ef Compare August 14, 2024 03:45

llvmbot added the llvm:globalisel label Aug 14, 2024

arsenm reviewed Aug 14, 2024

View reviewed changes

kerbowa reviewed Aug 15, 2024

View reviewed changes

[AMDGPU] Lower llvm.amdgcn.queue.ptr instrinsic to using implicit k…

4f5ef5f

…ernel argument if feasible

shiltian force-pushed the users/shiltian/lower-queue-ptr-to-implicit-kernel-arg branch from f3ca3ef to 4f5ef5f Compare September 3, 2024 03:50

llvmbot added the llvm:transforms label Sep 3, 2024

shiltian changed the title ~~[AMDGPU] Lower llvm.amdgcn.queue.ptr instrinsic to using implicit kernel argument if feasible~~ [AMDGPU] Still set up s[6:7] for COV5 if queue ptr is needed Sep 3, 2024

shiltian changed the title ~~[AMDGPU] Still set up s[6:7] for COV5 if queue ptr is needed~~ [AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 Sep 3, 2024

arsenm approved these changes Sep 15, 2024

View reviewed changes

shiltian closed this Oct 15, 2024

shiltian deleted the users/shiltian/lower-queue-ptr-to-implicit-kernel-arg branch October 15, 2024 17:01

[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 #103490

[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 #103490

Uh oh!

Conversation

shiltian commented Aug 14, 2024

Uh oh!

shiltian commented Aug 14, 2024

Uh oh!

llvmbot commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

shiltian commented Aug 14, 2024

Uh oh!

arsenm commented Aug 14, 2024

Uh oh!

shiltian commented Aug 14, 2024

Uh oh!

kerbowa commented Aug 15, 2024

Uh oh!

kerbowa left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

changpeng commented Aug 15, 2024

Uh oh!

arsenm commented Aug 16, 2024

Uh oh!

shiltian commented Aug 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changpeng commented Aug 16, 2024 • edited by shiltian Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Aug 16, 2024

Uh oh!

shiltian commented Aug 16, 2024

Uh oh!

arsenm commented Aug 19, 2024

Uh oh!

shiltian commented Sep 3, 2024

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

shiltian commented Oct 15, 2024

Uh oh!

Uh oh!

llvmbot commented Aug 14, 2024 •

edited

Loading

kerbowa left a comment •

edited

Loading

shiltian commented Aug 16, 2024 •

edited

Loading

changpeng commented Aug 16, 2024 •

edited by shiltian

Loading