-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 #65273
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add this kernarg in OpenMP AMDPGU plugin as well?
See assignment of kernargs before kernel launch, groupsegmentsize might refer to the LDS size and ImplictArgs struct definition
I thought the plan was to not bother with metadata |
Made the changes. please review : 65325 |
db5f43b
to
0c27434
Compare
@arsenm Currently HIP runtime requires kernel arg metadata to be present for it to fill the dynamic LDS size value in the argument. Please review. Thanks. |
0c27434
to
6a34b8e
Compare
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-backend-amdgpu Changes"hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout. Full diff: https://github.com/llvm/llvm-project/pull/65273.diff 4 Files Affected:
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 8022816d7e616d3..a48f25d6c8070b1 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -4024,6 +4024,9 @@ Code object V5 metadata is the same as
buffer that conforms to the requirements of the malloc/free
device library V1 version implementation.
+ "hidden_dynamic_lds_size"
+ Size of the dynamically allocated LDS memory is passed in the kernarg.
+
"hidden_private_base"
The high 32 bits of the flat addressing private aperture base.
Only used by GFX8 to allow conversion between private segment
diff --git a/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp b/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp
index 35a79ec04b6e767..f94940eecae20d9 100644
--- a/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp
+++ b/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp
@@ -135,6 +135,7 @@ bool MetadataVerifier::verifyKernelArgs(msgpack::DocNode &Node) {
.Case("hidden_default_queue", true)
.Case("hidden_completion_action", true)
.Case("hidden_multigrid_sync_arg", true)
+ .Case("hidden_dynamic_lds_size", true)
.Case("hidden_private_base", true)
.Case("hidden_shared_base", true)
.Case("hidden_queue_ptr", true)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
index 5060cd3aec581ce..fe27774776d450e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
@@ -663,7 +663,10 @@ void MetadataStreamerMsgPackV5::emitHiddenKernelArgs(
Offset += 8; // Skipped.
}
- Offset += 72; // Reserved.
+ // emit argument for hidden dynamic lds size
+ emitKernelArg(DL, Int32Ty, Align(4), "hidden_dynamic_lds_size", Offset, Args);
+
+ Offset += 68; // Reserved.
// hidden_private_base and hidden_shared_base are only when the subtarget has
// ApertureRegs.
diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v5.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v5.ll
index cb30d668674c316..690a0b02a5ff754 100644
--- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v5.ll
+++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v5.ll
@@ -81,13 +81,16 @@
; CHECK-NEXT: - .offset: 136
; CHECK-NEXT: .size: 8
; CHECK-NEXT: .value_kind: hidden_completion_action
+; CHECK: - .offset: 144
+; CHECK-NEXT: .size: 4
+; CHECK-NEXT: .value_kind: hidden_dynamic_lds_size
; GFX8-NEXT: - .offset: 216
; GFX8-NEXT: .size: 4
; GFX8-NEXT: .value_kind: hidden_private_base
; GFX8-NEXT: - .offset: 220
; GFX8-NEXT: .size: 4
; GFX8-NEXT: .value_kind: hidden_shared_base
-; CHECK: - .offset: 224
+; CHECK: - .offset: 224
; CHECK-NEXT: .size: 8
; CHECK-NEXT: .value_kind: hidden_queue_ptr
|
@@ -663,7 +663,10 @@ void MetadataStreamerMsgPackV5::emitHiddenKernelArgs( | |||
Offset += 8; // Skipped. | |||
} | |||
|
|||
Offset += 72; // Reserved. | |||
// emit argument for hidden dynamic lds size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should skip if dynamic LDS isn't used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime will always have the dynamic lds size passed in via hipLaunchKernelGGL(zero by default). So, it can fill this argument always. Also, there is no attribute present which I can find, which can be used to check if dynamic LDS is used in the function or not. Please let me know if we really need to skip adding this argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterate over all the globals used by function and check that AMDGPU::isDynamicLDS(GV) is true for any global. This could be one way to check if "hidden_dynamic_lds_size" kernel arg needs to be emitted or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, extended AMDGPUMachineFunction to enable a "UsesDynamicLDS" boolean flag. This is used to check if kernel uses dynamic LDS and also to decide to emit "dynamic_lds_size" hidden kernel arg. Please review.
5e2487d
to
a443ef3
Compare
@@ -663,7 +664,15 @@ void MetadataStreamerMsgPackV5::emitHiddenKernelArgs( | |||
Offset += 8; // Skipped. | |||
} | |||
|
|||
Offset += 72; // Reserved. | |||
// emit argument for hidden dynamic lds size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize. Also doesn't seem worth iterating over all globals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could find only this approach to check if dynamic LDS is used by kernel. Please let me know if there is any other way?
7256803
to
62d1160
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
77eb062
to
eb141cf
Compare
Can we read this from the LDS_ALLOC register with getreg, rather than adding this? |
@arsenm I don't think so. We need the exact application provided size in bytes. |
@arsenm Please let me know if any other changes are required? |
Main thing I'm not sure of is the handling of dynamic LDS uses outside of kernels - we need the use to be visible in the parent kernels. I'd need to check how those are currently handled |
@@ -6771,6 +6771,7 @@ SDValue SITargetLowering::LowerGlobalAddress(AMDGPUMachineFunction *MFI, | |||
// Adjust alignment for that dynamic shared memory array. | |||
Function &F = DAG.getMachineFunction().getFunction(); | |||
MFI->setDynLDSAlign(F, *cast<GlobalVariable>(GV)); | |||
MFI->setUsesDynamicLDS(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if the dynamic LDS reference only appears in a called function and not in the parent kernel. Is that possible with the current module LDS lowering?
You also should set this if a kernel argument is an LDS pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for suggestions @arsenm. Have updated the patch to identify below cases:
- Kernel uses dynamic global -> UsesDynamicLDS flag will be set in SITargetLowering::LowerGlobalAddress.
- Dyn lds globals (used in function and not present in parent kernel) are replaced with "llvm.amdgcn.kernel-name.dynlds" after module-lds-lowering pass. So, we can use the existing helper function "getKernelDynLDSGlobalFromFunction" from AMDGPUMachineFunction to identify if kernel uses dynamic LDS.
- LDS passed as pointer argument to kernel -> Will be identified using utility hasLDSKernelArgument.
eb141cf
to
2f46c67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with nit
static const GlobalVariable * | ||
getKernelDynLDSGlobalFromFunction(const Function &F) { | ||
const Module *M = F.getParent(); | ||
std::string KernelDynLDSName = "llvm.amdgcn."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SmallString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to SmallString in latest commit.
6e38950
to
11e9fe4
Compare
…65273) "hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout. Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a function uses dynamic LDS. hidden argument will be added in below cases: *LDS global is used in the kernel. *Kernel calls a function which uses LDS global. *LDS pointer is passed as argument to kernel itself.
11e9fe4
to
7ed197b
Compare
#65273 "hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout Add DynamicLdsSize to AMDGPUImplicitArgsTy struct at offset 120 and fill the dynamic LDS size before kernel launch.
llvm#65273 "hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout Add DynamicLdsSize to AMDGPUImplicitArgsTy struct at offset 120 and fill the dynamic LDS size before kernel launch.
…#65273) "hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout. Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a function uses dynamic LDS. hidden argument will be added in below cases: - LDS global is used in the kernel. - Kernel calls a function which uses LDS global. - LDS pointer is passed as argument to kernel itself. Change-Id: Idfa96ab8941b56277aaf35b416725a2522c85e61
"hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a function uses dynamic LDS.
hidden argument will be added in below cases: