Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 #65273

Merged
merged 1 commit into from
Jan 4, 2024

Conversation

skc7
Copy link
Contributor

@skc7 skc7 commented Sep 4, 2023

"hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a function uses dynamic LDS.

hidden argument will be added in below cases:

  • LDS global is used in the kernel.
  • Kernel calls a function which uses LDS global.
  • LDS pointer is passed as argument to kernel itself.

@skc7 skc7 requested review from arsenm and saiislam September 4, 2023 16:06
Copy link
Contributor

@saiislam saiislam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm
Copy link
Contributor

arsenm commented Sep 4, 2023

I thought the plan was to not bother with metadata

@skc7
Copy link
Contributor Author

skc7 commented Sep 5, 2023

Can you please add this kernarg in OpenMP AMDPGU plugin as well?

See assignment of kernargs before kernel launch, groupsegmentsize might refer to the LDS size and ImplictArgs struct definition

Made the changes. please review : 65325

@skc7
Copy link
Contributor Author

skc7 commented Sep 12, 2023

I thought the plan was to not bother with metadata

@arsenm Currently HIP runtime requires kernel arg metadata to be present for it to fill the dynamic LDS size value in the argument. Please review. Thanks.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 25, 2023

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-backend-amdgpu

Changes

"hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout.


Full diff: https://github.com/llvm/llvm-project/pull/65273.diff

4 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+3)
  • (modified) llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp (+1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp (+4-1)
  • (modified) llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v5.ll (+4-1)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 8022816d7e616d3..a48f25d6c8070b1 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -4024,6 +4024,9 @@ Code object V5 metadata is the same as
                                                        buffer that conforms to the requirements of the malloc/free
                                                        device library V1 version implementation.
 
+                                                     "hidden_dynamic_lds_size"
+                                                       Size of the dynamically allocated LDS memory is passed in the kernarg.
+
                                                      "hidden_private_base"
                                                        The high 32 bits of the flat addressing private aperture base.
                                                        Only used by GFX8 to allow conversion between private segment
diff --git a/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp b/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp
index 35a79ec04b6e767..f94940eecae20d9 100644
--- a/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp
+++ b/llvm/lib/BinaryFormat/AMDGPUMetadataVerifier.cpp
@@ -135,6 +135,7 @@ bool MetadataVerifier::verifyKernelArgs(msgpack::DocNode &Node) {
                                .Case("hidden_default_queue", true)
                                .Case("hidden_completion_action", true)
                                .Case("hidden_multigrid_sync_arg", true)
+                               .Case("hidden_dynamic_lds_size", true)
                                .Case("hidden_private_base", true)
                                .Case("hidden_shared_base", true)
                                .Case("hidden_queue_ptr", true)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
index 5060cd3aec581ce..fe27774776d450e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp
@@ -663,7 +663,10 @@ void MetadataStreamerMsgPackV5::emitHiddenKernelArgs(
     Offset += 8; // Skipped.
   }
 
-  Offset += 72; // Reserved.
+  // emit argument for hidden dynamic lds size
+  emitKernelArg(DL, Int32Ty, Align(4), "hidden_dynamic_lds_size", Offset, Args);
+
+  Offset += 68; // Reserved.
 
   // hidden_private_base and hidden_shared_base are only when the subtarget has
   // ApertureRegs.
diff --git a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v5.ll b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v5.ll
index cb30d668674c316..690a0b02a5ff754 100644
--- a/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v5.ll
+++ b/llvm/test/CodeGen/AMDGPU/hsa-metadata-hidden-args-v5.ll
@@ -81,13 +81,16 @@
 ; CHECK-NEXT:      - .offset:         136
 ; CHECK-NEXT:        .size:           8
 ; CHECK-NEXT:        .value_kind:     hidden_completion_action
+; CHECK:          - .offset:          144
+; CHECK-NEXT:        .size:           4
+; CHECK-NEXT:        .value_kind:     hidden_dynamic_lds_size
 ; GFX8-NEXT:      - .offset:         216
 ; GFX8-NEXT:        .size:           4
 ; GFX8-NEXT:        .value_kind:     hidden_private_base
 ; GFX8-NEXT:      - .offset:         220
 ; GFX8-NEXT:        .size:           4
 ; GFX8-NEXT:        .value_kind:     hidden_shared_base
-; CHECK:      - .offset:         224
+; CHECK:          - .offset:          224
 ; CHECK-NEXT:        .size:           8
 ; CHECK-NEXT:        .value_kind:     hidden_queue_ptr
 

@@ -663,7 +663,10 @@ void MetadataStreamerMsgPackV5::emitHiddenKernelArgs(
Offset += 8; // Skipped.
}

Offset += 72; // Reserved.
// emit argument for hidden dynamic lds size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should skip if dynamic LDS isn't used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime will always have the dynamic lds size passed in via hipLaunchKernelGGL(zero by default). So, it can fill this argument always. Also, there is no attribute present which I can find, which can be used to check if dynamic LDS is used in the function or not. Please let me know if we really need to skip adding this argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterate over all the globals used by function and check that AMDGPU::isDynamicLDS(GV) is true for any global. This could be one way to check if "hidden_dynamic_lds_size" kernel arg needs to be emitted or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, extended AMDGPUMachineFunction to enable a "UsesDynamicLDS" boolean flag. This is used to check if kernel uses dynamic LDS and also to decide to emit "dynamic_lds_size" hidden kernel arg. Please review.

@skc7 skc7 force-pushed the skc7/cov5_dyn_lds branch 5 times, most recently from 5e2487d to a443ef3 Compare October 23, 2023 06:43
@@ -663,7 +664,15 @@ void MetadataStreamerMsgPackV5::emitHiddenKernelArgs(
Offset += 8; // Skipped.
}

Offset += 72; // Reserved.
// emit argument for hidden dynamic lds size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize. Also doesn't seem worth iterating over all globals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could find only this approach to check if dynamic LDS is used by kernel. Please let me know if there is any other way?

@skc7 skc7 force-pushed the skc7/cov5_dyn_lds branch 2 times, most recently from 7256803 to 62d1160 Compare October 30, 2023 10:52
@github-actions
Copy link

github-actions bot commented Oct 30, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@skc7 skc7 force-pushed the skc7/cov5_dyn_lds branch 2 times, most recently from 77eb062 to eb141cf Compare October 30, 2023 11:08
@arsenm
Copy link
Contributor

arsenm commented Nov 3, 2023

Can we read this from the LDS_ALLOC register with getreg, rather than adding this?

@b-sumner
Copy link

b-sumner commented Nov 3, 2023

@arsenm I don't think so. We need the exact application provided size in bytes.

@skc7
Copy link
Contributor Author

skc7 commented Nov 8, 2023

@arsenm Please let me know if any other changes are required?

@arsenm
Copy link
Contributor

arsenm commented Nov 15, 2023

@arsenm Please let me know if any other changes are required?

Main thing I'm not sure of is the handling of dynamic LDS uses outside of kernels - we need the use to be visible in the parent kernels. I'd need to check how those are currently handled

@@ -6771,6 +6771,7 @@ SDValue SITargetLowering::LowerGlobalAddress(AMDGPUMachineFunction *MFI,
// Adjust alignment for that dynamic shared memory array.
Function &F = DAG.getMachineFunction().getFunction();
MFI->setDynLDSAlign(F, *cast<GlobalVariable>(GV));
MFI->setUsesDynamicLDS(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work if the dynamic LDS reference only appears in a called function and not in the parent kernel. Is that possible with the current module LDS lowering?

You also should set this if a kernel argument is an LDS pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestions @arsenm. Have updated the patch to identify below cases:

  • Kernel uses dynamic global -> UsesDynamicLDS flag will be set in SITargetLowering::LowerGlobalAddress.
  • Dyn lds globals (used in function and not present in parent kernel) are replaced with "llvm.amdgcn.kernel-name.dynlds" after module-lds-lowering pass. So, we can use the existing helper function "getKernelDynLDSGlobalFromFunction" from AMDGPUMachineFunction to identify if kernel uses dynamic LDS.
  • LDS passed as pointer argument to kernel -> Will be identified using utility hasLDSKernelArgument.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with nit

static const GlobalVariable *
getKernelDynLDSGlobalFromFunction(const Function &F) {
const Module *M = F.getParent();
std::string KernelDynLDSName = "llvm.amdgcn.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SmallString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to SmallString in latest commit.

…65273)

"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction
to identify if a function uses dynamic LDS.

hidden argument will be added in below cases:
 *LDS global is used in the kernel.
 *Kernel calls a function which uses LDS global.
 *LDS pointer is passed as argument to kernel itself.
@skc7 skc7 merged commit 9803de0 into llvm:main Jan 4, 2024
4 of 5 checks passed
skc7 added a commit that referenced this pull request Jan 6, 2024
#65273 "hidden_dynamic_lds_size" argument will be added in the reserved
section at offset 120 of the implicit argument layout
Add DynamicLdsSize to AMDGPUImplicitArgsTy struct at offset 120 and fill
the dynamic LDS size before kernel launch.
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
llvm#65273 "hidden_dynamic_lds_size" argument will be added in the reserved
section at offset 120 of the implicit argument layout
Add DynamicLdsSize to AMDGPUImplicitArgsTy struct at offset 120 and fill
the dynamic LDS size before kernel launch.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 21, 2024
…#65273)

"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a
function uses dynamic LDS.

hidden argument will be added in below cases:

- LDS global is used in the kernel.
- Kernel calls a function which uses LDS global.
- LDS pointer is passed as argument to kernel itself.

Change-Id: Idfa96ab8941b56277aaf35b416725a2522c85e61
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants