[OpenMP] Remove 'minncta' attributes from NVPTX kernels #88398

jhuber6 · 2024-04-11T15:17:04Z

Summary:
Currently we treat this attribute as a minimum number for the amount of
blocks scheduled on the kernel. However, the doucmentation states that
this applies to CTA's mapped onto a single SM. Currently we just set
it to the total number of blocks, which will almost always result in a
warning that the value is out of range and will be ignored. We don't
have a good way to automatically know how many CTAs can be put on a
single SM nor if we should do this, so we should probably leave this up
to users manually adding it.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm

Summary: Currently we treat this attribute as a minimum number for the amount of blocks scheduled on the kernel. However, the doucmentation states that this applies to CTA's mapped onto a *single* SM. Currently we just set it to the total number of blocks, which will almost always result in a warning that the value is out of range and will be ignored. We don't have a good way to automatically know how many CTAs can be put on a single SM nor if we should do this, so we should probably leave this up to users manually adding it. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm

llvmbot · 2024-04-11T15:17:35Z

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently we treat this attribute as a minimum number for the amount of
blocks scheduled on the kernel. However, the doucmentation states that
this applies to CTA's mapped onto a single SM. Currently we just set
it to the total number of blocks, which will almost always result in a
warning that the value is out of range and will be ignored. We don't
have a good way to automatically know how many CTAs can be put on a
single SM nor if we should do this, so we should probably leave this up
to users manually adding it.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm

Full diff: https://github.com/llvm/llvm-project/pull/88398.diff

2 Files Affected:

(modified) clang/test/OpenMP/ompx_attributes_codegen.cpp (+1-2)
(modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+1-3)

diff --git a/clang/test/OpenMP/ompx_attributes_codegen.cpp b/clang/test/OpenMP/ompx_attributes_codegen.cpp
index 6735972c6b1070..87eb2913537ba5 100644
--- a/clang/test/OpenMP/ompx_attributes_codegen.cpp
+++ b/clang/test/OpenMP/ompx_attributes_codegen.cpp
@@ -36,6 +36,5 @@ void func() {
 // NVIDIA: "omp_target_thread_limit"="45"
 // NVIDIA: "omp_target_thread_limit"="17"
 // NVIDIA: !{ptr @__omp_offloading[[HASH1:.*]]_l16, !"maxntidx", i32 20}
-// NVIDIA: !{ptr @__omp_offloading[[HASH2:.*]]_l18, !"minctasm", i32 90}
-// NVIDIA: !{ptr @__omp_offloading[[HASH2]]_l18, !"maxntidx", i32 45}
+// NVIDIA: !{ptr @__omp_offloading[[HASH2:.*]]_l18, !"maxntidx", i32 45}
 // NVIDIA: !{ptr @__omp_offloading[[HASH3:.*]]_l20, !"maxntidx", i32 17}
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 7fd8474c2ec890..4d2d352f7520b2 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -4786,11 +4786,9 @@ OpenMPIRBuilder::readTeamBoundsForKernel(const Triple &, Function &Kernel) {
 
 void OpenMPIRBuilder::writeTeamsForKernel(const Triple &T, Function &Kernel,
                                           int32_t LB, int32_t UB) {
-  if (T.isNVPTX()) {
+  if (T.isNVPTX())
     if (UB > 0)
       updateNVPTXMetadata(Kernel, "maxclusterrank", UB, true);
-    updateNVPTXMetadata(Kernel, "minctasm", LB, false);
-  }
   if (T.isAMDGPU())
     Kernel.addFnAttr("amdgpu-max-num-workgroups", llvm::utostr(LB) + ",1,1");

jdoerfert · 2024-04-15T20:27:01Z

LG, I misread the existing code in clang.

jdoerfert

LG

Summary: Currently we treat this attribute as a minimum number for the amount of blocks scheduled on the kernel. However, the doucmentation states that this applies to CTA's mapped onto a *single* SM. Currently we just set it to the total number of blocks, which will almost always result in a warning that the value is out of range and will be ignored. We don't have a good way to automatically know how many CTAs can be put on a single SM nor if we should do this, so we should probably leave this up to users manually adding it. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm

jhuber6 requested review from arsenm, Artem-B, jdoerfert, Meinersbur and shiltian April 11, 2024 15:17

llvmbot added clang Clang issues not falling into any other category flang:openmp clang:openmp OpenMP related changes to Clang labels Apr 11, 2024

arsenm added the backend:NVPTX label Apr 12, 2024

jdoerfert approved these changes Apr 15, 2024

View reviewed changes

jhuber6 merged commit 0287a5c into llvm:main Apr 15, 2024

jhuber6 deleted the RemoveCTA branch April 15, 2024 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OpenMP] Remove 'minncta' attributes from NVPTX kernels #88398

[OpenMP] Remove 'minncta' attributes from NVPTX kernels #88398

Uh oh!

jhuber6 commented Apr 11, 2024

Uh oh!

llvmbot commented Apr 11, 2024 •

edited

Loading

Uh oh!

jdoerfert commented Apr 15, 2024

Uh oh!

jdoerfert left a comment

Uh oh!

Uh oh!

[OpenMP] Remove 'minncta' attributes from NVPTX kernels #88398

[OpenMP] Remove 'minncta' attributes from NVPTX kernels #88398

Uh oh!

Conversation

jhuber6 commented Apr 11, 2024

Uh oh!

llvmbot commented Apr 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdoerfert commented Apr 15, 2024

Uh oh!

jdoerfert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Apr 11, 2024 •

edited

Loading