[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels #87695

jhuber6 · 2024-04-04T20:12:21Z

Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known

llvmbot · 2024-04-04T20:12:52Z

@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known

Full diff: https://github.com/llvm/llvm-project/pull/87695.diff

2 Files Affected:

(added) clang/test/OpenMP/thread_limit_amdgpu.c (+34)
(modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+3)

diff --git a/clang/test/OpenMP/thread_limit_amdgpu.c b/clang/test/OpenMP/thread_limit_amdgpu.c
new file mode 100644
index 00000000000000..f884eeb73c3ff1
--- /dev/null
+++ b/clang/test/OpenMP/thread_limit_amdgpu.c
@@ -0,0 +1,34 @@
+// Test target codegen - host bc file has to be created first.
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-linux-gnu -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-x86-host.bc
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - | FileCheck %s
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+void foo(int N) {
+#pragma omp target teams distribute parallel for simd
+  for (int i = 0; i < N; ++i)
+    ;
+#pragma omp target teams distribute parallel for simd thread_limit(4)
+  for (int i = 0; i < N; ++i)
+    ;
+#pragma omp target teams distribute parallel for simd ompx_attribute(__attribute__((launch_bounds(42, 42))))
+  for (int i = 0; i < N; ++i)
+    ;
+#pragma omp target teams distribute parallel for simd ompx_attribute(__attribute__((launch_bounds(42, 42)))) num_threads(22)
+  for (int i = 0; i < N; ++i)
+    ;
+}
+
+#endif
+
+// CHECK: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l10({{.*}}) #[[ATTR1:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l13({{.*}}) #[[ATTR2:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l16({{.*}}) #[[ATTR3:.+]] {
+// CHECK: define weak_odr protected amdgpu_kernel void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+__Z3fooi_}}l19({{.*}}) #[[ATTR4:.+]] {
+
+// CHECK: attributes #[[ATTR1]] = { {{.*}} "amdgpu-flat-work-group-size"="1,256" {{.*}} }
+// CHECK: attributes #[[ATTR2]] = { {{.*}} "amdgpu-flat-work-group-size"="1,4" {{.*}} }
+// CHECK: attributes #[[ATTR3]] = { {{.*}} "amdgpu-flat-work-group-size"="1,42" "amdgpu-max-num-workgroups"="42,1,1"{{.*}} }
+// CHECK: attributes #[[ATTR4]] = { {{.*}} "amdgpu-flat-work-group-size"="1,22" "amdgpu-max-num-workgroups"="42,1,1"{{.*}} }
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 16507a69ea8502..4fe44b10d1bd0e 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -4791,6 +4791,9 @@ void OpenMPIRBuilder::writeTeamsForKernel(const Triple &T, Function &Kernel,
       updateNVPTXMetadata(Kernel, "maxclusterrank", UB, true);
     updateNVPTXMetadata(Kernel, "minctasm", LB, false);
   }
+  if (T.isAMDGPU()) {
+    Kernel.addFnAttr("amdgpu-max-num-workgroups", llvm::utostr(LB) + ",1,1");
+  }
   Kernel.addFnAttr("omp_target_num_teams", std::to_string(LB));
 }

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp

shiltian

LG

Summary: This new attribute was introduced recently. We already do this for NVPTX kernels so we should apply this for AMDGPU as well. This patch simply applies this metadata in cases where a lower bound is known

arsenm · 2024-04-06T19:23:51Z

Note this attribute doesn't actually do anything yet. @jwanggit86 are you working on implementing the propagation and optimizations with this?

jhuber6 requested review from arsenm, jdoerfert, JonChesterfield and shiltian April 4, 2024 20:12

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU flang:openmp clang:openmp OpenMP related changes to Clang labels Apr 4, 2024

shiltian reviewed Apr 4, 2024

View reviewed changes

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp Outdated Show resolved Hide resolved

shiltian approved these changes Apr 4, 2024

View reviewed changes

[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels

1738c7f

Summary: This new attribute was introduced recently. We already do this for NVPTX kernels so we should apply this for AMDGPU as well. This patch simply applies this metadata in cases where a lower bound is known

jhuber6 force-pushed the MaxThreads branch from f4710ba to 1738c7f Compare April 4, 2024 20:55

jhuber6 merged commit 2650375 into llvm:main Apr 5, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels #87695

[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels #87695

jhuber6 commented Apr 4, 2024

llvmbot commented Apr 4, 2024 •

edited

shiltian left a comment

arsenm commented Apr 6, 2024

[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels #87695

[OpenMP] Add amdgpu-num-work-groups attribute to OpenMP kernels #87695

Conversation

jhuber6 commented Apr 4, 2024

llvmbot commented Apr 4, 2024 • edited

shiltian left a comment

Choose a reason for hiding this comment

arsenm commented Apr 6, 2024

llvmbot commented Apr 4, 2024 •

edited