Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OMPIRBuilder] Do not call __kmpc_push_num_threads for device parallel #71934

Merged

Conversation

DominikAdamski
Copy link
Contributor

Function __kmpc_push_num_threads should be called only if we specify number of threads for host parallel region.

Number of threads specified by the user should be passed as one of arguments of __kmpc_parallel_51 function.

Function __kmpc_push_num_threads should be called only if
we specify number of threads for host parallel region.
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 10, 2023

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-flang-openmp

Author: Dominik Adamski (DominikAdamski)

Changes

Function __kmpc_push_num_threads should be called only if we specify number of threads for host parallel region.

Number of threads specified by the user should be passed as one of arguments of __kmpc_parallel_51 function.


Full diff: https://github.com/llvm/llvm-project/pull/71934.diff

2 Files Affected:

  • (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+3-2)
  • (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+21)
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index eb6c120b2a6dcbd..6be58d70648f4db 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -1305,8 +1305,9 @@ IRBuilder<>::InsertPoint OpenMPIRBuilder::createParallel(
   // function arguments are declared in zero address space
   bool ArgsInZeroAddressSpace = Config.isTargetDevice();
 
-  if (NumThreads) {
-    // Build call __kmpc_push_num_threads(&Ident, global_tid, num_threads)
+  // Build call __kmpc_push_num_threads(&Ident, global_tid, num_threads)
+  // only if we compile for host side.
+  if (NumThreads && !Config.isTargetDevice()) {
     Value *Args[] = {
         Ident, ThreadID,
         Builder.CreateIntCast(NumThreads, Int32, /*isSigned*/ false)};
diff --git a/mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir b/mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir
index 4457d7f4275260d..2628e42d533b50e 100644
--- a/mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir
@@ -17,6 +17,21 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memo
     }
   llvm.return
   }
+
+  llvm.func @_test_num_threads(%arg0: !llvm.ptr) attributes {omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)>, omp.outline_parent_name = "_QQmain"} {
+    %0 = omp.map_info var_ptr(%arg0 : !llvm.ptr, i32) map_clauses(from) capture(ByRef) -> !llvm.ptr {name = "d"}
+    omp.target map_entries(%0 -> %arg2 : !llvm.ptr) {
+    ^bb0(%arg2: !llvm.ptr):
+      %1 = llvm.mlir.constant(156 : i32) : i32
+      omp.parallel num_threads(%1 : i32) {
+        %2 = llvm.mlir.constant(1 : i32) : i32
+        llvm.store %2, %arg2 : i32, !llvm.ptr
+        omp.terminator
+      }
+    omp.terminator
+    }
+  llvm.return
+  }
 }
 
 // CHECK: define weak_odr protected amdgpu_kernel void [[FUNC0:@.*]](
@@ -43,3 +58,9 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memo
 // CHECK: define internal void [[FUNC1]](
 // CHECK-SAME: ptr noalias noundef [[TID_ADDR_ASCAST:%.*]], ptr noalias noundef [[ZERO_ADDR_ASCAST:%.*]], ptr [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
 
+// CHECK: define weak_odr protected amdgpu_kernel void [[FUNC_NUM_THREADS0:@.*]](
+// CHECK-NOT:     call void @__kmpc_push_num_threads(
+// CHECK:         call void @__kmpc_parallel_51(ptr addrspacecast (
+// CHECK-SAME:  ptr addrspace(1) @[[NUM_THREADS_GLOB:[0-9]+]] to ptr),
+// CHECK-SAME:  i32 [[NUM_THREADS_TMP0:%.*]], i32 1, i32 156,
+// CHECK-SAME:  i32 -1,  ptr [[FUNC_NUM_THREADS1:@.*]], ptr null, ptr [[NUM_THREADS_TMP1:%.*]], i64 1)

Copy link
Contributor

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dominik! LGTM.

@DominikAdamski DominikAdamski merged commit f2f5f1b into llvm:main Nov 10, 2023
7 checks passed
@DominikAdamski DominikAdamski deleted the omp_target_parallel_num_threads branch November 10, 2023 19:39
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
llvm#71934)

Function __kmpc_push_num_threads should be called only if we specify
number of threads for host parallel region.

Number of threads specified by the user should be passed as one of
arguments of __kmpc_parallel_51 function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants