[OpenMP] Support 'omp_get_num_procs' on the device #65501

jhuber6 · 2023-09-06T16:59:32Z

Summary:
The omp_get_num_procs() function should return the amount of
parallelism availible. On the GPU, this was not defined. We have elected
to define this function as the maximum amount of wavefronts / warps that
can be simultaneously resident on the device. For AMDGPU this is the
number of CUs multiplied byth CU's per wave. For NVPTX this is the
maximum threads per SM divided by the warp size and multiplied by the
number of SMs.

Summary: The `omp_get_num_procs()` function should return the amount of parallelism availible. On the GPU, this was not defined. We have elected to define this function as the maximum amount of wavefronts / warps that can be simultaneously resident on the device. For AMDGPU this is the number of CUs multiplied byth CU's per wave. For NVPTX this is the maximum threads per SM divided by the warp size and multiplied by the number of SMs.

shiltian · 2023-09-06T18:19:01Z

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp

+  /// NVIDIA returns the product of the SM count and the number of warps that
+  /// fit if the maximum number of threads were scheduled on each SM.
+  uint64_t getHardwareParallelism() const override {
+    return HardwareParallelism;


Where is this value set?

This is borrowing it from a previous patch that was added for the RPC support. It's currently set at line 309.

shiltian · 2023-09-06T18:30:48Z

openmp/libomptarget/test/api/omp_get_num_procs.c

@@ -0,0 +1,15 @@
+// RUN: %libomptarget-compile-run-and-check-generic


You might want to require certain targets otherwise the test will fail since by default it returns 0.

For x86 offloading this should use libomp.so's implementation, which should be supported.

shiltian

LG

Summary: The `omp_get_num_procs()` function should return the amount of parallelism availible. On the GPU, this was not defined. We have elected to define this function as the maximum amount of wavefronts / warps that can be simultaneously resident on the device. For AMDGPU this is the number of CUs multiplied byth CU's per wave. For NVPTX this is the maximum threads per SM divided by the warp size and multiplied by the number of SMs.

jhuber6 requested a review from a team as a code owner September 6, 2023 16:59

jhuber6 requested review from jdoerfert and shiltian September 6, 2023 17:00

shiltian reviewed Sep 6, 2023

View reviewed changes

shiltian approved these changes Sep 6, 2023

View reviewed changes

jhuber6 merged commit 460840c into llvm:main Sep 6, 2023
2 checks passed

jhuber6 deleted the NumProcs branch September 7, 2023 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenMP] Support 'omp_get_num_procs' on the device #65501

[OpenMP] Support 'omp_get_num_procs' on the device #65501

jhuber6 commented Sep 6, 2023

shiltian Sep 6, 2023

jhuber6 Sep 6, 2023

shiltian Sep 6, 2023

jhuber6 Sep 6, 2023

shiltian left a comment

		@@ -0,0 +1,15 @@
		// RUN: %libomptarget-compile-run-and-check-generic

[OpenMP] Support 'omp_get_num_procs' on the device #65501

[OpenMP] Support 'omp_get_num_procs' on the device #65501

Conversation

jhuber6 commented Sep 6, 2023

shiltian Sep 6, 2023

Choose a reason for hiding this comment

jhuber6 Sep 6, 2023

Choose a reason for hiding this comment

shiltian Sep 6, 2023

Choose a reason for hiding this comment

jhuber6 Sep 6, 2023

Choose a reason for hiding this comment

shiltian left a comment

Choose a reason for hiding this comment