-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] Support 'omp_get_num_procs' on the device #65501
Conversation
Summary: The `omp_get_num_procs()` function should return the amount of parallelism availible. On the GPU, this was not defined. We have elected to define this function as the maximum amount of wavefronts / warps that can be simultaneously resident on the device. For AMDGPU this is the number of CUs multiplied byth CU's per wave. For NVPTX this is the maximum threads per SM divided by the warp size and multiplied by the number of SMs.
/// NVIDIA returns the product of the SM count and the number of warps that | ||
/// fit if the maximum number of threads were scheduled on each SM. | ||
uint64_t getHardwareParallelism() const override { | ||
return HardwareParallelism; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this value set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is borrowing it from a previous patch that was added for the RPC support. It's currently set at line 309.
@@ -0,0 +1,15 @@ | |||
// RUN: %libomptarget-compile-run-and-check-generic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to require certain targets otherwise the test will fail since by default it returns 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For x86 offloading this should use libomp.so
's implementation, which should be supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
Summary: The `omp_get_num_procs()` function should return the amount of parallelism availible. On the GPU, this was not defined. We have elected to define this function as the maximum amount of wavefronts / warps that can be simultaneously resident on the device. For AMDGPU this is the number of CUs multiplied byth CU's per wave. For NVPTX this is the maximum threads per SM divided by the warp size and multiplied by the number of SMs.
Summary:
The
omp_get_num_procs()
function should return the amount ofparallelism availible. On the GPU, this was not defined. We have elected
to define this function as the maximum amount of wavefronts / warps that
can be simultaneously resident on the device. For AMDGPU this is the
number of CUs multiplied byth CU's per wave. For NVPTX this is the
maximum threads per SM divided by the warp size and multiplied by the
number of SMs.