Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP] Support 'omp_get_num_procs' on the device #65501

Merged
merged 1 commit into from
Sep 6, 2023

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Sep 6, 2023

Summary:
The omp_get_num_procs() function should return the amount of
parallelism availible. On the GPU, this was not defined. We have elected
to define this function as the maximum amount of wavefronts / warps that
can be simultaneously resident on the device. For AMDGPU this is the
number of CUs multiplied byth CU's per wave. For NVPTX this is the
maximum threads per SM divided by the warp size and multiplied by the
number of SMs.

Summary:
The `omp_get_num_procs()` function should return the amount of
parallelism availible. On the GPU, this was not defined. We have elected
to define this function as the maximum amount of wavefronts / warps that
can be simultaneously resident on the device. For AMDGPU this is the
number of CUs multiplied byth CU's per wave. For NVPTX this is the
maximum threads per SM divided by the warp size and multiplied by the
number of SMs.
@jhuber6 jhuber6 requested a review from a team as a code owner September 6, 2023 16:59
/// NVIDIA returns the product of the SM count and the number of warps that
/// fit if the maximum number of threads were scheduled on each SM.
uint64_t getHardwareParallelism() const override {
return HardwareParallelism;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this value set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is borrowing it from a previous patch that was added for the RPC support. It's currently set at line 309.

@@ -0,0 +1,15 @@
// RUN: %libomptarget-compile-run-and-check-generic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to require certain targets otherwise the test will fail since by default it returns 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For x86 offloading this should use libomp.so's implementation, which should be supported.

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@jhuber6 jhuber6 merged commit 460840c into llvm:main Sep 6, 2023
2 checks passed
@jhuber6 jhuber6 deleted the NumProcs branch September 7, 2023 17:40
avillega pushed a commit to avillega/llvm-project that referenced this pull request Sep 11, 2023
Summary:
The `omp_get_num_procs()` function should return the amount of
parallelism availible. On the GPU, this was not defined. We have elected
to define this function as the maximum amount of wavefronts / warps that
can be simultaneously resident on the device. For AMDGPU this is the
number of CUs multiplied byth CU's per wave. For NVPTX this is the
maximum threads per SM divided by the warp size and multiplied by the
number of SMs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants