-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMPTarget hierarchical #3411
Conversation
…al parallel_reduce for + operator, except for ThreadVectorRange.
Add to whitelist |
test.run(8, 16); | ||
test.run(11, 13); | ||
|
||
// OpenMPTarget backend only accepts >= 32 threads per team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be a multiple of 32 though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it does not need to be a multiple of 32.
Actually I just realized that the clang compiler actually does (32+num_threads-requested-per-team) .
So essentially even if I assign team_size = 32, the actual block generated on the NVIDIA GPU is for 64 threads per team.
@@ -50,6 +50,442 @@ | |||
|
|||
#include <Kokkos_Atomic.hpp> | |||
|
|||
//---------------------------------------------------------------------------- | |||
//---------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please say so in the description of the PR or the commit message when you are moving code with no edit.
I had to diff it with the code removed from core/src/OpenMPTarget/Kokkos_OpenMPTarget_Parallel.hpp
to find it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry about this, even @crtrott pointed this out. I will be careful next time.
// Minimum team size should be 32 for OpenMPTarget backend. | ||
if (team_size_request < 32) { | ||
printf( | ||
"OpenMPTarget backend requires a minimum of 32 threads per team.\n"); | ||
exit(EXIT_FAILURE); | ||
} else | ||
m_team_size = team_size_request; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bit that matters.
Consider using one of the more standard way to raise error in Kokkos, Impl::throw_runtime_exception
or abort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be abort.
…tests to pass on Gen9. Removed TeamReductionScan test from unit tests.
Merge branch 'OpenMPTarget_Hierarchial' of github.com:rgayatri23/kokkos into OpenMPTarget_Hierarchial
…or OpenMPTarget backend.
…eam size of 32 for only the OpenMPTarget backend. Use long long int instead of int64_t in incremental test11* to be in sync with the SYCL backend.
…sed in case the team size is less than 32.
A newer PR for Hierarchial parallelism in OpenMPTarget backend.
Passes the first 11 incremental tests.
Kokkos league level maps to teams distribute,
Thread level maps to parallel for and
Vector level maps to simd
Reduction only works for "+" operator.