-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unique token improvement #4741
Unique token improvement #4741
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason to not also do that for SYCL
?
bf3b861
to
e96ca20
Compare
@@ -45,9 +45,12 @@ | |||
#ifndef KOKKOS_HIP_UNIQUE_TOKEN_HPP | |||
#define KOKKOS_HIP_UNIQUE_TOKEN_HPP | |||
|
|||
#include <impl/Kokkos_ConcurrentBitset.hpp> | |||
#include <Kokkos_Macros.hpp> | |||
#ifdef KOKKOS_ENABLE_HIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed. If HIP is not enabled, CMake will ignore the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -59,8 +59,8 @@ enum class UniqueTokenScope : int { Instance, Global }; | |||
/// | |||
/// This object should behave like a ref-counted object, so that when the last | |||
/// instance is destroy resources are free if needed | |||
template <typename ExecutionSpace, | |||
UniqueTokenScope = UniqueTokenScope::Instance> | |||
template <typename ExecutionSpace = Kokkos::DefaultExecutionSpace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change? This is not in the description of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now ...
Everything else in Kokkos defaults the execution space. I think this should too.
77c4f85
to
4cd03d6
Compare
a2c6f31
to
1d1c395
Compare
5b8f587
to
8c53575
Compare
uses a int based lock array instead of the unordered_bitset
8c53575
to
b28d48e
Compare
@dalg24 @masterleinad finally passed!! |
UniqueToken(size_type max_size) { | ||
m_locks = Kokkos::View<uint32_t*, Kokkos::CudaSpace>( | ||
"Kokkos::UniqueToken::m_locks", max_size); | ||
} | ||
UniqueToken(size_type max_size, execution_space const& exec) { | ||
m_locks = Kokkos::View<uint32_t*, Kokkos::CudaSpace>( | ||
Kokkos::view_alloc(exec, "Kokkos::UniqueToken::m_locks"), max_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly prefer to define and call protected constructors here to avoid allocating memory for m_locks
twice like in https://github.com/kokkos/kokkos/pull/4748/files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean allocating twice?
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
It looks like |
Uses static lock array for global variant, and locally owned lock array for instance variant.
e76554b
to
6f13b1e
Compare
We should still be able to get rid of the now unused |
Retest this please |
|
I don't get how either of these guys is failing (OpenMPTarget and OpenMP for NVHPC), I didn't really touch that stuff at all did I? |
Retest this please |
We have seen that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we keep m_scratchConcurrentBitset
?
Another PR failing UNiqueTOken in OpenMP here: https://cloud.cees.ornl.gov/jenkins-ci/blue/organizations/jenkins/Kokkos/detail/Kokkos/8028/pipeline/52
|
Failure is unrelated (Clang crashing when compiling Kokkos::complex unit test, sigh) |
Align interface with other Kokkos interfaces which default the execution space.
This also removes the usage of bitset in favor of using straight up int arrays as locks for UniqueToken in HIP and CUDA.
As a consequence atomic conflicts are vastly reduced, and also by default you are getting much better aligned indicies within a warp, improving data access patterns in common usecases.
Here is a performance example:
Prior to the change:
A100:
MI100:
Post change:
A100
MI100