-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SYCL UniqueToken avoiding bitset #4748
Update SYCL UniqueToken avoiding bitset #4748
Conversation
Hi @masterleinad -- A couple of questions: 1) will this PR be in the 3.6 release? 2) if it will be in 3.6, is it still a blocker? |
@ajpowelsnl Is there a reason to remove the |
I think we should treat it the same as #4741. |
Yes, I just spoke with Christian about it, and the |
067857a
to
cab7f79
Compare
Retest this please. |
1 similar comment
Retest this please. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will approve once we clarify the question about the UniqueToken<SYCL, UniqueTokenScope::Instance>
constructors.
int idx = (blockIdx[0] * (blockDim[0] * blockDim[1]) + | ||
threadIdx[1] * blockDim[0] + threadIdx[0]) % | ||
size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if you split the expression in two and had idx = idx % size();
like we do in CUDA
UniqueToken(size_type max_size, | ||
execution_space const& arg = execution_space()) | ||
: UniqueToken<SYCL, UniqueTokenScope::Global>(max_size, arg) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this one be declared explicit
too? ( I know we didn't in CUDA)
Also why is it OK to only have two constructors when we had 4 in CUDA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cuda
has
explicit UniqueToken()
: UniqueToken<Cuda, UniqueTokenScope::Global>(
Kokkos::Cuda().concurrency()) {}
explicit UniqueToken(execution_space const& arg)
: UniqueToken<Cuda, UniqueTokenScope::Global>(
Kokkos::Cuda().concurrency(), arg) {}
UniqueToken(size_type max_size)
: UniqueToken<Cuda, UniqueTokenScope::Global>(max_size) {}
UniqueToken(size_type max_size,
execution_space const& arg = execution_space())
: UniqueToken<Cuda, UniqueTokenScope::Global>(max_size, arg) {}
In the current form, the first two Cuda constructors are the same as the first one here and the last two Cuda
constructors are the same as the second one here. In fact, the third constructor in Cuda
(and HIP
) should just be removed since trying to use is would be ambiguous taking the last constructor into account.
Of course, we could argue about not using default arguments to be able to only mark the constructors taking one argument as explicit
. In the end, that only matters for copy initialization with an initializer list. I agree, though, that we should make all specializations for UniqueToken
have the same interface.
bd82bee
to
38a2deb
Compare
Again, only OpenMPTarget is failing with an internal compiler error. |
Retest this please. |
This corresponds to #4741. I see similar speed-ups.