-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL TeamPolicy::team_scan #3815
Conversation
Retest this please. |
1 similar comment
Retest this please. |
5e4aed6
to
2fb84b1
Compare
Retest this please. |
2fb84b1
to
36c0ab2
Compare
There is a problem with |
core/src/SYCL/Kokkos_SYCL_Team.hpp
Outdated
// FIXME_SYCL move somewhere else and combine with other places that do | ||
// parallel_scan | ||
// Exclusive scan returning the total sum, compare | ||
// https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to figure out what the license of this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I can easily rewrite this to look more similar to what we have elsewhere if that's a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote it to look more similar to RangePolicy::parallel_scan
.
36c0ab2
to
83b77da
Compare
Fixed the problem with |
638cf5b
to
5c01770
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
ed1dadc
to
4de5c08
Compare
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
4de5c08
to
6b77972
Compare
Rebased. |
m_shmem_begin = (sizeof(double) * (m_team_size + 2)); | ||
m_shmem_size = | ||
(m_policy.scratch_size(0, m_team_size) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we haven't been reusing code, but if we're going to have something that's duplicated but very slightly different like this (from, e.g., Kokkos_Cuda_Parallel.hpp:790-795
), can we at least add comments talking about what's different and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to m_shmem_size
or something else? AFAICT, its initialization looks the same for SYCL
and CUDA
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean m_scratch_size[0]
which is slightly different. It turns out that this variable is not used for Cuda
but m_shmem_size
instead. For SYCL
we actually use m_scratch_size[0]
instead of m_shmem_size
in some places (of course they still have the same value). We could probably just replace the m_scratch_size
C-array by a simple int
having the value of the current m_scratch_size[1]
in the related backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
396be41
to
917b22a
Compare
Based on #3783.