Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL TeamPolicy::team_scan #3815

Merged
merged 5 commits into from
Mar 18, 2021
Merged

Conversation

masterleinad
Copy link
Contributor

Based on #3783.

@masterleinad
Copy link
Contributor Author

Retest this please.

1 similar comment
@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad masterleinad added this to the Tentative 3.4 Release milestone Mar 9, 2021
core/src/SYCL/Kokkos_SYCL_Team.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Team.hpp Show resolved Hide resolved
@nliber nliber self-requested a review March 9, 2021 20:15
@masterleinad
Copy link
Contributor Author

There is a problem with global_accum that I will fix ASAP.

// FIXME_SYCL move somewhere else and combine with other places that do
// parallel_scan
// Exclusive scan returning the total sum, compare
// https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out what the license of this is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I can easily rewrite this to look more similar to what we have elsewhere if that's a concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote it to look more similar to RangePolicy::parallel_scan.

@masterleinad
Copy link
Contributor Author

Fixed the problem with global_accum, enabled team_reduction_scan, and rebased on top of #3818.

@masterleinad masterleinad added the Blocks Promotion Overview issue for release-blocking bugs label Mar 11, 2021
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

core/src/SYCL/Kokkos_SYCL_Team.hpp Outdated Show resolved Hide resolved
example/tutorial/Hierarchical_Parallelism/CMakeLists.txt Outdated Show resolved Hide resolved
@masterleinad
Copy link
Contributor Author

Rebased.

@dhollman dhollman self-requested a review March 17, 2021 20:15
Comment on lines 444 to 446
m_shmem_begin = (sizeof(double) * (m_team_size + 2));
m_shmem_size =
(m_policy.scratch_size(0, m_team_size) +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we haven't been reusing code, but if we're going to have something that's duplicated but very slightly different like this (from, e.g., Kokkos_Cuda_Parallel.hpp:790-795), can we at least add comments talking about what's different and why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to m_shmem_sizeor something else? AFAICT, its initialization looks the same for SYCL and CUDA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean m_scratch_size[0] which is slightly different. It turns out that this variable is not used for Cuda but m_shmem_size instead. For SYCL we actually use m_scratch_size[0] instead of m_shmem_size in some places (of course they still have the same value). We could probably just replace the m_scratch_size C-array by a simple int having the value of the current m_scratch_size[1] in the related backends.

Copy link

@dhollman dhollman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

core/src/SYCL/Kokkos_SYCL_Team.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Team.hpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Team.hpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Team.hpp Outdated Show resolved Hide resolved
@crtrott crtrott merged commit 574f907 into kokkos:develop Mar 18, 2021
@masterleinad masterleinad deleted the sycl_team_scan branch March 18, 2021 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants