Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix running parallel_reduce with TeamPolicy for large ranges #4532

Merged
merged 7 commits into from
Nov 30, 2021

Conversation

masterleinad
Copy link
Contributor

Fixes #4531. Specifically, the lines

const auto nwork = static_cast<typename Policy::index_type>(m_league_size) * m_team_size;

fix the issue (for CUDA and HIP). The other changes to the types of nwork are just for consistency.

@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad
Copy link
Contributor Author

This seems to work on all backends now. Note that I had to change the argument type of scratch_space for SYCL to std::size_t to make it work. We should discuss (maybe as a follow-up) if we want to make this consistent for all backends and switch to std::size_t or find a different solutiuon.

@masterleinad masterleinad linked an issue Nov 23, 2021 that may be closed by this pull request
@masterleinad
Copy link
Contributor Author

The changes for SYCL are also discussed in #4551.

Copy link
Contributor

@janciesko janciesko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crtrott crtrott merged commit 7428181 into kokkos:develop Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kernels using TeamPolicy with certain dimensions won't execute
5 participants