Fix running parallel_reduce with TeamPolicy for large ranges #4532

masterleinad · 2021-11-16T16:57:08Z

Fixes #4531. Specifically, the lines

const auto nwork = static_cast<typename Policy::index_type>(m_league_size) * m_team_size;

fix the issue (for CUDA and HIP). The other changes to the types of nwork are just for consistency.

masterleinad · 2021-11-17T14:29:41Z

Retest this please.

masterleinad · 2021-11-18T16:56:22Z

This seems to work on all backends now. Note that I had to change the argument type of scratch_space for SYCL to std::size_t to make it work. We should discuss (maybe as a follow-up) if we want to make this consistent for all backends and switch to std::size_t or find a different solutiuon.

core/src/Cuda/Kokkos_Cuda_Parallel.hpp

masterleinad · 2021-11-23T17:49:35Z

The changes for SYCL are also discussed in #4551.

janciesko

LGTM

masterleinad added 2 commits November 16, 2021 11:51

Fix running parallel_reduce with TeamPolicy for large ranges

032f6ed

Add test for parallel_reduce with TeamPolicy and large ranges

a42b70a

masterleinad mentioned this pull request Nov 16, 2021

Kernels using TeamPolicy with certain dimensions won't execute #4531

Closed

masterleinad added 2 commits November 16, 2021 12:08

remove unused variable

c13dd57

Avoid multiplying m_league_size and m_team_size altogether

f3d7c24

masterleinad force-pushed the fix_large_team_reduce_cuda_hip branch from 0161f58 to f3d7c24 Compare November 16, 2021 20:30

masterleinad added 2 commits November 17, 2021 17:22

Add another test case and fix OpenMP

0b488b0

Fix SYCL implementation

c411444

masterleinad force-pushed the fix_large_team_reduce_cuda_hip branch from 9872d3a to c411444 Compare November 18, 2021 15:34

masterleinad linked an issue Nov 23, 2021 that may be closed by this pull request

Kernels using TeamPolicy with certain dimensions won't execute #4531

Closed

dalg24 reviewed Nov 23, 2021

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel.hpp Outdated Show resolved Hide resolved

masterleinad mentioned this pull request Nov 23, 2021

Use std::size_t for requested scratch allocations in GPU backends #4551

Merged

do_work -> is_empty_range

95961f5

masterleinad force-pushed the fix_large_team_reduce_cuda_hip branch from 47d559c to 95961f5 Compare November 23, 2021 19:16

janciesko self-requested a review November 24, 2021 18:30

janciesko reviewed Nov 24, 2021

View reviewed changes

rgayatri23 approved these changes Nov 24, 2021

View reviewed changes

crtrott approved these changes Nov 30, 2021

View reviewed changes

crtrott merged commit 7428181 into kokkos:develop Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix running parallel_reduce with TeamPolicy for large ranges #4532

Fix running parallel_reduce with TeamPolicy for large ranges #4532

masterleinad commented Nov 16, 2021

masterleinad commented Nov 17, 2021

masterleinad commented Nov 18, 2021

masterleinad commented Nov 23, 2021

janciesko left a comment

Fix running parallel_reduce with TeamPolicy for large ranges #4532

Fix running parallel_reduce with TeamPolicy for large ranges #4532

Conversation

masterleinad commented Nov 16, 2021

masterleinad commented Nov 17, 2021

masterleinad commented Nov 18, 2021

masterleinad commented Nov 23, 2021

janciesko left a comment

Choose a reason for hiding this comment