Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with CUDA's team reduction for empty ranges #5079

Merged
merged 2 commits into from
Jun 7, 2022

Conversation

masterleinad
Copy link
Contributor

Fixes #5075. This was only triggered when scratch memory was requested.

@JBludau
Copy link
Contributor

JBludau commented Jun 2, 2022

looks good to me

core/unit_test/TestTeamReductionScan.hpp Outdated Show resolved Hide resolved
using ReducerType = Kokkos::Sum<double>;
double result = 10.;
ReducerType reducer(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you bother with a reducer?
Why do you initialize result at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initialize it so that I can see that it's actually set to zero in the reduction (and not by chance uninitialized zero).
I need the reducer to trigger the bug. Otherwise needs_device_set would not be true.

Copy link
Member

@dalg24 dalg24 Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is what it would default to if you passed a scalar isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const bool is_empty_range = m_league_size == 0 || m_team_size == 0;
const bool need_device_set = Analysis::has_init_member_function ||
Analysis::has_final_member_function ||
!m_result_ptr_host_accessible ||
#ifdef KOKKOS_CUDA_ENABLE_GRAPHS
Policy::is_graph_kernel::value ||
#endif
!std::is_same<ReducerType, InvalidType>::value;
if (!is_empty_range || need_device_set) {

Using a reducer I get both has_init_member_function and has_reducer. If I only provide a scalar need_device_set is false.

init: 1
final: 0
host accessible: 1
has reducer: 1

for the reducer case for the different parts of need_device_set

init: 0
final: 0
host accessible: 1
has reducer: 0 

for the scalar case.

core/unit_test/TestTeamReductionScan.hpp Outdated Show resolved Hide resolved
@jhux2
Copy link

jhux2 commented Jun 2, 2022

Is the file packages/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp new since the last Kokkos promotion in Trilinos? Is there a possibility of getting a "hot-fix" for the version of Kokkos in Trilinos that I could test?

@dalg24
Copy link
Member

dalg24 commented Jun 2, 2022

Is the file packages/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp new since the last Kokkos promotion in Trilinos?

It is not new. Note that it is a private header so it does not exist as far as users are concerned.

Is there a possibility of getting a "hot-fix" for the version of Kokkos in Trilinos that I could test?

You will have to wait for Kokkos 3.7 (we expect to create a release candidate branch and start testing in two weeks) or try snapshotting on your own.

@jhux2
Copy link

jhux2 commented Jun 2, 2022

@dalg24 Since it's a private header, I'm unsure how the snapshotting works. @ndellingwood, would you be able to help me with a hotfix in Trilinos? I think I can wait the two weeks.

Copy link
Contributor

@cz4rs cz4rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@masterleinad masterleinad added this to the Tentative 3.7 Release milestone Jun 6, 2022
@masterleinad masterleinad force-pushed the fix_empty_cuda_team_reduction branch from 5a2b7f4 to 1d67306 Compare June 7, 2022 14:24
@masterleinad masterleinad requested a review from dalg24 June 7, 2022 16:22
@rgayatri23
Copy link
Contributor

Retest this please

@dalg24
Copy link
Member

dalg24 commented Jun 7, 2022

Not sure why the status wasn't updated but the tests did pass
https://cloud.cees.ornl.gov/jenkins-ci/blue/organizations/jenkins/Kokkos/detail/Kokkos/8929/pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants