Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TeamThreadMDRange parallel_reduce #6511

Merged
merged 15 commits into from
Jan 11, 2024

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Oct 13, 2023

Fixes #6513, fixes #6530. I decided both in the same pull request to minimize the number of pull request/load on the CI.

It came up in https://kokkosteam.slack.com/archives/C5BGU5NDQ/p1697207384819959?thread_ts=1697194377.868819&cid=C5BGU5NDQ that the behavior of TeamThreadMDRange parallel_reduce is surprising. It just does the reduction for a single thread and is missing the reduction across multiple team members. Also, we don't initialize the result variable.
This pull request fixes that. It appears that the same holds true for ThreadVectorMDRange and TeamVectorMDRange but we are missing a uniform vector_reduce implementation so that is commented at the moment.

@masterleinad masterleinad marked this pull request as ready for review October 13, 2023 20:50
@etiennemlb
Copy link
Contributor

The fix resolved the issue I posted in the above mentioned Slack discussion.

scripts/diff_files Outdated Show resolved Hide resolved
@@ -983,7 +983,10 @@ template <typename Rank, typename TeamHandle, typename Lambda,
KOKKOS_INLINE_FUNCTION void parallel_reduce(
TeamThreadMDRange<Rank, TeamHandle> const& policy, Lambda const& lambda,
ReducerValueType& val) {
val = ReducerValueType{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a very vague memory that we intentionally did not default initialize the result variable here, expecting it to be handled further down the line when eventually non-md team policies are called. But seeing that the result does not reflect that thought, I think initializing the result value here works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we exclusively support sum reductions?
Otherwise initializing this way does not look right to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we currently only allow sum reductions for this interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a static assert to detect if a non-sum reducer or if a functor with a init/join is passed?

Copy link
Contributor Author

@masterleinad masterleinad Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some static_asserts for the value_type in 1c85dd0.
It shouldn't be that difficult to extend the interface to support the full reduction interface but I would rather do that in a follow-up pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note that the nested reduction interface doesn't allow functors as reducers yet, see #6317.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note that the current interface doesn't even allow passing in values as we oftentimes use for reducers.

Comment on lines +1005 to +1016
if constexpr (false
#ifdef KOKKOS_ENABLE_CUDA
|| std::is_same_v<typename TeamHandle::execution_space,
Kokkos::Cuda>
#elif defined(KOKKOS_ENABLE_HIP)
|| std::is_same_v<typename TeamHandle::execution_space,
Kokkos::HIP>
#elif defined(KOKKOS_ENABLE_SYCL)
|| std::is_same_v<typename TeamHandle::execution_space,
Kokkos::Experimental::SYCL>
#endif
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we prefer this over if constexpr(!std::is_same_v<typename TeamHandle::execution_space, Kokkos::Serial>)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see for which of the backends we need this in the end for the test to pass. We can discuss separately if we need a uniform vector_reduce implementation for all backends or not.

core/unit_test/CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines 172 to 178
Kokkos::TeamPolicy<ExecSpace>(
leagueSize, Kokkos::AUTO
#ifndef KOKKOS_ENABLE_OPENMPTARGET
,
Kokkos::TeamPolicy<ExecSpace>::vector_length_max()
#endif
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need at least 32 threads in a team and setting vector_length_max would give us 1 thread with vector_length 32.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that happen even when you manually specify a number for team_size, or is it only when Kokkos::AUTO is used along with vector_length_max?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the combination the compiler complains. I didn't try too many things here and would rather leave it to @rgayatri23 to clean this up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now using vector_length==2 if OpenMPTarget was enabled which seems to work fine as well.

@@ -149,8 +155,9 @@ class OpenMPTargetExecTeamMember {
}
#pragma omp barrier
}
return team_scratch[0];
value = team_scratch[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt this have given a warning at-least since it is now a void function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline that an earlier change in this pull request missed writing back to value when removing the return statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm wrong question

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the OpenMPTarget not proposed as its own PR?
Are they not more unit tests that can be enabled with that fix?

@masterleinad
Copy link
Contributor Author

Why is the OpenMPTarget not proposed as its own PR?
Are they not more unit tests that can be enabled with that fix?

My motivation was to minimize the number of places where we would disable/modify a test (that wasn't using the respective functionality in the intended way) and this fix in the OpenMPTarget was the only one to make the adjusted tests pass on all backends (the test was enabled for in the first place). I would expect there to be some more tests that could be enabled after fixing the team_reduce interface to be in line with the other backends but don't have a strong opinion if @rgayatri23 (or someone else) does that before or after this pull request.

core/unit_test/TestTeamMDRange.hpp Show resolved Hide resolved
@@ -983,7 +983,16 @@ template <typename Rank, typename TeamHandle, typename Lambda,
KOKKOS_INLINE_FUNCTION void parallel_reduce(
TeamThreadMDRange<Rank, TeamHandle> const& policy, Lambda const& lambda,
ReducerValueType& val) {
static_assert(/*!Kokkos::is_view_v<ReducerValueType> &&*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_view_v requires pulling in Kokkos_View.hpp which in turn pulls in this header. To avoid the circular dependency, I just commented this particular check.

@masterleinad
Copy link
Contributor Author

Why is the OpenMPTarget not proposed as its own PR?
Are they not more unit tests that can be enabled with that fix?

Not quite sure. The motivation here was to not disable more tests. I wanted to let @rgayatri23 explore (in a different pull request) if more unit tests could be enabled when providing the correct interface but the one enabled additionally here is the only one that seemed obvious.

@crtrott crtrott merged commit ee5cbfc into kokkos:develop Jan 11, 2024
30 of 31 checks passed
@crtrott crtrott mentioned this pull request Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants