Fix TeamThreadMDRange parallel_reduce #6511

masterleinad · 2023-10-13T18:29:32Z

Fixes #6513, fixes #6530. I decided both in the same pull request to minimize the number of pull request/load on the CI.

It came up in https://kokkosteam.slack.com/archives/C5BGU5NDQ/p1697207384819959?thread_ts=1697194377.868819&cid=C5BGU5NDQ that the behavior of TeamThreadMDRange parallel_reduce is surprising. It just does the reduction for a single thread and is missing the reduction across multiple team members. Also, we don't initialize the result variable.
This pull request fixes that. It appears that the same holds true for ThreadVectorMDRange and TeamVectorMDRange but we are missing a uniform vector_reduce implementation so that is commented at the moment.

etiennemlb · 2023-10-13T21:36:18Z

The fix resolved the issue I posted in the above mentioned Slack discussion.

scripts/diff_files

ldh4 · 2023-10-17T02:46:25Z

core/src/Kokkos_ExecPolicy.hpp

@@ -983,7 +983,10 @@ template <typename Rank, typename TeamHandle, typename Lambda,
 KOKKOS_INLINE_FUNCTION void parallel_reduce(
    TeamThreadMDRange<Rank, TeamHandle> const& policy, Lambda const& lambda,
    ReducerValueType& val) {
+  val = ReducerValueType{};


I have a very vague memory that we intentionally did not default initialize the result variable here, expecting it to be handled further down the line when eventually non-md team policies are called. But seeing that the result does not reflect that thought, I think initializing the result value here works.

Do we exclusively support sum reductions?
Otherwise initializing this way does not look right to me.

Yes, we currently only allow sum reductions for this interface.

Do we have a static assert to detect if a non-sum reducer or if a functor with a init/join is passed?

I added some static_asserts for the value_type in 1c85dd0.
It shouldn't be that difficult to extend the interface to support the full reduction interface but I would rather do that in a follow-up pull request.

Also, note that the nested reduction interface doesn't allow functors as reducers yet, see #6317.

Also, note that the current interface doesn't even allow passing in values as we oftentimes use for reducers.

ldh4 · 2023-10-17T02:50:22Z

core/src/Kokkos_ExecPolicy.hpp

+  if constexpr (false
+#ifdef KOKKOS_ENABLE_CUDA
+                || std::is_same_v<typename TeamHandle::execution_space,
+                                  Kokkos::Cuda>
+#elif defined(KOKKOS_ENABLE_HIP)
+                || std::is_same_v<typename TeamHandle::execution_space,
+                                  Kokkos::HIP>
+#elif defined(KOKKOS_ENABLE_SYCL)
+                || std::is_same_v<typename TeamHandle::execution_space,
+                                  Kokkos::Experimental::SYCL>
+#endif
+  )


Do we prefer this over if constexpr(!std::is_same_v<typename TeamHandle::execution_space, Kokkos::Serial>)?

Let's see for which of the backends we need this in the end for the test to pass. We can discuss separately if we need a uniform vector_reduce implementation for all backends or not.

core/unit_test/CMakeLists.txt

masterleinad · 2023-10-19T15:40:11Z

core/unit_test/TestTeamMDRange.hpp

+        Kokkos::TeamPolicy<ExecSpace>(
+            leagueSize, Kokkos::AUTO
+#ifndef KOKKOS_ENABLE_OPENMPTARGET
+            ,
+            Kokkos::TeamPolicy<ExecSpace>::vector_length_max()
+#endif
+                ),


We need at least 32 threads in a team and setting vector_length_max would give us 1 thread with vector_length 32.

Does that happen even when you manually specify a number for team_size, or is it only when Kokkos::AUTO is used along with vector_length_max?

With the combination the compiler complains. I didn't try too many things here and would rather leave it to @rgayatri23 to clean this up.

We are now using vector_length==2 if OpenMPTarget was enabled which seems to work fine as well.

rgayatri23 · 2023-10-19T17:00:14Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Parallel.hpp

@@ -149,8 +155,9 @@ class OpenMPTargetExecTeamMember {
      }
 #pragma omp barrier
    }
-    return team_scratch[0];
+    value = team_scratch[0];


Shouldnt this have given a warning at-least since it is now a void function.

Discussed offline that an earlier change in this pull request missed writing back to value when removing the return statement.

nvm wrong question

dalg24

Why is the OpenMPTarget not proposed as its own PR?
Are they not more unit tests that can be enabled with that fix?

masterleinad · 2023-10-24T21:42:02Z

Why is the OpenMPTarget not proposed as its own PR?
Are they not more unit tests that can be enabled with that fix?

My motivation was to minimize the number of places where we would disable/modify a test (that wasn't using the respective functionality in the intended way) and this fix in the OpenMPTarget was the only one to make the adjusted tests pass on all backends (the test was enabled for in the first place). I would expect there to be some more tests that could be enabled after fixing the team_reduce interface to be in line with the other backends but don't have a strong opinion if @rgayatri23 (or someone else) does that before or after this pull request.

core/unit_test/TestTeamMDRange.hpp

…_reduce

masterleinad · 2024-01-11T17:08:14Z

core/src/Kokkos_ExecPolicy.hpp

@@ -983,7 +983,16 @@ template <typename Rank, typename TeamHandle, typename Lambda,
 KOKKOS_INLINE_FUNCTION void parallel_reduce(
    TeamThreadMDRange<Rank, TeamHandle> const& policy, Lambda const& lambda,
    ReducerValueType& val) {
+  static_assert(/*!Kokkos::is_view_v<ReducerValueType> &&*/


is_view_v requires pulling in Kokkos_View.hpp which in turn pulls in this header. To avoid the circular dependency, I just commented this particular check.

masterleinad · 2024-01-11T18:28:44Z

Why is the OpenMPTarget not proposed as its own PR?
Are they not more unit tests that can be enabled with that fix?

Not quite sure. The motivation here was to not disable more tests. I wanted to let @rgayatri23 explore (in a different pull request) if more unit tests could be enabled when providing the correct interface but the one enabled additionally here is the only one that seemed obvious.

Fix TeamThreadRangeMD parallel_reduce

e66f2b0

masterleinad requested a review from ldh4 October 13, 2023 18:29

masterleinad marked this pull request as ready for review October 13, 2023 20:50

cz4rs reviewed Oct 16, 2023

View reviewed changes

scripts/diff_files Outdated Show resolved Hide resolved

Use vector_reduce for Cuda, HIP, and SYCL

dadb200

masterleinad force-pushed the fix_teamthreadmd_reduce branch from 960890e to dadb200 Compare October 16, 2023 11:54

Initialize reduction variable

5739a92

masterleinad mentioned this pull request Oct 16, 2023

parallel_reduce on TeamthreadMDRangeMDRange does not initialize the value to identity #6513

Closed

masterleinad added 2 commits October 16, 2023 14:20

Test with maximal vector length

d8fabbc

TeamHandle::execution_space->typename TeamHandle::execution_space

8b01efd

masterleinad force-pushed the fix_teamthreadmd_reduce branch from bcadb77 to 8b01efd Compare October 16, 2023 19:10

Fix OpenMPTarget

e94b2ca

ldh4 reviewed Oct 17, 2023

View reviewed changes

masterleinad added 3 commits October 16, 2023 23:02

Restore tests

257c2f4

reducer is unused

952f3eb

Workarounds for OpenMPTarget

7435619

masterleinad force-pushed the fix_teamthreadmd_reduce branch from f93d91c to 7435619 Compare October 17, 2023 19:10

masterleinad mentioned this pull request Oct 18, 2023

TeamThreadMDRange, TeamVectorMDRange, and ThreadVectorMDRange parallel_reduce don't reduce #6530

Closed

masterleinad added 3 commits October 19, 2023 14:04

Reduce

929bf2b

Fix OpenMPTarget tests

55c24ff

Restore tests

2d076eb

masterleinad requested a review from rgayatri23 October 19, 2023 15:29

masterleinad commented Oct 19, 2023

View reviewed changes

rgayatri23 reviewed Oct 19, 2023

View reviewed changes

Strengthen test case for OpenMPTarget

44fa290

masterleinad force-pushed the fix_teamthreadmd_reduce branch from 46810ef to 44fa290 Compare October 19, 2023 18:39

dalg24 reviewed Oct 24, 2023

View reviewed changes

masterleinad mentioned this pull request Oct 25, 2023

OpenMPTarget: Block failing unit tests with clang/16. #6521

Closed

ldh4 approved these changes Dec 19, 2023

View reviewed changes

core/unit_test/TestTeamMDRange.hpp Show resolved Hide resolved

masterleinad added 2 commits January 9, 2024 18:40

Merge remote-tracking branch 'upstream/develop' into fix_teamthreadmd…

b4642a5

…_reduce

Add some static_asserts

1c15e18

masterleinad force-pushed the fix_teamthreadmd_reduce branch from 1c85dd0 to 1c15e18 Compare January 10, 2024 19:12

masterleinad commented Jan 11, 2024

View reviewed changes

crtrott approved these changes Jan 11, 2024

View reviewed changes

crtrott merged commit ee5cbfc into kokkos:develop Jan 11, 2024
30 of 31 checks passed

crtrott mentioned this pull request Jan 11, 2024

CHANGELOG: 4.3.0 #6519

Closed

ndellingwood mentioned this pull request Jan 16, 2024

Nightly test failure, SYCL backend on PVC: multiple ThreadVectorMDRangeParallelReduce unit sub-test failures #6720

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TeamThreadMDRange parallel_reduce #6511

Fix TeamThreadMDRange parallel_reduce #6511

masterleinad commented Oct 13, 2023 •

edited

Loading

etiennemlb commented Oct 13, 2023

ldh4 Oct 17, 2023

dalg24 Jan 10, 2024

masterleinad Jan 10, 2024

dalg24 Jan 10, 2024

masterleinad Jan 10, 2024 •

edited

Loading

masterleinad Jan 10, 2024

masterleinad Jan 11, 2024

ldh4 Oct 17, 2023

masterleinad Oct 17, 2023

masterleinad Oct 19, 2023

ldh4 Oct 19, 2023

masterleinad Oct 19, 2023

masterleinad Oct 24, 2023

rgayatri23 Oct 19, 2023

masterleinad Oct 19, 2023

rgayatri23 Oct 19, 2023

dalg24 left a comment

masterleinad commented Oct 24, 2023

masterleinad Jan 11, 2024

masterleinad commented Jan 11, 2024

Fix TeamThreadMDRange parallel_reduce #6511

Fix TeamThreadMDRange parallel_reduce #6511

Conversation

masterleinad commented Oct 13, 2023 • edited Loading

etiennemlb commented Oct 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalg24 left a comment

Choose a reason for hiding this comment

masterleinad commented Oct 24, 2023

Choose a reason for hiding this comment

masterleinad commented Jan 11, 2024

masterleinad commented Oct 13, 2023 •

edited

Loading

masterleinad Jan 10, 2024 •

edited

Loading