Provide valid default team size for SYCL #4481

masterleinad · 2021-10-28T21:50:29Z

Looking at the KokkosKernels tests, I ran across a case where the default team_size of 32 would exceed the maximum group size eligible for the kernel. This pull request uses team_size_recommended instead and this, in turn, is computed as the maximum power of two not exceeding team_size_max.
The computation of both team_size_recommended and team_size_max probably is still not ideal but certainly an improvement. Any help or suggestions are of course welcome.

nliber · 2021-11-01T20:39:24Z

core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp

  }

  template <class FunctorType>
  int internal_team_size_recommended_reduce(const FunctorType& f) const {
    // FIXME_SYCL improve
-    return internal_team_size_max_reduce(f);
+    const int max_team_size_half = internal_team_size_max_reduce(f) / 2;
+    int power_of_two             = 1;


There is this trick which makes the power of 2 calculation O(1) and no jumps: https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2

I'm using Kokkos::log2 instead (which uses intrinsics where available).

masterleinad · 2021-11-23T19:18:03Z

The test failure is unrelated.

dalg24 · 2021-11-23T22:50:50Z

core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp

@@ -812,7 +816,7 @@ class ParallelReduce<FunctorType, Kokkos::TeamPolicy<Properties...>,
      Kokkos::Impl::throw_runtime_exception(out.str());
    }

-    if (m_team_size > m_policy.team_size_max(m_functor, ParallelForTag{}))
+    if (m_team_size > m_policy.team_size_max(m_functor, ParallelReduceTag{}))


core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp

dalg24 · 2021-11-23T23:06:39Z

core/unit_test/TestLocalDeepCopy.hpp

+#if defined(KOKKOS_ENABLE_SYCL) && !defined(KOKKOS_ARCH_INTEL_GPU)
+  if (std::is_same<ExecSpace, Kokkos::Experimental::SYCL>::value)
+    policy = team_policy(N, 512);
+#endif


These are failing because of the change team_size = {32 -> recommended_size}?

Yes, I think I'm requesting too many registers here on V100 but it seems to work fine on Intel GPUs...

why did you change this? Why isn't the change you did above for defaulted team_size not giving you a valid large number (instead of 32) here too?

The calculations for default/maximum team size don't take registers into account (since this information is in general not available through the SYCL interface). There are multiple places where we need to use a smaller workgroup size for the SYCL+CUDA CI (where it's not an issue for Intel GPUs).

shouldn't we then be generally more careful with the max size return? I.e. limit ourself to 256 threads if building for NVIDIA GPUs for example? That would mean that it will always fit.

OK, I'll try that.

crtrott · 2021-12-01T15:00:25Z

core/unit_test/TestLocalDeepCopy.hpp

+#if defined(KOKKOS_ENABLE_SYCL) && !defined(KOKKOS_ARCH_INTEL_GPU)
+  if (std::is_same<ExecSpace, Kokkos::Experimental::SYCL>::value)
+    policy = team_policy(N, 512);
+#endif


why did you change this? Why isn't the change you did above for defaulted team_size not giving you a valid large number (instead of 32) here too?

dalg24 · 2021-12-10T19:14:01Z

core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp

+                max_threads_for_memory, 256}) /
+           impl_vector_length();
+
+#else
    return std::min<int>(
               m_space.impl_internal_space_instance()->m_maxWorkgroupSize,
               max_threads_for_memory) /
           impl_vector_length();


I would prefer if you did

return std::min<int>( {m_space.impl_internal_space_instance()->m_maxWorkgroupSize, // FIXME_SYCL Avoid requesting to many registers on NVIDIA GPUs. #if <ARCH IS NVIDIA GPU> 256, #endif max_threads_for_memory}) / impl_vector_length();

core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp

Co-authored-by: Damien L-G <dalg24+github@gmail.com>

…CUDA arch is set

masterleinad · 2021-12-14T23:34:16Z

Retest this please.

masterleinad · 2021-12-15T04:01:23Z

Retest this please.

masterleinad marked this pull request as draft October 29, 2021 13:41

nliber reviewed Nov 1, 2021

View reviewed changes

masterleinad force-pushed the fix_default_team_size_sycl branch 2 times, most recently from 1e7f4ea to 09f0d8f Compare November 10, 2021 18:37

masterleinad force-pushed the fix_default_team_size_sycl branch from 09f0d8f to e498087 Compare November 23, 2021 17:20

masterleinad requested a review from nliber November 23, 2021 17:22

masterleinad marked this pull request as ready for review November 23, 2021 19:18

dalg24 reviewed Nov 23, 2021

View reviewed changes

crtrott requested changes Dec 1, 2021

View reviewed changes

crtrott approved these changes Dec 10, 2021

View reviewed changes

dalg24 reviewed Dec 10, 2021

View reviewed changes

core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp Outdated Show resolved Hide resolved

core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp Outdated Show resolved Hide resolved

masterleinad and others added 10 commits December 14, 2021 20:46

Provide valid default team size for SYCL

b29b878

Improve team_size_recommended

e1a0254

Workaround for CI

ce3c9a3

Restrict to Experimental::SYCL

f1c65c2

Fix typo

b57ce48

Specify the whole policy instead

349c063

Use Kokkos::log2 instead

62374c2

Restrict SYCL+CUDA team size to 256

913c781

Replace Kokkos::log2 with Kokkos::Impl::int_log2

616077f

Co-authored-by: Damien L-G <dalg24+github@gmail.com>

Make sure generic KOKKOS_ARCH_* for CUDA archs is defined whenever a …

6d56088

…CUDA arch is set

masterleinad force-pushed the fix_default_team_size_sycl branch from a818af0 to 6673db9 Compare December 14, 2021 22:26

Improve format of team size cap for SYCL+CUDA

befa4ca

masterleinad force-pushed the fix_default_team_size_sycl branch from 6673db9 to befa4ca Compare December 14, 2021 22:27

nliber approved these changes Dec 15, 2021

View reviewed changes

Also restrict team size for reduction kernels

f0ef65f

crtrott approved these changes Dec 15, 2021

View reviewed changes

dalg24 approved these changes Dec 15, 2021

View reviewed changes

dalg24 merged commit 1dbc97a into kokkos:develop Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide valid default team size for SYCL #4481

Provide valid default team size for SYCL #4481

masterleinad commented Oct 28, 2021

nliber Nov 1, 2021

masterleinad Nov 23, 2021

masterleinad commented Nov 23, 2021

dalg24 Nov 23, 2021

dalg24 Nov 23, 2021

masterleinad Nov 23, 2021

crtrott Dec 1, 2021

masterleinad Dec 1, 2021

crtrott Dec 1, 2021

masterleinad Dec 1, 2021

crtrott Dec 1, 2021

dalg24 Dec 10, 2021 •

edited

masterleinad Dec 14, 2021

masterleinad commented Dec 14, 2021

masterleinad commented Dec 15, 2021

Provide valid default team size for SYCL #4481

Provide valid default team size for SYCL #4481

Conversation

masterleinad commented Oct 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented Nov 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalg24 Dec 10, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented Dec 14, 2021

masterleinad commented Dec 15, 2021

dalg24 Dec 10, 2021 •

edited