Cap vector size to kernel maximum for SYCL #4704

masterleinad · 2022-01-20T23:13:17Z

Fixes #4573.
For SYCL, the maximum vector size depends on the kernel and the maximum reported by the device might not be supported for every kernel. So far, we just errored out in case we detected the vector size being too large. This pull request tries to cap the vector size to the maximum supported by the kernel instead.
This is easy enough for parallel_for since the kernel doesn't use the vector size explicitly (meaning that it's fine to retrieve from within the kernel) but for parallel_reduce allocations for the reductions actually depend on the total workgroup size.
The idea here is to first build a kernel with dummy pointers for reduction results and temporary storage, use that one for querying the maximum vector size for the kernel, allocating temporary global and local space, and create the final kernel to be launched with these pointers.

The actual changes are not that large if you ignore whitespace changes.

brian-kelley · 2022-01-21T17:45:10Z

@masterleinad So from the discussion, this does work for all parallel_fors in KokkosKernels which are valid Kokkos.

dalg24

Drive by comment

dalg24 · 2022-01-24T22:04:28Z

core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp

        std::stringstream out;
        out << "The maximum subgroup size (" << max_sg_size
            << ") for this kernel is not divisible by the vector_size ("
-            << m_vector_size << "). Choose a smaller vector_size!\n";
+            << final_vector_size << "). Choose a smaller vector_size!\n";


The error message needs to be revisited

Can you elaborate on why this needs to be revisited? Should we explain that the vector size might have been capped?

The error message will only be triggered if m_vector_size was smaller than max_sg_size in the first place (since you use the min). So one doesn't need to necessarily choose a smaller vector size, just something which is divisible (which could be larger). That said, how would this be ever triggered? Is max_sg_size not a power of two? We kinda have that requirement for vector length in team policy.

crtrott · 2022-01-25T16:06:58Z

core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp

        std::stringstream out;
        out << "The maximum subgroup size (" << max_sg_size
            << ") for this kernel is not divisible by the vector_size ("
-            << m_vector_size << "). Choose a smaller vector_size!\n";
+            << final_vector_size << "). Choose a smaller vector_size!\n";


The error message will only be triggered if m_vector_size was smaller than max_sg_size in the first place (since you use the min). So one doesn't need to necessarily choose a smaller vector size, just something which is divisible (which could be larger). That said, how would this be ever triggered? Is max_sg_size not a power of two? We kinda have that requirement for vector length in team policy.

crtrott · 2022-01-25T16:13:49Z

core/src/SYCL/Kokkos_SYCL_Parallel_Team.hpp

          std::stringstream out;
          out << "The maximum subgroup size (" << max_sg_size
              << ") for this kernel is not divisible by the vector_size ("
-              << m_vector_size << "). Choose a smaller vector_size!\n";
+              << final_vector_size << "). Choose a smaller vector_size!\n";


same comment as above

Yes, the used vector length is the largest power-of-two not exceeding the input vector length. The SYCL standard doesn't specify that the possible subgroup sizes are powers-of-two but that at least seems to be true for NVIDIA and Intel GPUs.
I think we can remove the check.

dalg24 · 2022-01-26T19:34:16Z

Failure (compiler crashing in OpenMPTarget build on Nvidia GPU) is clearly unrelated

Cap vector_size to kernel maximum for SYCL

aae748b

masterleinad requested a review from nliber January 20, 2022 23:13

masterleinad mentioned this pull request Jan 20, 2022

ThreadVectorRange parallel_reduce (int, +) gives incorrect result on SYCL #4573

Closed

masterleinad marked this pull request as ready for review January 21, 2022 04:45

masterleinad added the Blocks Promotion Overview issue for release-blocking bugs label Jan 21, 2022

nliber approved these changes Jan 24, 2022

View reviewed changes

dalg24 reviewed Jan 24, 2022

View reviewed changes

crtrott requested changes Jan 25, 2022

View reviewed changes

Remove check for incompatible vector_size

a34244f

crtrott approved these changes Jan 25, 2022

View reviewed changes

dalg24 merged commit 1ec0d1e into kokkos:develop Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cap vector size to kernel maximum for SYCL #4704

Cap vector size to kernel maximum for SYCL #4704

masterleinad commented Jan 20, 2022

brian-kelley commented Jan 21, 2022

dalg24 left a comment

dalg24 Jan 24, 2022

masterleinad Jan 24, 2022

crtrott Jan 25, 2022

crtrott Jan 25, 2022

crtrott Jan 25, 2022

masterleinad Jan 25, 2022

dalg24 commented Jan 26, 2022

Cap vector size to kernel maximum for SYCL #4704

Cap vector size to kernel maximum for SYCL #4704

Conversation

masterleinad commented Jan 20, 2022

brian-kelley commented Jan 21, 2022

dalg24 left a comment

Choose a reason for hiding this comment

dalg24 Jan 24, 2022

Choose a reason for hiding this comment

masterleinad Jan 24, 2022

Choose a reason for hiding this comment

crtrott Jan 25, 2022

Choose a reason for hiding this comment

crtrott Jan 25, 2022

Choose a reason for hiding this comment

crtrott Jan 25, 2022

Choose a reason for hiding this comment

masterleinad Jan 25, 2022

Choose a reason for hiding this comment

dalg24 commented Jan 26, 2022