Deduce workgroup size for SYCL parallel_reduce RangePolicy #5227

masterleinad · 2022-07-15T20:04:15Z

The crucial new lines of code are

        auto dummy_reduction_lambda =
            reduction_lambda_factory({1, cgh}, num_teams_done, nullptr);

        static sycl::kernel kernel = [&] {
          sycl::kernel_id functor_kernel_id =
              sycl::get_kernel_id<decltype(dummy_reduction_lambda)>();
          auto kernel_bundle =
              sycl::get_kernel_bundle<sycl::bundle_state::executable>(
                  q.get_context(), std::vector{functor_kernel_id});
          return kernel_bundle.get_kernel(functor_kernel_id);
        }();
        auto multiple = kernel.get_info<sycl::info::kernel_device_specific::
                                            preferred_work_group_size_multiple>(
            q.get_device());
        auto max =
            kernel
                .get_info<sycl::info::kernel_device_specific::work_group_size>(
                    q.get_device());
        const size_t wgroup_size =
            static_cast<size_t>(max / multiple) * multiple;

We have a similar approach already for TeamPolicy to find a suitable vector length/subgroup size.

masterleinad · 2022-07-15T20:24:52Z

The diff is best viewed without whitespace changes.

Rombur

It looks fine but the code could use more comments.

Rombur · 2022-07-19T13:19:35Z

core/src/SYCL/Kokkos_SYCL_Parallel_Reduce.hpp

+                       sycl::access::target::local>
+            num_teams_done(1, cgh);
+
+        auto dummy_reduction_lambda =


Why is called dummy?

Because it gets base dummy memory pointers.

dalg24 · 2022-07-19T13:52:49Z

core/src/SYCL/Kokkos_SYCL_Parallel_Reduce.hpp

+              sycl::accessor<unsigned int, 1, sycl::access::mode::read_write,
+                             sycl::access::target::local>
+                  num_teams_done,
+              sycl::device_ptr<value_type> results_ptr) mutable {


Not getting -Wshadow warnings?

Apparently not. 🙂

why does this need to be mutable?

dalg24 · 2022-07-19T13:59:29Z

core/src/SYCL/Kokkos_SYCL_Parallel_Reduce.hpp

+        static sycl::kernel kernel = [&] {
+          sycl::kernel_id functor_kernel_id =
+              sycl::get_kernel_id<decltype(dummy_reduction_lambda)>();
+          auto kernel_bundle =
+              sycl::get_kernel_bundle<sycl::bundle_state::executable>(
+                  q.get_context(), std::vector{functor_kernel_id});
+          return kernel_bundle.get_kernel(functor_kernel_id);
+        }();


This code you have 2x in Kokkos_SYCL_Parallel_Team.hpp.
Did you consider defining some kind of helper function for it?

Not sure if it's worth it. In my opinion, this is pretty concise.

I will not approve unless you refactor

masterleinad · 2022-08-11T17:13:22Z

CUDA-9.2-NVCC failing is unrelated.

crtrott

Why is that thing mutable?

crtrott · 2022-08-17T19:07:37Z

core/src/SYCL/Kokkos_SYCL_Parallel_Reduce.hpp

+              sycl::accessor<unsigned int, 1, sycl::access::mode::read_write,
+                             sycl::access::target::local>
+                  num_teams_done,
+              sycl::device_ptr<value_type> results_ptr) mutable {


why does this need to be mutable?

masterleinad · 2022-08-17T19:53:42Z

@crtrott I dropped the mutable specifier for the lambdas.

Deduce workgroup size for SYCL parallel_reduce RangePolicy

e90fca1

Limit workgroup size to 512 when not using an Intel GPU

3297b04

masterleinad force-pushed the sycl_deduce_wgroup_size_reduce branch from 929cebe to 3297b04 Compare July 18, 2022 19:26

masterleinad marked this pull request as ready for review July 19, 2022 03:08

Rombur approved these changes Jul 19, 2022

View reviewed changes

dalg24 reviewed Jul 19, 2022

View reviewed changes

crtrott self-assigned this Aug 17, 2022

crtrott requested changes Aug 17, 2022

View reviewed changes

Drop mutable

b77fb8f

masterleinad requested a review from crtrott August 17, 2022 19:52

masterleinad requested a review from nliber August 17, 2022 20:17

crtrott approved these changes Aug 17, 2022

View reviewed changes

crtrott merged commit 82834e3 into kokkos:develop Aug 18, 2022

masterleinad mentioned this pull request Sep 7, 2022

CHANGELOG: 4.0 #5439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduce workgroup size for SYCL parallel_reduce RangePolicy #5227

Deduce workgroup size for SYCL parallel_reduce RangePolicy #5227

masterleinad commented Jul 15, 2022

masterleinad commented Jul 15, 2022

Rombur left a comment

Rombur Jul 19, 2022

masterleinad Aug 11, 2022 •

edited

dalg24 Jul 19, 2022

masterleinad Aug 11, 2022

crtrott Aug 17, 2022

dalg24 Jul 19, 2022

masterleinad Aug 11, 2022

dalg24 Aug 17, 2022

masterleinad commented Aug 11, 2022

crtrott left a comment

crtrott Aug 17, 2022

masterleinad commented Aug 17, 2022

Deduce workgroup size for SYCL parallel_reduce RangePolicy #5227

Deduce workgroup size for SYCL parallel_reduce RangePolicy #5227

Conversation

masterleinad commented Jul 15, 2022

masterleinad commented Jul 15, 2022

Rombur left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad Aug 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented Aug 11, 2022

crtrott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented Aug 17, 2022

masterleinad Aug 11, 2022 •

edited