Tune Occupancy Requests #4023

DavidPoliakoff · 2021-05-12T18:53:50Z

So this PR is a bit big, it turns out that tuning occupancy is different than tuning other things. The end goal we're achieving here is to allow a user to say "please modify the occupancy of this kernel until you find the optimal result." It's used exactly how you use the other Occupancy request things

auto policy_with_tuning = 
  Kokkos::Experimental::prefer(
  SomeRangePolicy, Kokkos::Experimental::DesiredOccupancy{Kokkos::AUTO});

When you build with KOKKOS_ENABLE_TUNING, run with --kokkos-tune-internals, and pass policy_with_tuning into a parallel_x`, the autotuning system will tune it.

You'll notice that this PR is huge relative to recent Tuning PR's, and it's worth discussing why. A tuned TeamPolicy has the same type before and after tuning. Ditto MDRangePolicy. However, a RangePolicy with no occupancy set is a RangePolicy, while a tuned RangePolicy is a RangePolicy<Space, TagSayingImTuned>. This means that the tuning workflow has to change.

In the past, parallel patterns passed a policy to the Tools subsystem by reference, which mutated the policy. Now, the tools subsystem returns a response on begin_parallel_x, which includes a policy that is then used in the actual functor execution.

I'll leave comments inline to clarify some things

Update comment: #4023 (comment)

core/src/Kokkos_Tuners.hpp

DavidPoliakoff · 2021-05-12T18:58:07Z

core/src/Kokkos_Tuners.hpp

+
+  TunerType get_tuner() const { return tuner; }
+};
+namespace Impl {


Utilities to allow us to know that type double is associated with kokkos_value_double, and retrieve the double from a Kokkos VariableValue

DavidPoliakoff · 2021-05-12T18:59:23Z

core/src/Kokkos_Tuners.hpp

+template <class Bound>
+class SingleDimensionalRangeTuner {
+  size_t id;
+  size_t context;
+  using tuning_util = Impl::tuning_type_for<Bound>;
+
+  Bound default_value;
+
+ public:
+  SingleDimensionalRangeTuner() = default;
+  SingleDimensionalRangeTuner(
+      const std::string& name,
+      Kokkos::Tools::Experimental::StatisticalCategory category,
+      Bound default_val, Bound lower, Bound upper, Bound step = (Bound)0) {
+    default_value = default_val;
+    Kokkos::Tools::Experimental::VariableInfo info;
+    info.category   = category;
+    info.candidates = make_candidate_range(
+        static_cast<Bound>(lower), static_cast<Bound>(upper),
+        static_cast<Bound>(step), false, false);
+    info.valueQuantity =
+        Kokkos::Tools::Experimental::CandidateValueType::kokkos_value_range;
+    info.type = tuning_util::value;
+    id        = Kokkos::Tools::Experimental::declare_output_type(name, info);
+  }
+
+  Bound begin() {
+    context = Kokkos::Tools::Experimental::get_new_context_id();
+    Kokkos::Tools::Experimental::begin_context(context);
+    auto tuned_value =
+        Kokkos::Tools::Experimental::make_variable_value(id, default_value);
+    Kokkos::Tools::Experimental::request_output_values(context, 1,
+                                                       &tuned_value);
+    return tuning_util::get(tuned_value);
+  }


New general tuning utility to tune a double or int64 from a range. Should be useful outside of Kokkos, honestly, wraps some of the ugly parts of the tuning interface

DavidPoliakoff · 2021-05-12T19:02:40Z

core/src/Kokkos_Tuners.hpp

+      : tuner(TunerType(name,
+                        Kokkos::Tools::Experimental::StatisticalCategory::
+                            kokkos_value_ratio,
+                        100, 1, 100, 1)) {}


Somebody should check my work here. The valid values for a DesiredOccupancy are integers between 1 and 100, no?

We actually also allow 0, see

kokkos/core/src/traits/Kokkos_OccupancyControlTrait.hpp

Lines 63 to 72 in 4b97a22

struct DesiredOccupancy {

int m_occ = 100;

explicit constexpr DesiredOccupancy(int occ) : m_occ(occ) {

KOKKOS_EXPECTS(0 <= occ && occ <= 100);

}

explicit constexpr operator int() const { return m_occ; }

constexpr int value() const { return m_occ; }

DesiredOccupancy() = default;

explicit DesiredOccupancy(MaximizeOccupancy const&) : DesiredOccupancy() {}

};

not quite sure if that makes sense, though.

Ah, thanks. I'll update for it and see what happens ;)

DavidPoliakoff · 2021-05-12T19:04:23Z

core/src/Kokkos_Tuners.hpp

+                        100, 1, 100, 1)) {}
+
+  template <typename... Properties>
+  auto tune(Kokkos::RangePolicy<Properties...>& policy) {


I'm really enjoying how this change means that we now have a function in our autotuning system whose signature is "auto tune."

core/src/impl/Kokkos_Profiling.hpp

DavidPoliakoff · 2021-05-12T19:06:22Z

core/src/impl/Kokkos_Profiling.hpp

+};
+template <typename PolicyType, typename Functor>
+struct ToolResponse {
+  typename TuningResult<PolicyType>::type policy;


The one element struct here is used so that when we need the tools subsystem to return more than one thing, we can extend it easily

DavidPoliakoff · 2021-05-12T19:10:42Z

core/src/impl/Kokkos_Profiling.hpp

+template <typename Policy>
+auto default_tuned_version_of(const Policy& policy) {
+  return policy;
+}
+template <class... Properties>
+auto default_tuned_version_of(
+    const Kokkos::RangePolicy<Properties...>& policy) {
+  return Kokkos::Experimental::prefer(
+      policy, Kokkos::Experimental::DesiredOccupancy(100));
+}


Suppose tuning is enabled, but no tuning tool is loaded. We still need to return a tuned policy, though. I think saying "Maximize Occupancy" is valid, here?

DesiredOccupcany has a constructor taking MaximizeOccupancy if that's what you are asking.

Oh cool, I'll use that

Turns out, the type returned by that MaximizeOccupancy is different than the one from using DesiredOccupancy 100. For consistency, I'll use DesiredOccupancy 100

core/src/traits/Kokkos_OccupancyControlTrait.hpp

core/unit_test/default/TestDefaultDeviceDevelop.cpp

Rombur · 2021-05-19T17:55:47Z

core/src/Kokkos_Parallel.hpp

+  ExecPolicy policy_copy = policy;
+  /** Request a tuned policy from the tools subsystem */
+  const auto& response =
+      Kokkos::Tools::Impl::begin_parallel_for(policy_copy, functor, str, kpID);


Do I understand correctly that we need to copy policy because begin_parallel_for will change the ExecPolicy but we do not care about the changes. Instead we use response. Can't we have begin_parallel_for take a constant ExecPolicy and always use the response?

Right, it's a bit clunky. In the old model, we mutated the policy, and so I took a copy. Let me try your reorganization, I think if we can make it work that would be ideal

Oh, just saw your comment below (can't reply to it for some reason). I like that, I'll start with that as a model

core/src/Kokkos_Tuners.hpp

…/tune-occupancy

@Rombur

…s per @Rombur 's comments

DavidPoliakoff · 2021-05-24T19:38:25Z

So, big change. I'm moving this to a new model, along the lines of what @Rombur was saying. This does introduce breaking changes, but not into any user-facing code I believe anybody is using. This allows the interface to be what it always should have been: the profiling system takes in a policy by cref, and returns a new policy that the interface uses

Conflicts: core/src/Kokkos_Parallel.hpp

Rombur · 2021-05-26T13:40:59Z

core/src/Kokkos_Parallel.hpp

+  /** Request a tuned policy from the tools subsystem */
+  const auto& response = Kokkos::Tools::Impl::begin_parallel_scan(
+      execution_policy, functor, str, kpID);
+  const auto& inner_policy = response.policy;


Do you ever use response? It looks like you only care about response.policy. If you could return response.policy directly, it would save some boilerplate code.

I actually don't want to do that. I think the tools subsystem might return more data later, this is future-proofing

core/src/Kokkos_Tuners.hpp

core/src/impl/Kokkos_Profiling.hpp

core/src/traits/Kokkos_OccupancyControlTrait.hpp

…/tune-occupancy # Conflicts: # core/src/Kokkos_Tuners.hpp

crtrott

So if no tool is loaded, does that mean a new copy of the policy is created now? If so is there a way to avoid that?

crtrott · 2021-06-02T16:09:22Z

core/src/impl/Kokkos_Profiling.hpp

@@ -295,12 +308,18 @@ struct ComplexReducerSizeCalculator {
  }
 };

+template <typename Policy>
+auto default_tuned_version_of(const Policy& policy) {
+  return policy;


I am not clear if its a good idea that the default thing still creates a new policy. Doesn't this do that since it not returns a auto&?

ajpowelsnl · 2023-07-31T20:24:14Z

@crtrott, @dalg24 -- do we want to continue this PR?

cz4rs · 2023-08-16T19:10:35Z

@crtrott, @dalg24 -- do we want to continue this PR?

I guess we can start with converting this to a draft.

DavidPoliakoff added 4 commits May 11, 2021 15:39

First stab

cd682ed

Initial, decent commit

530d170

clang-format

9fcf03b

Comments

6ca5482