New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune Occupancy Requests #4023
base: develop
Are you sure you want to change the base?
Tune Occupancy Requests #4023
Conversation
|
||
TunerType get_tuner() const { return tuner; } | ||
}; | ||
namespace Impl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utilities to allow us to know that type double
is associated with kokkos_value_double, and retrieve the double
from a Kokkos VariableValue
template <class Bound> | ||
class SingleDimensionalRangeTuner { | ||
size_t id; | ||
size_t context; | ||
using tuning_util = Impl::tuning_type_for<Bound>; | ||
|
||
Bound default_value; | ||
|
||
public: | ||
SingleDimensionalRangeTuner() = default; | ||
SingleDimensionalRangeTuner( | ||
const std::string& name, | ||
Kokkos::Tools::Experimental::StatisticalCategory category, | ||
Bound default_val, Bound lower, Bound upper, Bound step = (Bound)0) { | ||
default_value = default_val; | ||
Kokkos::Tools::Experimental::VariableInfo info; | ||
info.category = category; | ||
info.candidates = make_candidate_range( | ||
static_cast<Bound>(lower), static_cast<Bound>(upper), | ||
static_cast<Bound>(step), false, false); | ||
info.valueQuantity = | ||
Kokkos::Tools::Experimental::CandidateValueType::kokkos_value_range; | ||
info.type = tuning_util::value; | ||
id = Kokkos::Tools::Experimental::declare_output_type(name, info); | ||
} | ||
|
||
Bound begin() { | ||
context = Kokkos::Tools::Experimental::get_new_context_id(); | ||
Kokkos::Tools::Experimental::begin_context(context); | ||
auto tuned_value = | ||
Kokkos::Tools::Experimental::make_variable_value(id, default_value); | ||
Kokkos::Tools::Experimental::request_output_values(context, 1, | ||
&tuned_value); | ||
return tuning_util::get(tuned_value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New general tuning utility to tune a double or int64 from a range. Should be useful outside of Kokkos, honestly, wraps some of the ugly parts of the tuning interface
: tuner(TunerType(name, | ||
Kokkos::Tools::Experimental::StatisticalCategory:: | ||
kokkos_value_ratio, | ||
100, 1, 100, 1)) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somebody should check my work here. The valid values for a DesiredOccupancy are integers between 1 and 100, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually also allow 0, see
kokkos/core/src/traits/Kokkos_OccupancyControlTrait.hpp
Lines 63 to 72 in 4b97a22
struct DesiredOccupancy { | |
int m_occ = 100; | |
explicit constexpr DesiredOccupancy(int occ) : m_occ(occ) { | |
KOKKOS_EXPECTS(0 <= occ && occ <= 100); | |
} | |
explicit constexpr operator int() const { return m_occ; } | |
constexpr int value() const { return m_occ; } | |
DesiredOccupancy() = default; | |
explicit DesiredOccupancy(MaximizeOccupancy const&) : DesiredOccupancy() {} | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. I'll update for it and see what happens ;)
core/src/Kokkos_Tuners.hpp
Outdated
100, 1, 100, 1)) {} | ||
|
||
template <typename... Properties> | ||
auto tune(Kokkos::RangePolicy<Properties...>& policy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really enjoying how this change means that we now have a function in our autotuning system whose signature is "auto tune."
core/src/impl/Kokkos_Profiling.hpp
Outdated
}; | ||
template <typename PolicyType, typename Functor> | ||
struct ToolResponse { | ||
typename TuningResult<PolicyType>::type policy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one element struct here is used so that when we need the tools subsystem to return more than one thing, we can extend it easily
core/src/impl/Kokkos_Profiling.hpp
Outdated
template <typename Policy> | ||
auto default_tuned_version_of(const Policy& policy) { | ||
return policy; | ||
} | ||
template <class... Properties> | ||
auto default_tuned_version_of( | ||
const Kokkos::RangePolicy<Properties...>& policy) { | ||
return Kokkos::Experimental::prefer( | ||
policy, Kokkos::Experimental::DesiredOccupancy(100)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose tuning is enabled, but no tuning tool is loaded. We still need to return a tuned policy, though. I think saying "Maximize Occupancy" is valid, here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DesiredOccupcany
has a constructor taking MaximizeOccupancy
if that's what you are asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, I'll use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out, the type returned by that MaximizeOccupancy is different than the one from using DesiredOccupancy 100. For consistency, I'll use DesiredOccupancy 100
core/src/Kokkos_Parallel.hpp
Outdated
ExecPolicy policy_copy = policy; | ||
/** Request a tuned policy from the tools subsystem */ | ||
const auto& response = | ||
Kokkos::Tools::Impl::begin_parallel_for(policy_copy, functor, str, kpID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that we need to copy policy
because begin_parallel_for
will change the ExecPolicy
but we do not care about the changes. Instead we use response
. Can't we have begin_parallel_for
take a constant ExecPolicy
and always use the response
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it's a bit clunky. In the old model, we mutated the policy, and so I took a copy. Let me try your reorganization, I think if we can make it work that would be ideal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just saw your comment below (can't reply to it for some reason). I like that, I'll start with that as a model
So, big change. I'm moving this to a new model, along the lines of what @Rombur was saying. This does introduce breaking changes, but not into any user-facing code I believe anybody is using. This allows the interface to be what it always should have been: the profiling system takes in a policy by cref, and returns a new policy that the interface uses |
Conflicts: core/src/Kokkos_Parallel.hpp
/** Request a tuned policy from the tools subsystem */ | ||
const auto& response = Kokkos::Tools::Impl::begin_parallel_scan( | ||
execution_policy, functor, str, kpID); | ||
const auto& inner_policy = response.policy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you ever use response
? It looks like you only care about response.policy
. If you could return response.policy
directly, it would save some boilerplate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't want to do that. I think the tools subsystem might return more data later, this is future-proofing
…/tune-occupancy # Conflicts: # core/src/Kokkos_Tuners.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if no tool is loaded, does that mean a new copy of the policy is created now? If so is there a way to avoid that?
@@ -295,12 +308,18 @@ struct ComplexReducerSizeCalculator { | |||
} | |||
}; | |||
|
|||
template <typename Policy> | |||
auto default_tuned_version_of(const Policy& policy) { | |||
return policy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not clear if its a good idea that the default thing still creates a new policy. Doesn't this do that since it not returns a auto&?
So this PR is a bit big, it turns out that tuning occupancy is different than tuning other things. The end goal we're achieving here is to allow a user to say "please modify the occupancy of this kernel until you find the optimal result." It's used exactly how you use the other Occupancy request things
When you build with
KOKKOS_ENABLE_TUNING
, run with--kokkos-tune-internals
, and passpolicy_with_tuning
into a parallel_x`, the autotuning system will tune it.You'll notice that this PR is huge relative to recent Tuning PR's, and it's worth discussing why. A tuned TeamPolicy has the same type before and after tuning. Ditto MDRangePolicy. However, a RangePolicy with no occupancy set is a RangePolicy, while a tuned RangePolicy is a RangePolicy<Space, TagSayingImTuned>. This means that the tuning workflow has to change.
In the past, parallel patterns passed a policy to the Tools subsystem by reference, which mutated the policy. Now, the tools subsystem returns a response on begin_parallel_x, which includes a policy that is then used in the actual functor execution.
I'll leave comments inline to clarify some things
Update comment: #4023 (comment)