Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDRange Tile Size Tuning #3688

Merged
merged 1 commit into from
Feb 2, 2021

Conversation

DavidPoliakoff
Copy link
Contributor

Implements #3156

If a user doesn't specify a tile size, this PR will add tuning hooks to enable that tuning.

Calling out a few oddities

  • Removal of an include here. This was necessary to avoid some loop of includes
  • Changing some of the infrastructure here. It turns out we changed the interface of init, which I missed.
  • This weird generic_tune_policy thing. Honestly this is a bit premature. But I've noticed that both policy tuning things I wanted to do had the same rough "shape" so I'll try to do it generically.

@DavidPoliakoff
Copy link
Contributor Author

@crtrott
Copy link
Member

crtrott commented Jan 13, 2021

Looks like you are trying to invoke a rank 3 thing with only two indicies somewhere?

@DavidPoliakoff
Copy link
Contributor Author

How so? The MDRange being pointed out is 2d, no?

@DavidPoliakoff
Copy link
Contributor Author

@masterleinad
Copy link
Contributor

Hmmm... I guess you don't have a configuration at hand that would reproduce the error?

@DavidPoliakoff
Copy link
Contributor Author

I can build one. Is it about seeing the skipped contexts?

@masterleinad
Copy link
Contributor

I can build one. Is it about seeing the skipped contexts?

No, the changes here are small enough that it should not be too hard to figure out what exactly is breaking it by selectively disabling changes.

@DavidPoliakoff
Copy link
Contributor Author

Also, since I made it, here's a build with -ftemplate-backtrace-limit=0

/ascldap/users/dzpolia/src/kokkos/core/src/impl/KokkosExp_IterateTileGPU.hpp(72): error: function "lambda [](int, int, Test::value_type &)->void" cannot be called with the given argument list
            argument types are: (const int, const int)
            object type is: const lambda [](int, int, Test::value_type &)->void
          detected during:
            instantiation of "std::enable_if_t<std::is_void<Tag>::value, void> Kokkos::Impl::_tag_invoke<Tag,Functor,Args...>(const Functor &, Args &&...) [with Tag=void, Functor=lambda [](int, int, Test::value_type &)->void, Args=<const int &, const int &>]"
(144): here
            instantiation of "void Kokkos::Impl::DeviceIterateTile<2, PolicyType, Functor, Tag>::exec_range() const [with PolicyType=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, Functor=lambda [](int, int, Test::value_type &)->void, Tag=void]"
/ascldap/users/dzpolia/src/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp(559): here
            instantiation of "void Kokkos::Impl::ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda>::operator()() const [with FunctorType=lambda [](int, int, Test::value_type &)->void, Traits=<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>]"
/ascldap/users/dzpolia/src/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp(121): here
            instantiation of "void Kokkos::Impl::cuda_parallel_launch_local_memory(DriverType) [with DriverType=Kokkos::Impl::ParallelFor<lambda [](int, int, Test::value_type &)->void, Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, Kokkos::CudaSpace::execution_space>]"
/ascldap/users/dzpolia/src/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp(322): here
            instantiation of "std::decay_t<decltype((<expression>))> Kokkos::Impl::CudaParallelLaunchKernelFunc<DriverType, Kokkos::LaunchBounds<0U, 0U>, Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory>::get_kernel_func() [with DriverType=Kokkos::Impl::ParallelFor<lambda [](int, int, Test::value_type &)->void, Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, Kokkos::CudaSpace::execution_space>]"
/ascldap/users/dzpolia/src/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp(653): here
            [ 6 instantiation contexts not shown ]
            instantiation of "void Kokkos::Tools::Impl::ReductionSwitcher<Kokkos::InvalidType>::tune(size_t, const std::__cxx11::string &, ExecPolicy &, const Functor &, const TagType &) [with Functor=lambda [](int, int, Test::value_type &)->void, TagType=Kokkos::ParallelReduceTag, ExecPolicy=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>]"
/ascldap/users/dzpolia/src/kokkos/core/src/impl/Kokkos_Profiling.hpp(529): here
            instantiation of "void Kokkos::Tools::Impl::begin_parallel_reduce<ReducerType,ExecPolicy,FunctorType>(ExecPolicy &, FunctorType &, const std::__cxx11::string &, uint64_t &) [with ReducerType=Kokkos::InvalidType, ExecPolicy=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, FunctorType=const lambda [](int, int, Test::value_type &)->void]"
/ascldap/users/dzpolia/src/kokkos/core/src/Kokkos_Parallel_Reduce.hpp(862): here
            instantiation of "void Kokkos::Impl::ParallelReduceAdaptor<PolicyType, FunctorType, ReturnType>::execute(const std::__cxx11::string &, const PolicyType &, const FunctorType &, ReturnType &) [with PolicyType=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, FunctorType=lambda [](int, int, Test::value_type &)->void, ReturnType=Test::value_type]"
/ascldap/users/dzpolia/src/kokkos/core/src/Kokkos_Parallel_Reduce.hpp(1018): here
            instantiation of "std::enable_if<Kokkos::is_execution_policy<PolicyType>::value, void>::type Kokkos::parallel_reduce(const PolicyType &, const FunctorType &, ReturnType &) [with PolicyType=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, FunctorType=lambda [](int, int, Test::value_type &)->void, ReturnType=Test::value_type]"
/ascldap/users/dzpolia/src/kokkos/core/unit_test/incremental/Test14_MDRangeReduce.hpp(128): here
            instantiation of "void Test::TestMDRangeReduce<ExecSpace>::reduce_MDRange() [with ExecSpace=Kokkos::Cuda]"
/ascldap/users/dzpolia/src/kokkos/core/unit_test/incremental/Test14_MDRangeReduce.hpp(178): here

/ascldap/users/dzpolia/src/kokkos/core/src/impl/KokkosExp_IterateTileGPU.hpp(72): error: function "lambda [](int, int, Test::value_type &)->void" cannot be called with the given argument list
            argument types are: (const int, const int)
            object type is: const lambda [](int, int, Test::value_type &)->void
          detected during:
            instantiation of "std::enable_if_t<std::is_void<Tag>::value, void> Kokkos::Impl::_tag_invoke<Tag,Functor,Args...>(const Functor &, Args &&...) [with Tag=void, Functor=lambda [](int, int, Test::value_type &)->void, Args=<const int &, const int &>]"
(144): here
            instantiation of "void Kokkos::Impl::DeviceIterateTile<2, PolicyType, Functor, Tag>::exec_range() const [with PolicyType=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, Functor=lambda [](int, int, Test::value_type &)->void, Tag=void]"
/ascldap/users/dzpolia/src/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp(559): here
            instantiation of "void Kokkos::Impl::ParallelFor<FunctorType, Kokkos::MDRangePolicy<Traits...>, Kokkos::Cuda>::operator()() const [with FunctorType=lambda [](int, int, Test::value_type &)->void, Traits=<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>]"
/ascldap/users/dzpolia/src/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp(121): here
            instantiation of "void Kokkos::Impl::cuda_parallel_launch_local_memory(DriverType) [with DriverType=Kokkos::Impl::ParallelFor<lambda [](int, int, Test::value_type &)->void, Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, Kokkos::CudaSpace::execution_space>]"
/ascldap/users/dzpolia/src/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp(322): here
            instantiation of "std::decay_t<decltype((<expression>))> Kokkos::Impl::CudaParallelLaunchKernelFunc<DriverType, Kokkos::LaunchBounds<0U, 0U>, Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory>::get_kernel_func() [with DriverType=Kokkos::Impl::ParallelFor<lambda [](int, int, Test::value_type &)->void, Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, Kokkos::CudaSpace::execution_space>]"
/ascldap/users/dzpolia/src/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp(653): here
            [ 6 instantiation contexts not shown ]
            instantiation of "void Kokkos::Tools::Impl::ReductionSwitcher<Kokkos::InvalidType>::tune(size_t, const std::__cxx11::string &, ExecPolicy &, const Functor &, const TagType &) [with Functor=lambda [](int, int, Test::value_type &)->void, TagType=Kokkos::ParallelReduceTag, ExecPolicy=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>]"
/ascldap/users/dzpolia/src/kokkos/core/src/impl/Kokkos_Profiling.hpp(529): here
            instantiation of "void Kokkos::Tools::Impl::begin_parallel_reduce<ReducerType,ExecPolicy,FunctorType>(ExecPolicy &, FunctorType &, const std::__cxx11::string &, uint64_t &) [with ReducerType=Kokkos::InvalidType, ExecPolicy=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, FunctorType=const lambda [](int, int, Test::value_type &)->void]"
/ascldap/users/dzpolia/src/kokkos/core/src/Kokkos_Parallel_Reduce.hpp(862): here
            instantiation of "void Kokkos::Impl::ParallelReduceAdaptor<PolicyType, FunctorType, ReturnType>::execute(const std::__cxx11::string &, const PolicyType &, const FunctorType &, ReturnType &) [with PolicyType=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, FunctorType=lambda [](int, int, Test::value_type &)->void, ReturnType=Kokkos::View<Test::value_type, Kokkos::Cuda>]"
/ascldap/users/dzpolia/src/kokkos/core/src/Kokkos_Parallel_Reduce.hpp(1018): here
            instantiation of "std::enable_if<Kokkos::is_execution_policy<PolicyType>::value, void>::type Kokkos::parallel_reduce(const PolicyType &, const FunctorType &, ReturnType &) [with PolicyType=Kokkos::MDRangePolicy<Kokkos::Cuda, Kokkos::Rank<2U, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>, FunctorType=lambda [](int, int, Test::value_type &)->void, ReturnType=Kokkos::View<Test::value_type, Kokkos::Cuda>]"
/ascldap/users/dzpolia/src/kokkos/core/unit_test/incremental/Test14_MDRangeReduce.hpp(136): here
            instantiation of "void Test::TestMDRangeReduce<ExecSpace>::reduce_MDRange() [with ExecSpace=Kokkos::Cuda]"
/ascldap/users/dzpolia/src/kokkos/core/unit_test/incremental/Test14_MDRangeReduce.hpp(178): here

@masterleinad
Copy link
Contributor

Hmmm.. now OpenMPTarget is complaining:

/var/jenkins/workspace/Kokkos/core/src/Kokkos_Tuners.hpp:533:30: error: no matching member function for call to 'get_max_mdrange_tile_size'
    int max_tile_size = calc.get_max_mdrange_tile_size(policy, functor, tag);
                        ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
/var/jenkins/workspace/Kokkos/core/src/impl/Kokkos_Profiling.hpp:339:24: note: in instantiation of function template specialization 'Kokkos::Tools::Experimental::MDRangeTuner<5>::MDRangeTuner<Test::(anonymous namespace)::TestMDRange_5D<Kokkos::Serial>, Kokkos::ParallelReduceTag, Kokkos::Tools::Impl::Impl::ComplexReducerSizeCalculator<Kokkos::Sum<double>>, Kokkos::Serial, Kokkos::Rank<5, Kokkos::Iterate::Default, Kokkos::Iterate::Default>, Kokkos::IndexType<int>>' requested here
                       Tuner(label, policy, functor, tag,

@DavidPoliakoff
Copy link
Contributor Author

I can fix it, I probably forgot an overload

@DavidPoliakoff
Copy link
Contributor Author

Yup, I forgot to change another form of bounds calculator to match the new interface, fixed

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. Otherwise, the changes look reasonable to me.

core/src/Kokkos_Tuners.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_Tuners.hpp Outdated Show resolved Hide resolved
@DavidPoliakoff
Copy link
Contributor Author

@masterleinad : done, thanks, Worked on my build, will see how CI treats it

@masterleinad
Copy link
Contributor

You need to fix indentation.

@DavidPoliakoff
Copy link
Contributor Author

This apparently needs another round of fixes to adjust for #3626

@masterleinad
Copy link
Contributor

This apparently needs another round of fixes to adjust for #3626

Sorry 😕

dalg24
dalg24 previously requested changes Jan 19, 2021
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please post a use example in the discussion

core/src/Kokkos_Tuners.hpp Outdated Show resolved Hide resolved
core/src/impl/Kokkos_Profiling.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_Tuners.hpp Outdated Show resolved Hide resolved
core/src/impl/Kokkos_Profiling.hpp Show resolved Hide resolved
core/src/impl/Kokkos_Profiling.hpp Outdated Show resolved Hide resolved
@DavidPoliakoff
Copy link
Contributor Author

@dalg24, what do you mean by

Please post a use example in the discussion

? Any MDRangePolicy kernel without a specified tile size will exercise this

Copy link
Contributor

@jrmadsen jrmadsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have something like:

#if CXXSTD >= 17
# define KOKKOS_IF_CONSTEXPR if constexpr
#else
# define KOKKOS_IF_CONSTEXPR if
#endif

?

Because a few things like: if(should_tune(policy)) appear to be compile-time constants and in that case, we should use if constexpr when available

@dalg24 dalg24 dismissed their stale review February 2, 2021 13:38

Changes requested have been addressed

@dalg24
Copy link
Member

dalg24 commented Feb 2, 2021

Please cleanup history and I will merge

@DavidPoliakoff
Copy link
Contributor Author

Hey @jrmadsen , those actually won't be compile_time, but it's a good thought if we can. "should_tune" just checks that a given instance of a policy should be tuned, which depends on runtime parameters (in the TeamPolicy case, was either parameter declared "AUTO". In the MDRange case, was the tile size unspecified?)

@dalg24, will do, thanks

@DavidPoliakoff
Copy link
Contributor Author

@dalg24 cool, passed CI, this is ready to merge in my book

@dalg24 dalg24 merged commit de735ed into kokkos:develop Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants