Autotuning for TeamPolicy team sizes and vector lengths #3206

DavidPoliakoff · 2020-07-22T16:46:20Z

Apologies for what will be a large pull request. The end goal of all of this is to make it so that we can tune the team sizes and vector lengths as I mentioned in #3155. My ask of reviewers is that you look at this series of steps, and see if they're the right steps, and also that I've done them correctly. So far I've only done Serial, CUDA, and OpenMP (HPX, Threads, HIP, and OMPTarget remain).

Reorganizing Tools functionality. It turns out that the best place to tune policies is in Kokkos_Parallel.hpp (and Parallel_Reduce.hpp), in our outermost parallel_x calls. I don't want to bloat this function, so what I've done is add begin/end functions for each pattern in Kokkos_Profiling.hpp that receive the arguments to that function (by non-const reference, they can change the behavior of the call)
Reworking TeamPolicies. TeamPolicies need a set of functionality added to support this. First, there was actually no way to request an automatic vector length, so I added a set of constructor overloads. Also, tuning needed some privileged access to TeamPolicies, specifically the ability to alter a team size or vector length on a constructed policy.
Writing the actual thing that does Tuning (Kokkos_Tuners.hpp is holding this for the moment. Note: that code might be user-facing, so we need to really hammer it out)

There are a couple of details I'm still figuring out (how to make this all work when Tuning is turned off), but before I went ham on all of this I wanted to make sure this was an acceptable path forward on this

… OpenMP. To be done: OMPTarget, HIP, and Threads

core/src/Cuda/Kokkos_Cuda_Parallel.hpp

jjwilke

My main objection is how we choose to do the tuning. I'm not convinced it's correct to call the configuration list a "ratio tuning variable"?

The other concern is connecting different invocations of the "same" kernel. Getting metaphysical, how do we determine 'same'?

core/src/Cuda/Kokkos_Cuda_Parallel.hpp

core/src/Kokkos_ExecPolicy.hpp

jjwilke · 2020-07-22T17:26:28Z

core/src/Kokkos_Parallel.hpp

-  }
+
+  ExecPolicy inner_policy = policy;
+  Kokkos::Tools::Experimental::begin_parallel_for(inner_policy, functor, str,


I haven't been following the profiling changes well enough. This seems like a major behavioral change that isn't intrinsic to the autotuning piece.

I'm not sure what the final plan for the PR is. Putting namespace 'Experimental' inside the most important Kokkos function seems problematic.

It's not intrinsic, but it is required (in my opinion). And if we're putting Tuning in the most important Kokkos function, and Tuning is Experimental, we're putting Experimental in that function in some sense. That's a question I'd like @crtrott to answer on, because if the answer is "this is a problem" I just need to close the PR

This shouldn't be in the public namespace (actually that is probably also true for beginParallelFor come to speak of it - what reason do we have to expose that call to users ?? [and yes I know David that is WELL before your time ...]). Generally I don't have a problem with this use of experimental here otherwise. But again I don't see a reason to expose this new function to users - and rather want to remove the other ones from the public namespace too.

@crtrott: Kokkos::Tools::Impl or Kokkos::Tools::Experimental::Impl? I'm in favor of the first

core/src/Kokkos_Parallel.hpp

core/src/Kokkos_Tuners.hpp

jjwilke · 2020-07-22T18:11:28Z

core/src/impl/Kokkos_Profiling.hpp

+        &kpID);
+  }
+#ifdef KOKKOS_ENABLE_TUNING
+  size_t context_id = Kokkos::Tools::Experimental::get_new_context_id();


Are we getting a new context ID every time we execute a parallel_for? Doesn't this mean that the "same" parallel will be a different context every time? How do we 'tune' if we can't connect parallel_for's to each other?

I might just need to redo #2422, since everybody seems to think that a context is some permanent thing. A context is a just a way of saying when values go in and out of scope. You know to connect parallel_x's to each other if you are providing the same output types to the same set of input types

core/src/impl/Kokkos_Profiling.hpp

DavidPoliakoff · 2020-07-22T19:47:51Z

Yikes. I used to just tune the team size, which is ratio data, and didn't change when I moved to tuning both. Good catch. I think the rest of your comments are good and I'll chase them, but thanks so much for catching that

DavidPoliakoff · 2020-07-22T22:18:08Z

Getting errors in Jenkins:

/var/jenkins/workspace/Kokkos/core/src/Kokkos_ExecPolicy.hpp:603:5: error: use of undeclared identifier 'first_arg' [clang-diagnostic-error]
    first_arg = false;

I don't recognize this, anybody know what's up?

core/src/Kokkos_ExecPolicy.hpp

Conflicts: core/src/Cuda/Kokkos_Cuda_Parallel.hpp core/src/OpenMP/Kokkos_OpenMP_Team.hpp

core/src/OpenMP/Kokkos_OpenMP_Team.hpp

… error

…arison.

…r find it if it already exists"

DavidPoliakoff · 2020-09-23T13:42:52Z

Current state of play:

Need a review from @crtrott on the Profiling and Tuners changes (or for him to say he's okay with other people's reviews)
@dalg24 (fairly) points out that the ValueHierarchyNode implementation is extremely convoluted. I don't have a less convoluted phrasing for all of this, but we need to see if anybody else has any good ideas. If not, we need to decide whether this is good enough.
We need to design tests, or defer that. Frankly I prefer deferring, we really need a tool (or toolkit) to test Tuning, that's likely bigger than this PR
Decisions on whether "vector_length" gets marked impl (edit: yes. vector_length is deprecated in existing backends, the impl_ prefixed one is preferred)
Fix how TeamPolicyInternal was implemented
Need redesigned ValueHierarchyNode infrastructure (this is the outcome of 1&2)

core/src/Threads/Kokkos_Threads_Parallel.hpp

DavidPoliakoff · 2020-10-05T15:46:43Z

Retest this please

DavidPoliakoff · 2020-10-05T16:53:09Z

Retest this please

DavidPoliakoff · 2020-10-06T14:31:36Z

Retest this please

crtrott · 2020-10-06T17:22:20Z

core/src/impl/Kokkos_Profiling.hpp

+size_t get_current_context_id();
+}  // namespace Experimental
+
+namespace Impl {


Uhm double Impl namespace?

crtrott

Do we have a follow up issue to change how the tuning stuff is enabled? I.e. add a runtime option which enables it and remove the compile time thing?

DavidPoliakoff · 2020-10-06T17:43:57Z

Do we have a follow up issue to change how the tuning stuff is enabled? I.e. add a runtime option which enables it and remove the compile time thing?

#3454

DavidPoliakoff added 6 commits July 15, 2020 10:10

Moved Profiling implementation to its own file

d948808

Finished up Serial

b14e1b7

Fixes to an extra overload

530b835

Updated test

9ff95d5

Intermediate commit. This has an implementation for CUDA, Serial, and…

4d9a3e5

… OpenMP. To be done: OMPTarget, HIP, and Threads

Fixed a few more callsites

ed984f5

DavidPoliakoff requested a review from crtrott July 22, 2020 16:46

jeffmiles63 reviewed Jul 22, 2020

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel.hpp Outdated Show resolved Hide resolved

jeffmiles63 reviewed Jul 22, 2020

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel.hpp Outdated Show resolved Hide resolved

DavidPoliakoff added 3 commits July 22, 2020 10:35

Made the code nicer to read

dc51c03

Fixed broken name of type

2e50d63

Unused parameters

8d87867

jjwilke previously requested changes Jul 22, 2020

View reviewed changes

DavidPoliakoff added 2 commits July 22, 2020 15:04

Fixed easily fixable comments

5bca3e1

Merge branch 'develop' into feature/team-tuning

891c599

Fixed unused parameter

b7de586

masterleinad reviewed Jul 22, 2020

View reviewed changes

core/src/Kokkos_ExecPolicy.hpp Outdated Show resolved Hide resolved

DavidPoliakoff added 4 commits July 22, 2020 15:52

Finished conceptual merge commit

9d7f73c

Merge branch 'develop' into feature/team-tuning

0eb56e5

Conflicts: core/src/Cuda/Kokkos_Cuda_Parallel.hpp core/src/OpenMP/Kokkos_OpenMP_Team.hpp

Fixed up the unused parameter errors/warnings

7d03fa5

Build fixes

084fb77

masterleinad reviewed Jul 23, 2020

View reviewed changes

core/src/OpenMP/Kokkos_OpenMP_Team.hpp Show resolved Hide resolved

DavidPoliakoff added 6 commits July 23, 2020 13:42

Good catch, bad constructor

250653b

Merge branch 'develop' into feature/team-tuning

14d5aba

Modified test to cover new constructors, pushing for CI

2aeaca9

Unused param

4c44d7c

DO NOT MERGE. Removing warnings for OMPTarget build to find my actual…

c72b266

… error

DO NOT MERGE (turn off Tuning on OMPTarget builds)

c2a9389

DavidPoliakoff added 4 commits September 22, 2020 14:20

Explicitly making int from unsigned guy to avoid signed-unsigned comp…

e62fd8e

…arison.

Used @masterleinad's fun construction for "make a thing in the map, o…

df0dd4d

…r find it if it already exists"

Updated iterator pattern

3991f9e

Fix to the calculator type in reducer (copy-paste error)

2673fb3

DavidPoliakoff mentioned this pull request Sep 23, 2020

Use constructor delegation in TeamPolicyInternal implementations #3412

Closed

8 tasks

DavidPoliakoff added 5 commits September 23, 2020 09:08

Fixed Serial TeamPolicyInternal implementation

9870569

Unused parameter warning

481284f

Fixed Threads TeamPolicyInternal implementation

33c6960

Fixes to the space

e720ea3

Fixup for Threads backend

56639b8

crtrott reviewed Sep 28, 2020

View reviewed changes

core/src/Threads/Kokkos_Threads_Parallel.hpp Outdated Show resolved Hide resolved

As per @crtrott's comment, fixed up the recommended team size

f52d1f0

DavidPoliakoff mentioned this pull request Sep 29, 2020

defaultdevicetype.reduce_instantiation_b* test failures on SKX with gcc/7.2 and Pthreads backend, hwloc/1.11.8 #3433

Closed

DavidPoliakoff force-pushed the feature/team-tuning branch from 34ec427 to f52d1f0 Compare October 1, 2020 17:19

DavidPoliakoff added 4 commits October 1, 2020 10:24

10 isn't a valid chunk size for all backends (Threads), changing to 16

661523b

vector_length -> impl_vector_length

b3d8be7

Moved namespaces

5c5b4c2

Fixed bad deprecation

fcf02a2

crtrott reviewed Oct 6, 2020

View reviewed changes

crtrott approved these changes Oct 6, 2020

View reviewed changes

crtrott merged commit 153d255 into kokkos:develop Oct 6, 2020

DavidPoliakoff mentioned this pull request Oct 6, 2020

Changed namespacing of Kokkos::Tools::Impl::Impl::tune_policy #3455

Merged

This was referenced Oct 7, 2020

Gcc/Intel+Pthreads with Hwloc Numerical Exception Thrown in unit tests (SKX arch): KokkosCore_UnitTest_Threads, KokkosCore_UnitTest_Default, KokkosCore_PerfTestExec #3460

Closed

HPX/1.3 backend compilation error #3463

Closed

DavidPoliakoff mentioned this pull request Dec 23, 2020

Autotuning team size/vector length #3155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autotuning for TeamPolicy team sizes and vector lengths #3206

Autotuning for TeamPolicy team sizes and vector lengths #3206

DavidPoliakoff commented Jul 22, 2020

jjwilke left a comment

jjwilke Jul 22, 2020

DavidPoliakoff Jul 22, 2020

crtrott Sep 22, 2020

DavidPoliakoff Sep 22, 2020

jjwilke Jul 22, 2020

DavidPoliakoff Jul 22, 2020

DavidPoliakoff commented Jul 22, 2020

DavidPoliakoff commented Jul 22, 2020

DavidPoliakoff commented Sep 23, 2020 •

edited

DavidPoliakoff commented Oct 5, 2020

DavidPoliakoff commented Oct 5, 2020

DavidPoliakoff commented Oct 6, 2020

crtrott Oct 6, 2020

crtrott left a comment

DavidPoliakoff commented Oct 6, 2020

Autotuning for TeamPolicy team sizes and vector lengths #3206

Autotuning for TeamPolicy team sizes and vector lengths #3206

Conversation

DavidPoliakoff commented Jul 22, 2020

jjwilke left a comment

Choose a reason for hiding this comment

jjwilke Jul 22, 2020

Choose a reason for hiding this comment

DavidPoliakoff Jul 22, 2020

Choose a reason for hiding this comment

crtrott Sep 22, 2020

Choose a reason for hiding this comment

DavidPoliakoff Sep 22, 2020

Choose a reason for hiding this comment

jjwilke Jul 22, 2020

Choose a reason for hiding this comment

DavidPoliakoff Jul 22, 2020

Choose a reason for hiding this comment

DavidPoliakoff commented Jul 22, 2020

DavidPoliakoff commented Jul 22, 2020

DavidPoliakoff commented Sep 23, 2020 • edited

DavidPoliakoff commented Oct 5, 2020

DavidPoliakoff commented Oct 5, 2020

DavidPoliakoff commented Oct 6, 2020

crtrott Oct 6, 2020

Choose a reason for hiding this comment

crtrott left a comment

Choose a reason for hiding this comment

DavidPoliakoff commented Oct 6, 2020

DavidPoliakoff commented Sep 23, 2020 •

edited