Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autotuning for TeamPolicy team sizes and vector lengths #3206

Merged
merged 83 commits into from
Oct 6, 2020

Conversation

DavidPoliakoff
Copy link
Contributor

Apologies for what will be a large pull request. The end goal of all of this is to make it so that we can tune the team sizes and vector lengths as I mentioned in #3155. My ask of reviewers is that you look at this series of steps, and see if they're the right steps, and also that I've done them correctly. So far I've only done Serial, CUDA, and OpenMP (HPX, Threads, HIP, and OMPTarget remain).

  1. Reorganizing Tools functionality. It turns out that the best place to tune policies is in Kokkos_Parallel.hpp (and Parallel_Reduce.hpp), in our outermost parallel_x calls. I don't want to bloat this function, so what I've done is add begin/end functions for each pattern in Kokkos_Profiling.hpp that receive the arguments to that function (by non-const reference, they can change the behavior of the call)
  2. Reworking TeamPolicies. TeamPolicies need a set of functionality added to support this. First, there was actually no way to request an automatic vector length, so I added a set of constructor overloads. Also, tuning needed some privileged access to TeamPolicies, specifically the ability to alter a team size or vector length on a constructed policy.
  3. Writing the actual thing that does Tuning (Kokkos_Tuners.hpp is holding this for the moment. Note: that code might be user-facing, so we need to really hammer it out)

There are a couple of details I'm still figuring out (how to make this all work when Tuning is turned off), but before I went ham on all of this I wanted to make sure this was an acceptable path forward on this

Copy link

@jjwilke jjwilke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main objection is how we choose to do the tuning. I'm not convinced it's correct to call the configuration list a "ratio tuning variable"?

The other concern is connecting different invocations of the "same" kernel. Getting metaphysical, how do we determine 'same'?

core/src/Cuda/Kokkos_Cuda_Parallel.hpp Outdated Show resolved Hide resolved
core/src/Cuda/Kokkos_Cuda_Parallel.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_ExecPolicy.hpp Show resolved Hide resolved
}

ExecPolicy inner_policy = policy;
Kokkos::Tools::Experimental::begin_parallel_for(inner_policy, functor, str,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been following the profiling changes well enough. This seems like a major behavioral change that isn't intrinsic to the autotuning piece.

I'm not sure what the final plan for the PR is. Putting namespace 'Experimental' inside the most important Kokkos function seems problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not intrinsic, but it is required (in my opinion). And if we're putting Tuning in the most important Kokkos function, and Tuning is Experimental, we're putting Experimental in that function in some sense. That's a question I'd like @crtrott to answer on, because if the answer is "this is a problem" I just need to close the PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be in the public namespace (actually that is probably also true for beginParallelFor come to speak of it - what reason do we have to expose that call to users ?? [and yes I know David that is WELL before your time ...]). Generally I don't have a problem with this use of experimental here otherwise. But again I don't see a reason to expose this new function to users - and rather want to remove the other ones from the public namespace too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crtrott: Kokkos::Tools::Impl or Kokkos::Tools::Experimental::Impl? I'm in favor of the first

core/src/Kokkos_Parallel.hpp Show resolved Hide resolved
core/src/Kokkos_Tuners.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_Tuners.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_Tuners.hpp Outdated Show resolved Hide resolved
&kpID);
}
#ifdef KOKKOS_ENABLE_TUNING
size_t context_id = Kokkos::Tools::Experimental::get_new_context_id();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we getting a new context ID every time we execute a parallel_for? Doesn't this mean that the "same" parallel will be a different context every time? How do we 'tune' if we can't connect parallel_for's to each other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might just need to redo #2422, since everybody seems to think that a context is some permanent thing. A context is a just a way of saying when values go in and out of scope. You know to connect parallel_x's to each other if you are providing the same output types to the same set of input types

core/src/impl/Kokkos_Profiling.hpp Show resolved Hide resolved
@DavidPoliakoff
Copy link
Contributor Author

Yikes. I used to just tune the team size, which is ratio data, and didn't change when I moved to tuning both. Good catch. I think the rest of your comments are good and I'll chase them, but thanks so much for catching that

@DavidPoliakoff
Copy link
Contributor Author

Getting errors in Jenkins:

/var/jenkins/workspace/Kokkos/core/src/Kokkos_ExecPolicy.hpp:603:5: error: use of undeclared identifier 'first_arg' [clang-diagnostic-error]
    first_arg = false;

I don't recognize this, anybody know what's up?

@DavidPoliakoff
Copy link
Contributor Author

DavidPoliakoff commented Sep 23, 2020

Current state of play:

  • Need a review from @crtrott on the Profiling and Tuners changes (or for him to say he's okay with other people's reviews)
  • @dalg24 (fairly) points out that the ValueHierarchyNode implementation is extremely convoluted. I don't have a less convoluted phrasing for all of this, but we need to see if anybody else has any good ideas. If not, we need to decide whether this is good enough.
  • We need to design tests, or defer that. Frankly I prefer deferring, we really need a tool (or toolkit) to test Tuning, that's likely bigger than this PR
  • Decisions on whether "vector_length" gets marked impl (edit: yes. vector_length is deprecated in existing backends, the impl_ prefixed one is preferred)
  • Fix how TeamPolicyInternal was implemented
  • Need redesigned ValueHierarchyNode infrastructure (this is the outcome of 1&2)

@DavidPoliakoff
Copy link
Contributor Author

Retest this please

2 similar comments
@DavidPoliakoff
Copy link
Contributor Author

Retest this please

@DavidPoliakoff
Copy link
Contributor Author

Retest this please

size_t get_current_context_id();
} // namespace Experimental

namespace Impl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm double Impl namespace?

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a follow up issue to change how the tuning stuff is enabled? I.e. add a runtime option which enables it and remove the compile time thing?

@crtrott crtrott merged commit 153d255 into kokkos:develop Oct 6, 2020
@DavidPoliakoff
Copy link
Contributor Author

Do we have a follow up issue to change how the tuning stuff is enabled? I.e. add a runtime option which enables it and remove the compile time thing?

#3454

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants