Implement new blocksize deduction method for HIP Backend #3953

skyreflectedinmirrors · 2021-04-21T00:20:49Z

The general theory is to switch between a blocksize of 1024 and 256 depending on resource usage.
For places where resource use is light (e.g., STREAM-y) kernels -> 1024
For places with lots of resource use, or spills -> 256

This also fixes our long-standing issues with incorrect resource use calculations, resulting from passing the FunctorTypes into the blocksize deduction versus the Driver/ClosureType's, and unifies the blocksize deduction's that were scattered all over the place (ParallelRange, Teams, MDRange, etc.) into a common set of methods such that we can more easily modify them

As some sample data-points, on an MI-100 this results in:

an approximately 50% speedup in the GramSchmidt tester
an approximately 30% performance uplift for ReaxFF / Tersoff for upstream LAMMPS / Kokkos, and is neutral for EAM/LJ (within +/- 1%) -- note: I'm using llvm from f9a8c6a0e50540f68e6740a849a7caf5e4d46ca6 as it fixes issues present in the current mainline ROCm compilers.

Some additional test cases we might want to look at:

VTK-m had a 0x100f crash that I anticipate should be fixed by this [still testing this]
Maybe some Kokkos kernels results to make sure I didn't throw a wrench in over there (cc: @lucbv)

If we have some more test cases to run, I'd be happy to take a look at them as it would be good to tune the heuristic as broadly as possible :)

Rombur · 2021-04-21T13:37:49Z

core/src/HIP/Kokkos_HIP_BlockSize_Deduction.hpp

+    }
+    block_size >>= 1;
+  } while (block_size >= HIPTraits::WarpSize);
+  return 0;


We should use KOKKOS_ASSERT so we don't propagate a wrong block size.

So, we could put an assert in here -- but the block_size=0 case is handled differently by the various layers, e.g. ParallelReduce would error out here:

https://github.com/kokkos/kokkos/blob/develop/core/src/HIP/Kokkos_HIP_Parallel_Team.hpp#L961

But, I think we could figure out a way to make this more uniform between the backends.

Let's discuss what we want to do on this and #3944 on Monday

See 2145bcd for a potential implementation

core/src/HIP/Kokkos_HIP_BlockSize_Deduction.hpp

core/src/HIP/Kokkos_HIP_Parallel_MDRange.hpp

core/src/HIP/Kokkos_HIP_Parallel_Team.hpp

core/src/Kokkos_Macros.hpp

skyreflectedinmirrors · 2021-04-26T16:09:00Z

Added a change to use a scoped enum as per our discussion this morning. edit: resubmitted, missed a few bare bools in the Teams bits.

masterleinad · 2021-04-26T16:16:59Z

Retest this please.

The general theory is to switch between a blocksize of 1024 and 256 For places where resource use is light (e.g., STREAM-y) kernels -> 1024 For places with lots of resource use, or spills -> 256 This also fixes our long-standing bugs due to resource use miscounting from the FunctorTypes vs Driver/ClosureTypes in the blocksize deduction. Change-Id: I2730e254a478d4b936d2f80b4d0e5c96614d1142

Change-Id: Ic0221337bd6c43132eb832da52fd3f239be844ce

Rombur · 2021-05-10T13:07:44Z

core/src/HIP/Kokkos_HIP_Parallel_Team.hpp

+  int internal_team_size_common(const FunctorType& f) const {
+    // FIXME_HIP: this could be unified with the
+    // internal_team_size_common_reduce
+    //            once we can turn c++17 constexpr on by default.


The indentation is weird here

skyreflectedinmirrors changed the title ~~[WIP] Implement new blocksize deduction method for HIP Backend~~ Implement new blocksize deduction method for HIP Backend Apr 21, 2021

Rombur reviewed Apr 21, 2021

View reviewed changes

masterleinad reviewed Apr 21, 2021

View reviewed changes

core/src/Kokkos_Macros.hpp Outdated Show resolved Hide resolved

skyreflectedinmirrors force-pushed the hip_new_blocksize_deduction branch 4 times, most recently from 51ad919 to e09090a Compare April 26, 2021 16:08

skyreflectedinmirrors force-pushed the hip_new_blocksize_deduction branch from e09090a to fcccfd2 Compare April 26, 2021 16:29

Add proper error checking for invalid launch configs

2145bcd

Change-Id: Ic0221337bd6c43132eb832da52fd3f239be844ce

Rombur approved these changes May 10, 2021

View reviewed changes

dalg24 merged commit 38949e2 into kokkos:develop May 10, 2021

Rombur mentioned this pull request Jul 15, 2021

[HIP] Add multiple LaunchMechanism #3820

Merged

skyreflectedinmirrors mentioned this pull request Sep 24, 2021

hip.team_policy_max_recommended test failing nightly tests in Hip_Serial build, rocm/4.3.0, MI50 and MI100 #4338

Closed

masterleinad mentioned this pull request Oct 5, 2021

HIP fix compiling with Tuning #4380

Merged

masterleinad mentioned this pull request Jan 15, 2022

Stop hardcoding max team size in HIP backend #3410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new blocksize deduction method for HIP Backend #3953

Implement new blocksize deduction method for HIP Backend #3953

skyreflectedinmirrors commented Apr 21, 2021 •

edited

Rombur Apr 21, 2021

skyreflectedinmirrors Apr 21, 2021

skyreflectedinmirrors Apr 23, 2021

skyreflectedinmirrors Apr 27, 2021 •

edited

skyreflectedinmirrors commented Apr 26, 2021 •

edited

masterleinad commented Apr 26, 2021

Rombur May 10, 2021

Implement new blocksize deduction method for HIP Backend #3953

Implement new blocksize deduction method for HIP Backend #3953

Conversation

skyreflectedinmirrors commented Apr 21, 2021 • edited

Rombur Apr 21, 2021

Choose a reason for hiding this comment

skyreflectedinmirrors Apr 21, 2021

Choose a reason for hiding this comment

skyreflectedinmirrors Apr 23, 2021

Choose a reason for hiding this comment

skyreflectedinmirrors Apr 27, 2021 • edited

Choose a reason for hiding this comment

skyreflectedinmirrors commented Apr 26, 2021 • edited

masterleinad commented Apr 26, 2021

Rombur May 10, 2021

Choose a reason for hiding this comment

skyreflectedinmirrors commented Apr 21, 2021 •

edited

skyreflectedinmirrors Apr 27, 2021 •

edited

skyreflectedinmirrors commented Apr 26, 2021 •

edited