OpenMPTarget: Update hierarchical parallelism. #6043

rgayatri23 · 2023-04-10T23:07:18Z

The PR does the following:

Update concurrency based on underlying architecture
Update max_active_teams based on the updated concurrency
Introduce a different style of hierarchical parallelism for Intel architectures

Todo:

Update concurrency specifically for AMD GPUs.

masterleinad · 2023-04-11T14:36:06Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp

+  // Multiply the number of processors with teh SIMD length.
+  max_threads *= 64;


Suggested change

// Multiply the number of processors with teh SIMD length.

max_threads *= 64;

// Multiply the number of processors by the SIMD length.

max_threads *= 64;

Where do you get 64 from?

I assumed the SIMD length to be 64 to create a maximum number of possible "threads" on Intel architectures.

masterleinad · 2023-04-11T14:43:39Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp

+#endif
+#elif defined(KOKKOS_ARCH_INTEL_GPU)
+#pragma omp target map(max_threads)
+  { max_threads = omp_get_num_procs(); }


Does this give you the number of EUs? If so, shouldn't the value be a little higher since multiple workgroups could be scheduled by a EU? See https://github.com/intel/llvm/blob/756ba2616111235bba073e481b7f1c8004b34ee6/sycl/source/detail/reduction.cpp#L51-L62.

As per OpenMP spec, this should give us the number of procs on the device and each proc can then execute a SIMD instruction.

gives you the number of total hardware-threads

Is that OK to invoke every time? Should we be caching the value after the first call?

masterleinad · 2023-04-11T14:47:00Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_ParallelFor_Team.hpp

+#if !defined(KOKKOS_IMPL_HIERARCHICAL_INTEL_GPU)
+#pragma omp target teams thread_limit(team_size) firstprivate(a_functor) \
+    num_teams(max_active_teams) is_device_ptr(scratch_ptr)


This looks similar to #6035 and thus I would expect #6035 (comment) to also apply. Why is restricting the number of workgroups/teams here, in general, a good idea?

No its not a good idea to restrict the number of teams but unfortunately for the OpenMPTarget backend, we need a tight control over the number of teams generated as we have data structures that depend on the maximum number of in-flight teams.

core/src/OpenMPTarget/Kokkos_OpenMPTarget_ParallelFor_Team.hpp

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Parallel_Common.hpp

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp

dalg24 · 2023-04-13T12:31:27Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp

+#endif
+#elif defined(KOKKOS_ARCH_INTEL_GPU)
+#pragma omp target map(max_threads)
+  { max_threads = omp_get_num_procs(); }


Is that OK to invoke every time? Should we be caching the value after the first call?

dalg24 · 2023-04-13T12:33:57Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp

+
+  // Multiply the number of processors with the SIMD length.
+  max_threads *= 32;
+#endif


What about AMD? It is OK to fix later but maybe you still need the FIXME

The omp_get_num_proc is only invoked once for every instance and I think that should be ok.
The AMD currently has the default 2048*80 number of threads which is fine for now. I will add a FIXME line there to fix it in the future. I don't have a way to get the right number for AMD GPUs right now.

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp

dalg24 · 2023-04-13T21:03:15Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_ParallelFor_Team.hpp

+// Intel architectures prefer the classical hierarchical parallelism that relies
+// on OpenMP.
+#if defined(KOKKOS_ARCH_INTEL_GPU)
+#define KOKKOS_IMPL_HIERARCHICAL_INTEL_GPU
+#endif


Is that something that it would make sense to define somewhere more "central" and include in print as part of the configuration?

We can put that macro in a more central place. I will do that. It will be better since we need it in more than one file.
But I don't think we need this in configuration printing since the user is not concerned (IMO) with how we implement hierarchical parallelism.

…ral location.

dalg24 · 2023-04-26T02:09:29Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp

+  // max_active_teams is the number of active teams on the given hardware.
+  // We set the number of teams to be twice the number of max_active_teams for
+  // the compiler to pick the right number in its case.


I do not understand this comment. Why are we setting the number to twice the max instead of just once?
Also what do you mean "in its case"?

The ideal case would be to set a large enough upper bound on the number of teams generated using omp_set_num_teams and and let compiler pick up the right number of teams for a given target region. We do that in resize_scratch where we assign the upper bound to be 2*max_active_teams . However that call is not respected and hence the need to add num_teams.

The idea is to not hamper compiler's ability to chose the appropriate number of teams (hence a large upper bound) but also to have control over that number so we can allocate data for each team.

dalg24

I am pretty sure 2ac6a07 accidentally disabled all the code paths guarded by #ifdef KOKKOS_IMPL_HIERARCHICAL_INTEL_GPU

dalg24 · 2023-04-26T12:02:16Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_ParallelFor_Team.hpp

+      if (omp_get_num_teams() > max_active_teams)
+        Kokkos::abort("`omp_set_num_teams` call was not respected.\n");


What is the point of that check? Can this ever fail? That would indicate a bug in the OpenMP implementation wouldn't it?

Yes thats the intention, that if there is a bug in OpenMP, dont run it because in this case it might lead to race conditions.

core/src/OpenMPTarget/Kokkos_OpenMPTarget_ParallelFor_Team.hpp

rgayatri23 · 2023-04-26T15:23:24Z

I am pretty sure 2ac6a07 accidentally disabled all the code paths guarded by #ifdef KOKKOS_IMPL_HIERARCHICAL_INTEL_GPU

I don't understand, the macro is now in core/src/OpenMPTarget/Kokkos_OpenMPTarget_Parallel.hpp .
Am I missing something?

dalg24 · 2023-04-26T15:34:25Z

I am pretty sure 2ac6a07 accidentally disabled all the code paths guarded by #ifdef KOKKOS_IMPL_HIERARCHICAL_INTEL_GPU

I don't understand, the macro is now in core/src/OpenMPTarget/Kokkos_OpenMPTarget_Parallel.hpp . Am I missing something?

It is #undefd at the end of the header and never visible from the parallel constructs implementations

… the macro.

rgayatri23 · 2023-04-26T17:27:57Z

I am pretty sure 2ac6a07 accidentally disabled all the code paths guarded by #ifdef KOKKOS_IMPL_HIERARCHICAL_INTEL_GPU

I don't understand, the macro is now in core/src/OpenMPTarget/Kokkos_OpenMPTarget_Parallel.hpp . Am I missing something?

It is #undefd at the end of the header and never visible from the parallel constructs implementations

Ok now undefing from files that include the file that defines the macro.

dalg24 · 2023-04-26T17:39:11Z

Ok now undefing from files that include the file that defines the macro.

That means only one of the files will have the new code path enabled, whichever gets included first.

rgayatri23 · 2023-04-26T17:54:33Z

Ok now undefing from files that include the file that defines the macro.

That means only one of the files will have the new code path enabled, whichever gets included first.

Oh right thats true. I need to undef it in a common place only once.

rgayatri23 · 2023-04-26T21:11:17Z

Ok now undefing from files that include the file that defines the macro.

That means only one of the files will have the new code path enabled, whichever gets included first.

Oh right thats true. I need to undef it in a common place only once.

Removed the undef and added the macro in print_configuration.

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp

masterleinad

Looks OK to me (apart from the typo you might want to fix).

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp

crtrott

Couple small things

crtrott · 2023-05-02T17:03:38Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_ParallelFor_Team.hpp

-    // nteams should not exceed the maximum in-flight teams possible.
-    const auto nteams =
-        league_size < max_active_teams ? league_size : max_active_teams;
+    int max_active_teams = omp_get_max_teams();


Why not min(nteams, omp_get_max_team())

That min is happening in resize_scratch and the resulting value is set in omp_set_num_teams .
The value is then being accessed here using omp_get_max_teams rather than setting another variable that is passed between routines.

dalg24 · 2023-05-03T20:33:19Z

Ignoring HIP build that timed out.

* OpenMPTarget: Update hierarchical parallelism. * OpenMPTarget: Update initialize routine. * OpenMPTarget: Remove num_teams for Intel GPUs. * OpenMPTarget: fix comment. * OpenMPTarget: Oversubscribe number of teams. * OpenMPTarget: Move KOKKOS_IMPL_HIERARCHICAL_INTEL_GPU macro to a central location. * OpenMPTarget: Add num_teams clause for Intel GPUs too. * OpenMPTarget: Moving the undef for Intel GPUs into files that include the macro. * OpenMPTarget: Updated macro name and added to print_configuration. * OpenMPTarget: Adding impl to macro. * OpenMPTarget: Fix typo for Intel GPUs. * OpenMPTarget: Fix print_configuration. * OpenMPTarget: Rename variable names. * OpenMPTarget: clang format. --------- Co-authored-by: Rahulkumar Gayatri <rgayatri@lbl.gov>

OpenMPTarget: Update hierarchical parallelism.

73175e9

masterleinad reviewed Apr 11, 2023

View reviewed changes

dalg24 reviewed Apr 11, 2023

View reviewed changes

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp Outdated Show resolved Hide resolved

Rahulkumar Gayatri added 3 commits April 11, 2023 09:46

OpenMPTarget: Update initialize routine.

9c51bb7

OpenMPTarget: Remove num_teams for Intel GPUs.

b1e61ea

OpenMPTarget: fix comment.

a4c66f0

rgayatri23 force-pushed the OpenMPTarget_update_hierarchical branch from d3fb726 to a4c66f0 Compare April 12, 2023 06:14

OpenMPTarget: Oversubscribe number of teams.

a039ed5

dalg24 reviewed Apr 13, 2023

View reviewed changes

Rahulkumar Gayatri added 2 commits April 13, 2023 16:09

OpenMPTarget: Move KOKKOS_IMPL_HIERARCHICAL_INTEL_GPU macro to a cent…

2ac6a07

…ral location.

OpenMPTarget: Add num_teams clause for Intel GPUs too.

c9c698c

dalg24 reviewed Apr 26, 2023

View reviewed changes

masterleinad reviewed Apr 26, 2023

View reviewed changes

core/src/OpenMPTarget/Kokkos_OpenMPTarget_ParallelFor_Team.hpp Show resolved Hide resolved

OpenMPTarget: Moving the undef for Intel GPUs into files that include…

30d56dc

… the macro.

OpenMPTarget: Updated macro name and added to print_configuration.

f7fdafb

masterleinad reviewed Apr 26, 2023

View reviewed changes

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp Outdated Show resolved Hide resolved

Rahulkumar Gayatri added 2 commits April 26, 2023 15:36

OpenMPTarget: Adding impl to macro.

f818e01

OpenMPTarget: Fix typo for Intel GPUs.

fc9c6eb

masterleinad approved these changes Apr 28, 2023

View reviewed changes

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp Outdated Show resolved Hide resolved

OpenMPTarget: Fix print_configuration.

1e5725d

crtrott reviewed May 2, 2023

View reviewed changes

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp Outdated Show resolved Hide resolved

crtrott reviewed May 2, 2023

View reviewed changes

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Instance.cpp Outdated Show resolved Hide resolved

crtrott requested changes May 2, 2023

View reviewed changes

Rahulkumar Gayatri added 2 commits May 2, 2023 10:41

OpenMPTarget: Rename variable names.

0ce76a9

OpenMPTarget: clang format.

14c4045

rgayatri23 added this to the Release 4.1 milestone May 3, 2023

crtrott approved these changes May 3, 2023

View reviewed changes

dalg24 merged commit 4b6d971 into kokkos:develop May 3, 2023
26 of 27 checks passed

rgayatri23 deleted the OpenMPTarget_update_hierarchical branch May 4, 2023 01:32

masterleinad mentioned this pull request Jun 16, 2023

CHANGELOG: 4.1.0 #5902

Closed

rgayatri23 mentioned this pull request Sep 20, 2023

OpenMPTarget init-join fix #6444

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenMPTarget: Update hierarchical parallelism. #6043

OpenMPTarget: Update hierarchical parallelism. #6043

rgayatri23 commented Apr 10, 2023 •

edited

masterleinad Apr 11, 2023

rgayatri23 Apr 11, 2023

masterleinad Apr 11, 2023

rgayatri23 Apr 11, 2023

psteinbrecher Apr 12, 2023

dalg24 Apr 13, 2023

masterleinad Apr 11, 2023

rgayatri23 Apr 11, 2023

dalg24 Apr 13, 2023

dalg24 Apr 13, 2023

rgayatri23 Apr 13, 2023

dalg24 Apr 13, 2023

rgayatri23 Apr 13, 2023

dalg24 Apr 26, 2023

rgayatri23 Apr 26, 2023 •

edited

dalg24 left a comment

dalg24 Apr 26, 2023

rgayatri23 Apr 26, 2023

rgayatri23 commented Apr 26, 2023

dalg24 commented Apr 26, 2023

rgayatri23 commented Apr 26, 2023

dalg24 commented Apr 26, 2023

rgayatri23 commented Apr 26, 2023

rgayatri23 commented Apr 26, 2023

masterleinad left a comment

crtrott left a comment

crtrott May 2, 2023

rgayatri23 May 2, 2023

dalg24 commented May 3, 2023

		// Multiply the number of processors with teh SIMD length.
		max_threads *= 64;

		if (omp_get_num_teams() > max_active_teams)
		Kokkos::abort("`omp_set_num_teams` call was not respected.\n");

OpenMPTarget: Update hierarchical parallelism. #6043

OpenMPTarget: Update hierarchical parallelism. #6043

Conversation

rgayatri23 commented Apr 10, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgayatri23 Apr 26, 2023 • edited

Choose a reason for hiding this comment

dalg24 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgayatri23 commented Apr 26, 2023

dalg24 commented Apr 26, 2023

rgayatri23 commented Apr 26, 2023

dalg24 commented Apr 26, 2023

rgayatri23 commented Apr 26, 2023

rgayatri23 commented Apr 26, 2023

masterleinad left a comment

Choose a reason for hiding this comment

crtrott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalg24 commented May 3, 2023

rgayatri23 commented Apr 10, 2023 •

edited

rgayatri23 Apr 26, 2023 •

edited