Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop Trip Counts #89239

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

tgymnich
Copy link
Member

Sometimes it might be beneficial to spawn more thread blocks instead of reusing existing for multiple loop iterations.

Alternatives considered:

Make DefaultNumBlocks settable via an environment variable.

@llvmbot llvmbot added the openmp:libomptarget OpenMP offload runtime label Apr 18, 2024
Copy link

github-actions bot commented Apr 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sheer number of environment variables we seem to be accumulating is concerning. We really need an API function that toggles features on or off instead, but that's an entirely different discussion.

// 2>&1 | %fcheck-generic UNSUPPORTED: aarch64-unknown-linux-gnu UNSUPPORTED:
// aarch64-unknown-linux-gnu-LTO UNSUPPORTED: x86_64-pc-linux-gnu UNSUPPORTED:
// x86_64-pc-linux-gnu-LTO UNSUPPORTED: s390x-ibm-linux-gnu UNSUPPORTED:
// s390x-ibm-linux-gnu-LTO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got formatted badly.

Also, add a second RUN line to test the default. Then don't check for pass but just print the values and CHECK for the two different expected ones.

@@ -742,6 +742,7 @@ variables is defined below.
* ``LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE=<out:Filename> (LLVM-IR file)``
* ``LIBOMPTARGET_JIT_POST_OPT_IR_MODULE=<out:Filename> (LLVM-IR file)``
* ``LIBOMPTARGET_MIN_THREADS_FOR_LOW_TRIP_COUNT=<Num> (default: 32)``
* ``LIBOMPTARGET_REUSE_BLOCKS_FOR_HIGH_TRIP_COUNT=[TRUE/FALSE] (default TRUE)``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs a section in the docs explaining what it does.

@llvmbot
Copy link
Collaborator

llvmbot commented May 6, 2024

@llvm/pr-subscribers-offload

Author: Tim Gymnich (tgymnich)

Changes

Sometimes it might be beneficial to spawn more thread blocks instead of reusing existing for multiple loop iterations.

Alternatives considered:

Make DefaultNumBlocks settable via an environment variable.


Full diff: https://github.com/llvm/llvm-project/pull/89239.diff

4 Files Affected:

  • (modified) offload/plugins-nextgen/common/include/PluginInterface.h (+9)
  • (modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+4-1)
  • (added) offload/test/offloading/high_trip_count_block_limit.cpp (+29)
  • (modified) openmp/docs/design/Runtimes.rst (+1)
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index 79e8464bfda5c1..1b7a0ca2136e3d 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -829,6 +829,12 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
     return OMPX_MinThreadsForLowTripCount;
   }
 
+  /// Whether or not to reuse blocks for high trip count loops.
+  /// @see OMPX_ReuseBlocksForHighTripCount
+  bool getReuseBlocksForHighTripCount() {
+    return OMPX_ReuseBlocksForHighTripCount;
+  }
+
   /// Get the total amount of hardware parallelism supported by the target
   /// device. This is the total amount of warps or wavefronts that can be
   /// resident on the device simultaneously.
@@ -904,6 +910,9 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   UInt32Envar OMPX_MinThreadsForLowTripCount =
       UInt32Envar("LIBOMPTARGET_MIN_THREADS_FOR_LOW_TRIP_COUNT", 32);
 
+  BoolEnvar OMPX_ReuseBlocksForHighTripCount =
+      BoolEnvar("LIBOMPTARGET_REUSE_BLOCKS_FOR_HIGH_TRIP_COUNT", true);
+
 protected:
   /// Environment variables defined by the LLVM OpenMP implementation
   /// regarding the initial number of streams and events.
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index b5f3c45c835fdb..d88ba8a47d2708 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -705,8 +705,11 @@ uint64_t GenericKernelTy::getNumBlocks(GenericDeviceTy &GenericDevice,
       TripCountNumBlocks = LoopTripCount;
     }
   }
+
+  uint32_t PreferredNumBlocks = TripCountNumBlocks;
   // If the loops are long running we rather reuse blocks than spawn too many.
-  uint32_t PreferredNumBlocks = std::min(TripCountNumBlocks, DefaultNumBlocks);
+  if (GenericDevice.getReuseBlocksForHighTripCount())
+    PreferredNumBlocks = std::min(TripCountNumBlocks, DefaultNumBlocks);
   return std::min(PreferredNumBlocks, GenericDevice.getBlockLimit());
 }
 
diff --git a/offload/test/offloading/high_trip_count_block_limit.cpp b/offload/test/offloading/high_trip_count_block_limit.cpp
new file mode 100644
index 00000000000000..1972188c93e92a
--- /dev/null
+++ b/offload/test/offloading/high_trip_count_block_limit.cpp
@@ -0,0 +1,29 @@
+// RUN: %libomptarget-compilexx-generic && env
+// LIBOMPTARGET_REUSE_BLOCKS_FOR_HIGH_TRIP_COUNT=False %libomptarget-run-generic
+// 2>&1 | %fcheck-generic UNSUPPORTED: aarch64-unknown-linux-gnu UNSUPPORTED:
+// aarch64-unknown-linux-gnu-LTO UNSUPPORTED: x86_64-pc-linux-gnu UNSUPPORTED:
+// x86_64-pc-linux-gnu-LTO UNSUPPORTED: s390x-ibm-linux-gnu UNSUPPORTED:
+// s390x-ibm-linux-gnu-LTO
+
+/*
+  Check if there is a thread for each loop iteration
+*/
+#include <omp.h>
+#include <stdio.h>
+
+int main() {
+  int N = 819200;
+  int num_threads[N];
+
+#pragma omp target teams distribute parallel for
+  for (int j = 0; j < N; j++) {
+    num_threads[j] = omp_get_num_threads() * omp_get_num_teams();
+  }
+
+  // CHECK: PASS
+  if (num_threads[0] == N)
+    printf("PASS\n");
+  else
+    printf("FAIL: num_threads: %d\n != N: %d", num_threads[0], N);
+  return 0;
+}
diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index f8a8cb87e83e66..520620ddb78735 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -742,6 +742,7 @@ variables is defined below.
     * ``LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE=<out:Filename> (LLVM-IR file)``
     * ``LIBOMPTARGET_JIT_POST_OPT_IR_MODULE=<out:Filename> (LLVM-IR file)``
     * ``LIBOMPTARGET_MIN_THREADS_FOR_LOW_TRIP_COUNT=<Num> (default: 32)``
+    * ``LIBOMPTARGET_REUSE_BLOCKS_FOR_HIGH_TRIP_COUNT=[TRUE/FALSE] (default TRUE)``
 
 LIBOMPTARGET_DEBUG
 """"""""""""""""""

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
offload openmp:libomp OpenMP host runtime openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants