-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop Trip Counts #89239
base: main
Are you sure you want to change the base?
Conversation
46f1090
to
328d58b
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
328d58b
to
78a9ab8
Compare
openmp/libomptarget/plugins-nextgen/common/include/PluginInterface.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sheer number of environment variables we seem to be accumulating is concerning. We really need an API function that toggles features on or off instead, but that's an entirely different discussion.
openmp/libomptarget/plugins-nextgen/common/include/PluginInterface.h
Outdated
Show resolved
Hide resolved
openmp/libomptarget/plugins-nextgen/common/src/PluginInterface.cpp
Outdated
Show resolved
Hide resolved
59c6931
to
73392c0
Compare
// 2>&1 | %fcheck-generic UNSUPPORTED: aarch64-unknown-linux-gnu UNSUPPORTED: | ||
// aarch64-unknown-linux-gnu-LTO UNSUPPORTED: x86_64-pc-linux-gnu UNSUPPORTED: | ||
// x86_64-pc-linux-gnu-LTO UNSUPPORTED: s390x-ibm-linux-gnu UNSUPPORTED: | ||
// s390x-ibm-linux-gnu-LTO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got formatted badly.
Also, add a second RUN line to test the default. Then don't check for pass but just print the values and CHECK for the two different expected ones.
@@ -742,6 +742,7 @@ variables is defined below. | |||
* ``LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE=<out:Filename> (LLVM-IR file)`` | |||
* ``LIBOMPTARGET_JIT_POST_OPT_IR_MODULE=<out:Filename> (LLVM-IR file)`` | |||
* ``LIBOMPTARGET_MIN_THREADS_FOR_LOW_TRIP_COUNT=<Num> (default: 32)`` | |||
* ``LIBOMPTARGET_REUSE_BLOCKS_FOR_HIGH_TRIP_COUNT=[TRUE/FALSE] (default TRUE)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs a section in the docs explaining what it does.
….cpp Co-authored-by: Joseph Huber <huberjn@outlook.com>
@llvm/pr-subscribers-offload Author: Tim Gymnich (tgymnich) ChangesSometimes it might be beneficial to spawn more thread blocks instead of reusing existing for multiple loop iterations. Alternatives considered: Make Full diff: https://github.com/llvm/llvm-project/pull/89239.diff 4 Files Affected:
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index 79e8464bfda5c1..1b7a0ca2136e3d 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -829,6 +829,12 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
return OMPX_MinThreadsForLowTripCount;
}
+ /// Whether or not to reuse blocks for high trip count loops.
+ /// @see OMPX_ReuseBlocksForHighTripCount
+ bool getReuseBlocksForHighTripCount() {
+ return OMPX_ReuseBlocksForHighTripCount;
+ }
+
/// Get the total amount of hardware parallelism supported by the target
/// device. This is the total amount of warps or wavefronts that can be
/// resident on the device simultaneously.
@@ -904,6 +910,9 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
UInt32Envar OMPX_MinThreadsForLowTripCount =
UInt32Envar("LIBOMPTARGET_MIN_THREADS_FOR_LOW_TRIP_COUNT", 32);
+ BoolEnvar OMPX_ReuseBlocksForHighTripCount =
+ BoolEnvar("LIBOMPTARGET_REUSE_BLOCKS_FOR_HIGH_TRIP_COUNT", true);
+
protected:
/// Environment variables defined by the LLVM OpenMP implementation
/// regarding the initial number of streams and events.
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index b5f3c45c835fdb..d88ba8a47d2708 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -705,8 +705,11 @@ uint64_t GenericKernelTy::getNumBlocks(GenericDeviceTy &GenericDevice,
TripCountNumBlocks = LoopTripCount;
}
}
+
+ uint32_t PreferredNumBlocks = TripCountNumBlocks;
// If the loops are long running we rather reuse blocks than spawn too many.
- uint32_t PreferredNumBlocks = std::min(TripCountNumBlocks, DefaultNumBlocks);
+ if (GenericDevice.getReuseBlocksForHighTripCount())
+ PreferredNumBlocks = std::min(TripCountNumBlocks, DefaultNumBlocks);
return std::min(PreferredNumBlocks, GenericDevice.getBlockLimit());
}
diff --git a/offload/test/offloading/high_trip_count_block_limit.cpp b/offload/test/offloading/high_trip_count_block_limit.cpp
new file mode 100644
index 00000000000000..1972188c93e92a
--- /dev/null
+++ b/offload/test/offloading/high_trip_count_block_limit.cpp
@@ -0,0 +1,29 @@
+// RUN: %libomptarget-compilexx-generic && env
+// LIBOMPTARGET_REUSE_BLOCKS_FOR_HIGH_TRIP_COUNT=False %libomptarget-run-generic
+// 2>&1 | %fcheck-generic UNSUPPORTED: aarch64-unknown-linux-gnu UNSUPPORTED:
+// aarch64-unknown-linux-gnu-LTO UNSUPPORTED: x86_64-pc-linux-gnu UNSUPPORTED:
+// x86_64-pc-linux-gnu-LTO UNSUPPORTED: s390x-ibm-linux-gnu UNSUPPORTED:
+// s390x-ibm-linux-gnu-LTO
+
+/*
+ Check if there is a thread for each loop iteration
+*/
+#include <omp.h>
+#include <stdio.h>
+
+int main() {
+ int N = 819200;
+ int num_threads[N];
+
+#pragma omp target teams distribute parallel for
+ for (int j = 0; j < N; j++) {
+ num_threads[j] = omp_get_num_threads() * omp_get_num_teams();
+ }
+
+ // CHECK: PASS
+ if (num_threads[0] == N)
+ printf("PASS\n");
+ else
+ printf("FAIL: num_threads: %d\n != N: %d", num_threads[0], N);
+ return 0;
+}
diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index f8a8cb87e83e66..520620ddb78735 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -742,6 +742,7 @@ variables is defined below.
* ``LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE=<out:Filename> (LLVM-IR file)``
* ``LIBOMPTARGET_JIT_POST_OPT_IR_MODULE=<out:Filename> (LLVM-IR file)``
* ``LIBOMPTARGET_MIN_THREADS_FOR_LOW_TRIP_COUNT=<Num> (default: 32)``
+ * ``LIBOMPTARGET_REUSE_BLOCKS_FOR_HIGH_TRIP_COUNT=[TRUE/FALSE] (default TRUE)``
LIBOMPTARGET_DEBUG
""""""""""""""""""
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sometimes it might be beneficial to spawn more thread blocks instead of reusing existing for multiple loop iterations.
Alternatives considered:
Make
DefaultNumBlocks
settable via an environment variable.