-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] Introduce -fopenmp-force-usm flag #75468
Conversation
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-backend-amdgpu Author: Jan Patrick Lehr (jplehr) ChangesThe new flag implements logic to include This is the flag mentioned in #75467 Full diff: https://github.com/llvm/llvm-project/pull/75468.diff 4 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 1b02087425b751..b9cd3043a13a9a 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm=">
Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group<f_Group>,
Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group<f_Group>,
+ Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>;
//===----------------------------------------------------------------------===//
// Shared cc1 + fc1 OpenMP Target Options
diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index b012b7cb729378..2484a59085c276 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -129,6 +129,20 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const {
void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
const ArgList &DriverArgs, ArgStringList &CC1Args) const {
HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+ CC1Args.push_back("-internal-isystem");
+ SmallString<128> P(HostTC.getDriver().ResourceDir);
+ llvm::sys::path::append(P, "include/cuda_wrappers");
+ CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+ // Force APU mode will focefully include #pragma omp requires
+ // unified_shared_memory via the force_usm header
+ if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+ CC1Args.push_back("-include");
+ CC1Args.push_back(
+ DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));
+ }
}
void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args,
diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index f8fdd402777e48..aac232fa8b4405 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -319,6 +319,7 @@ set(openmp_wrapper_files
openmp_wrappers/__clang_openmp_device_functions.h
openmp_wrappers/complex_cmath.h
openmp_wrappers/new
+ openmp_wrappers/usm/force_usm.h
)
set(llvm_libc_wrapper_files
diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
new file mode 100644
index 00000000000000..15c394e27ce9c2
--- /dev/null
+++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
@@ -0,0 +1,6 @@
+#ifndef __CLANG_FORCE_OPENMP_USM
+#define __CLANG_FORCE_OPENMP_USM
+
+#pragma omp requires unified_shared_memory
+
+#endif
|
@llvm/pr-subscribers-backend-x86 Author: Jan Patrick Lehr (jplehr) ChangesThe new flag implements logic to include This is the flag mentioned in #75467 Full diff: https://github.com/llvm/llvm-project/pull/75468.diff 4 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 1b02087425b751..b9cd3043a13a9a 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm=">
Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group<f_Group>,
Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group<f_Group>,
+ Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>;
//===----------------------------------------------------------------------===//
// Shared cc1 + fc1 OpenMP Target Options
diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index b012b7cb729378..2484a59085c276 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -129,6 +129,20 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const {
void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
const ArgList &DriverArgs, ArgStringList &CC1Args) const {
HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+ CC1Args.push_back("-internal-isystem");
+ SmallString<128> P(HostTC.getDriver().ResourceDir);
+ llvm::sys::path::append(P, "include/cuda_wrappers");
+ CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+ // Force APU mode will focefully include #pragma omp requires
+ // unified_shared_memory via the force_usm header
+ if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+ CC1Args.push_back("-include");
+ CC1Args.push_back(
+ DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));
+ }
}
void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args,
diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index f8fdd402777e48..aac232fa8b4405 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -319,6 +319,7 @@ set(openmp_wrapper_files
openmp_wrappers/__clang_openmp_device_functions.h
openmp_wrappers/complex_cmath.h
openmp_wrappers/new
+ openmp_wrappers/usm/force_usm.h
)
set(llvm_libc_wrapper_files
diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
new file mode 100644
index 00000000000000..15c394e27ce9c2
--- /dev/null
+++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
@@ -0,0 +1,6 @@
+#ifndef __CLANG_FORCE_OPENMP_USM
+#define __CLANG_FORCE_OPENMP_USM
+
+#pragma omp requires unified_shared_memory
+
+#endif
|
@llvm/pr-subscribers-clang-driver Author: Jan Patrick Lehr (jplehr) ChangesThe new flag implements logic to include This is the flag mentioned in #75467 Full diff: https://github.com/llvm/llvm-project/pull/75468.diff 4 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 1b02087425b751..b9cd3043a13a9a 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3381,6 +3381,8 @@ def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm=">
Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group<f_Group>,
Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group<f_Group>,
+ Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>;
//===----------------------------------------------------------------------===//
// Shared cc1 + fc1 OpenMP Target Options
diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index b012b7cb729378..2484a59085c276 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -129,6 +129,20 @@ AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const {
void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
const ArgList &DriverArgs, ArgStringList &CC1Args) const {
HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
+
+ CC1Args.push_back("-internal-isystem");
+ SmallString<128> P(HostTC.getDriver().ResourceDir);
+ llvm::sys::path::append(P, "include/cuda_wrappers");
+ CC1Args.push_back(DriverArgs.MakeArgString(P));
+
+ // Force APU mode will focefully include #pragma omp requires
+ // unified_shared_memory via the force_usm header
+ if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) {
+ CC1Args.push_back("-include");
+ CC1Args.push_back(
+ DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir +
+ "/include/openmp_wrappers/force_usm.h"));
+ }
}
void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args,
diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index f8fdd402777e48..aac232fa8b4405 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -319,6 +319,7 @@ set(openmp_wrapper_files
openmp_wrappers/__clang_openmp_device_functions.h
openmp_wrappers/complex_cmath.h
openmp_wrappers/new
+ openmp_wrappers/usm/force_usm.h
)
set(llvm_libc_wrapper_files
diff --git a/clang/lib/Headers/openmp_wrappers/usm/force_usm.h b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
new file mode 100644
index 00000000000000..15c394e27ce9c2
--- /dev/null
+++ b/clang/lib/Headers/openmp_wrappers/usm/force_usm.h
@@ -0,0 +1,6 @@
+#ifndef __CLANG_FORCE_OPENMP_USM
+#define __CLANG_FORCE_OPENMP_USM
+
+#pragma omp requires unified_shared_memory
+
+#endif
|
bc912bf
to
9809ba1
Compare
if (DriverArgs.hasArg(options::OPT_fopenmp_force_usm)) { | ||
CC1Args.push_back("-include"); | ||
CC1Args.push_back( | ||
DriverArgs.MakeArgString(HostTC.getDriver().ResourceDir + | ||
"/include/openmp_wrappers/force_usm.h")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good way to handle this. We should make this a CC1 argument, forward it in the standard way, and make CGOpenMPRuntime
always emit the associated runtime call.
Also note that I'm planning on removing the current "requires" handling because emitting spurious global constructors into the runtime is difficult to work around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the patch for -fopenmp-offload-mandatory
which is a similar use-case https://reviews.llvm.org/D120353.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to change that to something more reasonable, if you can point out where to look for inspiration on how to do it properly.
def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group<f_Group>, | ||
Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[CC1Option]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-f
options tend to have a -fno
variant as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the intent to remove the USM behavior from a codebase that has the requires pragma, by basically just ignoring it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it would just override the flag before it. E.g. -fopenmp-force-usm -fno-openmp-force-usm
would return to not having it on.
9809ba1
to
4ecd07d
Compare
The new flag implements logic to include #pragma omp requires unified_shared_memory in every translation unit. This enables a straightforward way to enable USM for an application without the need to modify sources.
This reverts commit 4ecd07d786a5a994b33b9177d4e21d839bfe3fc9. To test the other solution.
This uses an implicitly added OpenMP USM Clause when initializing SEMA to enforce the use of USM.
4ecd07d
to
f0aaefb
Compare
You can test this locally with the following command:git-clang-format --diff 3c92011b600bdf70424e2547594dd461fe411a41 f0aaefbe923d2daa1752f3a9664dab3958346c51 -- clang/lib/CodeGen/CGOpenMPRuntime.cpp clang/lib/Driver/ToolChains/Clang.cpp View the diff from clang-format here.diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 09204c3017..4855e7410a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1045,7 +1045,8 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM)
: StringRef{});
OMPBuilder.setConfig(Config);
- // The user forces the compiler to behave as if omp requires unified_shared_memory was given.
+ // The user forces the compiler to behave as if omp requires
+ // unified_shared_memory was given.
if (CGM.getLangOpts().OpenMPForceUSM) {
HasRequiresUnifiedSharedMemory = true;
OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
|
Hmm.. I guess I screwed something up with git and the history. |
The new flag implements logic to include
#pragma omp requires unified_shared_memory
in every translation unit.This enables a straightforward way to enable USM for an application without the need to modify sources.
This is the flag mentioned in #75467
Once the test landed, I'll rebase and enable the test with this patch.