New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP][USM] Introduces -fopenmp-force-usm flag #76571
Conversation
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang-driver Author: Jan Patrick Lehr (jplehr) ChangesThis flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. In favor of #75468, sorry for the noise and confusion. Full diff: https://github.com/llvm/llvm-project/pull/76571.diff 4 Files Affected:
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 21abc346cf17ac..81cf2ad9498a7f 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have mo
LANGOPT(OpenMPNoThreadState , 1, 0, "Assume that no thread in a parallel region will modify an ICV.")
LANGOPT(OpenMPNoNestedParallelism , 1, 0, "Assume that no thread in a parallel region will encounter a parallel region")
LANGOPT(OpenMPOffloadMandatory , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.")
+LANGOPT(OpenMPForceUSM , 1, 0, "Enable OpenMP unified shared memory mode via compiler.")
LANGOPT(NoGPULib , 1, 0, "Indicate a build without the standard GPU libraries.")
LANGOPT(RenderScript , 1, 0, "RenderScript")
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..28290da438c62d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group<
Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
HelpText<"Do not create a host fallback if offloading to the device fails.">,
MarshallingInfoFlag<LangOpts<"OpenMPOffloadMandatory">>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group<f_Group>,
+ Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
+ HelpText<"Force behvaior as if the user specified pragma omp requires unified_shared_memory.">,
+ MarshallingInfoFlag<LangOpts<"OpenMPForceUSM">>;
def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group<f_Group>,
Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>,
HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies -foffload-lto=full">;
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ea6645a39e8321..4855e7410a015a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM)
? CGM.getLangOpts().OMPHostIRFile
: StringRef{});
OMPBuilder.setConfig(Config);
+
+ // The user forces the compiler to behave as if omp requires
+ // unified_shared_memory was given.
+ if (CGM.getLangOpts().OpenMPForceUSM) {
+ HasRequiresUnifiedSharedMemory = true;
+ OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
+ }
}
void CGOpenMPRuntime::clear() {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index acfa119805068d..ffc24201ab2e0b 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
if (Args.hasArg(options::OPT_fopenmp_offload_mandatory))
CmdArgs.push_back("-fopenmp-offload-mandatory");
+ if (Args.hasArg(options::OPT_fopenmp_force_usm))
+ CmdArgs.push_back("-fopenmp-force-usm");
break;
default:
// By default, if Clang doesn't know how to generate useful OpenMP code
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a test. There should be some difference in codegen we can key off of.
Is the approach taken in this approach acceptable as opposed to the header solution I put up earlier? |
Yes, it's pretty much exactly what I had in mind from my suggestion in the last PR. Thanks. |
Perfect. I'll go ahead and add lit and runtime tests. |
Documentation missing as well. |
While I add some documentation, I'd appreciate feedback especially on the lit side of things. I would very much like to rename the pretty happy tripple-X workaround for substitution debugging into something sane. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
@carlobertolli can you have another look at the runtime test I added to see if that addresses your feedback? |
Automatic zero-copy doesn't work on some of the bbot's. I will have to land this once the lit test harness extension in #77851 re-lands. |
Having your work landed would be very helpful indeed. |
f1639cd
to
41b227e
Compare
I just realized that I need to update the clang lit tests, so this is not ready to land, but I don't see a button to indicate that. |
This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Which I think is the only difference observable in code generation. This runtime test checks for the (non-)occurence of data movement between host and device. It does one run without the flag and one with the flag to also see that both versions behave as expected. In the case w/o the new flag data movement between host and device is expected. In the case with the flag such data movement should not be present / reported.
41b227e
to
a6c437a
Compare
This should fix the AMDGPU buildbot breakage from #76571
This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file.
The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version.
In favor of #75468, sorry for the noise and confusion.