Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP][USM] Introduces -fopenmp-force-usm flag #76571

Merged
merged 1 commit into from Jan 22, 2024

Conversation

jplehr
Copy link
Contributor

@jplehr jplehr commented Dec 29, 2023

This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file.

The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version.

In favor of #75468, sorry for the noise and confusion.

@jplehr jplehr requested a review from jhuber6 December 29, 2023 16:28
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen clang:openmp OpenMP related changes to Clang labels Dec 29, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 29, 2023

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang-driver

Author: Jan Patrick Lehr (jplehr)

Changes

This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file.

The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version.

In favor of #75468, sorry for the noise and confusion.


Full diff: https://github.com/llvm/llvm-project/pull/76571.diff

4 Files Affected:

  • (modified) clang/include/clang/Basic/LangOptions.def (+1)
  • (modified) clang/include/clang/Driver/Options.td (+4)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+7)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+2)
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 21abc346cf17ac..81cf2ad9498a7f 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -260,6 +260,7 @@ LANGOPT(OpenMPTeamSubscription  , 1, 0, "Assume distributed loops do not have mo
 LANGOPT(OpenMPNoThreadState  , 1, 0, "Assume that no thread in a parallel region will modify an ICV.")
 LANGOPT(OpenMPNoNestedParallelism  , 1, 0, "Assume that no thread in a parallel region will encounter a parallel region")
 LANGOPT(OpenMPOffloadMandatory  , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.")
+LANGOPT(OpenMPForceUSM     , 1, 0, "Enable OpenMP unified shared memory mode via compiler.")
 LANGOPT(NoGPULib  , 1, 0, "Indicate a build without the standard GPU libraries.")
 LANGOPT(RenderScript      , 1, 0, "RenderScript")
 
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 2b93ddf033499c..28290da438c62d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3451,6 +3451,10 @@ def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group<
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
   HelpText<"Do not create a host fallback if offloading to the device fails.">,
   MarshallingInfoFlag<LangOpts<"OpenMPOffloadMandatory">>;
+def fopenmp_force_usm : Flag<["-"], "fopenmp-force-usm">, Group<f_Group>,
+  Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Force behvaior as if the user specified pragma omp requires unified_shared_memory.">,
+  MarshallingInfoFlag<LangOpts<"OpenMPForceUSM">>;
 def fopenmp_target_jit : Flag<["-"], "fopenmp-target-jit">, Group<f_Group>,
   Flags<[NoArgumentUnused]>, Visibility<[ClangOption, CLOption]>,
   HelpText<"Emit code that can be JIT compiled for OpenMP offloading. Implies -foffload-lto=full">;
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ea6645a39e8321..4855e7410a015a 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1044,6 +1044,13 @@ CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM)
                                          ? CGM.getLangOpts().OMPHostIRFile
                                          : StringRef{});
   OMPBuilder.setConfig(Config);
+
+  // The user forces the compiler to behave as if omp requires
+  // unified_shared_memory was given.
+  if (CGM.getLangOpts().OpenMPForceUSM) {
+    HasRequiresUnifiedSharedMemory = true;
+    OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
+  }
 }
 
 void CGOpenMPRuntime::clear() {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index acfa119805068d..ffc24201ab2e0b 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6382,6 +6382,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
         CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
       if (Args.hasArg(options::OPT_fopenmp_offload_mandatory))
         CmdArgs.push_back("-fopenmp-offload-mandatory");
+      if (Args.hasArg(options::OPT_fopenmp_force_usm))
+        CmdArgs.push_back("-fopenmp-force-usm");
       break;
     default:
       // By default, if Clang doesn't know how to generate useful OpenMP code

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a test. There should be some difference in codegen we can key off of.

@jplehr
Copy link
Contributor Author

jplehr commented Dec 29, 2023

Is the approach taken in this approach acceptable as opposed to the header solution I put up earlier?

@jhuber6
Copy link
Contributor

jhuber6 commented Dec 29, 2023

Is the approach taken in this approach acceptable as opposed to the header solution I put up earlier?

Yes, it's pretty much exactly what I had in mind from my suggestion in the last PR. Thanks.

@jplehr
Copy link
Contributor Author

jplehr commented Dec 29, 2023

Is the approach taken in this approach acceptable as opposed to the header solution I put up earlier?

Yes, it's pretty much exactly what I had in mind from my suggestion in the last PR. Thanks.

Perfect. I'll go ahead and add lit and runtime tests.

@jdoerfert
Copy link
Member

Documentation missing as well.

@llvmbot llvmbot added the openmp:libomptarget OpenMP offload runtime label Jan 18, 2024
@jplehr jplehr requested a review from jhuber6 January 18, 2024 13:48
@jplehr
Copy link
Contributor Author

jplehr commented Jan 18, 2024

While I add some documentation, I'd appreciate feedback especially on the lit side of things. I would very much like to rename the pretty happy tripple-X workaround for substitution debugging into something sane.

Copy link

github-actions bot commented Jan 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@jplehr
Copy link
Contributor Author

jplehr commented Jan 19, 2024

@carlobertolli can you have another look at the runtime test I added to see if that addresses your feedback?

@carlobertolli
Copy link
Member

Automatic zero-copy doesn't work on some of the bbot's. I will have to land this once the lit test harness extension in #77851 re-lands.

@jplehr
Copy link
Contributor Author

jplehr commented Jan 19, 2024

Automatic zero-copy doesn't work on some of the bbot's. I will have to land this once the lit test harness extension in #77851 re-lands.

Having your work landed would be very helpful indeed.

@jplehr
Copy link
Contributor Author

jplehr commented Jan 19, 2024

I just realized that I need to update the clang lit tests, so this is not ready to land, but I don't see a button to indicate that.

This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.

The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.
Which I think is the only difference observable in code generation.

This runtime test checks for the (non-)occurence of data movement between host
and device. It does one run without the flag and one with the flag to
also see that both versions behave as expected. In the case w/o the new
flag data movement between host and device is expected. In the case with
the flag such data movement should not be present / reported.
@jplehr jplehr merged commit fa4780f into llvm:main Jan 22, 2024
4 checks passed
jhuber6 pushed a commit that referenced this pull request Jan 22, 2024
This should fix the AMDGPU buildbot breakage from #76571
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants