Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading #78057

Merged
merged 4 commits into from
Jan 15, 2024

Conversation

fabianmcg
Copy link
Contributor

This patch moves clang/tools/clang-linker-wrapper/OffloadWrapper.* to
llvm/Frontend/Offloading allowing them to be reutilized by other projects.

Additionally, it makes minor modifications to the API to make it more flexible.
Concretely:

  • The wrap* methods are moved to the OffloadWrapper class.
  • The OffloadWrapper includes Suffix and EmitSurfacesAndTextures fields
    to specify some additional options.
  • The Suffix field is used when emitting the descriptor, registration methods,
    etc, to make them more readable. It is empty by default.
  • The EmitSurfacesAndTextures field controls whether to emit surface and
    texture registration code, as those functions were removed from CUDART
    in CUDA 12. It is true by default.
  • The wrap* methods now have an optional field to specify the EntryArray;
    this change is needed to enable JIT compilation, as ORC doesn't fully support
    __start_ and __stop_ symbols. Thus, to JIT the code, the EntryArray has
    to be constructed explicitly in the IR.
  • The function getOffloadingEntryInitializer was added to help create the
    EntryArray, as it returns the constant initializer and not a global variable.

….* to llvm/Frontend/Offloading

This patch moves `clang/tools/clang-linker-wrapper/OffloadWrapper.*` to
`llvm/Frontend/Offloading` allowing them to be reutilized by other projects.

Additionally, it makes minor modifications to the API to make it more flexible.
Concretely:
 - The `wrap*` methods are moved to the `OffloadWrapper` class.
 - The `OffloadWrapper` includes `Suffix` and `EmitSurfacesAndTextures` fields
to specify some additional options.
 - The `Suffix` field is used when emitting the descriptor, registration methods,
etc, to make them more readable. It is empty by default.
 - The `EmitSurfacesAndTextures` field controls whether to emit surface and
texture registration code, as those functions were removed from `CUDART`
in CUDA 12. It is true by default.
 - The `wrap*` methods now have an optional field to specify the `EntryArray`;
this change is needed to enable JIT compilation, as ORC doesn't fully support
`__start_` and `__stop_` symbols. Thus, to JIT the code, the `EntryArray` has
to be constructed explicitly in the IR.
 - The function `getOffloadingEntryInitializer` was added to help create the
`EntryArray`, as it returns the constant initializer and not a global variable.
@fabianmcg fabianmcg marked this pull request as ready for review January 13, 2024 19:19
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jan 13, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 13, 2024

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Fabian Mora (fabianmcg)

Changes

This patch moves clang/tools/clang-linker-wrapper/OffloadWrapper.* to
llvm/Frontend/Offloading allowing them to be reutilized by other projects.

Additionally, it makes minor modifications to the API to make it more flexible.
Concretely:

  • The wrap* methods are moved to the OffloadWrapper class.
  • The OffloadWrapper includes Suffix and EmitSurfacesAndTextures fields
    to specify some additional options.
  • The Suffix field is used when emitting the descriptor, registration methods,
    etc, to make them more readable. It is empty by default.
  • The EmitSurfacesAndTextures field controls whether to emit surface and
    texture registration code, as those functions were removed from CUDART
    in CUDA 12. It is true by default.
  • The wrap* methods now have an optional field to specify the EntryArray;
    this change is needed to enable JIT compilation, as ORC doesn't fully support
    __start_ and __stop_ symbols. Thus, to JIT the code, the EntryArray has
    to be constructed explicitly in the IR.
  • The function getOffloadingEntryInitializer was added to help create the
    EntryArray, as it returns the constant initializer and not a global variable.

Patch is 24.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78057.diff

8 Files Affected:

  • (modified) clang/tools/clang-linker-wrapper/CMakeLists.txt (-1)
  • (modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+7-4)
  • (removed) clang/tools/clang-linker-wrapper/OffloadWrapper.h (-28)
  • (added) llvm/include/llvm/Frontend/Offloading/OffloadWrapper.h (+62)
  • (modified) llvm/include/llvm/Frontend/Offloading/Utility.h (+6)
  • (modified) llvm/lib/Frontend/Offloading/CMakeLists.txt (+2)
  • (renamed) llvm/lib/Frontend/Offloading/OffloadWrapper.cpp (+79-44)
  • (modified) llvm/lib/Frontend/Offloading/Utility.cpp (+15-6)
diff --git a/clang/tools/clang-linker-wrapper/CMakeLists.txt b/clang/tools/clang-linker-wrapper/CMakeLists.txt
index 744026a37b22c0..5556869affaa62 100644
--- a/clang/tools/clang-linker-wrapper/CMakeLists.txt
+++ b/clang/tools/clang-linker-wrapper/CMakeLists.txt
@@ -28,7 +28,6 @@ endif()
 
 add_clang_tool(clang-linker-wrapper
   ClangLinkerWrapper.cpp
-  OffloadWrapper.cpp
 
   DEPENDS
   ${tablegen_deps}
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 122ba1998eb83f..ebe8b634c7ae73 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -14,11 +14,11 @@
 //
 //===---------------------------------------------------------------------===//
 
-#include "OffloadWrapper.h"
 #include "clang/Basic/Version.h"
 #include "llvm/BinaryFormat/Magic.h"
 #include "llvm/Bitcode/BitcodeWriter.h"
 #include "llvm/CodeGen/CommandFlags.h"
+#include "llvm/Frontend/Offloading/OffloadWrapper.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DiagnosticPrinter.h"
 #include "llvm/IR/Module.h"
@@ -906,15 +906,18 @@ wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers,
 
   switch (Kind) {
   case OFK_OpenMP:
-    if (Error Err = wrapOpenMPBinaries(M, BuffersToWrap))
+    if (Error Err =
+            offloading::OffloadWrapper().wrapOpenMPBinaries(M, BuffersToWrap))
       return std::move(Err);
     break;
   case OFK_Cuda:
-    if (Error Err = wrapCudaBinary(M, BuffersToWrap.front()))
+    if (Error Err = offloading::OffloadWrapper().wrapCudaBinary(
+            M, BuffersToWrap.front()))
       return std::move(Err);
     break;
   case OFK_HIP:
-    if (Error Err = wrapHIPBinary(M, BuffersToWrap.front()))
+    if (Error Err = offloading::OffloadWrapper().wrapHIPBinary(
+            M, BuffersToWrap.front()))
       return std::move(Err);
     break;
   default:
diff --git a/clang/tools/clang-linker-wrapper/OffloadWrapper.h b/clang/tools/clang-linker-wrapper/OffloadWrapper.h
deleted file mode 100644
index 679333975b2120..00000000000000
--- a/clang/tools/clang-linker-wrapper/OffloadWrapper.h
+++ /dev/null
@@ -1,28 +0,0 @@
-//===- OffloadWrapper.h --r-------------------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef LLVM_CLANG_TOOLS_CLANG_LINKER_WRAPPER_OFFLOAD_WRAPPER_H
-#define LLVM_CLANG_TOOLS_CLANG_LINKER_WRAPPER_OFFLOAD_WRAPPER_H
-
-#include "llvm/ADT/ArrayRef.h"
-#include "llvm/IR/Module.h"
-
-/// Wraps the input device images into the module \p M as global symbols and
-/// registers the images with the OpenMP Offloading runtime libomptarget.
-llvm::Error wrapOpenMPBinaries(llvm::Module &M,
-                               llvm::ArrayRef<llvm::ArrayRef<char>> Images);
-
-/// Wraps the input fatbinary image into the module \p M as global symbols and
-/// registers the images with the CUDA runtime.
-llvm::Error wrapCudaBinary(llvm::Module &M, llvm::ArrayRef<char> Images);
-
-/// Wraps the input bundled image into the module \p M as global symbols and
-/// registers the images with the HIP runtime.
-llvm::Error wrapHIPBinary(llvm::Module &M, llvm::ArrayRef<char> Images);
-
-#endif
diff --git a/llvm/include/llvm/Frontend/Offloading/OffloadWrapper.h b/llvm/include/llvm/Frontend/Offloading/OffloadWrapper.h
new file mode 100644
index 00000000000000..6b23f875a8f15f
--- /dev/null
+++ b/llvm/include/llvm/Frontend/Offloading/OffloadWrapper.h
@@ -0,0 +1,62 @@
+//===- OffloadWrapper.h --r-------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
+#define LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/IR/Module.h"
+
+namespace llvm {
+namespace offloading {
+/// Class for embedding and registering offloading images and related objects in
+/// a Module.
+class OffloadWrapper {
+public:
+  using EntryArrayTy = std::pair<GlobalVariable *, GlobalVariable *>;
+
+  OffloadWrapper(const Twine &Suffix = "", bool EmitSurfacesAndTextures = true)
+      : Suffix(Suffix.str()), EmitSurfacesAndTextures(EmitSurfacesAndTextures) {
+  }
+
+  /// Wraps the input device images into the module \p M as global symbols and
+  /// registers the images with the OpenMP Offloading runtime libomptarget.
+  /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
+  /// symbols holding the `__tgt_offload_entry` array.
+  llvm::Error wrapOpenMPBinaries(
+      llvm::Module &M, llvm::ArrayRef<llvm::ArrayRef<char>> Images,
+      std::optional<EntryArrayTy> EntryArray = std::nullopt) const;
+
+  /// Wraps the input fatbinary image into the module \p M as global symbols and
+  /// registers the images with the CUDA runtime.
+  /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
+  /// symbols holding the `__tgt_offload_entry` array.
+  llvm::Error
+  wrapCudaBinary(llvm::Module &M, llvm::ArrayRef<char> Images,
+                 std::optional<EntryArrayTy> EntryArray = std::nullopt) const;
+
+  /// Wraps the input bundled image into the module \p M as global symbols and
+  /// registers the images with the HIP runtime.
+  /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
+  /// symbols holding the `__tgt_offload_entry` array.
+  llvm::Error
+  wrapHIPBinary(llvm::Module &M, llvm::ArrayRef<char> Images,
+                std::optional<EntryArrayTy> EntryArray = std::nullopt) const;
+
+protected:
+  /// Suffix used when emitting symbols. It defaults to the empty string.
+  std::string Suffix;
+
+  /// Whether to emit surface and textures registration code. It defaults to
+  /// false.
+  bool EmitSurfacesAndTextures;
+};
+} // namespace offloading
+} // namespace llvm
+
+#endif // LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
diff --git a/llvm/include/llvm/Frontend/Offloading/Utility.h b/llvm/include/llvm/Frontend/Offloading/Utility.h
index 520c192996a066..f54dd7ba7ab45f 100644
--- a/llvm/include/llvm/Frontend/Offloading/Utility.h
+++ b/llvm/include/llvm/Frontend/Offloading/Utility.h
@@ -61,6 +61,12 @@ StructType *getEntryTy(Module &M);
 void emitOffloadingEntry(Module &M, Constant *Addr, StringRef Name,
                          uint64_t Size, int32_t Flags, int32_t Data,
                          StringRef SectionName);
+/// Create a constant struct initializer used to register this global at
+/// runtime.
+/// \return the constant struct and the global variable holding the symbol name.
+std::pair<Constant *, GlobalVariable *>
+getOffloadingEntryInitializer(Module &M, Constant *Addr, StringRef Name,
+                              uint64_t Size, int32_t Flags, int32_t Data);
 
 /// Creates a pair of globals used to iterate the array of offloading entries by
 /// accessing the section variables provided by the linker.
diff --git a/llvm/lib/Frontend/Offloading/CMakeLists.txt b/llvm/lib/Frontend/Offloading/CMakeLists.txt
index 2d0117c9e10059..16e0dcfa0e90d6 100644
--- a/llvm/lib/Frontend/Offloading/CMakeLists.txt
+++ b/llvm/lib/Frontend/Offloading/CMakeLists.txt
@@ -1,5 +1,6 @@
 add_llvm_component_library(LLVMFrontendOffloading
   Utility.cpp
+  OffloadWrapper.cpp
 
   ADDITIONAL_HEADER_DIRS
   ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend
@@ -9,6 +10,7 @@ add_llvm_component_library(LLVMFrontendOffloading
 
   LINK_COMPONENTS
   Core
+  BinaryFormat
   Support
   TransformUtils
   TargetParser
diff --git a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
similarity index 86%
rename from clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
rename to llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 161374ae555233..f34a879b99dd02 100644
--- a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -6,7 +6,7 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "OffloadWrapper.h"
+#include "llvm/Frontend/Offloading/OffloadWrapper.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/BinaryFormat/Magic.h"
 #include "llvm/Frontend/Offloading/Utility.h"
@@ -21,8 +21,11 @@
 #include "llvm/Transforms/Utils/ModuleUtils.h"
 
 using namespace llvm;
+using namespace llvm::offloading;
 
 namespace {
+using EntryArrayTy = OffloadWrapper::EntryArrayTy;
+
 /// Magic number that begins the section containing the CUDA fatbinary.
 constexpr unsigned CudaFatMagic = 0x466243b1;
 constexpr unsigned HIPFatMagic = 0x48495046;
@@ -110,10 +113,10 @@ PointerType *getBinDescPtrTy(Module &M) {
 /// };
 ///
 /// Global variable that represents BinDesc is returned.
-GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {
+GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs,
+                              EntryArrayTy EntryArray, StringRef Suffix) {
   LLVMContext &C = M.getContext();
-  auto [EntriesB, EntriesE] =
-      offloading::getOffloadEntryArray(M, "omp_offloading_entries");
+  auto [EntriesB, EntriesE] = EntryArray;
 
   auto *Zero = ConstantInt::get(getSizeTTy(M), 0u);
   Constant *ZeroZero[] = {Zero, Zero};
@@ -126,7 +129,7 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {
     auto *Data = ConstantDataArray::get(C, Buf);
     auto *Image = new GlobalVariable(M, Data->getType(), /*isConstant=*/true,
                                      GlobalVariable::InternalLinkage, Data,
-                                     ".omp_offloading.device_image");
+                                     ".omp_offloading.device_image" + Suffix);
     Image->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
     Image->setSection(".llvm.offloading");
     Image->setAlignment(Align(object::OffloadBinary::getAlignment()));
@@ -166,7 +169,7 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {
   auto *Images =
       new GlobalVariable(M, ImagesData->getType(), /*isConstant*/ true,
                          GlobalValue::InternalLinkage, ImagesData,
-                         ".omp_offloading.device_images");
+                         ".omp_offloading.device_images" + Suffix);
   Images->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
 
   auto *ImagesB =
@@ -180,14 +183,15 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {
 
   return new GlobalVariable(M, DescInit->getType(), /*isConstant*/ true,
                             GlobalValue::InternalLinkage, DescInit,
-                            ".omp_offloading.descriptor");
+                            ".omp_offloading.descriptor" + Suffix);
 }
 
-void createRegisterFunction(Module &M, GlobalVariable *BinDesc) {
+void createRegisterFunction(Module &M, GlobalVariable *BinDesc,
+                            StringRef Suffix) {
   LLVMContext &C = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
   auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
-                                ".omp_offloading.descriptor_reg", &M);
+                                ".omp_offloading.descriptor_reg" + Suffix, &M);
   Func->setSection(".text.startup");
 
   // Get __tgt_register_lib function declaration.
@@ -210,11 +214,13 @@ void createRegisterFunction(Module &M, GlobalVariable *BinDesc) {
   appendToGlobalCtors(M, Func, /*Priority*/ 1);
 }
 
-void createUnregisterFunction(Module &M, GlobalVariable *BinDesc) {
+void createUnregisterFunction(Module &M, GlobalVariable *BinDesc,
+                              StringRef Suffix) {
   LLVMContext &C = M.getContext();
   auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
-                                ".omp_offloading.descriptor_unreg", &M);
+  auto *Func =
+      Function::Create(FuncTy, GlobalValue::InternalLinkage,
+                       ".omp_offloading.descriptor_unreg" + Suffix, &M);
   Func->setSection(".text.startup");
 
   // Get __tgt_unregister_lib function declaration.
@@ -251,7 +257,8 @@ StructType *getFatbinWrapperTy(Module &M) {
 
 /// Embed the image \p Image into the module \p M so it can be found by the
 /// runtime.
-GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP) {
+GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP,
+                                 StringRef Suffix) {
   LLVMContext &C = M.getContext();
   llvm::Type *Int8PtrTy = PointerType::getUnqual(C);
   llvm::Triple Triple = llvm::Triple(M.getTargetTriple());
@@ -263,7 +270,7 @@ GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP) {
   auto *Data = ConstantDataArray::get(C, Image);
   auto *Fatbin = new GlobalVariable(M, Data->getType(), /*isConstant*/ true,
                                     GlobalVariable::InternalLinkage, Data,
-                                    ".fatbin_image");
+                                    ".fatbin_image" + Suffix);
   Fatbin->setSection(FatbinConstantSection);
 
   // Create the fatbinary wrapper
@@ -282,7 +289,7 @@ GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP) {
   auto *FatbinDesc =
       new GlobalVariable(M, getFatbinWrapperTy(M),
                          /*isConstant*/ true, GlobalValue::InternalLinkage,
-                         FatbinInitializer, ".fatbin_wrapper");
+                         FatbinInitializer, ".fatbin_wrapper" + Suffix);
   FatbinDesc->setSection(FatbinWrapperSection);
   FatbinDesc->setAlignment(Align(8));
 
@@ -312,10 +319,12 @@ GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP) {
 ///                         0, entry->size, 0, 0);
 ///   }
 /// }
-Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
+Function *createRegisterGlobalsFunction(Module &M, bool IsHIP,
+                                        EntryArrayTy EntryArray,
+                                        StringRef Suffix,
+                                        bool EmitSurfacesAndTextures) {
   LLVMContext &C = M.getContext();
-  auto [EntriesB, EntriesE] = offloading::getOffloadEntryArray(
-      M, IsHIP ? "hip_offloading_entries" : "cuda_offloading_entries");
+  auto [EntriesB, EntriesE] = EntryArray;
 
   // Get the __cudaRegisterFunction function declaration.
   PointerType *Int8PtrTy = PointerType::get(C, 0);
@@ -339,7 +348,7 @@ Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
       IsHIP ? "__hipRegisterVar" : "__cudaRegisterVar", RegVarTy);
 
   // Get the __cudaRegisterSurface function declaration.
-  auto *RegSurfaceTy =
+  FunctionType *RegSurfaceTy =
       FunctionType::get(Type::getVoidTy(C),
                         {Int8PtrPtrTy, Int8PtrTy, Int8PtrTy, Int8PtrTy,
                          Type::getInt32Ty(C), Type::getInt32Ty(C)},
@@ -348,7 +357,7 @@ Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
       IsHIP ? "__hipRegisterSurface" : "__cudaRegisterSurface", RegSurfaceTy);
 
   // Get the __cudaRegisterTexture function declaration.
-  auto *RegTextureTy = FunctionType::get(
+  FunctionType *RegTextureTy = FunctionType::get(
       Type::getVoidTy(C),
       {Int8PtrPtrTy, Int8PtrTy, Int8PtrTy, Int8PtrTy, Type::getInt32Ty(C),
        Type::getInt32Ty(C), Type::getInt32Ty(C)},
@@ -454,19 +463,20 @@ Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
   Builder.CreateBr(IfEndBB);
   Switch->addCase(Builder.getInt32(llvm::offloading::OffloadGlobalManagedEntry),
                   SwManagedBB);
-
   // Create surface variable registration code.
   Builder.SetInsertPoint(SwSurfaceBB);
-  Builder.CreateCall(
-      RegSurface, {RegGlobalsFn->arg_begin(), Addr, Name, Name, Data, Extern});
+  if (EmitSurfacesAndTextures)
+    Builder.CreateCall(RegSurface, {RegGlobalsFn->arg_begin(), Addr, Name, Name,
+                                    Data, Extern});
   Builder.CreateBr(IfEndBB);
   Switch->addCase(Builder.getInt32(llvm::offloading::OffloadGlobalSurfaceEntry),
                   SwSurfaceBB);
 
   // Create texture variable registration code.
   Builder.SetInsertPoint(SwTextureBB);
-  Builder.CreateCall(RegTexture, {RegGlobalsFn->arg_begin(), Addr, Name, Name,
-                                  Data, Normalized, Extern});
+  if (EmitSurfacesAndTextures)
+    Builder.CreateCall(RegTexture, {RegGlobalsFn->arg_begin(), Addr, Name, Name,
+                                    Data, Normalized, Extern});
   Builder.CreateBr(IfEndBB);
   Switch->addCase(Builder.getInt32(llvm::offloading::OffloadGlobalTextureEntry),
                   SwTextureBB);
@@ -497,18 +507,21 @@ Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
 // Create the constructor and destructor to register the fatbinary with the CUDA
 // runtime.
 void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
-                                  bool IsHIP) {
+                                  bool IsHIP,
+                                  std::optional<EntryArrayTy> EntryArrayOpt,
+                                  StringRef Suffix,
+                                  bool EmitSurfacesAndTextures) {
   LLVMContext &C = M.getContext();
   auto *CtorFuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *CtorFunc =
-      Function::Create(CtorFuncTy, GlobalValue::InternalLinkage,
-                       IsHIP ? ".hip.fatbin_reg" : ".cuda.fatbin_reg", &M);
+  auto *CtorFunc = Function::Create(
+      CtorFuncTy, GlobalValue::InternalLinkage,
+      (IsHIP ? ".hip.fatbin_reg" : ".cuda.fatbin_reg") + Suffix, &M);
   CtorFunc->setSection(".text.startup");
 
   auto *DtorFuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
-  auto *DtorFunc =
-      Function::Create(DtorFuncTy, GlobalValue::InternalLinkage,
-                       IsHIP ? ".hip.fatbin_unreg" : ".cuda.fatbin_unreg", &M);
+  auto *DtorFunc = Function::Create(
+      DtorFuncTy, GlobalValue::InternalLinkage,
+      (IsHIP ? ".hip.fatbin_unreg" : ".cuda.fatbin_unreg") + Suffix, &M);
   DtorFunc->setSection(".text.startup");
 
   auto *PtrTy = PointerType::getUnqual(C);
@@ -536,7 +549,7 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
   auto *BinaryHandleGlobal = new llvm::GlobalVariable(
       M, PtrTy, false, llvm::GlobalValue::InternalLinkage,
       llvm::ConstantPointerNull::get(PtrTy),
-      IsHIP ? ".hip.binary_handle" : ".cuda.binary_handle");
+      (IsHIP ? ".hip.binary_handle" : ".cuda.binary_handle") + Suffix);
 
   // Create the constructor to register this image with the runtime.
   IRBuilder<> CtorBuilder(BasicBlock::Create(C, "entry", CtorFunc));
@@ -546,7 +559,16 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
   CtorBuilder.CreateAlignedStore(
       Handle, BinaryHandleGlobal,
       Align(M.getDataLayout().getPointerTypeSize(PtrTy)));
-  CtorBuilder.CreateCall(createRegisterGlobalsFunction(M, IsHIP), Handle);
+  EntryArrayTy EntryArray =
+      (EntryArrayOpt ? *EntryArrayOpt
+                     : (IsHIP ? offloading::getOffloadEntryArray(
+                                    M, "hip_offloading_entries")
+                              : offloading::getOffloadEntryArray(
+                                    M, "cuda_offloading_entries")));
+  CtorBuilder.CreateCall(createRegisterGlobalsFunction(M, IsHIP, EntryArray,
+                                                       Suffix,
+                                                       EmitSurfacesAndTextures),
+                         Handle);
   if (!IsHIP)
     CtorBuilder.CreateCall(RegFatbinEnd, Handle);
   CtorBuilder.CreateCall(AtExit, DtorFunc);
@@ -568,32 +590,45 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
 
 } // namespace
 
-Error wrapOpenMPBinaries(Module &M, ArrayRef<ArrayRef<char>> Images) {
-  GlobalVariable *Desc = createBinDesc(M, Images);
+Error OffloadWrapper:...
[truncated]

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, some comments.

if (!Desc)
return createStringError(inconvertibleErrorCode(),
"No fatinbary section created.");

createRegisterFatbinFunction(M, Desc, /* IsHIP */ true);
createRegisterFatbinFunction(M, Desc, /* IsHIP */ true, EntryArray, Suffix,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix these comments while you're at it? LLVM inline comments should be /*IsHIP=*/

namespace offloading {
/// Class for embedding and registering offloading images and related objects in
/// a Module.
class OffloadWrapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these should just be free functions and the extra two bits of state here are additional default arguments like you've done with EntryArray.


/// Whether to emit surface and textures registration code. It defaults to
/// false.
bool EmitSurfacesAndTextures;
Copy link
Contributor

@jhuber6 jhuber6 Jan 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I wasn't sure about this either. I know that CUDA emits these __cudaRegisterSurface calls, but I can't seem to find them in any of the exported libraries. It caused linker errors due to that and I was too lazy to fix it. Wondering if they've been deprecated, maybe @Artem-B knows.

if (!Desc)
return createStringError(inconvertibleErrorCode(),
"No binary descriptors created.");
createRegisterFunction(M, Desc);
createUnregisterFunction(M, Desc);
createRegisterFunction(M, Desc, Suffix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the Suffix for exactly? It might be better just to give it some generic name, since the executed use currently it always _cuda_ or _omp_ as a name within some other stuff.

Copy link
Contributor Author

@fabianmcg fabianmcg Jan 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in MLIR we can have multiple binaries, PTX, fatbinaries in a single IR module:

gpu.binary @binary_sm_70 [#gpu.object<#nvvm.target<chip="sm_70">, "BINARY BLOB">]
gpu.binary @binary_gfx90a [#gpu.object<#rocdl.target<chip="gfx90a">, "BINARY BLOB">]
...
// Call `kernel_name` in `binary_sm_70`
 gpu.launch_func @binary_sm_70::kernel_name
// Call `kernel_name` in `binary_gfx90a`
 gpu.launch_func @binary_gfx90a::kernel_name

I added the suffix field so that in MLIR we can append the binary identifier to the descriptor, registration constructor, etc. This makes the IR more readable.

M, Images,
EntryArray
? *EntryArray
: offloading::getOffloadEntryArray(M, "omp_offloading_entries"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking that this argument shouldn't be default, it should be up to whoever calls it to create such an array. For the linker wrapper it would be getting the offloading utility first. Making these arrays is quite complicated for implicit default behavior if we're expecting other things to happen I feel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it default so clang-linker-wrapper didn't see any functional changes, while allowing new usages. I think we should revisit this API for project offload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it should be fine to just call this with offloading::getOffloadEntryArray(M, "xxx_offloading_entries") at the callsite. std::optional makes it a little weird here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, first some broader context, this patch is also part of a patch series that will add GPU compilation for OMP operations in MLIR without the need for flang or clang, which is not currently possible. This series also enables to JIT OMP operations in MLIR. The goal of the series is to make OMP target functional in MLIR as a standalone.

I allow the passage of a custom entry array because ORC JIT doesn't fully support __start, __stop symbols for grouping section data. My solution was allowing the custom entry array, so in MLIR I build the full entry array and never rely on sections, this applies to OMP, CUDA and HIP.
Thus we have that the following MLIR:

module attributes {gpu.container_module} {
  gpu.binary @binary <#gpu.offload_embedding<cuda>> [#gpu.object<#nvvm.target, bin = "BLOB">]
  llvm.func @func() {
    %1 = llvm.mlir.constant(1 : index) : i64
    gpu.launch_func  @binary::@hello blocks in (%1, %1, %1) threads in (%1, %1, %1) : i64
    gpu.launch_func  @binary::@world blocks in (%1, %1, %1) threads in (%1, %1, %1) : i64
    llvm.return
  }
}

Produces:

@__begin_offload_binary = internal constant [2 x %struct.__tgt_offload_entry] [%struct.__tgt_offload_entry { ptr @binary_Khello, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, %struct.__tgt_offload_entry { ptr @binary_Kworld, ptr @.omp_offloading.entry_name.2, i64 0, i32 0, i32 0 }]
@__end_offload_binary = internal constant ptr getelementptr inbounds (%struct.__tgt_offload_entry, ptr @__begin_offload_binary, i64 2)
@.fatbin_image.binary = internal constant [4 x i8] c"BLOB", section ".nv_fatbin"
@.fatbin_wrapper.binary = internal constant %fatbin_wrapper { i32 1180844977, i32 1, ptr @.fatbin_image.binary, ptr null }, section ".nvFatBinSegment", align 8
@.cuda.binary_handle.binary = internal global ptr null
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg.binary, ptr null }]
@binary_Khello = weak constant i8 0
@.omp_offloading.entry_name = internal unnamed_addr constant [6 x i8] c"hello\00"
@binary_Kworld = weak constant i8 0
@.omp_offloading.entry_name.2 = internal unnamed_addr constant [6 x i8] c"world\00"
...

And this works.

@llvmbot llvmbot added the clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' label Jan 15, 2024
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll probably make a patch after this to make the surface handling for CUDA default off because it seems to be unsupported.

@fabianmcg fabianmcg merged commit 9fa9d9a into llvm:main Jan 15, 2024
5 checks passed
@fabianmcg fabianmcg deleted the offload_wrapper branch January 15, 2024 23:56
fabianmcg added a commit to fabianmcg/llvm-project that referenced this pull request Jan 16, 2024
This patch adds the offloading translation attribute. This attribute uses LLVM
offloading infrastructure to embed GPU binaries in the IR. At the program start,
the LLVM offloading mechanism registers kernels and variables with the runtime
library: CUDA RT, HIP RT, or LibOMPTarget.

The offloading mechanism relies on the runtime library to dispatch the correct
kernel based on the registered symbols.

This patch is 3/4 on introducing the OffloadEmbeddingAttr GPU translation
attribute.

Note: Ignore the base commits; those are being reviewed in PRs llvm#78057, llvm#78098,
and llvm#78073.
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
….* to llvm/Frontend/Offloading (llvm#78057)

This patch moves `clang/tools/clang-linker-wrapper/OffloadWrapper.*` to
`llvm/Frontend/Offloading` allowing them to be re-utilized by other
projects.

Additionally, it makes minor modifications to the API to make it more
flexible.
Concretely:
 - The `wrap*` methods now have additional arguments `EntryArray`, 
`Suffix` and `EmitSurfacesAndTextures` to specify some additional options.
- The `EntryArray` is now constructed by the caller. This change is needed to
enable JIT compilation, as ORC doesn't fully support `__start_` and `__stop_` 
symbols. Thus, to JIT the code, the `EntryArray` has to be constructed explicitly in the IR.
- The `Suffix` field is used when emitting the descriptor, registration
methods, etc, to make them more readable. It is empty by default.
- The `EmitSurfacesAndTextures` field controls whether to emit surface
and texture registration code, as those functions were removed from `CUDART`
in CUDA 12. It is true by default.
- The function `getOffloadingEntryInitializer` was added to help create
the `EntryArray`, as it returns the constant initializer and not a global
variable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants