-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading #78057
Conversation
….* to llvm/Frontend/Offloading This patch moves `clang/tools/clang-linker-wrapper/OffloadWrapper.*` to `llvm/Frontend/Offloading` allowing them to be reutilized by other projects. Additionally, it makes minor modifications to the API to make it more flexible. Concretely: - The `wrap*` methods are moved to the `OffloadWrapper` class. - The `OffloadWrapper` includes `Suffix` and `EmitSurfacesAndTextures` fields to specify some additional options. - The `Suffix` field is used when emitting the descriptor, registration methods, etc, to make them more readable. It is empty by default. - The `EmitSurfacesAndTextures` field controls whether to emit surface and texture registration code, as those functions were removed from `CUDART` in CUDA 12. It is true by default. - The `wrap*` methods now have an optional field to specify the `EntryArray`; this change is needed to enable JIT compilation, as ORC doesn't fully support `__start_` and `__stop_` symbols. Thus, to JIT the code, the `EntryArray` has to be constructed explicitly in the IR. - The function `getOffloadingEntryInitializer` was added to help create the `EntryArray`, as it returns the constant initializer and not a global variable.
5c3bdac
to
fad7a36
Compare
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Fabian Mora (fabianmcg) ChangesThis patch moves Additionally, it makes minor modifications to the API to make it more flexible.
Patch is 24.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78057.diff 8 Files Affected:
diff --git a/clang/tools/clang-linker-wrapper/CMakeLists.txt b/clang/tools/clang-linker-wrapper/CMakeLists.txt
index 744026a37b22c0..5556869affaa62 100644
--- a/clang/tools/clang-linker-wrapper/CMakeLists.txt
+++ b/clang/tools/clang-linker-wrapper/CMakeLists.txt
@@ -28,7 +28,6 @@ endif()
add_clang_tool(clang-linker-wrapper
ClangLinkerWrapper.cpp
- OffloadWrapper.cpp
DEPENDS
${tablegen_deps}
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 122ba1998eb83f..ebe8b634c7ae73 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -14,11 +14,11 @@
//
//===---------------------------------------------------------------------===//
-#include "OffloadWrapper.h"
#include "clang/Basic/Version.h"
#include "llvm/BinaryFormat/Magic.h"
#include "llvm/Bitcode/BitcodeWriter.h"
#include "llvm/CodeGen/CommandFlags.h"
+#include "llvm/Frontend/Offloading/OffloadWrapper.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DiagnosticPrinter.h"
#include "llvm/IR/Module.h"
@@ -906,15 +906,18 @@ wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers,
switch (Kind) {
case OFK_OpenMP:
- if (Error Err = wrapOpenMPBinaries(M, BuffersToWrap))
+ if (Error Err =
+ offloading::OffloadWrapper().wrapOpenMPBinaries(M, BuffersToWrap))
return std::move(Err);
break;
case OFK_Cuda:
- if (Error Err = wrapCudaBinary(M, BuffersToWrap.front()))
+ if (Error Err = offloading::OffloadWrapper().wrapCudaBinary(
+ M, BuffersToWrap.front()))
return std::move(Err);
break;
case OFK_HIP:
- if (Error Err = wrapHIPBinary(M, BuffersToWrap.front()))
+ if (Error Err = offloading::OffloadWrapper().wrapHIPBinary(
+ M, BuffersToWrap.front()))
return std::move(Err);
break;
default:
diff --git a/clang/tools/clang-linker-wrapper/OffloadWrapper.h b/clang/tools/clang-linker-wrapper/OffloadWrapper.h
deleted file mode 100644
index 679333975b2120..00000000000000
--- a/clang/tools/clang-linker-wrapper/OffloadWrapper.h
+++ /dev/null
@@ -1,28 +0,0 @@
-//===- OffloadWrapper.h --r-------------------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef LLVM_CLANG_TOOLS_CLANG_LINKER_WRAPPER_OFFLOAD_WRAPPER_H
-#define LLVM_CLANG_TOOLS_CLANG_LINKER_WRAPPER_OFFLOAD_WRAPPER_H
-
-#include "llvm/ADT/ArrayRef.h"
-#include "llvm/IR/Module.h"
-
-/// Wraps the input device images into the module \p M as global symbols and
-/// registers the images with the OpenMP Offloading runtime libomptarget.
-llvm::Error wrapOpenMPBinaries(llvm::Module &M,
- llvm::ArrayRef<llvm::ArrayRef<char>> Images);
-
-/// Wraps the input fatbinary image into the module \p M as global symbols and
-/// registers the images with the CUDA runtime.
-llvm::Error wrapCudaBinary(llvm::Module &M, llvm::ArrayRef<char> Images);
-
-/// Wraps the input bundled image into the module \p M as global symbols and
-/// registers the images with the HIP runtime.
-llvm::Error wrapHIPBinary(llvm::Module &M, llvm::ArrayRef<char> Images);
-
-#endif
diff --git a/llvm/include/llvm/Frontend/Offloading/OffloadWrapper.h b/llvm/include/llvm/Frontend/Offloading/OffloadWrapper.h
new file mode 100644
index 00000000000000..6b23f875a8f15f
--- /dev/null
+++ b/llvm/include/llvm/Frontend/Offloading/OffloadWrapper.h
@@ -0,0 +1,62 @@
+//===- OffloadWrapper.h --r-------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
+#define LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/IR/Module.h"
+
+namespace llvm {
+namespace offloading {
+/// Class for embedding and registering offloading images and related objects in
+/// a Module.
+class OffloadWrapper {
+public:
+ using EntryArrayTy = std::pair<GlobalVariable *, GlobalVariable *>;
+
+ OffloadWrapper(const Twine &Suffix = "", bool EmitSurfacesAndTextures = true)
+ : Suffix(Suffix.str()), EmitSurfacesAndTextures(EmitSurfacesAndTextures) {
+ }
+
+ /// Wraps the input device images into the module \p M as global symbols and
+ /// registers the images with the OpenMP Offloading runtime libomptarget.
+ /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
+ /// symbols holding the `__tgt_offload_entry` array.
+ llvm::Error wrapOpenMPBinaries(
+ llvm::Module &M, llvm::ArrayRef<llvm::ArrayRef<char>> Images,
+ std::optional<EntryArrayTy> EntryArray = std::nullopt) const;
+
+ /// Wraps the input fatbinary image into the module \p M as global symbols and
+ /// registers the images with the CUDA runtime.
+ /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
+ /// symbols holding the `__tgt_offload_entry` array.
+ llvm::Error
+ wrapCudaBinary(llvm::Module &M, llvm::ArrayRef<char> Images,
+ std::optional<EntryArrayTy> EntryArray = std::nullopt) const;
+
+ /// Wraps the input bundled image into the module \p M as global symbols and
+ /// registers the images with the HIP runtime.
+ /// \param EntryArray Optional pair pointing to the `__start` and `__stop`
+ /// symbols holding the `__tgt_offload_entry` array.
+ llvm::Error
+ wrapHIPBinary(llvm::Module &M, llvm::ArrayRef<char> Images,
+ std::optional<EntryArrayTy> EntryArray = std::nullopt) const;
+
+protected:
+ /// Suffix used when emitting symbols. It defaults to the empty string.
+ std::string Suffix;
+
+ /// Whether to emit surface and textures registration code. It defaults to
+ /// false.
+ bool EmitSurfacesAndTextures;
+};
+} // namespace offloading
+} // namespace llvm
+
+#endif // LLVM_FRONTEND_OFFLOADING_OFFLOADWRAPPER_H
diff --git a/llvm/include/llvm/Frontend/Offloading/Utility.h b/llvm/include/llvm/Frontend/Offloading/Utility.h
index 520c192996a066..f54dd7ba7ab45f 100644
--- a/llvm/include/llvm/Frontend/Offloading/Utility.h
+++ b/llvm/include/llvm/Frontend/Offloading/Utility.h
@@ -61,6 +61,12 @@ StructType *getEntryTy(Module &M);
void emitOffloadingEntry(Module &M, Constant *Addr, StringRef Name,
uint64_t Size, int32_t Flags, int32_t Data,
StringRef SectionName);
+/// Create a constant struct initializer used to register this global at
+/// runtime.
+/// \return the constant struct and the global variable holding the symbol name.
+std::pair<Constant *, GlobalVariable *>
+getOffloadingEntryInitializer(Module &M, Constant *Addr, StringRef Name,
+ uint64_t Size, int32_t Flags, int32_t Data);
/// Creates a pair of globals used to iterate the array of offloading entries by
/// accessing the section variables provided by the linker.
diff --git a/llvm/lib/Frontend/Offloading/CMakeLists.txt b/llvm/lib/Frontend/Offloading/CMakeLists.txt
index 2d0117c9e10059..16e0dcfa0e90d6 100644
--- a/llvm/lib/Frontend/Offloading/CMakeLists.txt
+++ b/llvm/lib/Frontend/Offloading/CMakeLists.txt
@@ -1,5 +1,6 @@
add_llvm_component_library(LLVMFrontendOffloading
Utility.cpp
+ OffloadWrapper.cpp
ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend
@@ -9,6 +10,7 @@ add_llvm_component_library(LLVMFrontendOffloading
LINK_COMPONENTS
Core
+ BinaryFormat
Support
TransformUtils
TargetParser
diff --git a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
similarity index 86%
rename from clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
rename to llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index 161374ae555233..f34a879b99dd02 100644
--- a/clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
+++ b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
@@ -6,7 +6,7 @@
//
//===----------------------------------------------------------------------===//
-#include "OffloadWrapper.h"
+#include "llvm/Frontend/Offloading/OffloadWrapper.h"
#include "llvm/ADT/ArrayRef.h"
#include "llvm/BinaryFormat/Magic.h"
#include "llvm/Frontend/Offloading/Utility.h"
@@ -21,8 +21,11 @@
#include "llvm/Transforms/Utils/ModuleUtils.h"
using namespace llvm;
+using namespace llvm::offloading;
namespace {
+using EntryArrayTy = OffloadWrapper::EntryArrayTy;
+
/// Magic number that begins the section containing the CUDA fatbinary.
constexpr unsigned CudaFatMagic = 0x466243b1;
constexpr unsigned HIPFatMagic = 0x48495046;
@@ -110,10 +113,10 @@ PointerType *getBinDescPtrTy(Module &M) {
/// };
///
/// Global variable that represents BinDesc is returned.
-GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {
+GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs,
+ EntryArrayTy EntryArray, StringRef Suffix) {
LLVMContext &C = M.getContext();
- auto [EntriesB, EntriesE] =
- offloading::getOffloadEntryArray(M, "omp_offloading_entries");
+ auto [EntriesB, EntriesE] = EntryArray;
auto *Zero = ConstantInt::get(getSizeTTy(M), 0u);
Constant *ZeroZero[] = {Zero, Zero};
@@ -126,7 +129,7 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {
auto *Data = ConstantDataArray::get(C, Buf);
auto *Image = new GlobalVariable(M, Data->getType(), /*isConstant=*/true,
GlobalVariable::InternalLinkage, Data,
- ".omp_offloading.device_image");
+ ".omp_offloading.device_image" + Suffix);
Image->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
Image->setSection(".llvm.offloading");
Image->setAlignment(Align(object::OffloadBinary::getAlignment()));
@@ -166,7 +169,7 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {
auto *Images =
new GlobalVariable(M, ImagesData->getType(), /*isConstant*/ true,
GlobalValue::InternalLinkage, ImagesData,
- ".omp_offloading.device_images");
+ ".omp_offloading.device_images" + Suffix);
Images->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
auto *ImagesB =
@@ -180,14 +183,15 @@ GlobalVariable *createBinDesc(Module &M, ArrayRef<ArrayRef<char>> Bufs) {
return new GlobalVariable(M, DescInit->getType(), /*isConstant*/ true,
GlobalValue::InternalLinkage, DescInit,
- ".omp_offloading.descriptor");
+ ".omp_offloading.descriptor" + Suffix);
}
-void createRegisterFunction(Module &M, GlobalVariable *BinDesc) {
+void createRegisterFunction(Module &M, GlobalVariable *BinDesc,
+ StringRef Suffix) {
LLVMContext &C = M.getContext();
auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
- ".omp_offloading.descriptor_reg", &M);
+ ".omp_offloading.descriptor_reg" + Suffix, &M);
Func->setSection(".text.startup");
// Get __tgt_register_lib function declaration.
@@ -210,11 +214,13 @@ void createRegisterFunction(Module &M, GlobalVariable *BinDesc) {
appendToGlobalCtors(M, Func, /*Priority*/ 1);
}
-void createUnregisterFunction(Module &M, GlobalVariable *BinDesc) {
+void createUnregisterFunction(Module &M, GlobalVariable *BinDesc,
+ StringRef Suffix) {
LLVMContext &C = M.getContext();
auto *FuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
- auto *Func = Function::Create(FuncTy, GlobalValue::InternalLinkage,
- ".omp_offloading.descriptor_unreg", &M);
+ auto *Func =
+ Function::Create(FuncTy, GlobalValue::InternalLinkage,
+ ".omp_offloading.descriptor_unreg" + Suffix, &M);
Func->setSection(".text.startup");
// Get __tgt_unregister_lib function declaration.
@@ -251,7 +257,8 @@ StructType *getFatbinWrapperTy(Module &M) {
/// Embed the image \p Image into the module \p M so it can be found by the
/// runtime.
-GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP) {
+GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP,
+ StringRef Suffix) {
LLVMContext &C = M.getContext();
llvm::Type *Int8PtrTy = PointerType::getUnqual(C);
llvm::Triple Triple = llvm::Triple(M.getTargetTriple());
@@ -263,7 +270,7 @@ GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP) {
auto *Data = ConstantDataArray::get(C, Image);
auto *Fatbin = new GlobalVariable(M, Data->getType(), /*isConstant*/ true,
GlobalVariable::InternalLinkage, Data,
- ".fatbin_image");
+ ".fatbin_image" + Suffix);
Fatbin->setSection(FatbinConstantSection);
// Create the fatbinary wrapper
@@ -282,7 +289,7 @@ GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP) {
auto *FatbinDesc =
new GlobalVariable(M, getFatbinWrapperTy(M),
/*isConstant*/ true, GlobalValue::InternalLinkage,
- FatbinInitializer, ".fatbin_wrapper");
+ FatbinInitializer, ".fatbin_wrapper" + Suffix);
FatbinDesc->setSection(FatbinWrapperSection);
FatbinDesc->setAlignment(Align(8));
@@ -312,10 +319,12 @@ GlobalVariable *createFatbinDesc(Module &M, ArrayRef<char> Image, bool IsHIP) {
/// 0, entry->size, 0, 0);
/// }
/// }
-Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
+Function *createRegisterGlobalsFunction(Module &M, bool IsHIP,
+ EntryArrayTy EntryArray,
+ StringRef Suffix,
+ bool EmitSurfacesAndTextures) {
LLVMContext &C = M.getContext();
- auto [EntriesB, EntriesE] = offloading::getOffloadEntryArray(
- M, IsHIP ? "hip_offloading_entries" : "cuda_offloading_entries");
+ auto [EntriesB, EntriesE] = EntryArray;
// Get the __cudaRegisterFunction function declaration.
PointerType *Int8PtrTy = PointerType::get(C, 0);
@@ -339,7 +348,7 @@ Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
IsHIP ? "__hipRegisterVar" : "__cudaRegisterVar", RegVarTy);
// Get the __cudaRegisterSurface function declaration.
- auto *RegSurfaceTy =
+ FunctionType *RegSurfaceTy =
FunctionType::get(Type::getVoidTy(C),
{Int8PtrPtrTy, Int8PtrTy, Int8PtrTy, Int8PtrTy,
Type::getInt32Ty(C), Type::getInt32Ty(C)},
@@ -348,7 +357,7 @@ Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
IsHIP ? "__hipRegisterSurface" : "__cudaRegisterSurface", RegSurfaceTy);
// Get the __cudaRegisterTexture function declaration.
- auto *RegTextureTy = FunctionType::get(
+ FunctionType *RegTextureTy = FunctionType::get(
Type::getVoidTy(C),
{Int8PtrPtrTy, Int8PtrTy, Int8PtrTy, Int8PtrTy, Type::getInt32Ty(C),
Type::getInt32Ty(C), Type::getInt32Ty(C)},
@@ -454,19 +463,20 @@ Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
Builder.CreateBr(IfEndBB);
Switch->addCase(Builder.getInt32(llvm::offloading::OffloadGlobalManagedEntry),
SwManagedBB);
-
// Create surface variable registration code.
Builder.SetInsertPoint(SwSurfaceBB);
- Builder.CreateCall(
- RegSurface, {RegGlobalsFn->arg_begin(), Addr, Name, Name, Data, Extern});
+ if (EmitSurfacesAndTextures)
+ Builder.CreateCall(RegSurface, {RegGlobalsFn->arg_begin(), Addr, Name, Name,
+ Data, Extern});
Builder.CreateBr(IfEndBB);
Switch->addCase(Builder.getInt32(llvm::offloading::OffloadGlobalSurfaceEntry),
SwSurfaceBB);
// Create texture variable registration code.
Builder.SetInsertPoint(SwTextureBB);
- Builder.CreateCall(RegTexture, {RegGlobalsFn->arg_begin(), Addr, Name, Name,
- Data, Normalized, Extern});
+ if (EmitSurfacesAndTextures)
+ Builder.CreateCall(RegTexture, {RegGlobalsFn->arg_begin(), Addr, Name, Name,
+ Data, Normalized, Extern});
Builder.CreateBr(IfEndBB);
Switch->addCase(Builder.getInt32(llvm::offloading::OffloadGlobalTextureEntry),
SwTextureBB);
@@ -497,18 +507,21 @@ Function *createRegisterGlobalsFunction(Module &M, bool IsHIP) {
// Create the constructor and destructor to register the fatbinary with the CUDA
// runtime.
void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
- bool IsHIP) {
+ bool IsHIP,
+ std::optional<EntryArrayTy> EntryArrayOpt,
+ StringRef Suffix,
+ bool EmitSurfacesAndTextures) {
LLVMContext &C = M.getContext();
auto *CtorFuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
- auto *CtorFunc =
- Function::Create(CtorFuncTy, GlobalValue::InternalLinkage,
- IsHIP ? ".hip.fatbin_reg" : ".cuda.fatbin_reg", &M);
+ auto *CtorFunc = Function::Create(
+ CtorFuncTy, GlobalValue::InternalLinkage,
+ (IsHIP ? ".hip.fatbin_reg" : ".cuda.fatbin_reg") + Suffix, &M);
CtorFunc->setSection(".text.startup");
auto *DtorFuncTy = FunctionType::get(Type::getVoidTy(C), /*isVarArg*/ false);
- auto *DtorFunc =
- Function::Create(DtorFuncTy, GlobalValue::InternalLinkage,
- IsHIP ? ".hip.fatbin_unreg" : ".cuda.fatbin_unreg", &M);
+ auto *DtorFunc = Function::Create(
+ DtorFuncTy, GlobalValue::InternalLinkage,
+ (IsHIP ? ".hip.fatbin_unreg" : ".cuda.fatbin_unreg") + Suffix, &M);
DtorFunc->setSection(".text.startup");
auto *PtrTy = PointerType::getUnqual(C);
@@ -536,7 +549,7 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
auto *BinaryHandleGlobal = new llvm::GlobalVariable(
M, PtrTy, false, llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(PtrTy),
- IsHIP ? ".hip.binary_handle" : ".cuda.binary_handle");
+ (IsHIP ? ".hip.binary_handle" : ".cuda.binary_handle") + Suffix);
// Create the constructor to register this image with the runtime.
IRBuilder<> CtorBuilder(BasicBlock::Create(C, "entry", CtorFunc));
@@ -546,7 +559,16 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
CtorBuilder.CreateAlignedStore(
Handle, BinaryHandleGlobal,
Align(M.getDataLayout().getPointerTypeSize(PtrTy)));
- CtorBuilder.CreateCall(createRegisterGlobalsFunction(M, IsHIP), Handle);
+ EntryArrayTy EntryArray =
+ (EntryArrayOpt ? *EntryArrayOpt
+ : (IsHIP ? offloading::getOffloadEntryArray(
+ M, "hip_offloading_entries")
+ : offloading::getOffloadEntryArray(
+ M, "cuda_offloading_entries")));
+ CtorBuilder.CreateCall(createRegisterGlobalsFunction(M, IsHIP, EntryArray,
+ Suffix,
+ EmitSurfacesAndTextures),
+ Handle);
if (!IsHIP)
CtorBuilder.CreateCall(RegFatbinEnd, Handle);
CtorBuilder.CreateCall(AtExit, DtorFunc);
@@ -568,32 +590,45 @@ void createRegisterFatbinFunction(Module &M, GlobalVariable *FatbinDesc,
} // namespace
-Error wrapOpenMPBinaries(Module &M, ArrayRef<ArrayRef<char>> Images) {
- GlobalVariable *Desc = createBinDesc(M, Images);
+Error OffloadWrapper:...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, some comments.
if (!Desc) | ||
return createStringError(inconvertibleErrorCode(), | ||
"No fatinbary section created."); | ||
|
||
createRegisterFatbinFunction(M, Desc, /* IsHIP */ true); | ||
createRegisterFatbinFunction(M, Desc, /* IsHIP */ true, EntryArray, Suffix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix these comments while you're at it? LLVM inline comments should be /*IsHIP=*/
namespace offloading { | ||
/// Class for embedding and registering offloading images and related objects in | ||
/// a Module. | ||
class OffloadWrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like these should just be free functions and the extra two bits of state here are additional default arguments like you've done with EntryArray
.
|
||
/// Whether to emit surface and textures registration code. It defaults to | ||
/// false. | ||
bool EmitSurfacesAndTextures; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I wasn't sure about this either. I know that CUDA emits these __cudaRegisterSurface
calls, but I can't seem to find them in any of the exported libraries. It caused linker errors due to that and I was too lazy to fix it. Wondering if they've been deprecated, maybe @Artem-B knows.
if (!Desc) | ||
return createStringError(inconvertibleErrorCode(), | ||
"No binary descriptors created."); | ||
createRegisterFunction(M, Desc); | ||
createUnregisterFunction(M, Desc); | ||
createRegisterFunction(M, Desc, Suffix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the Suffix for exactly? It might be better just to give it some generic name, since the executed use currently it always _cuda_
or _omp_
as a name within some other stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in MLIR we can have multiple binaries, PTX, fatbinaries in a single IR module:
gpu.binary @binary_sm_70 [#gpu.object<#nvvm.target<chip="sm_70">, "BINARY BLOB">]
gpu.binary @binary_gfx90a [#gpu.object<#rocdl.target<chip="gfx90a">, "BINARY BLOB">]
...
// Call `kernel_name` in `binary_sm_70`
gpu.launch_func @binary_sm_70::kernel_name
// Call `kernel_name` in `binary_gfx90a`
gpu.launch_func @binary_gfx90a::kernel_name
I added the suffix field so that in MLIR we can append the binary identifier to the descriptor, registration constructor, etc. This makes the IR more readable.
M, Images, | ||
EntryArray | ||
? *EntryArray | ||
: offloading::getOffloadEntryArray(M, "omp_offloading_entries"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking that this argument shouldn't be default, it should be up to whoever calls it to create such an array. For the linker wrapper it would be getting the offloading utility first. Making these arrays is quite complicated for implicit default behavior if we're expecting other things to happen I feel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it default so clang-linker-wrapper
didn't see any functional changes, while allowing new usages. I think we should revisit this API for project offload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think it should be fine to just call this with offloading::getOffloadEntryArray(M, "xxx_offloading_entries")
at the callsite. std::optional
makes it a little weird here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, first some broader context, this patch is also part of a patch series that will add GPU compilation for OMP operations in MLIR without the need for flang
or clang
, which is not currently possible. This series also enables to JIT OMP operations in MLIR. The goal of the series is to make OMP target functional in MLIR as a standalone.
I allow the passage of a custom entry array because ORC JIT doesn't fully support __start
, __stop
symbols for grouping section data. My solution was allowing the custom entry array, so in MLIR I build the full entry array and never rely on sections, this applies to OMP, CUDA and HIP.
Thus we have that the following MLIR:
module attributes {gpu.container_module} {
gpu.binary @binary <#gpu.offload_embedding<cuda>> [#gpu.object<#nvvm.target, bin = "BLOB">]
llvm.func @func() {
%1 = llvm.mlir.constant(1 : index) : i64
gpu.launch_func @binary::@hello blocks in (%1, %1, %1) threads in (%1, %1, %1) : i64
gpu.launch_func @binary::@world blocks in (%1, %1, %1) threads in (%1, %1, %1) : i64
llvm.return
}
}
Produces:
@__begin_offload_binary = internal constant [2 x %struct.__tgt_offload_entry] [%struct.__tgt_offload_entry { ptr @binary_Khello, ptr @.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, %struct.__tgt_offload_entry { ptr @binary_Kworld, ptr @.omp_offloading.entry_name.2, i64 0, i32 0, i32 0 }]
@__end_offload_binary = internal constant ptr getelementptr inbounds (%struct.__tgt_offload_entry, ptr @__begin_offload_binary, i64 2)
@.fatbin_image.binary = internal constant [4 x i8] c"BLOB", section ".nv_fatbin"
@.fatbin_wrapper.binary = internal constant %fatbin_wrapper { i32 1180844977, i32 1, ptr @.fatbin_image.binary, ptr null }, section ".nvFatBinSegment", align 8
@.cuda.binary_handle.binary = internal global ptr null
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @.cuda.fatbin_reg.binary, ptr null }]
@binary_Khello = weak constant i8 0
@.omp_offloading.entry_name = internal unnamed_addr constant [6 x i8] c"hello\00"
@binary_Kworld = weak constant i8 0
@.omp_offloading.entry_name.2 = internal unnamed_addr constant [6 x i8] c"world\00"
...
And this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll probably make a patch after this to make the surface handling for CUDA default off because it seems to be unsupported.
This patch adds the offloading translation attribute. This attribute uses LLVM offloading infrastructure to embed GPU binaries in the IR. At the program start, the LLVM offloading mechanism registers kernels and variables with the runtime library: CUDA RT, HIP RT, or LibOMPTarget. The offloading mechanism relies on the runtime library to dispatch the correct kernel based on the registered symbols. This patch is 3/4 on introducing the OffloadEmbeddingAttr GPU translation attribute. Note: Ignore the base commits; those are being reviewed in PRs llvm#78057, llvm#78098, and llvm#78073.
….* to llvm/Frontend/Offloading (llvm#78057) This patch moves `clang/tools/clang-linker-wrapper/OffloadWrapper.*` to `llvm/Frontend/Offloading` allowing them to be re-utilized by other projects. Additionally, it makes minor modifications to the API to make it more flexible. Concretely: - The `wrap*` methods now have additional arguments `EntryArray`, `Suffix` and `EmitSurfacesAndTextures` to specify some additional options. - The `EntryArray` is now constructed by the caller. This change is needed to enable JIT compilation, as ORC doesn't fully support `__start_` and `__stop_` symbols. Thus, to JIT the code, the `EntryArray` has to be constructed explicitly in the IR. - The `Suffix` field is used when emitting the descriptor, registration methods, etc, to make them more readable. It is empty by default. - The `EmitSurfacesAndTextures` field controls whether to emit surface and texture registration code, as those functions were removed from `CUDART` in CUDA 12. It is true by default. - The function `getOffloadingEntryInitializer` was added to help create the `EntryArray`, as it returns the constant initializer and not a global variable.
This patch moves
clang/tools/clang-linker-wrapper/OffloadWrapper.*
tollvm/Frontend/Offloading
allowing them to be reutilized by other projects.Additionally, it makes minor modifications to the API to make it more flexible.
Concretely:
wrap*
methods are moved to theOffloadWrapper
class.OffloadWrapper
includesSuffix
andEmitSurfacesAndTextures
fieldsto specify some additional options.
Suffix
field is used when emitting the descriptor, registration methods,etc, to make them more readable. It is empty by default.
EmitSurfacesAndTextures
field controls whether to emit surface andtexture registration code, as those functions were removed from
CUDART
in CUDA 12. It is true by default.
wrap*
methods now have an optional field to specify theEntryArray
;this change is needed to enable JIT compilation, as ORC doesn't fully support
__start_
and__stop_
symbols. Thus, to JIT the code, theEntryArray
hasto be constructed explicitly in the IR.
getOffloadingEntryInitializer
was added to help create theEntryArray
, as it returns the constant initializer and not a global variable.