[OpenMP] Rework handling of global ctor/dtors in OpenMP #71739

jhuber6 · 2023-11-08T21:49:09Z

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard dlopen handling. For AMDGPU, we
emit amdgcn.device.init and amdgcn.device.fini functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch #71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use llvm.global_dtors instead of
using atexit. This is because atexit is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

struct S { ~S() { foo(); } };
void foo() {
  static S s;
}

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: #71549

llvmbot · 2023-11-08T21:49:37Z

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

[NVPTX] Allow the ctor/dtor lowering pass to emit kernels
[OpenMP] Rework handling of global ctor/dtors in OpenMP

Patch is 57.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71739.diff

21 Files Affected:

(modified) clang/lib/CodeGen/CGDeclCXX.cpp (+10-4)
(modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (-130)
(modified) clang/lib/CodeGen/CGOpenMPRuntime.h (-8)
(modified) clang/lib/CodeGen/CodeGenFunction.h (+5)
(modified) clang/lib/CodeGen/CodeGenModule.h (+7-7)
(modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+7)
(modified) clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp (+34-14)
(modified) clang/test/OpenMP/amdgcn_target_global_constructor.cpp (+35-10)
(modified) clang/test/OpenMP/declare_target_codegen.cpp (-1)
(modified) clang/test/OpenMP/nvptx_declare_target_var_ctor_dtor_codegen.cpp (+4-31)
(modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (-4)
(modified) llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp (+1-1)
(modified) llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp (+155-7)
(modified) llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll (+58)
(modified) openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp (+52)
(modified) openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.cpp (+22)
(modified) openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.h (+4)
(modified) openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp (+7)
(modified) openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h (+14)
(modified) openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp (+109)
(modified) openmp/libomptarget/src/rtl.cpp (+8-1)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..d816aa8554df8bb 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -22,6 +22,7 @@
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
 
 using namespace clang;
 using namespace CodeGen;
@@ -327,6 +328,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }
 
+/// Register a global destructor using the C atexit runtime function.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+                                                 llvm::FunctionCallee Dtor,
+                                                 llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +529,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D,
        D->hasAttr<CUDASharedAttr>()))
     return;
 
-  if (getLangOpts().OpenMP &&
-      getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-    return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emitThreadPrivateVarDefinition(
   return nullptr;
 }
 
-bool CGOpenMPRuntime::emitDeclareTargetVarDefinition(const VarDecl *VD,
-                                                     llvm::GlobalVariable *Addr,
-                                                     bool PerformInit) {
-  if (CGM.getLangOpts().OMPTargetTriples.empty() &&
-      !CGM.getLangOpts().OpenMPIsTargetDevice)
-    return false;
-  std::optional<OMPDeclareTargetDeclAttr::MapTypeTy> Res =
-      OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD);
-  if (!Res || *Res == OMPDeclareTargetDeclAttr::MT_Link ||
-      ((*Res == OMPDeclareTargetDeclAttr::MT_To ||
-        *Res == OMPDeclareTargetDeclAttr::MT_Enter) &&
-       HasRequiresUnifiedSharedMemory))
-    return CGM.getLangOpts().OpenMPIsTargetDevice;
-  VD = VD->getDefinition(CGM.getContext());
-  assert(VD && "Unknown VarDecl");
-
-  if (!DeclareTargetWithDefinition.insert(CGM.getMangledName(VD)).second)
-    return CGM.getLangOpts().OpenMPIsTargetDevice;
-
-  QualType ASTTy = VD->getType();
-  SourceLocation Loc = VD->getCanonicalDecl()->getBeginLoc();
-
-  // Produce the unique prefix to identify the new target regions. We use
-  // the source location of the variable declaration which we know to not
-  // conflict with any target region.
-  llvm::TargetRegionEntryInfo EntryInfo =
-      getEntryInfoFromPresumedLoc(CGM, OMPBuilder, Loc, VD->getName());
-  SmallString<128> Buffer, Out;
-  OMPBuilder.OffloadInfoManager.getTargetRegionEntryFnName(Buffer, EntryInfo);
-
-  const Expr *Init = VD->getAnyInitializer();
-  if (CGM.getLangOpts().CPlusPlus && PerformInit) {
-    llvm::Constant *Ctor;
-    llvm::Constant *ID;
-    if (CGM.getLangOpts().OpenMPIsTargetDevice) {
-      // Generate function that re-emits the declaration's initializer into
-      // the threadprivate copy of the variable VD
-      CodeGenFunction CtorCGF(CGM);
-
-      const CGFunctionInfo &FI = CGM.getTypes().arrangeNullaryFunction();
-      llvm::FunctionType *FTy = CGM.getTypes().GetFunctionType(FI);
-      llvm::Function *Fn = CGM.CreateGlobalInitOrCleanUpFunction(
-          FTy, Twine(Buffer, "_ctor"), FI, Loc, false,
-          llvm::GlobalValue::WeakODRLinkage);
-      Fn->setVisibility(llvm::GlobalValue::ProtectedVisibility);
-      if (CGM.getTriple().isAMDGCN())
-        Fn->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
-      auto NL = ApplyDebugLocation::CreateEmpty(CtorCGF);
-      CtorCGF.StartFunction(GlobalDecl(), CGM.getContext().VoidTy, Fn, FI,
-                            FunctionArgList(), Loc, Loc);
-      auto AL = ApplyDebugLocation::CreateArtificial(CtorCGF);
-      llvm::Constant *AddrInAS0 = Addr;
-      if (Addr->getAddressSpace() != 0)
-        AddrInAS0 = llvm::ConstantExpr::getAddrSpaceCast(
-            Addr, llvm::PointerType::get(CGM.getLLVMContext(), 0));
-      CtorCGF.EmitAnyExprToMem(Init,
-                               Address(AddrInAS0, Addr->getValueType(),
-                                       CGM.getContext().getDeclAlign(VD)),
-                               Init->getType().getQualifiers(),
-                               /*IsInitializer=*/true);
-      CtorCGF.FinishFunction();
-      Ctor = Fn;
-      ID = Fn;
-    } else {
-      Ctor = new llvm::GlobalVariable(
-          CGM.getModule(), CGM.Int8Ty, /*isConstant=*/true,
-          llvm::GlobalValue::PrivateLinkage,
-          llvm::Constant::getNullValue(CGM.Int8Ty), Twine(Buffer, "_ctor"));
-      ID = Ctor;
-    }
-
-    // Register the information for the entry associated with the constructor.
-    Out.clear();
-    auto CtorEntryInfo = EntryInfo;
-    CtorEntryInfo.ParentName = Twine(Buffer, "_ctor").toStringRef(Out);
-    OMPBuilder.OffloadInfoManager.registerTargetRegionEntryInfo(
-        CtorEntryInfo, Ctor, ID,
-        llvm::OffloadEntriesInfoManager::OMPTargetRegionEntryCtor);
-  }
-  if (VD->getType().isDestructedType() != QualType::DK_none) {
-    llvm::Constant *Dtor;
-    llvm::Constant *ID;
-    if (CGM.getLangOpts().OpenMPIsTargetDevice) {
-      // Generate function that emits destructor call for the threadprivate
-      // copy of the variable VD
-      CodeGenFunction DtorCGF(CGM);
-
-      const CGFunctionInfo &FI = CGM.getTypes().arrangeNullaryFunction();
-      llvm::FunctionType *FTy = CGM.getTypes().GetFunctionType(FI);
-      llvm::Function *Fn = CGM.CreateGlobalInitOrCleanUpFunction(
-          FTy, Twine(Buffer, "_dtor"), FI, Loc, false,
-          llvm::GlobalValue::WeakODRLinkage);
-      Fn->setVisibility(llvm::GlobalValue::ProtectedVisibility);
-      if (CGM.getTriple().isAMDGCN())
-        Fn->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
-      auto NL = ApplyDebugLocation::CreateEmpty(DtorCGF);
-      DtorCGF.StartFunction(GlobalDecl(), CGM.getContext().VoidTy, Fn, FI,
-                            FunctionArgList(), Loc, Loc);
-      // Create a scope with an artificial location for the body of this
-      // function.
-      auto AL = ApplyDebugLocation::CreateArtificial(DtorCGF);
-      llvm::Constant *AddrInAS0 = Addr;
-      if (Addr->getAddressSpace() != 0)
-        AddrInAS0 = llvm::ConstantExpr::getAddrSpaceCast(
-            Addr, llvm::PointerType::get(CGM.getLLVMContext(), 0));
-      DtorCGF.emitDestroy(Address(AddrInAS0, Addr->getValueType(),
-                                  CGM.getContext().getDeclAlign(VD)),
-                          ASTTy, DtorCGF.getDestroyer(ASTTy.isDestructedType()),
-                          DtorCGF.needsEHCleanup(ASTTy.isDestructedType()));
-      DtorCGF.FinishFunction();
-      Dtor = Fn;
-      ID = Fn;
-    } else {
-      Dtor = new llvm::GlobalVariable(
-          CGM.getModule(), CGM.Int8Ty, /*isConstant=*/true,
-          llvm::GlobalValue::PrivateLinkage,
-          llvm::Constant::getNullValue(CGM.Int8Ty), Twine(Buffer, "_dtor"));
-      ID = Dtor;
-    }
-    // Register the information for the entry associated with the destructor.
-    Out.clear();
-    auto DtorEntryInfo = EntryInfo;
-    DtorEntryInfo.ParentName = Twine(Buffer, "_dtor").toStringRef(Out);
-    OMPBuilder.OffloadInfoManager.registerTargetRegionEntryInfo(
-        DtorEntryInfo, Dtor, ID,
-        llvm::OffloadEntriesInfoManager::OMPTargetRegionEntryDtor);
-  }
-  return CGM.getLangOpts().OpenMPIsTargetDevice;
-}
-
 void CGOpenMPRuntime::emitDeclareTargetFunction(const FunctionDecl *FD,
                                                 llvm::GlobalValue *GV) {
   std::optional<OMPDeclareTargetDeclAttr *> ActiveAttr =
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index 0c4ad46e881b9c5..b01b39abd1606af 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -1089,14 +1089,6 @@ class CGOpenMPRuntime {
                                  SourceLocation Loc, bool PerformInit,
                                  CodeGenFunction *CGF = nullptr);
 
-  /// Emit a code for initialization of declare target variable.
-  /// \param VD Declare target variable.
-  /// \param Addr Address of the global variable \a VD.
-  /// \param PerformInit true if initialization expression is not constant.
-  virtual bool emitDeclareTargetVarDefinition(const VarDecl *VD,
-                                              llvm::GlobalVariable *Addr,
-                                              bool PerformInit);
-
   /// Emit code for handling declare target functions in the runtime.
   /// \param FD Declare target function.
   /// \param Addr Address of the global \a FD.
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 42f94c9b540191e..f25e03b02762628 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4527,6 +4527,11 @@ class CodeGenFunction : public CodeGenTypeCache {
   void registerGlobalDtorWithAtExit(const VarDecl &D, llvm::FunctionCallee fn,
                                     llvm::Constant *addr);
 
+  /// Registers the dtor using 'llvm.global_dtors' for platforms that do not
+  /// support an 'atexit()' function.
+  void registerGlobalDtorWithLLVM(const VarDecl &D, llvm::FunctionCallee fn,
+                                  llvm::Constant *addr);
+
   /// Call atexit() with function dtorStub.
   void registerGlobalDtorWithAtExit(llvm::Constant *dtorStub);
 
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 793861f23b15f95..e81edc979c208b1 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1570,6 +1570,13 @@ class CodeGenModule : public CodeGenTypeCache {
                         const VarDecl *D,
                         ForDefinition_t IsForDefinition = NotForDefinition);
 
+  // FIXME: Hardcoding priority here is gross.
+  void AddGlobalCtor(llvm::Function *Ctor, int Priority = 65535,
+                     unsigned LexOrder = ~0U,
+                     llvm::Constant *AssociatedData = nullptr);
+  void AddGlobalDtor(llvm::Function *Dtor, int Priority = 65535,
+                     bool IsDtorAttrFunc = false);
+
 private:
   llvm::Constant *GetOrCreateLLVMFunction(
       StringRef MangledName, llvm::Type *Ty, GlobalDecl D, bool ForVTable,
@@ -1641,13 +1648,6 @@ class CodeGenModule : public CodeGenTypeCache {
   void EmitPointerToInitFunc(const VarDecl *VD, llvm::GlobalVariable *Addr,
                              llvm::Function *InitFunc, InitSegAttr *ISA);
 
-  // FIXME: Hardcoding priority here is gross.
-  void AddGlobalCtor(llvm::Function *Ctor, int Priority = 65535,
-                     unsigned LexOrder = ~0U,
-                     llvm::Constant *AssociatedData = nullptr);
-  void AddGlobalDtor(llvm::Function *Dtor, int Priority = 65535,
-                     bool IsDtorAttrFunc = false);
-
   /// EmitCtorList - Generates a global array of functions and priorities using
   /// the given list and name. This array will have appending linkage and is
   /// suitable for use as a LLVM constructor or destructor array. Clears Fns.
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 89a2127f3761af4..af022002cd5702a 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -2794,6 +2794,13 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
     return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+      (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))
+    return CGF.registerGlobalDtorWithLLVM(D, dtor, addr);
+
   // emitGlobalDtorWithCXAAtExit will emit a call to either __cxa_thread_atexit
   // or __cxa_atexit depending on whether this VarDecl is a thread-local storage
   // or not. CXAAtExit controls only __cxa_atexit, so use it if it is enabled.
diff --git a/clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp b/clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp
index a5bb949ccaad3ac..0fdc02edc15086f 100644
--- a/clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp
+++ b/clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp
@@ -35,7 +35,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 
 
 #pragma omp end declare target
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fabsf_f32_l14_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init
 // CHECK-SAME: () #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -49,7 +49,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fabs_f32_l15_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.1
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -69,7 +69,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_sinf_f32_l17_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.2
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -83,7 +83,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_sin_f32_l18_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.3
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -103,7 +103,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_cosf_f32_l20_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.4
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -117,7 +117,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_cos_f32_l21_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.5
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -137,7 +137,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fmaf_f32_l23_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.6
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -159,7 +159,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fma_f32_l24_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.7
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -195,7 +195,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_min_f32_l27_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.8
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -213,7 +213,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_max_f32_l28_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.9
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -231,7 +231,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fmin_f32_l30_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.10
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef float @_Z4fminff(float noundef 2.000000e+00, float noundef -4.000000e+00) #[[ATTR4:[0-9]+]]
@@ -239,7 +239,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fmax_f32_l31_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.11
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef float @_Z4fmaxff(float noundef 2.000000e+00, float noundef -4.000000e+00) #[[ATTR4]]
@@ -247,7 +247,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fminf_f32_l33_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.12
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -265,7 +265,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fmaxf_f32_l34_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.13
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -282,3 +282,23 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    store float [[TMP2]], ptr addrspacecast (ptr addrspace(1) @_ZL19constexpr_fmaxf_f32 to ptr), align 4
 // CHECK-NEXT:    ret void
 //
+//
+// CHECK-LABEL: define {{[^@]+}}@_GLOBAL__sub_I_amdgcn_openmp_device_math_constexpr.cpp
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @__cxx_global_var_init()
+// CHECK-NEXT:    call void @__cxx_global_var_init.1()
+// CHECK-NEXT:    call void @__cxx_global_var_init.2()
+// CHECK-NEXT:    call void @__cxx_global_var_init.3()
+// CHECK-NEXT:    call void @__cxx_global_var_init.4()
+// CHECK-NEXT:    call void @__cxx_global_var_init.5()
+// CHECK-NEXT:    call void @__cxx_global_var_init.6()
+// CHECK-NEXT:    call void @__cxx_global_var_init.7()
+// CHECK-NEXT:    call void @__cxx_global_var_init.8()
+// CHECK-NEXT:    call void @__cxx_global_var_init.9()
+// CHECK-NEXT:    call void @__cxx_global_var_init.10()
+// CHECK-NEXT:    call void @__cxx_global_var_init.11()
+// CHECK-NEXT:    call void @__cxx_global_var_init.12()
+// CHECK-NEXT:    call void @__cxx_global_var_init.13()
+// CHECK-NEXT:    ret void
+//
diff --git a/clang/test/OpenMP/amdgcn_target_global_constructor.cpp b/clang/test/OpenMP/amdgcn_target_global_constructor.cpp
index ff5a3ba2b95d267..c8f150431c7fded 100644
--- a/clang/test/OpenMP/amdgcn_target_global_constructor.cpp
+++ b/clang/test/OpenMP/amdgcn_target_global_constructor.cpp
@@ -1,4 +1,4 @@
-// NOTE: Asser...
[truncated]

llvmbot · 2023-11-08T21:49:37Z

@llvm/pr-subscribers-clang-codegen

Author: Joseph Huber (jhuber6)

Changes

[NVPTX] Allow the ctor/dtor lowering pass to emit kernels
[OpenMP] Rework handling of global ctor/dtors in OpenMP

Patch is 57.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71739.diff

21 Files Affected:

(modified) clang/lib/CodeGen/CGDeclCXX.cpp (+10-4)
(modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (-130)
(modified) clang/lib/CodeGen/CGOpenMPRuntime.h (-8)
(modified) clang/lib/CodeGen/CodeGenFunction.h (+5)
(modified) clang/lib/CodeGen/CodeGenModule.h (+7-7)
(modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+7)
(modified) clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp (+34-14)
(modified) clang/test/OpenMP/amdgcn_target_global_constructor.cpp (+35-10)
(modified) clang/test/OpenMP/declare_target_codegen.cpp (-1)
(modified) clang/test/OpenMP/nvptx_declare_target_var_ctor_dtor_codegen.cpp (+4-31)
(modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (-4)
(modified) llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp (+1-1)
(modified) llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp (+155-7)
(modified) llvm/test/CodeGen/NVPTX/lower-ctor-dtor.ll (+58)
(modified) openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp (+52)
(modified) openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.cpp (+22)
(modified) openmp/libomptarget/plugins-nextgen/common/PluginInterface/GlobalHandler.h (+4)
(modified) openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp (+7)
(modified) openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h (+14)
(modified) openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp (+109)
(modified) openmp/libomptarget/src/rtl.cpp (+8-1)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..d816aa8554df8bb 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -22,6 +22,7 @@
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
 
 using namespace clang;
 using namespace CodeGen;
@@ -327,6 +328,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }
 
+/// Register a global destructor using the C atexit runtime function.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+                                                 llvm::FunctionCallee Dtor,
+                                                 llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +529,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D,
        D->hasAttr<CUDASharedAttr>()))
     return;
 
-  if (getLangOpts().OpenMP &&
-      getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-    return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emitThreadPrivateVarDefinition(
   return nullptr;
 }
 
-bool CGOpenMPRuntime::emitDeclareTargetVarDefinition(const VarDecl *VD,
-                                                     llvm::GlobalVariable *Addr,
-                                                     bool PerformInit) {
-  if (CGM.getLangOpts().OMPTargetTriples.empty() &&
-      !CGM.getLangOpts().OpenMPIsTargetDevice)
-    return false;
-  std::optional<OMPDeclareTargetDeclAttr::MapTypeTy> Res =
-      OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD);
-  if (!Res || *Res == OMPDeclareTargetDeclAttr::MT_Link ||
-      ((*Res == OMPDeclareTargetDeclAttr::MT_To ||
-        *Res == OMPDeclareTargetDeclAttr::MT_Enter) &&
-       HasRequiresUnifiedSharedMemory))
-    return CGM.getLangOpts().OpenMPIsTargetDevice;
-  VD = VD->getDefinition(CGM.getContext());
-  assert(VD && "Unknown VarDecl");
-
-  if (!DeclareTargetWithDefinition.insert(CGM.getMangledName(VD)).second)
-    return CGM.getLangOpts().OpenMPIsTargetDevice;
-
-  QualType ASTTy = VD->getType();
-  SourceLocation Loc = VD->getCanonicalDecl()->getBeginLoc();
-
-  // Produce the unique prefix to identify the new target regions. We use
-  // the source location of the variable declaration which we know to not
-  // conflict with any target region.
-  llvm::TargetRegionEntryInfo EntryInfo =
-      getEntryInfoFromPresumedLoc(CGM, OMPBuilder, Loc, VD->getName());
-  SmallString<128> Buffer, Out;
-  OMPBuilder.OffloadInfoManager.getTargetRegionEntryFnName(Buffer, EntryInfo);
-
-  const Expr *Init = VD->getAnyInitializer();
-  if (CGM.getLangOpts().CPlusPlus && PerformInit) {
-    llvm::Constant *Ctor;
-    llvm::Constant *ID;
-    if (CGM.getLangOpts().OpenMPIsTargetDevice) {
-      // Generate function that re-emits the declaration's initializer into
-      // the threadprivate copy of the variable VD
-      CodeGenFunction CtorCGF(CGM);
-
-      const CGFunctionInfo &FI = CGM.getTypes().arrangeNullaryFunction();
-      llvm::FunctionType *FTy = CGM.getTypes().GetFunctionType(FI);
-      llvm::Function *Fn = CGM.CreateGlobalInitOrCleanUpFunction(
-          FTy, Twine(Buffer, "_ctor"), FI, Loc, false,
-          llvm::GlobalValue::WeakODRLinkage);
-      Fn->setVisibility(llvm::GlobalValue::ProtectedVisibility);
-      if (CGM.getTriple().isAMDGCN())
-        Fn->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
-      auto NL = ApplyDebugLocation::CreateEmpty(CtorCGF);
-      CtorCGF.StartFunction(GlobalDecl(), CGM.getContext().VoidTy, Fn, FI,
-                            FunctionArgList(), Loc, Loc);
-      auto AL = ApplyDebugLocation::CreateArtificial(CtorCGF);
-      llvm::Constant *AddrInAS0 = Addr;
-      if (Addr->getAddressSpace() != 0)
-        AddrInAS0 = llvm::ConstantExpr::getAddrSpaceCast(
-            Addr, llvm::PointerType::get(CGM.getLLVMContext(), 0));
-      CtorCGF.EmitAnyExprToMem(Init,
-                               Address(AddrInAS0, Addr->getValueType(),
-                                       CGM.getContext().getDeclAlign(VD)),
-                               Init->getType().getQualifiers(),
-                               /*IsInitializer=*/true);
-      CtorCGF.FinishFunction();
-      Ctor = Fn;
-      ID = Fn;
-    } else {
-      Ctor = new llvm::GlobalVariable(
-          CGM.getModule(), CGM.Int8Ty, /*isConstant=*/true,
-          llvm::GlobalValue::PrivateLinkage,
-          llvm::Constant::getNullValue(CGM.Int8Ty), Twine(Buffer, "_ctor"));
-      ID = Ctor;
-    }
-
-    // Register the information for the entry associated with the constructor.
-    Out.clear();
-    auto CtorEntryInfo = EntryInfo;
-    CtorEntryInfo.ParentName = Twine(Buffer, "_ctor").toStringRef(Out);
-    OMPBuilder.OffloadInfoManager.registerTargetRegionEntryInfo(
-        CtorEntryInfo, Ctor, ID,
-        llvm::OffloadEntriesInfoManager::OMPTargetRegionEntryCtor);
-  }
-  if (VD->getType().isDestructedType() != QualType::DK_none) {
-    llvm::Constant *Dtor;
-    llvm::Constant *ID;
-    if (CGM.getLangOpts().OpenMPIsTargetDevice) {
-      // Generate function that emits destructor call for the threadprivate
-      // copy of the variable VD
-      CodeGenFunction DtorCGF(CGM);
-
-      const CGFunctionInfo &FI = CGM.getTypes().arrangeNullaryFunction();
-      llvm::FunctionType *FTy = CGM.getTypes().GetFunctionType(FI);
-      llvm::Function *Fn = CGM.CreateGlobalInitOrCleanUpFunction(
-          FTy, Twine(Buffer, "_dtor"), FI, Loc, false,
-          llvm::GlobalValue::WeakODRLinkage);
-      Fn->setVisibility(llvm::GlobalValue::ProtectedVisibility);
-      if (CGM.getTriple().isAMDGCN())
-        Fn->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
-      auto NL = ApplyDebugLocation::CreateEmpty(DtorCGF);
-      DtorCGF.StartFunction(GlobalDecl(), CGM.getContext().VoidTy, Fn, FI,
-                            FunctionArgList(), Loc, Loc);
-      // Create a scope with an artificial location for the body of this
-      // function.
-      auto AL = ApplyDebugLocation::CreateArtificial(DtorCGF);
-      llvm::Constant *AddrInAS0 = Addr;
-      if (Addr->getAddressSpace() != 0)
-        AddrInAS0 = llvm::ConstantExpr::getAddrSpaceCast(
-            Addr, llvm::PointerType::get(CGM.getLLVMContext(), 0));
-      DtorCGF.emitDestroy(Address(AddrInAS0, Addr->getValueType(),
-                                  CGM.getContext().getDeclAlign(VD)),
-                          ASTTy, DtorCGF.getDestroyer(ASTTy.isDestructedType()),
-                          DtorCGF.needsEHCleanup(ASTTy.isDestructedType()));
-      DtorCGF.FinishFunction();
-      Dtor = Fn;
-      ID = Fn;
-    } else {
-      Dtor = new llvm::GlobalVariable(
-          CGM.getModule(), CGM.Int8Ty, /*isConstant=*/true,
-          llvm::GlobalValue::PrivateLinkage,
-          llvm::Constant::getNullValue(CGM.Int8Ty), Twine(Buffer, "_dtor"));
-      ID = Dtor;
-    }
-    // Register the information for the entry associated with the destructor.
-    Out.clear();
-    auto DtorEntryInfo = EntryInfo;
-    DtorEntryInfo.ParentName = Twine(Buffer, "_dtor").toStringRef(Out);
-    OMPBuilder.OffloadInfoManager.registerTargetRegionEntryInfo(
-        DtorEntryInfo, Dtor, ID,
-        llvm::OffloadEntriesInfoManager::OMPTargetRegionEntryDtor);
-  }
-  return CGM.getLangOpts().OpenMPIsTargetDevice;
-}
-
 void CGOpenMPRuntime::emitDeclareTargetFunction(const FunctionDecl *FD,
                                                 llvm::GlobalValue *GV) {
   std::optional<OMPDeclareTargetDeclAttr *> ActiveAttr =
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index 0c4ad46e881b9c5..b01b39abd1606af 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -1089,14 +1089,6 @@ class CGOpenMPRuntime {
                                  SourceLocation Loc, bool PerformInit,
                                  CodeGenFunction *CGF = nullptr);
 
-  /// Emit a code for initialization of declare target variable.
-  /// \param VD Declare target variable.
-  /// \param Addr Address of the global variable \a VD.
-  /// \param PerformInit true if initialization expression is not constant.
-  virtual bool emitDeclareTargetVarDefinition(const VarDecl *VD,
-                                              llvm::GlobalVariable *Addr,
-                                              bool PerformInit);
-
   /// Emit code for handling declare target functions in the runtime.
   /// \param FD Declare target function.
   /// \param Addr Address of the global \a FD.
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 42f94c9b540191e..f25e03b02762628 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4527,6 +4527,11 @@ class CodeGenFunction : public CodeGenTypeCache {
   void registerGlobalDtorWithAtExit(const VarDecl &D, llvm::FunctionCallee fn,
                                     llvm::Constant *addr);
 
+  /// Registers the dtor using 'llvm.global_dtors' for platforms that do not
+  /// support an 'atexit()' function.
+  void registerGlobalDtorWithLLVM(const VarDecl &D, llvm::FunctionCallee fn,
+                                  llvm::Constant *addr);
+
   /// Call atexit() with function dtorStub.
   void registerGlobalDtorWithAtExit(llvm::Constant *dtorStub);
 
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 793861f23b15f95..e81edc979c208b1 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1570,6 +1570,13 @@ class CodeGenModule : public CodeGenTypeCache {
                         const VarDecl *D,
                         ForDefinition_t IsForDefinition = NotForDefinition);
 
+  // FIXME: Hardcoding priority here is gross.
+  void AddGlobalCtor(llvm::Function *Ctor, int Priority = 65535,
+                     unsigned LexOrder = ~0U,
+                     llvm::Constant *AssociatedData = nullptr);
+  void AddGlobalDtor(llvm::Function *Dtor, int Priority = 65535,
+                     bool IsDtorAttrFunc = false);
+
 private:
   llvm::Constant *GetOrCreateLLVMFunction(
       StringRef MangledName, llvm::Type *Ty, GlobalDecl D, bool ForVTable,
@@ -1641,13 +1648,6 @@ class CodeGenModule : public CodeGenTypeCache {
   void EmitPointerToInitFunc(const VarDecl *VD, llvm::GlobalVariable *Addr,
                              llvm::Function *InitFunc, InitSegAttr *ISA);
 
-  // FIXME: Hardcoding priority here is gross.
-  void AddGlobalCtor(llvm::Function *Ctor, int Priority = 65535,
-                     unsigned LexOrder = ~0U,
-                     llvm::Constant *AssociatedData = nullptr);
-  void AddGlobalDtor(llvm::Function *Dtor, int Priority = 65535,
-                     bool IsDtorAttrFunc = false);
-
   /// EmitCtorList - Generates a global array of functions and priorities using
   /// the given list and name. This array will have appending linkage and is
   /// suitable for use as a LLVM constructor or destructor array. Clears Fns.
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 89a2127f3761af4..af022002cd5702a 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -2794,6 +2794,13 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
     return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+      (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))
+    return CGF.registerGlobalDtorWithLLVM(D, dtor, addr);
+
   // emitGlobalDtorWithCXAAtExit will emit a call to either __cxa_thread_atexit
   // or __cxa_atexit depending on whether this VarDecl is a thread-local storage
   // or not. CXAAtExit controls only __cxa_atexit, so use it if it is enabled.
diff --git a/clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp b/clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp
index a5bb949ccaad3ac..0fdc02edc15086f 100644
--- a/clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp
+++ b/clang/test/Headers/amdgcn_openmp_device_math_constexpr.cpp
@@ -35,7 +35,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 
 
 #pragma omp end declare target
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fabsf_f32_l14_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init
 // CHECK-SAME: () #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -49,7 +49,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fabs_f32_l15_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.1
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -69,7 +69,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_sinf_f32_l17_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.2
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -83,7 +83,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_sin_f32_l18_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.3
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -103,7 +103,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_cosf_f32_l20_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.4
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -117,7 +117,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_cos_f32_l21_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.5
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -137,7 +137,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fmaf_f32_l23_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.6
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -159,7 +159,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fma_f32_l24_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.7
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -195,7 +195,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_min_f32_l27_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.8
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -213,7 +213,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_max_f32_l28_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.9
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -231,7 +231,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fmin_f32_l30_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.10
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef float @_Z4fminff(float noundef 2.000000e+00, float noundef -4.000000e+00) #[[ATTR4:[0-9]+]]
@@ -239,7 +239,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fmax_f32_l31_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.11
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef float @_Z4fmaxff(float noundef 2.000000e+00, float noundef -4.000000e+00) #[[ATTR4]]
@@ -247,7 +247,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fminf_f32_l33_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.12
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -265,7 +265,7 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    ret void
 //
 //
-// CHECK-LABEL: define {{[^@]+}}@{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}_constexpr_fmaxf_f32_l34_ctor
+// CHECK-LABEL: define {{[^@]+}}@__cxx_global_var_init.13
 // CHECK-SAME: () #[[ATTR0]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL_I:%.*]] = alloca float, align 4, addrspace(5)
@@ -282,3 +282,23 @@ const float constexpr_fmaxf_f32 = fmaxf(2.0f, -4.0f);
 // CHECK-NEXT:    store float [[TMP2]], ptr addrspacecast (ptr addrspace(1) @_ZL19constexpr_fmaxf_f32 to ptr), align 4
 // CHECK-NEXT:    ret void
 //
+//
+// CHECK-LABEL: define {{[^@]+}}@_GLOBAL__sub_I_amdgcn_openmp_device_math_constexpr.cpp
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @__cxx_global_var_init()
+// CHECK-NEXT:    call void @__cxx_global_var_init.1()
+// CHECK-NEXT:    call void @__cxx_global_var_init.2()
+// CHECK-NEXT:    call void @__cxx_global_var_init.3()
+// CHECK-NEXT:    call void @__cxx_global_var_init.4()
+// CHECK-NEXT:    call void @__cxx_global_var_init.5()
+// CHECK-NEXT:    call void @__cxx_global_var_init.6()
+// CHECK-NEXT:    call void @__cxx_global_var_init.7()
+// CHECK-NEXT:    call void @__cxx_global_var_init.8()
+// CHECK-NEXT:    call void @__cxx_global_var_init.9()
+// CHECK-NEXT:    call void @__cxx_global_var_init.10()
+// CHECK-NEXT:    call void @__cxx_global_var_init.11()
+// CHECK-NEXT:    call void @__cxx_global_var_init.12()
+// CHECK-NEXT:    call void @__cxx_global_var_init.13()
+// CHECK-NEXT:    ret void
+//
diff --git a/clang/test/OpenMP/amdgcn_target_global_constructor.cpp b/clang/test/OpenMP/amdgcn_target_global_constructor.cpp
index ff5a3ba2b95d267..c8f150431c7fded 100644
--- a/clang/test/OpenMP/amdgcn_target_global_constructor.cpp
+++ b/clang/test/OpenMP/amdgcn_target_global_constructor.cpp
@@ -1,4 +1,4 @@
-// NOTE: Asser...
[truncated]

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

jplehr

I have only briefly looked at the NVPTX implementation.

jplehr · 2023-11-09T09:02:41Z

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp

+    if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+      consumeError(std::move(Err));
+      return Error::success();


Is there a specific reason we do not return the error here, but instead consume and return success?

Also, I think this should be Plugin::success() to not deviate from what is used in the plugin.

If there were any global ctors / dtors the backend will emit a kernel. This is simply encoding "Does this symbol exist? If not continue on". We check the ELF symbol table directly as it's more efficient than going through the device API.

We probably need to encode the logic better, since consumeError is a bit of a code smell. Maybe a helper function like Handler.hasGlobal or something.

That would certainly make it more obvious.

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp

jplehr

Thanks Joseph. Another two nits.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h

arsenm · 2023-11-09T11:58:10Z

clang/lib/CodeGen/ItaniumCXXABI.cpp

+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+      !D.isStaticLocal() &&
+      (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))


Oh look, it's both of my favorite patterns. Can you refine this into something better than language X | language Y and AMDGPU || PTX

Yeah, these types of things are problematic especially if we consider getting SPIR-V support eventually. The logic basically goes like this. OpenMP supports global destructors but does not always support the atexit function. The old logic used to replace everything. This now at least lets CPU based targets use regular handling. I could make this unconditional for OpenMP, but I figured it'd be better to allow the CPU based targets to use the regular handling.

More or less this is just a concession to prevent regressions from this patch. The old logic looked like this, which did this unconditionally. Like I said, could remove the AMD and PTX checks and just do this on the CPU as well if it would be better.

if (CGM.getLangOpts().OMPTargetTriples.empty() && !CGM.getLangOpts().OpenMPIsTargetDevice) return false;

Just make this apply to all triples. I don't want to remove the dependency on the OpenMP language because this is somewhat of a hack. We can revisit this later if needed.

Would also just hide this in a target/lang predicate that lists these

So just some random helper function like "Does target support X?"

I could put something in LangOptions that just returns the same thing. Wasn't sure if it's worth forcing a recompile of everything though.

jhuber6 · 2023-11-09T13:06:03Z

Just noticed I'm actually calling the destructors backwards in AMDGPU. Will fix that.

github-actions · 2023-11-10T14:38:06Z

✅ With the latest revision this PR passed the C/C++ code formatter.

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp

jdoerfert · 2023-11-10T17:31:08Z

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp

+    // Allocate a buffer to store all of the known constructor / destructor
+    // functions in so we can iterate them on the device.
+    void *Buffer =
+        allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED);


Do we really need to used shared/managed memory here?

It's much more convenient than copying over the buffer. SHARED in CUDA context would be "migratable" memory without async access AFAIK. So this will most likely just invoke a migration once it's accessed. Unsure if that's slower or faster than waiting on an explicit memcpy.

I'm more worried about systems that do not have support than about the time.
If you think it's always supported, we can keep it for now.

openmp/libomptarget/src/rtl.cpp

jdoerfert

LG, check my comments.

jdoerfert · 2023-11-10T18:22:26Z

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp

+
+    // Allocate and construct the AMDGPU kernel.
+    AMDGPUKernelTy AMDGPUKernel(Name);
+    if (auto Err = AMDGPUKernel.initImpl(*this, Image))


Generally, we should always call the generic entry points, so, init, not initImpl.
Assuming you have no specific reason not to. Also below for launch.

jdoerfert · 2023-11-10T18:25:06Z

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp

+    // Allocate a buffer to store all of the known constructor / destructor
+    // functions in so we can iterate them on the device.
+    void *Buffer =
+        allocate(Funcs.size() * sizeof(void *), nullptr, TARGET_ALLOC_SHARED);


I'm more worried about systems that do not have support than about the time.
If you think it's always supported, we can keep it for now.

openmp/libomptarget/test/libc/global_ctor_dtor.cpp

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp

Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch llvm#71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: llvm#71549 Add LangOption for atexit usage Summary: This method isn't 1-to-1 but it's more functional than not having it.

Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch llvm#71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: llvm#71549 Change-Id: I99d449b4ca8c590a99fbd84774c673a4d49300a4

Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch llvm#71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: llvm#71549

Local branch amd-gfx 5c6a35e Merged main:dbd00c3b5d8a into amd-gfx:bf826207d2df Remote branch main 237adfc [OpenMP] Rework handling of global ctor/dtors in OpenMP (llvm#71739)

jhuber6 requested review from arsenm, Artem-B, jdoerfert, JonChesterfield, jplehr, MaskRay, ronlieb, shiltian and yxsamliu November 8, 2023 21:49

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:codegen flang:openmp clang:openmp OpenMP related changes to Clang openmp:libomptarget OpenMP offload runtime labels Nov 8, 2023

jhuber6 force-pushed the ReworkCtorDtor branch 2 times, most recently from 6be02dc to a9f8285 Compare November 8, 2023 22:02

Artem-B reviewed Nov 8, 2023

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp Outdated Show resolved Hide resolved

jhuber6 force-pushed the ReworkCtorDtor branch from a9f8285 to 159031c Compare November 8, 2023 22:38

jhuber6 changed the title ~~ReworkCtorDtor~~ [OpenMP] Rework handling of global ctor/dtors in OpenMP Nov 8, 2023

jhuber6 force-pushed the ReworkCtorDtor branch 2 times, most recently from 07a74b4 to 5e378ae Compare November 9, 2023 00:15

jplehr reviewed Nov 9, 2023

View reviewed changes

jhuber6 force-pushed the ReworkCtorDtor branch from 5e378ae to 0a1f4b5 Compare November 9, 2023 11:33

jplehr reviewed Nov 9, 2023

View reviewed changes

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h Show resolved Hide resolved

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h Show resolved Hide resolved

arsenm reviewed Nov 9, 2023

View reviewed changes

jhuber6 force-pushed the ReworkCtorDtor branch from 0a1f4b5 to 5283c5e Compare November 9, 2023 12:25

jhuber6 force-pushed the ReworkCtorDtor branch from 5283c5e to c3df637 Compare November 9, 2023 15:04

jhuber6 mentioned this pull request Nov 9, 2023

[AMDGPU] Call the FINI_ARRAY destructors in the correct order #71815

Merged

jhuber6 force-pushed the ReworkCtorDtor branch from c3df637 to 45a645c Compare November 9, 2023 16:56

jhuber6 mentioned this pull request Nov 9, 2023

[NVPTX] Allow the ctor/dtor lowering pass to emit kernels #71549

Merged

llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Nov 10, 2023

jhuber6 force-pushed the ReworkCtorDtor branch from 0f18490 to 2d16f64 Compare November 10, 2023 14:39

jdoerfert reviewed Nov 10, 2023

View reviewed changes

jhuber6 force-pushed the ReworkCtorDtor branch from 2d16f64 to 5366317 Compare November 10, 2023 17:41

jdoerfert approved these changes Nov 10, 2023

View reviewed changes

jhuber6 force-pushed the ReworkCtorDtor branch from 5366317 to e0281fc Compare November 10, 2023 19:03

jhuber6 merged commit 237adfc into llvm:main Nov 10, 2023
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenMP] Rework handling of global ctor/dtors in OpenMP #71739

[OpenMP] Rework handling of global ctor/dtors in OpenMP #71739

jhuber6 commented Nov 8, 2023 •

edited

Loading

llvmbot commented Nov 8, 2023 •

edited

Loading

llvmbot commented Nov 8, 2023

jplehr left a comment

jplehr Nov 9, 2023

jhuber6 Nov 9, 2023

jplehr Nov 9, 2023

jplehr left a comment

arsenm Nov 9, 2023

jhuber6 Nov 9, 2023

jhuber6 Nov 9, 2023

arsenm Nov 9, 2023

jhuber6 Nov 9, 2023

jhuber6 Nov 9, 2023

jhuber6 commented Nov 9, 2023

github-actions bot commented Nov 10, 2023 •

edited

Loading

jdoerfert Nov 10, 2023

jhuber6 Nov 10, 2023

jdoerfert Nov 10, 2023

jdoerfert left a comment

jdoerfert Nov 10, 2023

jdoerfert Nov 10, 2023

[OpenMP] Rework handling of global ctor/dtors in OpenMP #71739

[OpenMP] Rework handling of global ctor/dtors in OpenMP #71739

Conversation

jhuber6 commented Nov 8, 2023 • edited Loading

llvmbot commented Nov 8, 2023 • edited Loading

llvmbot commented Nov 8, 2023

jplehr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jplehr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhuber6 commented Nov 9, 2023

github-actions bot commented Nov 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdoerfert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhuber6 commented Nov 8, 2023 •

edited

Loading

llvmbot commented Nov 8, 2023 •

edited

Loading

github-actions bot commented Nov 10, 2023 •

edited

Loading