[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order #71815

jhuber6 · 2023-11-09T15:14:31Z

Summary:
The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY
sections to call all the global constructors in a single kernel.
Previously this mistakenly used the same iteration logic for both
arrays. The destructors stored in FINI_ARRAY are stored in the same order as
the ones in the INIT_ARRAY section so we need to traverse it in reverse order.

Summary: The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY sections to call all the global constructors in a single kernel. Previously this mistakenly used the same iteration logic for both arrays. The destructors stored in FINI_ARRAY are actually stored in reverse order, so we must start at the end of the array and decrement. This patch makes the neccesarry changes to properly respect priority.

llvmbot · 2023-11-09T15:15:02Z

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY
sections to call all the global constructors in a single kernel.
Previously this mistakenly used the same iteration logic for both
arrays. The destructors stored in FINI_ARRAY are actually stored in
reverse order, so we must start at the end of the array and decrement.
This patch makes the neccesarry changes to properly respect priority.

Full diff: https://github.com/llvm/llvm-project/pull/71815.diff

4 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp (+37-4)
(modified) llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll (+4-6)
(modified) llvm/test/CodeGen/AMDGPU/lower-ctor-dtor.ll (+7-16)
(modified) llvm/test/CodeGen/AMDGPU/lower-multiple-ctor-dtor.ll (+5-22)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
index a13447586bd4ba3..8814d2ca456d8f5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
@@ -53,13 +53,22 @@ static Function *createInitOrFiniKernelFunction(Module &M, bool IsCtor) {
 //
 // extern "C" void * __init_array_start[];
 // extern "C" void * __init_array_end[];
+// extern "C" void * __fini_array_start[];
+// extern "C" void * __fini_array_end[];
 //
 // using InitCallback = void();
+// using FiniCallback = void(void);
 //
 // void call_init_array_callbacks() {
 //   for (auto start = __init_array_start; start != __init_array_end; ++start)
 //     reinterpret_cast<InitCallback *>(*start)();
 // }
+//
+// void call_fini_array_callbacks() {
+//  size_t fini_array_size = __fini_array_end - __fini_array_start;
+//  for (size_t i = fini_array_size; i > 0; --i)
+//    reinterpret_cast<FiniCallback *>(__fini_array_start[i - 1])();
+// }
 static void createInitOrFiniCalls(Function &F, bool IsCtor) {
   Module &M = *F.getParent();
   LLVMContext &C = M.getContext();
@@ -96,15 +105,39 @@ static void createInitOrFiniCalls(Function &F, bool IsCtor) {
   // for now we just call them with no arguments.
   auto *CallBackTy = FunctionType::get(IRB.getVoidTy(), {});
 
-  IRB.CreateCondBr(IRB.CreateICmpNE(Begin, End), LoopBB, ExitBB);
+  Constant *Start = Begin;
+  Constant *Stop = End;
+  // The destructor array must be called in reverse order. Get a constant
+  // expression to the end of the array and iterate backwards instead.
+  if (!IsCtor) {
+    Type *Int64Ty = IntegerType::getInt64Ty(C);
+    auto *Offset = ConstantExpr::getSub(
+        ConstantExpr::getAShr(
+            ConstantExpr::getSub(ConstantExpr::getPtrToInt(End, Int64Ty),
+                                 ConstantExpr::getPtrToInt(Begin, Int64Ty)),
+            ConstantInt::get(Int64Ty, 3)),
+        ConstantInt::get(Int64Ty, 1));
+    Start = ConstantExpr::getGetElementPtr(
+        ArrayType::get(IRB.getPtrTy(), 0), Begin,
+        ArrayRef<Constant *>({ConstantInt::get(Int64Ty, 0), Offset}),
+        /*InBounds=*/true);
+    Stop = Begin;
+  }
+
+  IRB.CreateCondBr(
+      IRB.CreateCmp(IsCtor ? ICmpInst::ICMP_NE : ICmpInst::ICMP_UGE, Start,
+                    Stop),
+      LoopBB, ExitBB);
   IRB.SetInsertPoint(LoopBB);
   auto *CallBackPHI = IRB.CreatePHI(PtrTy, 2, "ptr");
   auto *CallBack = IRB.CreateLoad(CallBackTy->getPointerTo(F.getAddressSpace()),
                                   CallBackPHI, "callback");
   IRB.CreateCall(CallBackTy, CallBack);
-  auto *NewCallBack = IRB.CreateConstGEP1_64(PtrTy, CallBackPHI, 1, "next");
-  auto *EndCmp = IRB.CreateICmpEQ(NewCallBack, End, "end");
-  CallBackPHI->addIncoming(Begin, &F.getEntryBlock());
+  auto *NewCallBack =
+      IRB.CreateConstGEP1_64(PtrTy, CallBackPHI, IsCtor ? 1 : -1, "next");
+  auto *EndCmp = IRB.CreateCmp(IsCtor ? ICmpInst::ICMP_EQ : ICmpInst::ICMP_ULT,
+                               NewCallBack, Stop, "end");
+  CallBackPHI->addIncoming(Start, &F.getEntryBlock());
   CallBackPHI->addIncoming(NewCallBack, LoopBB);
   IRB.CreateCondBr(EndCmp, ExitBB, LoopBB);
   IRB.SetInsertPoint(ExitBB);
diff --git a/llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll b/llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll
index a1929a2e8931c11..f9dfa8b4e106656 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll
@@ -25,8 +25,6 @@ define void @bar() addrspace(1) {
   ret void
 }
 
-
-
 ;.
 ; CHECK: @[[LLVM_GLOBAL_CTORS:[a-zA-Z0-9_$"\\.-]+]] = appending addrspace(1) global [2 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @foo.alias, ptr null }, { i32, ptr, ptr } { i32 1, ptr inttoptr (i64 4096 to ptr), ptr null }]
 ; CHECK: @[[LLVM_GLOBAL_DTORS:[a-zA-Z0-9_$"\\.-]+]] = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr addrspacecast (ptr addrspace(1) @bar to ptr), ptr null }]
@@ -65,13 +63,13 @@ define void @bar() addrspace(1) {
 ; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.fini(
 ; CHECK-SAME: ) #[[ATTR2:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br i1 icmp ne (ptr addrspace(1) @__fini_array_start, ptr addrspace(1) @__fini_array_end), label [[WHILE_ENTRY:%.*]], label [[WHILE_END:%.*]]
+; CHECK-NEXT:    br i1 icmp uge (ptr addrspace(1) getelementptr inbounds ([0 x ptr], ptr addrspace(1) @__fini_array_start, i64 0, i64 sub (i64 ashr (i64 sub (i64 ptrtoint (ptr addrspace(1) @__fini_array_end to i64), i64 ptrtoint (ptr addrspace(1) @__fini_array_start to i64)), i64 3), i64 1)), ptr addrspace(1) @__fini_array_start), label [[WHILE_ENTRY:%.*]], label [[WHILE_END:%.*]]
 ; CHECK:       while.entry:
-; CHECK-NEXT:    [[PTR:%.*]] = phi ptr addrspace(1) [ @__fini_array_start, [[ENTRY:%.*]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
+; CHECK-NEXT:    [[PTR:%.*]] = phi ptr addrspace(1) [ getelementptr inbounds ([0 x ptr], ptr addrspace(1) @__fini_array_start, i64 0, i64 sub (i64 ashr (i64 sub (i64 ptrtoint (ptr addrspace(1) @__fini_array_end to i64), i64 ptrtoint (ptr addrspace(1) @__fini_array_start to i64)), i64 3), i64 1)), [[ENTRY:%.*]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
 ; CHECK-NEXT:    [[CALLBACK:%.*]] = load ptr, ptr addrspace(1) [[PTR]], align 8
 ; CHECK-NEXT:    call void [[CALLBACK]]()
-; CHECK-NEXT:    [[NEXT]] = getelementptr ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 1
-; CHECK-NEXT:    [[END:%.*]] = icmp eq ptr addrspace(1) [[NEXT]], @__fini_array_end
+; CHECK-NEXT:    [[NEXT]] = getelementptr ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 -1
+; CHECK-NEXT:    [[END:%.*]] = icmp ult ptr addrspace(1) [[NEXT]], @__fini_array_start
 ; CHECK-NEXT:    br i1 [[END]], label [[WHILE_END]], label [[WHILE_ENTRY]]
 ; CHECK:       while.end:
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/CodeGen/AMDGPU/lower-ctor-dtor.ll b/llvm/test/CodeGen/AMDGPU/lower-ctor-dtor.ll
index 968442182229723..4f228af90c65a00 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-ctor-dtor.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-ctor-dtor.ll
@@ -12,20 +12,19 @@
 @llvm.global_ctors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @foo, ptr null }]
 @llvm.global_dtors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @bar, ptr null }]
 
-
-
-
-
 ; VISIBILITY: FUNC   WEAK PROTECTED {{.*}} amdgcn.device.init
 ; VISIBILITY: OBJECT WEAK DEFAULT {{.*}} amdgcn.device.init.kd
 ; VISIBILITY: FUNC   WEAK PROTECTED {{.*}} amdgcn.device.fini
 ; VISIBILITY: OBJECT   WEAK DEFAULT {{.*}} amdgcn.device.fini.kd
+
 ; SECTION: .init_array.1     INIT_ARRAY      {{.*}} {{.*}} 000008 00  WA  0   0  8
 ; SECTION: .fini_array.1     FINI_ARRAY      {{.*}} {{.*}} 000008 00  WA  0   0  8
+
 ; DISABLED-NOT: FUNC   GLOBAL PROTECTED {{.*}} amdgcn.device.init
 ; DISABLED-NOT: OBJECT GLOBAL DEFAULT {{.*}} amdgcn.device.init.kd
 ; DISABLED-NOT: FUNC   GLOBAL PROTECTED {{.*}} amdgcn.device.fini
 ; DISABLED-NOT: OBJECT   GLOBAL DEFAULT {{.*}} amdgcn.device.fini.kd
+
 ; METADATA:  amdhsa.kernels:
 ; METADATA:    .kind:           init
 ; METADATA:    .max_flat_workgroup_size: 1
@@ -53,13 +52,6 @@ define internal void @bar() {
 ; CHECK: @[[__FINI_ARRAY_END:[a-zA-Z0-9_$"\\.-]+]] = external addrspace(1) constant [0 x ptr addrspace(1)]
 ; CHECK: @[[LLVM_USED:[a-zA-Z0-9_$"\\.-]+]] = appending addrspace(1) global [2 x ptr] [ptr @amdgcn.device.init, ptr @amdgcn.device.fini], section "llvm.metadata"
 ;.
-; CHECK-LABEL: define internal void @foo() {
-; CHECK-NEXT:    ret void
-;
-;
-; CHECK-LABEL: define internal void @bar() {
-; CHECK-NEXT:    ret void
-;
 ;
 ; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.init(
 ; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
@@ -79,13 +71,13 @@ define internal void @bar() {
 ; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.fini(
 ; CHECK-SAME: ) #[[ATTR1:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br i1 icmp ne (ptr addrspace(1) @__fini_array_start, ptr addrspace(1) @__fini_array_end), label [[WHILE_ENTRY:%.*]], label [[WHILE_END:%.*]]
+; CHECK-NEXT:    br i1 icmp uge (ptr addrspace(1) getelementptr inbounds ([0 x ptr], ptr addrspace(1) @__fini_array_start, i64 0, i64 sub (i64 ashr (i64 sub (i64 ptrtoint (ptr addrspace(1) @__fini_array_end to i64), i64 ptrtoint (ptr addrspace(1) @__fini_array_start to i64)), i64 3), i64 1)), ptr addrspace(1) @__fini_array_start), label [[WHILE_ENTRY:%.*]], label [[WHILE_END:%.*]]
 ; CHECK:       while.entry:
-; CHECK-NEXT:    [[PTR:%.*]] = phi ptr addrspace(1) [ @__fini_array_start, [[ENTRY:%.*]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
+; CHECK-NEXT:    [[PTR:%.*]] = phi ptr addrspace(1) [ getelementptr inbounds ([0 x ptr], ptr addrspace(1) @__fini_array_start, i64 0, i64 sub (i64 ashr (i64 sub (i64 ptrtoint (ptr addrspace(1) @__fini_array_end to i64), i64 ptrtoint (ptr addrspace(1) @__fini_array_start to i64)), i64 3), i64 1)), [[ENTRY:%.*]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
 ; CHECK-NEXT:    [[CALLBACK:%.*]] = load ptr, ptr addrspace(1) [[PTR]], align 8
 ; CHECK-NEXT:    call void [[CALLBACK]]()
-; CHECK-NEXT:    [[NEXT]] = getelementptr ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 1
-; CHECK-NEXT:    [[END:%.*]] = icmp eq ptr addrspace(1) [[NEXT]], @__fini_array_end
+; CHECK-NEXT:    [[NEXT]] = getelementptr ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 -1
+; CHECK-NEXT:    [[END:%.*]] = icmp ult ptr addrspace(1) [[NEXT]], @__fini_array_start
 ; CHECK-NEXT:    br i1 [[END]], label [[WHILE_END]], label [[WHILE_ENTRY]]
 ; CHECK:       while.end:
 ; CHECK-NEXT:    ret void
@@ -93,4 +85,3 @@ define internal void @bar() {
 ;.
 ; CHECK: attributes #[[ATTR0]] = { "amdgpu-flat-work-group-size"="1,1" "device-init" }
 ; CHECK: attributes #[[ATTR1]] = { "amdgpu-flat-work-group-size"="1,1" "device-fini" }
-;.
diff --git a/llvm/test/CodeGen/AMDGPU/lower-multiple-ctor-dtor.ll b/llvm/test/CodeGen/AMDGPU/lower-multiple-ctor-dtor.ll
index 83bb61d1a632351..75445b99719281c 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-multiple-ctor-dtor.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-multiple-ctor-dtor.ll
@@ -3,10 +3,10 @@
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj -o - < %s | llvm-readelf -s - 2>&1 | FileCheck %s -check-prefix=CHECK-VIS
 
 
-; UTC_ARGS: --disable
 @llvm.global_ctors = appending addrspace(1) global [2 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @foo, ptr null }, { i32, ptr, ptr } { i32 1, ptr @foo.5, ptr null }]
 @llvm.global_dtors = appending addrspace(1) global [2 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @bar, ptr null }, { i32, ptr, ptr } { i32 1, ptr @bar.5, ptr null }]
 
+; UTC_ARGS: --disable
 ; CHECK: @__init_array_start = external addrspace(1) constant [0 x ptr addrspace(1)]
 ; CHECK: @__init_array_end = external addrspace(1) constant [0 x ptr addrspace(1)]
 ; CHECK: @__fini_array_start = external addrspace(1) constant [0 x ptr addrspace(1)]
@@ -36,22 +36,6 @@ define internal void @bar.5() {
   ret void
 }
 
-; CHECK-LABEL: define internal void @foo() {
-; CHECK-NEXT:    ret void
-;
-;
-; CHECK-LABEL: define internal void @bar() {
-; CHECK-NEXT:    ret void
-;
-;
-; CHECK-LABEL: define internal void @foo.5() {
-; CHECK-NEXT:    ret void
-;
-;
-; CHECK-LABEL: define internal void @bar.5() {
-; CHECK-NEXT:    ret void
-;
-;
 ; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.init(
 ; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  entry:
@@ -70,14 +54,13 @@ define internal void @bar.5() {
 ; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.fini(
 ; CHECK-SAME: ) #[[ATTR1:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br i1 icmp ne (ptr addrspace(1) @__fini_array_start, ptr addrspace(1) @__fini_array_end), label [[WHILE_ENTRY:%.*]], label [[WHILE_END:%.*]]
+; CHECK-NEXT:    br i1 icmp uge (ptr addrspace(1) getelementptr inbounds ([0 x ptr], ptr addrspace(1) @__fini_array_start, i64 0, i64 sub (i64 ashr (i64 sub (i64 ptrtoint (ptr addrspace(1) @__fini_array_end to i64), i64 ptrtoint (ptr addrspace(1) @__fini_array_start to i64)), i64 3), i64 1)), ptr addrspace(1) @__fini_array_start), label [[WHILE_ENTRY:%.*]], label [[WHILE_END:%.*]]
 ; CHECK:       while.entry:
-; CHECK-NEXT:    [[PTR:%.*]] = phi ptr addrspace(1) [ @__fini_array_start, [[ENTRY:%.*]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
+; CHECK-NEXT:    [[PTR:%.*]] = phi ptr addrspace(1) [ getelementptr inbounds ([0 x ptr], ptr addrspace(1) @__fini_array_start, i64 0, i64 sub (i64 ashr (i64 sub (i64 ptrtoint (ptr addrspace(1) @__fini_array_end to i64), i64 ptrtoint (ptr addrspace(1) @__fini_array_start to i64)), i64 3), i64 1)), [[ENTRY:%.*]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
 ; CHECK-NEXT:    [[CALLBACK:%.*]] = load ptr, ptr addrspace(1) [[PTR]], align 8
 ; CHECK-NEXT:    call void [[CALLBACK]]()
-; CHECK-NEXT:    [[NEXT]] = getelementptr ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 1
-; CHECK-NEXT:    [[END:%.*]] = icmp eq ptr addrspace(1) [[NEXT]], @__fini_array_end
+; CHECK-NEXT:    [[NEXT]] = getelementptr ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 -1
+; CHECK-NEXT:    [[END:%.*]] = icmp ult ptr addrspace(1) [[NEXT]], @__fini_array_start
 ; CHECK-NEXT:    br i1 [[END]], label [[WHILE_END]], label [[WHILE_ENTRY]]
 ; CHECK:       while.end:
 ; CHECK-NEXT:    ret void
-;

jhuber6 · 2023-11-09T15:16:48Z

This was tested using the support in #71739

JonChesterfield · 2023-11-09T18:16:08Z

The requirement is for destructors to be called in reverse order to constructors in pairwise fashion. The only way we have to merge these arrays between translation units is concatenation in link object order. The choice we have is whether to iterate the arrays in order or not, and what order to put elements in the array per TU. Assume without loss of generality that constructors will be iterated from 0 to N.

Store constructors and destructors in the same order
Given TUA containing (ctorA0 ctorA1) (dtorA0 dtorA1) // dtors must be walked from N-1 to 0
Given TUB containing (ctorB0 ctorB1) (dtorB0 dtorB1)

Concatenates to (ctorA0 ctorA1 ctorB0 ctorB1} {dtorA0 dtorA1 dtorB0 dtorB1}
Ctor A0 first, B1 last. Dtor A0 last, seems fine.

Store constructors and destructors in opposite order
Given TUA containing (ctorA0 ctorA1) (dtorA1 dtorA0 ) // dtors must be walked from 0 to N
Given TUB containing (ctorB0 ctorB1) (dtorB1 dtorB0)

(ctorA0 ctorA1 ctorB0 ctorB1) (dtorA1 dtorA0 dtorB1 dtorB0)
Ctor A0 first, B1 last. Dtor A1 first, B0 last. Not fine.

Therefore either your commit message is wrong:

The destructors stored in FINI_ARRAY are actually stored in
reverse order

Or the implementation is wrong. I note that the test cases all use a single array and thus would not notice.

Constructor and destructor arrays must be in the same order - object A, then object B - as otherwise concatenation between translation units does not work.

jhuber6 · 2023-11-09T19:33:18Z

Constructor and destructor arrays must be in the same order - object A, then object B - as otherwise concatenation between translation units does not work.

There is no concatenation here. This is handled entirely by the linker and it sorts all objects in this section by priority order. For destructors we want to call things in reverse priority order so we walk it backwards. If you look at implementations of loaders that walk this section on x64 it will walk the array backwards, we are merely encoding this logic in LLVM-IR.

JonChesterfield · 2023-11-09T20:07:09Z

This needs tests with multiple translation units. Multiple IR files with appending linkage would also be fine. It also needs tests that don't force the ordering with priority.

Destructors need to fire in reverse order of constructors. If your comment is accurate to the implementation:

The destructors stored in FINI_ARRAY are actually stored in reverse order

Then that won't happen for all of the above test cases.

jhuber6 · 2023-11-09T21:40:59Z

This needs tests with multiple translation units. Multiple IR files with appending linkage would also be fine. It also needs tests that don't force the ordering with priority.

Destructors need to fire in reverse order of constructors. If your comment is accurate to the implementation:

The tests here are only concerned with the backend. For lowering of these to the init_array and .fini_array sections we have tests in lld. For direct use on the GPU we have libc and the tests added in #71739 which requires this patch to pass.

The destructors stored in FINI_ARRAY are actually stored in reverse order

Then that won't happen for all of the above test cases.

The commit message was not clear enough. The .init_array and .fini_array sections are created by the linker. The section is sorted according to its priority. If two destructors have identical priority, the order they execute in is somewhat arbitrary, but more or less just boils down to the order they were appended in the linker (e.g. if foo.c came before bar.c in the linker). This is separate from the destructor handling in the C++ language which uses __cxa_atexit to ensure that paired constructors have destructors execute in a fixed order once initialized.

The execution of this should be identical to the handling you would find in glibc, musl, or any other implementation that uses the init / fini arrays. The main difference here is that we emit a kernel to do this traversal while those implementations have a function for this that the startup code calls. This implementation is correct and in-line with standard semantics as far as I'm aware.

JonChesterfield · 2023-11-10T14:41:25Z

Constructors without additional constraints execute in somewhat arbitrary order. Destructors, in the C++ sense, do not. Their execution order is the reverse of the arbitrary constructor call order, i.e. exactly fixed.

However that does not detract from this patch looking like the right thing. Thank you for changing the commit message to better correlate with the existing implementation.

…er (#71815)" This reverts commit c1d5865. Introduces a new use of ConstantExpr::getAShr().

llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp

Summary: The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY sections to call all the global constructors in a single kernel. Previously this mistakenly used the same iteration logic for both arrays. The destructors stored in FINI_ARRAY are stored in the same order as the ones in the INIT_ARRAY section so we need to traverse it in reverse order. Relanding after the revert in fe7b5e2 using the IR builder interface instead of ConstantExpr.

…#71815) Summary: The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY sections to call all the global constructors in a single kernel. Previously this mistakenly used the same iteration logic for both arrays. The destructors stored in FINI_ARRAY are stored in the same order as the ones in the INIT_ARRAY section so we need to traverse it in reverse order.

…er (llvm#71815)" This reverts commit c1d5865. Introduces a new use of ConstantExpr::getAShr().

…#71815) Summary: The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY sections to call all the global constructors in a single kernel. Previously this mistakenly used the same iteration logic for both arrays. The destructors stored in FINI_ARRAY are stored in the same order as the ones in the INIT_ARRAY section so we need to traverse it in reverse order. Relanding after the revert in fe7b5e2 using the IR builder interface instead of ConstantExpr.

Local branch amd-gfx f34f6bd Merged main:8474bfdd149b into amd-gfx:24c3950d1abc Remote branch main c1d5865 [AMDGPU] Call the `FINI_ARRAY` destructors in the correct order (llvm#71815)

jhuber6 requested review from arsenm, jayfoad, JonChesterfield and yxsamliu November 9, 2023 15:14

llvmbot added the backend:AMDGPU label Nov 9, 2023

JonChesterfield approved these changes Nov 10, 2023

View reviewed changes

jhuber6 merged commit c1d5865 into llvm:main Nov 10, 2023
4 checks passed

nikic added a commit that referenced this pull request Nov 10, 2023

Revert "[AMDGPU] Call the FINI_ARRAY destructors in the correct ord…

fe7b5e2

…er (#71815)" This reverts commit c1d5865. Introduces a new use of ConstantExpr::getAShr().

nikic reviewed Nov 10, 2023

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp Show resolved Hide resolved

zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023

Revert "[AMDGPU] Call the FINI_ARRAY destructors in the correct ord…

cf63631

…er (llvm#71815)" This reverts commit c1d5865. Introduces a new use of ConstantExpr::getAShr().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order #71815

[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order #71815

jhuber6 commented Nov 9, 2023 •

edited

Loading

llvmbot commented Nov 9, 2023

jhuber6 commented Nov 9, 2023

JonChesterfield commented Nov 9, 2023 •

edited

Loading

jhuber6 commented Nov 9, 2023

JonChesterfield commented Nov 9, 2023

jhuber6 commented Nov 9, 2023

JonChesterfield commented Nov 10, 2023

[AMDGPU] Call the FINI_ARRAY destructors in the correct order #71815

[AMDGPU] Call the FINI_ARRAY destructors in the correct order #71815

Conversation

jhuber6 commented Nov 9, 2023 • edited Loading

llvmbot commented Nov 9, 2023

jhuber6 commented Nov 9, 2023

JonChesterfield commented Nov 9, 2023 • edited Loading

jhuber6 commented Nov 9, 2023

JonChesterfield commented Nov 9, 2023

jhuber6 commented Nov 9, 2023

JonChesterfield commented Nov 10, 2023

[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order #71815

[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order #71815

jhuber6 commented Nov 9, 2023 •

edited

Loading

JonChesterfield commented Nov 9, 2023 •

edited

Loading