[libc] Implement 'atexit' on the GPU correctly #83037

jhuber6 · 2024-02-26T17:19:10Z

Summary:
This function was never marked at supported because it was fundamentally
broken when called with multiple threads. The patch in
#83026 introduces a lock-free
stack that can be used to correctly handle enqueuing callbacks from
multiple threads. Although the previous interface tried to provide a
consistent API, this was not feasible with the needs for a lock-free
stack so I have elected to just use ifdefs. The size is fixed to
whatever we use for testing, which currently amounts to about 8KiB
dedicated for this thing, which isn't enough to be concenred about.

Depends on #83026

llvmbot · 2024-02-26T17:19:45Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-libc

Author: Joseph Huber (jhuber6)

Changes

Summary:
This function was never marked at supported because it was fundamentally
broken when called with multiple threads. The patch in
#83026 introduces a lock-free
stack that can be used to correctly handle enqueuing callbacks from
multiple threads. Although the previous interface tried to provide a
consistent API, this was not feasible with the needs for a lock-free
stack so I have elected to just use ifdefs. The size is fixed to
whatever we use for testing, which currently amounts to about 8KiB
dedicated for this thing, which isn't enough to be concenred about.

Depends on #83026

Full diff: https://github.com/llvm/llvm-project/pull/83037.diff

2 Files Affected:

(modified) libc/docs/gpu/support.rst (+1)
(modified) libc/src/stdlib/atexit.cpp (+17-7)

diff --git a/libc/docs/gpu/support.rst b/libc/docs/gpu/support.rst
index 250f0a7794de3c..b2231eb69a590a 100644
--- a/libc/docs/gpu/support.rst
+++ b/libc/docs/gpu/support.rst
@@ -102,6 +102,7 @@ atol           |check|
 atoll          |check|
 exit           |check|    |check|
 abort          |check|    |check|
+atexit         |check|
 labs           |check|
 llabs          |check|
 div            |check|
diff --git a/libc/src/stdlib/atexit.cpp b/libc/src/stdlib/atexit.cpp
index 1513b7969f0dbc..5951c5c77e8d1f 100644
--- a/libc/src/stdlib/atexit.cpp
+++ b/libc/src/stdlib/atexit.cpp
@@ -9,6 +9,7 @@
 #include "src/stdlib/atexit.h"
 #include "src/__support/blockstore.h"
 #include "src/__support/common.h"
+#include "src/__support/fixedstack.h"
 #include "src/__support/fixedvector.h"
 #include "src/__support/threads/mutex.h"
 
@@ -16,7 +17,7 @@ namespace LIBC_NAMESPACE {
 
 namespace {
 
-Mutex handler_list_mtx(false, false, false);
+[[maybe_unused]] Mutex handler_list_mtx(false, false, false);
 
 using AtExitCallback = void(void *);
 using StdCAtExitCallback = void(void);
@@ -29,12 +30,10 @@ struct AtExitUnit {
 };
 
 #if defined(LIBC_TARGET_ARCH_IS_GPU)
-// The GPU build cannot handle the potentially recursive definitions required by
-// the BlockStore class. Additionally, the liklihood that someone exceeds this
-// while executing on the GPU is extremely small.
-// FIXME: It is not generally safe to use 'atexit' on the GPU because the
-//        mutexes simply passthrough. We will need a lock free stack.
-using ExitCallbackList = FixedVector<AtExitUnit, 64>;
+// The GPU interface cannot use the standard implementation because it does not
+// support the Mutex type. Instead we use a lock free stack with a sufficiently
+// large size.
+using ExitCallbackList = FixedStack<AtExitUnit, CALLBACK_LIST_SIZE_FOR_TESTS>;
 #elif defined(LIBC_COPT_PUBLIC_PACKAGING)
 using ExitCallbackList = cpp::ReverseOrderBlockStore<AtExitUnit, 32>;
 #else
@@ -60,6 +59,11 @@ void stdc_at_exit_func(void *payload) {
 namespace internal {
 
 void call_exit_callbacks() {
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+  AtExitUnit unit;
+  while (exit_callbacks.pop(unit))
+    unit.callback(unit.payload);
+#else
   handler_list_mtx.lock();
   while (!exit_callbacks.empty()) {
     auto unit = exit_callbacks.back();
@@ -69,14 +73,20 @@ void call_exit_callbacks() {
     handler_list_mtx.lock();
   }
   ExitCallbackList::destroy(&exit_callbacks);
+#endif
 }
 
 } // namespace internal
 
 static int add_atexit_unit(const AtExitUnit &unit) {
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+  if (!exit_callbacks.push(unit))
+    return -1;
+#else
   MutexLock lock(&handler_list_mtx);
   if (!exit_callbacks.push_back(unit))
     return -1;
+#endif
   return 0;
 }

lntue

Can you split the GPU's atexit implementation to src/stdlib/gpu/atexit.cpp similar to malloc and free there? So that they won't have maybe-unused non-overlapping dependency.

jhuber6 · 2024-02-26T19:48:35Z

Can you split the GPU's atexit implementation to src/stdlib/gpu/atexit.cpp similar to malloc and free there? So that they won't have maybe-unused non-overlapping dependency.

Somewhat of a pain because it required adding some "generic/" directory only for a single file.

lntue · 2024-02-26T20:03:51Z

Can you split the GPU's atexit implementation to src/stdlib/gpu/atexit.cpp similar to malloc and free there? So that they won't have maybe-unused non-overlapping dependency.

Somewhat of a pain because it required adding some "generic/" directory only for a single file.

You don't need the generic folder for that. I'm thinking more about putting atexit entry point creation under the switch https://github.com/llvm/llvm-project/blob/main/libc/src/stdlib/CMakeLists.txt#L400, where you simply alias it with the .gpu.atexit target when building for GPU.

lntue · 2024-02-27T01:26:04Z

libc/src/stdlib/atexit.cpp

@@ -9,6 +9,7 @@
 #include "src/stdlib/atexit.h"
 #include "src/__support/blockstore.h"
 #include "src/__support/common.h"
+#include "src/__support/fixedstack.h"


fixedstack is not needed here anymore.

Summary: This function was never marked at supported because it was fundamentally broken when called with multiple threads. The patch in llvm#83026 introduces a lock-free stack that can be used to correctly handle enqueuing callbacks from multiple threads. Although the previous interface tried to provide a consistent API, this was not feasible with the needs for a lock-free stack so I have elected to just use ifdefs. The size is fixed to whatever we use for testing, which currently amounts to about 8KiB dedicated for this thing, which isn't enough to be concenred about. Depends on llvm#83026

lntue

Thanks for splitting this out! The implementation is cleaner and easier to follow now.

michaelrj-google

LGTM

jhuber6 requested review from gchatelet, JonChesterfield, lntue and michaelrj-google February 26, 2024 17:19

llvmbot added the libc label Feb 26, 2024

jhuber6 mentioned this pull request Feb 26, 2024

[libc] Implement simple lock-free stack data structure #83026

Open

lntue reviewed Feb 26, 2024

View reviewed changes

jhuber6 force-pushed the FixAtexit branch from 2abf5a3 to 8cc743b Compare February 27, 2024 00:34

llvmbot added the backend:AMDGPU label Feb 27, 2024

lntue reviewed Feb 27, 2024

View reviewed changes

jhuber6 force-pushed the FixAtexit branch from 8cc743b to d4df77a Compare February 27, 2024 01:27

lntue approved these changes Feb 27, 2024

View reviewed changes

michaelrj-google approved these changes Feb 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libc] Implement 'atexit' on the GPU correctly #83037

[libc] Implement 'atexit' on the GPU correctly #83037

jhuber6 commented Feb 26, 2024

llvmbot commented Feb 26, 2024 •

edited

Loading

lntue left a comment

jhuber6 commented Feb 26, 2024

lntue commented Feb 26, 2024

lntue Feb 27, 2024

lntue left a comment

michaelrj-google left a comment

[libc] Implement 'atexit' on the GPU correctly #83037

Are you sure you want to change the base?

[libc] Implement 'atexit' on the GPU correctly #83037

Conversation

jhuber6 commented Feb 26, 2024

llvmbot commented Feb 26, 2024 • edited Loading

lntue left a comment

Choose a reason for hiding this comment

jhuber6 commented Feb 26, 2024

lntue commented Feb 26, 2024

lntue Feb 27, 2024

Choose a reason for hiding this comment

lntue left a comment

Choose a reason for hiding this comment

michaelrj-google left a comment

Choose a reason for hiding this comment

llvmbot commented Feb 26, 2024 •

edited

Loading