Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc] Implement 'atexit' on the GPU correctly #83037

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Feb 26, 2024

Summary:
This function was never marked at supported because it was fundamentally
broken when called with multiple threads. The patch in
#83026 introduces a lock-free
stack that can be used to correctly handle enqueuing callbacks from
multiple threads. Although the previous interface tried to provide a
consistent API, this was not feasible with the needs for a lock-free
stack so I have elected to just use ifdefs. The size is fixed to
whatever we use for testing, which currently amounts to about 8KiB
dedicated for this thing, which isn't enough to be concenred about.

Depends on #83026

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 26, 2024

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-libc

Author: Joseph Huber (jhuber6)

Changes

Summary:
This function was never marked at supported because it was fundamentally
broken when called with multiple threads. The patch in
#83026 introduces a lock-free
stack that can be used to correctly handle enqueuing callbacks from
multiple threads. Although the previous interface tried to provide a
consistent API, this was not feasible with the needs for a lock-free
stack so I have elected to just use ifdefs. The size is fixed to
whatever we use for testing, which currently amounts to about 8KiB
dedicated for this thing, which isn't enough to be concenred about.

Depends on #83026


Full diff: https://github.com/llvm/llvm-project/pull/83037.diff

2 Files Affected:

  • (modified) libc/docs/gpu/support.rst (+1)
  • (modified) libc/src/stdlib/atexit.cpp (+17-7)
diff --git a/libc/docs/gpu/support.rst b/libc/docs/gpu/support.rst
index 250f0a7794de3c..b2231eb69a590a 100644
--- a/libc/docs/gpu/support.rst
+++ b/libc/docs/gpu/support.rst
@@ -102,6 +102,7 @@ atol           |check|
 atoll          |check|
 exit           |check|    |check|
 abort          |check|    |check|
+atexit         |check|
 labs           |check|
 llabs          |check|
 div            |check|
diff --git a/libc/src/stdlib/atexit.cpp b/libc/src/stdlib/atexit.cpp
index 1513b7969f0dbc..5951c5c77e8d1f 100644
--- a/libc/src/stdlib/atexit.cpp
+++ b/libc/src/stdlib/atexit.cpp
@@ -9,6 +9,7 @@
 #include "src/stdlib/atexit.h"
 #include "src/__support/blockstore.h"
 #include "src/__support/common.h"
+#include "src/__support/fixedstack.h"
 #include "src/__support/fixedvector.h"
 #include "src/__support/threads/mutex.h"
 
@@ -16,7 +17,7 @@ namespace LIBC_NAMESPACE {
 
 namespace {
 
-Mutex handler_list_mtx(false, false, false);
+[[maybe_unused]] Mutex handler_list_mtx(false, false, false);
 
 using AtExitCallback = void(void *);
 using StdCAtExitCallback = void(void);
@@ -29,12 +30,10 @@ struct AtExitUnit {
 };
 
 #if defined(LIBC_TARGET_ARCH_IS_GPU)
-// The GPU build cannot handle the potentially recursive definitions required by
-// the BlockStore class. Additionally, the liklihood that someone exceeds this
-// while executing on the GPU is extremely small.
-// FIXME: It is not generally safe to use 'atexit' on the GPU because the
-//        mutexes simply passthrough. We will need a lock free stack.
-using ExitCallbackList = FixedVector<AtExitUnit, 64>;
+// The GPU interface cannot use the standard implementation because it does not
+// support the Mutex type. Instead we use a lock free stack with a sufficiently
+// large size.
+using ExitCallbackList = FixedStack<AtExitUnit, CALLBACK_LIST_SIZE_FOR_TESTS>;
 #elif defined(LIBC_COPT_PUBLIC_PACKAGING)
 using ExitCallbackList = cpp::ReverseOrderBlockStore<AtExitUnit, 32>;
 #else
@@ -60,6 +59,11 @@ void stdc_at_exit_func(void *payload) {
 namespace internal {
 
 void call_exit_callbacks() {
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+  AtExitUnit unit;
+  while (exit_callbacks.pop(unit))
+    unit.callback(unit.payload);
+#else
   handler_list_mtx.lock();
   while (!exit_callbacks.empty()) {
     auto unit = exit_callbacks.back();
@@ -69,14 +73,20 @@ void call_exit_callbacks() {
     handler_list_mtx.lock();
   }
   ExitCallbackList::destroy(&exit_callbacks);
+#endif
 }
 
 } // namespace internal
 
 static int add_atexit_unit(const AtExitUnit &unit) {
+#if defined(LIBC_TARGET_ARCH_IS_GPU)
+  if (!exit_callbacks.push(unit))
+    return -1;
+#else
   MutexLock lock(&handler_list_mtx);
   if (!exit_callbacks.push_back(unit))
     return -1;
+#endif
   return 0;
 }
 

Copy link
Contributor

@lntue lntue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split the GPU's atexit implementation to src/stdlib/gpu/atexit.cpp similar to malloc and free there? So that they won't have maybe-unused non-overlapping dependency.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Feb 26, 2024

Can you split the GPU's atexit implementation to src/stdlib/gpu/atexit.cpp similar to malloc and free there? So that they won't have maybe-unused non-overlapping dependency.

Somewhat of a pain because it required adding some "generic/" directory only for a single file.

@lntue
Copy link
Contributor

lntue commented Feb 26, 2024

Can you split the GPU's atexit implementation to src/stdlib/gpu/atexit.cpp similar to malloc and free there? So that they won't have maybe-unused non-overlapping dependency.

Somewhat of a pain because it required adding some "generic/" directory only for a single file.

You don't need the generic folder for that. I'm thinking more about putting atexit entry point creation under the switch https://github.com/llvm/llvm-project/blob/main/libc/src/stdlib/CMakeLists.txt#L400, where you simply alias it with the .gpu.atexit target when building for GPU.

@@ -9,6 +9,7 @@
#include "src/stdlib/atexit.h"
#include "src/__support/blockstore.h"
#include "src/__support/common.h"
#include "src/__support/fixedstack.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixedstack is not needed here anymore.

Summary:
This function was never marked at supported because it was fundamentally
broken when called with multiple threads. The patch in
llvm#83026 introduces a lock-free
stack that can be used to correctly handle enqueuing callbacks from
multiple threads. Although the previous interface tried to provide a
consistent API, this was not feasible with the needs for a lock-free
stack so I have elected to just use ifdefs. The size is fixed to
whatever we use for testing, which currently amounts to about 8KiB
dedicated for this thing, which isn't enough to be concenred about.

Depends on llvm#83026
Copy link
Contributor

@lntue lntue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this out! The implementation is cleaner and easier to follow now.

Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants