Skip to content

[libc][rwlock] fix the race condition in waiter queue#201629

Merged
SchrodingerZhu merged 2 commits into
llvm:mainfrom
SchrodingerZhu:libc/fix-rwlock
Jun 4, 2026
Merged

[libc][rwlock] fix the race condition in waiter queue#201629
SchrodingerZhu merged 2 commits into
llvm:mainfrom
SchrodingerZhu:libc/fix-rwlock

Conversation

@SchrodingerZhu
Copy link
Copy Markdown
Contributor

@SchrodingerZhu SchrodingerZhu commented Jun 4, 2026

Fix #201615.

Fix the issue that non atomic operations race in waiting queue, which causes missed futex wakeup signals.

Confirmed by TSAN:

==================
WARNING: ThreadSanitizer: data race (pid=388518)
  Write of size 4 at 0x7ffd21cf98e4 by thread T23:
    #0 __llvm_libc_23_0_0_git::RawRwLock::notify_pending_threads() ./libc/src/__support/threads/raw_rwlock.h:443:44
    #1 __llvm_libc_23_0_0_git::RawRwLock::unlock() ./libc/src/__support/threads/raw_rwlock.h:520:5
    #2 randomized_thread_operation(SharedData*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:104:18
    #3 thread_runner(void*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:148:5

  Previous atomic read of size 4 at 0x7ffd21cf98e4 by thread T4:
    #0 __llvm_libc_23_0_0_git::cpp::Atomic<unsigned int>::load(...) ./libc/src/__support/CPP/atomic.h:115:5
    #1 __llvm_libc_23_0_0_git::Futex::wait(...) ./libc/src/__support/threads/linux/futex_utils.h:43:17
    #2 __llvm_libc_23_0_0_git::cpp::expected<int, int> __llvm_libc_23_0_0_git::rwlock::WaitingQueue::wait<Role::Reader>(...) ./libc/src/__support/threads/raw_rwlock.h:101:35
    #3 __llvm_libc_23_0_0_git::rwlock::LockResult __llvm_libc_23_0_0_git::RawRwLock::lock_slow<Role::Reader>(...) ./libc/src/__support/threads/raw_rwlock.h:402:34
    #4 __llvm_libc_23_0_0_git::RawRwLock::read_lock(...) ./libc/src/__support/threads/raw_rwlock.h:485:12
    #5 randomized_thread_operation(SharedData*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:79:16
    #6 thread_runner(void*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:148:5

  Thread T23 (tid=388553, running) created by main thread at:
    #0 pthread_create ...
    #1 main ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:166:5

  Thread T4 (tid=388533, running) created by main thread at:
    #0 pthread_create ...
    #1 main ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:166:5

SUMMARY: ThreadSanitizer: data race ./libc/src/__support/threads/raw_rwlock.h:443:44 in __llvm_libc_23_0_0_git::RawRwLock::notify_pending_threads()
==================

AI wrote the detection script. Manually fixed.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

✅ With the latest revision this PR passed the C/C++ code formatter.

@SchrodingerZhu SchrodingerZhu marked this pull request as ready for review June 4, 2026 16:45
@SchrodingerZhu SchrodingerZhu requested a review from a team as a code owner June 4, 2026 16:45
@llvmorg-github-actions
Copy link
Copy Markdown

@llvm/pr-subscribers-libc

Author: Schrodinger ZHU Yifan (SchrodingerZhu)

Changes

Fix #201615.

Fix the issue that non atomic operations races in waiting queue, which causes missed futex wakeup signals.

Confirmed by TSAN:

==================
WARNING: ThreadSanitizer: data race (pid=388518)
  Write of size 4 at 0x7ffd21cf98e4 by thread T23:
    #<!-- -->0 __llvm_libc_23_0_0_git::RawRwLock::notify_pending_threads() ./libc/src/__support/threads/raw_rwlock.h:443:44
    #<!-- -->1 __llvm_libc_23_0_0_git::RawRwLock::unlock() ./libc/src/__support/threads/raw_rwlock.h:520:5
    #<!-- -->2 randomized_thread_operation(SharedData*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:104:18
    #<!-- -->3 thread_runner(void*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:148:5

  Previous atomic read of size 4 at 0x7ffd21cf98e4 by thread T4:
    #<!-- -->0 __llvm_libc_23_0_0_git::cpp::Atomic&lt;unsigned int&gt;::load(...) ./libc/src/__support/CPP/atomic.h:115:5
    #<!-- -->1 __llvm_libc_23_0_0_git::Futex::wait(...) ./libc/src/__support/threads/linux/futex_utils.h:43:17
    #<!-- -->2 __llvm_libc_23_0_0_git::cpp::expected&lt;int, int&gt; __llvm_libc_23_0_0_git::rwlock::WaitingQueue::wait&lt;Role::Reader&gt;(...) ./libc/src/__support/threads/raw_rwlock.h:101:35
    #<!-- -->3 __llvm_libc_23_0_0_git::rwlock::LockResult __llvm_libc_23_0_0_git::RawRwLock::lock_slow&lt;Role::Reader&gt;(...) ./libc/src/__support/threads/raw_rwlock.h:402:34
    #<!-- -->4 __llvm_libc_23_0_0_git::RawRwLock::read_lock(...) ./libc/src/__support/threads/raw_rwlock.h:485:12
    #<!-- -->5 randomized_thread_operation(SharedData*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:79:16
    #<!-- -->6 thread_runner(void*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:148:5

  Thread T23 (tid=388553, running) created by main thread at:
    #<!-- -->0 pthread_create ...
    #<!-- -->1 main ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:166:5

  Thread T4 (tid=388533, running) created by main thread at:
    #<!-- -->0 pthread_create ...
    #<!-- -->1 main ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:166:5

SUMMARY: ThreadSanitizer: data race ./libc/src/__support/threads/raw_rwlock.h:443:44 in __llvm_libc_23_0_0_git::RawRwLock::notify_pending_threads()
==================

AI wrote the detection script. Manually fixed.


Full diff: https://github.com/llvm/llvm-project/pull/201629.diff

1 Files Affected:

  • (modified) libc/src/__support/threads/raw_rwlock.h (+10-7)
diff --git a/libc/src/__support/threads/raw_rwlock.h b/libc/src/__support/threads/raw_rwlock.h
index 9bc1b88cabd69..1b0bba9c72675 100644
--- a/libc/src/__support/threads/raw_rwlock.h
+++ b/libc/src/__support/threads/raw_rwlock.h
@@ -75,11 +75,11 @@ class WaitingQueue final : private RawMutex {
       else
         return queue.pending_writers;
     }
-    template <Role role> LIBC_INLINE FutexWordType &serialization() {
+    template <Role role> LIBC_INLINE Futex &serialization() {
       if constexpr (role == Role::Reader)
-        return queue.reader_serialization.val;
+        return queue.reader_serialization;
       else
-        return queue.writer_serialization.val;
+        return queue.writer_serialization;
     }
     friend WaitingQueue;
   };
@@ -391,8 +391,9 @@ class RawRwLock {
         // sleep on the futex, we can avoid such waiting.
         old = RwState::fetch_set_pending_bit<role>(state,
                                                    cpp::MemoryOrder::RELAXED);
-        // no need to use atomic since it is already protected by the mutex.
-        serial_number = guard.serialization<role>();
+        // relaxed atomic since it is already protected by the mutex.
+        serial_number =
+            guard.serialization<role>().load(cpp::MemoryOrder::RELAXED);
       }
 
       // Phase 6: do futex wait until the lock is available or timeout is
@@ -437,10 +438,12 @@ class RawRwLock {
     {
       WaitingQueue::Guard guard = queue.acquire(is_pshared);
       if (guard.pending_count<Role::Writer>() != 0) {
-        guard.serialization<Role::Writer>()++;
+        guard.serialization<Role::Writer>().fetch_add(
+            1, cpp::MemoryOrder::RELEASE);
         status = WakeTarget::Writers;
       } else if (guard.pending_count<Role::Reader>() != 0) {
-        guard.serialization<Role::Reader>()++;
+        guard.serialization<Role::Reader>().fetch_add(
+            1, cpp::MemoryOrder::RELEASE);
         status = WakeTarget::Readers;
       } else {
         status = WakeTarget::None;

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the libc RawRwLock waiter-queue serialization handling to eliminate data races on the futex “serialization” words, addressing flakiness/hangs caused by missed futex wakeups (Issue #201615). The change aligns the serialization counters with the Futex atomic interface so waiters and wakers always interact with the futex word atomically.

Changes:

  • Switch WaitingQueue::Guard::serialization() to return a Futex& instead of a plain FutexWordType&.
  • Read the serialization value with an atomic load(RELAXED) when registering as pending.
  • Increment the serialization value with an atomic fetch_add when deciding which waiters to wake.

Comment thread libc/src/__support/threads/raw_rwlock.h
Comment thread libc/src/__support/threads/raw_rwlock.h
@SchrodingerZhu SchrodingerZhu merged commit 70f7167 into llvm:main Jun 4, 2026
42 checks passed
@SchrodingerZhu SchrodingerZhu deleted the libc/fix-rwlock branch June 4, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[libc] hanging rwlock integration tests due to race conditions

3 participants