Skip to content

Conversation

rjmccall
Copy link
Contributor

The current system is based on MetadataCompletionQueueEntry objects which are allocated and then enqueued on dependencies. Blocking is achieved using a condition variable associated with the lock on the appropriate metadata cache. Condition variables are inherently susceptible to priority inversions because the waiting threads have no dynamic knowledge of which thread will notify the condition. In the current system, threads that unblock dependencies synchronously advance their dependent metadata completions, which means the signaling thread is unreliable even if we could represent it in condition variables. As a result, the current system is wholly unsuited for eliminating these priority inversions.

The new system uses AtomicWaitQueue, basically an object containing a lock. The queue is eagerly allocated, and the lock is held, whenever a thread is doing work that other threads might wish to block on. In the metadata completion system, this means whenever we construct a metadata cache entry and the metadata isn't already allocated and transitively complete after said construction. Blocking is done by safely acquiring a shared reference to the queue object (which, in the current implementation, requires briefly taking a lock that's global to the surrounding metadata cache) and then acquiring the contained lock. For typical lock implementations, this will avoid priority inversions by ensuring that the priority of waiting threads is temporarily propagated to the thread responsible for advancing the metadata's completion.

Dependencies are unblocked by simply releasing the lock contained in the queue. The unblocking thread doesn't know exactly what metadata are blocked on it and doesn't make any effort to directly advance their completion; instead, the blocking thread will wake up and then attempt to advance the dependent metadata completion itself, eliminating a source of priority overhang that affected the old system. Successive rounds of unblocking (e.g. when a metadata makes partial progress but isn't yet complete) can be achieved by creating a new queue and unlocking the old one. We can still record dependencies and use them to dynamically diagnose metadata cycles.

The new system allocates more eagerly than the old one. Formerly, metadata completions which were never blocked never needed to allocate a MetadataCompletionQueueEntry. (As a downside, we were unable to actually deallocate those entries if they did need to be allocated.) The new system will allocate a queue for most metadata completions. (As an upside, they can be reliably deallocated.) Cache entries are also now slightly smaller because some of the excess storage for status has been folded into the queue.

The fast path of an actual read of the metadata remains a simple load-acquire. Slow paths may require a bit more locking. On Darwin, the metadata cache lock can now use os_unfair_lock instead of pthread_mutex_t (which is a massive improvement) because it does not need to support associated condition variables.

In the future, the excess locking could be eliminated with some sort of generational scheme. Sadly, those are not portable, and I didn't want to take it on up-front.

rdar://76127798

@rjmccall rjmccall requested review from mikeash and rokhinip December 14, 2021 04:55
@rjmccall
Copy link
Contributor Author

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 34b6548c46155f8e971bc788353b9b6a6a8c5753

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 34b6548c46155f8e971bc788353b9b6a6a8c5753

@rjmccall rjmccall force-pushed the atomic-wait-queue-metadata branch from 34b6548 to cade366 Compare December 14, 2021 08:53
@rjmccall
Copy link
Contributor Author

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - cade36601c10bfed04392f146da4deadf2d8eb93

@rjmccall rjmccall force-pushed the atomic-wait-queue-metadata branch from cade366 to 2c265e4 Compare December 14, 2021 09:01
@rjmccall
Copy link
Contributor Author

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 2c265e4923bf272bf8a0122506a95e0166f44e97

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 2c265e4923bf272bf8a0122506a95e0166f44e97

@mikeash
Copy link
Contributor

mikeash commented Dec 14, 2021

This looks fantastic, thanks! Now to see if I can get my brain to read through it properly....

- take critical sections by reference
- forward critical sections into calls (except when called
  multiple times)
- add withLock methods where they were missing
@rjmccall rjmccall force-pushed the atomic-wait-queue-metadata branch from 2c265e4 to 662716a Compare December 14, 2021 20:31
@rjmccall
Copy link
Contributor Author

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 662716a50e68e7796fd2d41e0647d7a198f2d6d8

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 662716a50e68e7796fd2d41e0647d7a198f2d6d8

@rjmccall
Copy link
Contributor Author

@swift-ci Please test macOS

@rjmccall
Copy link
Contributor Author

@swift-ci Please test Linux

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 662716a50e68e7796fd2d41e0647d7a198f2d6d8

The current system is based on MetadataCompletionQueueEntry
objects which are allocated and then enqueued on dependencies.
Blocking is achieved using a condition variable associated
with the lock on the appropriate metadata cache.  Condition
variables are inherently susceptible to priority inversions
because the waiting threads have no dynamic knowledge of
which thread will notify the condition.  In the current system,
threads that unblock dependencies synchronously advance their
dependent metadata completions, which means the signaling
thread is unreliable even if we could represent it in condition
variables.  As a result, the current system is wholly unsuited
for eliminating these priority inversions.

An AtomicWaitQueue is an object containing a lock.  The queue
is eagerly allocated, and the lock is held, whenever a thread
is doing work that other threads might wish to block on.  In
the metadata completion system, this means whenever we construct
a metadata cache entry and the metadata isn't already allocated
and transitively complete after said construction.  Blocking
is done by safely acquiring a shared reference to the queue
object (which, in the current implementation, requires briefly
taking a lock that's global to the surrounding metadata cache)
and then acquiring the contained lock.  For typical lock
implementations, this avoids priority inversions by temporarily
propagating the priority of waiting threads to the locking
threads.

Dependencies are unblocked by simply releasing the lock held
in the queue.  The unblocking thread doesn't know exactly what
metadata are blocked on it and doesn't make any effort to
directly advance their completion; instead, the blocking
thread will wake up and then attempt to advance the dependent
metadata completion itself, eliminating a source of priority
overhang that affected the old system.  Successive rounds of
unblocking (e.g. when a metadata makes partial progress but
isn't yet complete) can be achieved by creating a new queue
and unlocking the old one.  We can still record dependencies
and use them to dynamically diagnose metadata cycles.

The new system allocates more eagerly than the old one.
Formerly, metadata completions which were never blocked never
needed to allocate a MetadataCompletionQueueEntry; we were
then unable to actually deallocate those entries once they
were allocated.  The new system will allocate a queue for
most metadata completions, although, on the positive side,
we can reliably deallocate these queues.  Cache entries are
also now slightly smaller because some of the excess storage
for status has been folded into the queue.

The fast path of an actual read of the metadata remains a
simple load-acquire.  Slow paths may require a bit more
locking.  On Darwin, the metadata cache lock can now use
os_unfair_lock instead of pthread_mutex_t (which is a massive
improvement) because it does not need to support associated
condition variables.

The excess locking could be eliminated with some sort of
generational scheme.  Sadly, those are not portable, and I
didn't want to take it on up-front.

rdar://76127798
@rjmccall rjmccall force-pushed the atomic-wait-queue-metadata branch from 662716a to 749ed09 Compare December 15, 2021 03:19
@rjmccall
Copy link
Contributor Author

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 749ed09

@DougGregor
Copy link
Member

@swift-ci Please test macOS

Copy link
Contributor

@mikeash mikeash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through it all, may have understood some of it. The fact that waiters may be waiting for different states really complicates things, but this seems like a solid approach (and implementation) for solving that. Thanks for the extensive comments explaining various parts, and for putting in the work to fix this problem.

@rjmccall rjmccall merged commit 4969407 into swiftlang:main Dec 16, 2021
@rjmccall rjmccall deleted the atomic-wait-queue-metadata branch December 16, 2021 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants