-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Eliminate priority inversions in the metadata completion runtime #40542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci Please test |
Build failed |
Build failed |
34b6548
to
cade366
Compare
@swift-ci Please test |
Build failed |
cade366
to
2c265e4
Compare
@swift-ci Please test |
Build failed |
Build failed |
This looks fantastic, thanks! Now to see if I can get my brain to read through it properly.... |
- take critical sections by reference - forward critical sections into calls (except when called multiple times) - add withLock methods where they were missing
2c265e4
to
662716a
Compare
@swift-ci Please test |
Build failed |
Build failed |
@swift-ci Please test macOS |
@swift-ci Please test Linux |
Build failed |
The current system is based on MetadataCompletionQueueEntry objects which are allocated and then enqueued on dependencies. Blocking is achieved using a condition variable associated with the lock on the appropriate metadata cache. Condition variables are inherently susceptible to priority inversions because the waiting threads have no dynamic knowledge of which thread will notify the condition. In the current system, threads that unblock dependencies synchronously advance their dependent metadata completions, which means the signaling thread is unreliable even if we could represent it in condition variables. As a result, the current system is wholly unsuited for eliminating these priority inversions. An AtomicWaitQueue is an object containing a lock. The queue is eagerly allocated, and the lock is held, whenever a thread is doing work that other threads might wish to block on. In the metadata completion system, this means whenever we construct a metadata cache entry and the metadata isn't already allocated and transitively complete after said construction. Blocking is done by safely acquiring a shared reference to the queue object (which, in the current implementation, requires briefly taking a lock that's global to the surrounding metadata cache) and then acquiring the contained lock. For typical lock implementations, this avoids priority inversions by temporarily propagating the priority of waiting threads to the locking threads. Dependencies are unblocked by simply releasing the lock held in the queue. The unblocking thread doesn't know exactly what metadata are blocked on it and doesn't make any effort to directly advance their completion; instead, the blocking thread will wake up and then attempt to advance the dependent metadata completion itself, eliminating a source of priority overhang that affected the old system. Successive rounds of unblocking (e.g. when a metadata makes partial progress but isn't yet complete) can be achieved by creating a new queue and unlocking the old one. We can still record dependencies and use them to dynamically diagnose metadata cycles. The new system allocates more eagerly than the old one. Formerly, metadata completions which were never blocked never needed to allocate a MetadataCompletionQueueEntry; we were then unable to actually deallocate those entries once they were allocated. The new system will allocate a queue for most metadata completions, although, on the positive side, we can reliably deallocate these queues. Cache entries are also now slightly smaller because some of the excess storage for status has been folded into the queue. The fast path of an actual read of the metadata remains a simple load-acquire. Slow paths may require a bit more locking. On Darwin, the metadata cache lock can now use os_unfair_lock instead of pthread_mutex_t (which is a massive improvement) because it does not need to support associated condition variables. The excess locking could be eliminated with some sort of generational scheme. Sadly, those are not portable, and I didn't want to take it on up-front. rdar://76127798
662716a
to
749ed09
Compare
@swift-ci Please test |
Build failed |
@swift-ci Please test macOS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read through it all, may have understood some of it. The fact that waiters may be waiting for different states really complicates things, but this seems like a solid approach (and implementation) for solving that. Thanks for the extensive comments explaining various parts, and for putting in the work to fix this problem.
The current system is based on
MetadataCompletionQueueEntry
objects which are allocated and then enqueued on dependencies. Blocking is achieved using a condition variable associated with the lock on the appropriate metadata cache. Condition variables are inherently susceptible to priority inversions because the waiting threads have no dynamic knowledge of which thread will notify the condition. In the current system, threads that unblock dependencies synchronously advance their dependent metadata completions, which means the signaling thread is unreliable even if we could represent it in condition variables. As a result, the current system is wholly unsuited for eliminating these priority inversions.The new system uses
AtomicWaitQueue
, basically an object containing a lock. The queue is eagerly allocated, and the lock is held, whenever a thread is doing work that other threads might wish to block on. In the metadata completion system, this means whenever we construct a metadata cache entry and the metadata isn't already allocated and transitively complete after said construction. Blocking is done by safely acquiring a shared reference to the queue object (which, in the current implementation, requires briefly taking a lock that's global to the surrounding metadata cache) and then acquiring the contained lock. For typical lock implementations, this will avoid priority inversions by ensuring that the priority of waiting threads is temporarily propagated to the thread responsible for advancing the metadata's completion.Dependencies are unblocked by simply releasing the lock contained in the queue. The unblocking thread doesn't know exactly what metadata are blocked on it and doesn't make any effort to directly advance their completion; instead, the blocking thread will wake up and then attempt to advance the dependent metadata completion itself, eliminating a source of priority overhang that affected the old system. Successive rounds of unblocking (e.g. when a metadata makes partial progress but isn't yet complete) can be achieved by creating a new queue and unlocking the old one. We can still record dependencies and use them to dynamically diagnose metadata cycles.
The new system allocates more eagerly than the old one. Formerly, metadata completions which were never blocked never needed to allocate a
MetadataCompletionQueueEntry
. (As a downside, we were unable to actually deallocate those entries if they did need to be allocated.) The new system will allocate a queue for most metadata completions. (As an upside, they can be reliably deallocated.) Cache entries are also now slightly smaller because some of the excess storage for status has been folded into the queue.The fast path of an actual read of the metadata remains a simple load-acquire. Slow paths may require a bit more locking. On Darwin, the metadata cache lock can now use
os_unfair_lock
instead ofpthread_mutex_t
(which is a massive improvement) because it does not need to support associated condition variables.In the future, the excess locking could be eliminated with some sort of generational scheme. Sadly, those are not portable, and I didn't want to take it on up-front.
rdar://76127798