-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Proposal: shrink types in sync package
Current API
The types in the sync package make use of semaphore operations provided by the runtime package. Specifically, they use the following two APIs
semacquire(s *uint32): waits until *s > 0 and atomically decrements it.
semrelease(s *uint32): atomically increments *s and notifies any goroutine blocked in a semacquire (if any).
Each semaphore logically manages a queue of goroutines. The sync types store their atomically-modified state separate from their semaphores; for this reason, semaphores have to support out-of-order operations (e.g. when unlocking a sync.Mutex, you could have a semrelease show up to the semaphore 'before' a corresponding semacquire). The separation into atomic state and semaphores also results in the sync types having a larger memory footprint.
Proposed new API
Let's consider a slightly different API for queueing and dequeueing goroutines.
semqueue(s *uint32, mask uint32, cmp uint32, bit uint8) bool: checks (*s & mask) == cmp; if false, will immediately return false (indicating 'barging'). Otherwise, puts the calling goroutine to sleep; if this goroutine is the first to sleep for the given s and bit, will also perform *s = *s | (1 << bit).
semdequeue(s *uint32, bit uint8, dequeueAll bool) int: wakes up 1 goroutine sleeping on a corresponding semqueue call with a matching s and bit (or all, if dequeueAll is true). If there are no more goroutines sleeping, then will unset the corresponding bit in *s: *s = *s & ^(1 << bit). Returns the number of woken goroutines.
Notes about this API:
The names listed above are placeholders (other suggestions welcome).
All of the operations for the given function are atomic (by making use of a lock keyed on s).
Also, the real signatures will likely need to be more complicated to support mutex profiling, and FIFO vs LIFO queueing.
Benefits
Given this kind of API, I believe we could shrink the types in the sync package. This is because this new API only ever modifies a single bit in an atomic word, so all the other bits are available to store atomic state. We can manage multiple logical queues using the same 4-byte atomic word; this will allow it to handle types like sync.RWMutex that have a queue for readers and a queue for writers. Here are the possible savings for reimplementing various types in the sync package:
| Type | Current Size (bytes) | New Size (bytes) |
|---|---|---|
sync.Once |
12 | 4 |
sync.Mutex |
8 | 4 |
sync.RWMutex |
24 | 4 |
These are the ones I'm fairly confident about. I also think we can shrink sync.Waitgroup, and sync.Cond (the former from 12 bytes to 4 bytes, and I think we can shave off 28 bytes from the latter), but I'm less sure of these two as I'm unfamiliar with their implementations.
Also, while this isn't the goal, I think this might also improve the performance of (*sync.RWMutex).Unlock. It currently repeatedly calls semrelease in a loop to wake readers, acquiring and releasing locks for each iteration. The API above offers batch dequeue functionality, which will allow us to avoid a number of atomic operations.
Backwards Compatibility
This is only modifying internal details of the Go runtime, so from that perspective it should be backwards compatible. There are two other considerations here:
If any users are carefully sizing their types to avoid false sharing or to align with cache-line boundaries, changing the size of types in sync could be problematic for them.
There are users who are using go:linkname to directly invoke semacquire and semrelease. See gVisor for an example of this. So, even if we convert all uses in the standard library, we will likely want to have these functions stick around for a little while.
Edits
- Reordered types in order of implementation simplicity
- Increased size of
sync.RWMutexfrom 4 to 8 bytes (see comments for explanation) - Decreased size of
sync.RWMutexback down to 4 bytes (see comments for explanation)