Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an efficient mutex in std::sync #11462

Closed
wants to merge 2 commits into from

Conversation

alexcrichton
Copy link
Member

These commits introduce Mutex and StaticMutex types in libstd as first-class primitives. One of the major goals of this implementation is to be efficient with a threshold of pthread mutexes as the benchmark.

Most of the comments can be found on the mutex module itself and I won't repeat the details here. The good news is that this allows for a different building block of libextra which means that try_send_deferred can now be completely removed!

In profiling, native performance is the same as pthreads performance, and green performance is about 3x faster than pthreads pending fixes to the scheduler that doesn't attempt to fan out all workloads as quickly as possible.


impl StaticMutex {
/// Attempts to grab this lock, see `Mutex::trylock`
pub fn trylock<'a>(&'a self) -> Option<Guard<'a>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: I think it would be more rust-y if this were named try_lock().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I remembered we had things like try_recv and try_send, so I think is certainly most appropriate as try_lock

Much of the justification can be found in the large implementation comment found
in the module itself.
@bill-myers
Copy link
Contributor

Is it really necessary to have the "held" variable?

Isn't it possible to use the pthread mutex itself to figure it if it's held, by unconditionally trylocking after push in lock(), and by unlocking it before (not after like in the current code!) pop() in unlock()?

Note that this probably needs an explicit memory fence in both paths.

The invariant is of course that if the pthread lock is locked, everything on the queue will be woken up eventually.

You would have the issue that the "memory of a mutex must be guaranteed to be valid until all unlocks have returned", but I don't think this is a problem in Rust where you can't just free memory arbitrarily like in C and always have a reference count that protects shared memory.

@alexcrichton
Copy link
Member Author

If the point of this mutex is to be the mutex, then the clause I mentioned is incredibly important in my opinion. That's the exact problem that a mutex is trying to solve (wait for a time at which it's ok to do something). Most rust code wouldn't be susceptible to that condition, but any code that is susceptible is really going to need it (and otherwise won't have a solution in libstd.

You also can't pop outside of the lock because it's a single-consumer queue.

I would love to get rid of the held flag, but I thought about this for two days and couldn't come up with anything better.

@bill-myers
Copy link
Contributor

The problem that the mutex is trying to solve is mainly making sure that the code protected by a single mutex is globally serialized, not that as soon as you unlock it and know that nothing else is going to lock it again you can kill the structure .

Figuring out when you can kill the structure is generally the job of the reference count in the structure containing the mutex; I'm not even sure that all native OS mutexes provide that guarantee (EDIT: POSIX requires that it is provided, Windows seems to say nothing).

Rust code without unsafe code that breaks Rust semantics cannot be affected because to call unlock() you need to pass an &mut to the mutex, and if that &mut is reborrowed, then the original &mut it is reborrowed from will continue to be semantically valid just after the unlock() call, and thus the memory must continue to be valid after the unlock(), making it irrelevant whether unlock() touches the memory or not after unlocking.

You also can't pop outside of the lock because it's a single-consumer queue.

That's a serious issue though.

A possible solution is to wrap the q.pop() in an additional pthread mutex used only for that, and also keep an atomic int with the queue size, which would allow to skip taking the lock and popping when it is zero.

@bill-myers
Copy link
Contributor

BTW, it's possible to create a mutex providing that guarantee using a mutex that doesn't, by boxing the mutex that doesn't in an UnsafeArc, and keeping a clone of the UnsafeArc in the lock guard for the mutex wrapper.

But as already argued no proper Rust code can need that guarantee, and anyway the sensible way of doing things is to put the mutex and the data it protects inside an Arc or RWARC, where the Arc or RWARC reference count determines when to destroy the object independently of the mutex.

So I think that guarantee is unimportant for Rust and should only be provided if it can be provided for free, which doesn't seem to be the case.

This removes all usage of `try_send_deferred` and all related functionality.
Primarily, this builds all extra::sync primitives on top of `sync::Mutex`
instead of `unstable::sync::Mutex`.
@bill-myers
Copy link
Contributor

I think I have found a lockless algorithm that works.

Note that it requires support for 64-bit atomics even on 32-bit platforms, and either limits tasks to 2^32 or requires 128-bit atomics on 64-bit platforms.

However, while libstd doesn't currently implement those, 64-bit atomics are available on all modern 32-bit x86 and ARM systems (x86 >= Pentium Pro, ARM >= ARMv6).

The idea is that we hold the state in an atomic (u32, u32) called (queue_size, lockers).

Lockers is the number of threads that are about to call lock() on the internal lock or that hold the internal lock; queue_size is the number of queued tasks.

Basically the "held" variable is replaced by lockers != 0, and having queue_size in the same atomic value allows to do the held check and enqueuing atomically.

The "atomically" blocks are to be implemented with compare-and-swap, the queue using the intrusive MPSC queue already written.

// try_lock() {
//     if atomically {queue_size == 0 && lockers == 0} && lock.try_lock() {
//         if atomically {let b = queue_size == 0; if b {++lockers; true} else {false}} {
//             ok
//         } else {
//             lock.unlock()
//             fail
//         }
//     } else {
//         fail
//     }
// }
// 
// lock() {
//     if try_lock() {
//         return guard;
//     }
//
//     if can_block && atomically {let b = queue_size == 0; if b {++lockers; true} else {false}} {
//         lock.lock();
//     } else {
//         q.push();
//         if atomically {++queue_size; if(lockers == 0) {++lockers; true} else {false}} {
//             // this never blocks indefinitely
//             // this is because lockers was 0, so we have no one having or trying to get the lock
//             // and we atomically set queue_size to a positive value, so no one will start blocking
//             lock.lock();
//             atomically {--queue_size}
//             t = q.pop();
//             if t != ourselves {
//                 t.wakeup();
//                 go to sleep
//             }
//         } else {
//             go to sleep
//         }
//     }
// }
// 
// unlock() {
//     if atomically {if(queue_size != 0 && lockers == 1) {--queue_size; true} else {--lockers; false}} {
//         t = q.pop();
//         t.wakeup();
//     } else {
//         lock.unlock()
//     }
// }

Transitions:

(0, lockers) => (0, lockers + 1) // task locks with nothing queued
(queue_size, lockers > 1) => (queue_size, lockers - 1) // task unlocks with native task waiting
(0, 1) => (0, 0) // task unlocks with nothing else waiting

(queue_size, lockers) => (queue_size + 1, lockers) // task queues up
(queue_size > 0, 1) => (queue_size - 1, lockers) // task unlocks, waking up queued task

EDIT: changed to now also be globally fair

The idea is that if lockers > 1, we don't pop the queue, but rather just unlock, which means that queued native tasks get run.

New native tasks instead queue up due to queue_size != 0

EDIT 2: added try_lock, fixed lock to use it and not have a race that caused blocking on green tasks

EDIT 3: and if we replace the MPSC queue implementation with one using list reversal, we can remove the last yields

EDIT 4: implemented this in pull #11520, suggesting to consider that pull requests as an improvement over the code in this one (assuming the algorithm indeed works)

@alexcrichton
Copy link
Member Author

Closing, adding another crucial Thread::yield_now() makes me cringe, and #11520 looks like a much better alternative.

@alexcrichton alexcrichton deleted the mutex branch January 23, 2014 18:22
flip1995 pushed a commit to flip1995/rust that referenced this pull request Sep 7, 2023
…rve-literals, r=blyxyas

Preserve literals and range kinds in `manual_range_patterns`

Fixes rust-lang#11461

Also enables linting when there are 3 or fewer alternatives if one of them is already a range pattern

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants