Mutexes, take 5 #11866

alexcrichton · 2014-01-28T07:06:53Z

Let's try this again.

This is an implementation of mutexes which I believe is free from undefined behavior of OS mutexes (the pitfall of the previous implementation).

This implementation is not ideal. There's a yield-loop spot, and it's not particularly fair with respect to lockers who steal without going through the normal code paths. That being said, I believe that this is a correct implementation which is a stepping stone to move from.

I haven't done rigorous benchmarking of this mutex, but preliminary results show that it's about 25% slower in the uncontended case on linux (same runtime on OSX), and it's actually faster than a pthreads mutex on high contention (again, not rigorous benchmarking, I just saw these numbers come up).

bill-myers · 2014-01-28T09:42:27Z

It seems it's missing logic to alternate between waking up native_blocker and green_blocker.

Also, while it's kind of nice to have the symmetry between native_blocker and green_blocker, this design seems suboptimal compared to replacing code that accesses green_blocker with code that pops from the queue, which would avoid making green tasks deschedule and get rescheduled twice in the native+green task case (first on the queue, then on green_blocker).

This probably requires to merge green_cnt and state, by reserving the two least significant bits for LOCKED and NATIVE_BLOCKED. This would also reduce the size of struct Mutex by two words.

Overall, this implementation is quite nice and might be the optimal approach for performance in both the green-only and native-only cases.

It has, however, the significant caveat that it makes taking advantage of lock elision that should eventually be provided by all OS mutexes completely impossible due to always writing to state (maybe it can still be used by implementing it "by hand").

[I think the only workable alternative so far for automatic lock elision is to use https://github.com//pull/11610#issuecomment-33075571 after adapting to always lock/unlock on each call.]

Also, it doesn't reduce to just using pthread mutexes in the native-only case, which would be a problem for taking advantage of priority inheritance for native tasks where the OS mutex supports it and in general anything that requires the OS to know that a thread owns a mutex; this is fixable if desired by removing TryLockAcquisition by removing the initial try lock code in lock, and implementing try_lock in a similar way to lock (first try_locking the green or native parts, then the whole).

alexcrichton · 2014-01-28T17:48:10Z

It seems it's missing logic to alternate between waking up native_blocker and green_blocker.

We actually get this for free because if a green thread holds the lock, you're guaranteed that the green blocker is empty (unless you've got a steal from a try_lock, as noted by the FIXME in the code).

which would avoid making green tasks deschedule and get rescheduled twice in the native+green task case

It's true, and this is even a problem with native tasks. This unlocks the OS mutex and then flags the outer mutex as available, so it's possible for the child thread to run first, go to sleep on a cvar, and then have to get woken up. I agree this is a little odd, but the focus now is to get something working rather than get the fastest thing working up front.

This probably requires to merge green_cnt and state

I can see how that'd work, nice idea! For now I'd rather focus on clarity, however. I'd like to get this merged in one form or another so we can start using it. I'll add a FIXME though so we don't forget that we can shrink down the size if we need to.

Overall, this implementation is quite nice

Thank you!

It has, however, the significant caveat that it makes taking advantage of lock elision that should eventually be provided by all OS mutexes completely impossible due to always writing to state

While regrettable, I'm feeling at this point we should get something merged before getting the perfect thing merged. I personally don't know enough about lock elision to write a mutex for it, but regardless that probably shouldn't be the focus this time around.

Also, it doesn't reduce to just using pthread mutexes in the native-only case

Hmm, that's not a bad idea, I think I'll play around with that and see how it works out. I was initially worried about the TLS hit, but my profiling found this to not be that expensive.

bill-myers · 2014-01-30T01:14:19Z

We actually get this for free because if a green thread holds the lock, you're guaranteed that the green blocker is empty.

Indeed, I was wrong.

Yes, getting something merged is definitely a priority, since the implementation can be changed in a backwards compatible way, and this one seems to work.

The only significant issue regarding API and backwards compatibility is whether to commit to providing the "immediate destruction after unlock" guarantee to unsafe code, or to explicitly reserve the right to reimplement Mutex in a way that no longer provides it.

Also, what behavior is guaranteed when attempting to recursively lock the Mutex (always fails, always deadlocks, undefined).

I'm thinking lock elision will probably not require a redesign anyway, since it's likely going to be faster to do it ourselves, by simply starting an HTM transaction and checking that state == 0 inside; the runtime probably needs to be aware of it anyway since it seems it should abort any active transaction on green context switches.

thestinger · 2014-01-30T01:16:14Z

@bill-myers: Keep in mind that the real lock elision API has better performance characteristics than RTM, and with RTM you still need a fallback abort path (in all likelihood, to a lock).

This also generalizes all atomic intrinsics over T so we'll be able to add u8 atomics if we really feel the need to (do we really want to?)

With the recently added double word CAS functionality on 32-bit ARM (enabled via a 64-bit atomic instruction in LLVM IR), without some extra features enabled LLVM lowers code to function calls which emulate atomic instructions. With the v7 feature enabled, proper 64-bit CAS instructions are used on 32-bit arm. I've been told that v7 for arm is what we should have been doing anyway. This is overridable by providing some other non-empty feature string.

This allows for easier static initialization of a pthread mutex, although the windows mutexes still sadly suffer. Note that this commit removes the clone() method from a mutex because it no longer makes sense for pthreads mutexes. This also removes the Once type for now, but it'll get added back shortly.

This originally lived in std::unstable::mutex, but now it has a new home (and a more proper one).

Now that extra::sync primitives are built on a proper mutex instead of a pthreads one, there's no longer any use for this function.

Let's try this again. This is an implementation of mutexes which I believe is free from undefined behavior of OS mutexes (the pitfall of the previous implementation). This implementation is not ideal. There's a yield-loop spot, and it's not particularly fair with respect to lockers who steal without going through the normal code paths. That being said, I believe that this is a correct implementation which is a stepping stone to move from. I haven't done rigorous benchmarking of this mutex, but preliminary results show that it's about 25% slower in the uncontended case on linux (same runtime on OSX), and it's actually faster than a pthreads mutex on high contention (again, not rigorous benchmarking, I just saw these numbers come up).

…ap_or_else_none, r=flip1995 Simplify code for `result_map_or_else_none` As mentioned in rust-lang/rust-clippy#11864. r? `@flip1995` changelog: Simplify code for `result_map_or_else_none`

alexcrichton added 9 commits February 3, 2014 12:04

Add an AtomicU64 type to std::sync::atomics

b00147a

This also generalizes all atomic intrinsics over T so we'll be able to add u8 atomics if we really feel the need to (do we really want to?)

extra: Make room for more sync primitives

a417de9

extra: Add an intrusive MPSC to be used soon

24631c8

extra: Re-add the Once primitve to extra::sync

21e8466

This originally lived in std::unstable::mutex, but now it has a new home (and a more proper one).

std: Remove try_send_deferred plus all fallout

b49771e

Now that extra::sync primitives are built on a proper mutex instead of a pthreads one, there's no longer any use for this function.

extra: Introduce a mutex type for native/green threads

984727f

Various bug fixes and rebase conflicts

acacfb2

bors closed this Feb 3, 2014

alexcrichton deleted the atomic-u64 branch February 5, 2014 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutexes, take 5 #11866

Mutexes, take 5 #11866

alexcrichton commented Jan 28, 2014

bill-myers commented Jan 28, 2014

alexcrichton commented Jan 28, 2014

bill-myers commented Jan 30, 2014

thestinger commented Jan 30, 2014

Mutexes, take 5 #11866

Mutexes, take 5 #11866

Conversation

alexcrichton commented Jan 28, 2014

bill-myers commented Jan 28, 2014

alexcrichton commented Jan 28, 2014

bill-myers commented Jan 30, 2014

thestinger commented Jan 30, 2014