Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutex isn't guaranteed to have atomic acquire/release semantics #126239

Open
briansmith opened this issue Jun 10, 2024 · 13 comments
Open

Mutex isn't guaranteed to have atomic acquire/release semantics #126239

briansmith opened this issue Jun 10, 2024 · 13 comments
Labels
A-atomic Area: atomics, barriers, and sync primitives A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-discussion Category: Discussion or questions that doesn't represent real issues. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@briansmith
Copy link
Contributor

Location

https://doc.rust-lang.org/std/sync/struct.Mutex.html, including at least:
https://doc.rust-lang.org/std/sync/struct.Mutex.html#method.lock
https://doc.rust-lang.org/std/sync/struct.Mutex.html#method.try_lock
https://doc.rust-lang.org/std/sync/struct.MutexGuard.html

Summary

See https://marabos.nl/atomics/memory-ordering.html#example-locking, emphasis mine:

Mutexes are the most common use case for release and acquire ordering (see "Locking: Mutexes and RwLocks" in Chapter 1). When locking, they use an atomic operation to check if it was unlocked, using acquire ordering, while also (atomically) changing the state to "locked." When unlocking, they set the state back to "unlocked" using release ordering. This means that there will be a happens-before relationship between unlocking a mutex and subsequently locking it.

Many people are relying on this guarantee, but the documentation for Mutex does not state this guarantee. We should explicitly guarantee this in the documentation for Mutex.

In theory, a mutex could have "consume" semantics instead of acquire/release semantics, such that reads/writes to the "contents" of a mutex (the T in Mutex<T>) and pointers/reference contained therein, are treated as "consume" load/stores (even if Rust's atomics API doesn't support consume ordering). In practice, people have already written much code that relies on the acquire/release semantics.

Or, in theory, a mutex could have SeqCst semantics during lock and unlock, but I think we don't want people to rely on that (though I already have seen code that does, accidentally).

Why does this matter?. Consider:

use std::{
    fs, io,
    os::fd::{IntoRawFd as _, RawFd},
    sync::{
        atomic::{compiler_fence, AtomicI32, Ordering},
        Mutex,
    },
};

fn get_lazy_fd() -> Result<RawFd, io::Error> {
    const FD_UNINIT: i32 = -1; // libstd guarantees a RawFD is never -1.

    // FD isn't stored within the mutex so we can usually use it
    // without locking overhead.
    static FD: AtomicI32 = AtomicI32::new(FD_UNINIT);
    static LOCK: Mutex<()> = Mutex::new(());
    match FD.load(Ordering::Relaxed) {
        fd if fd != FD_UNINIT => Ok(fd), // Common case: Fast, no Acquire at all.
        _ => {
            let _guard = LOCK.lock().unwrap();

            if let Some(fd) = FD.load(Ordering::Relaxed) {
                return fd;
            }

            let fd = fs::File::open("/dev/urandom")?.into_raw_fd();

            FD.store(fd, Ordering::Relaxed);
            
            Ok(fd)
        }
    }
}

We want some official reassurance of the "obvious" fact that the FD.store(fd, Ordering::Relaxed) will be sequenced before the unlocking of the Mutex, i.e. that the mutex is doing an atomic release when unlocking the mutex so that the side effect of storing to FD will be seen by other thread after locking the mutex, which would only be guaranteed if locking the mutex does an atomic aquire that synchronizes with that atomic release.

This would also clarify the more basic fact that in a Mutex<T>, there is nothing special about what's "contained" in the Mutex, as far as synchronization is concerned. (This is unique to rust; C/C++/Posix mutexes don't have the concept of the Mutex containing a value.)

@briansmith briansmith added the A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools label Jun 10, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 10, 2024
@chorman0773
Copy link
Contributor

I don't think we need to explicitly say "It does an atomic operation".

We can just say

The return from a call to Mutex::lock synchronizes-with a preceeding call to MutexGuard::drop that unlocks that Mutex.

@briansmith
Copy link
Contributor Author

We can just say

The return from a call to Mutex::lock synchronizes-with a preceeding call to MutexGuard::drop that unlocks that Mutex.

That sounds good. However, do we have a place that says that synchronizes-with implies happens-before? I only see it implicitly in https://doc.rust-lang.org/std/sync/atomic/fn.fence.html:

A fence ‘A’ which has (at least) Release ordering semantics, synchronizes with a fence ‘B’ with (at least) Acquire semantics, if and only if there exist operations X and Y, both operating on some atomic object ‘M’ such that A is sequenced before X, Y is sequenced before B and Y observes the change to M. This provides a happens-before dependence between A and B.

Whereas Ordering::Release and Ordering::Acquire basically indirectly say that AtomicX.store(x, Ordering::Release) and AtomicX::load(Ordering::Acquire) establish a happens-before relationship, but without mentioning synchronized-with or happens-before.

It seems like if we want to use the term synchronizes-with for Mutex, then we should change the definition of Release/Acquire to also use that term, and then somewhere give some kind of definition of synchronizes-with that specifically mentions "happens before."

https://preshing.com/20130823/the-synchronizes-with-relation/#other-ways-to-achieve-synchronizes-with and some other places also mention that spawning a thread synchronizes-with (Ordering::Release) the execution of the closure on the new thread (Ordering::Acquire).

I believe std::sync::Once and OnceLock also need similar language; every call to Once::call_once()/OnceLock::get_andinit().OnceLOck::get_or_try_init() synchronizes-with (Ordering::Acquire) the successful (i.e. non-panicking, non-Err result) completion of the initialization function (Ordering::Release). (Note that the documentation for std::sync::Once seems to say something much stronger right now, which is another issue.)

@veera-sivarajan
Copy link
Contributor

@rustbot label -needs-triage +A-atomic

@rustbot rustbot added A-atomic Area: atomics, barriers, and sync primitives and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 10, 2024
@chorman0773
Copy link
Contributor

chorman0773 commented Jun 10, 2024

That sounds good. However, do we have a place that says that synchronizes-with implies happens-before?

Yes, the C++ Standard, [intro.races]#9, [intro.races]#10, and [intro.races]#11 https://eel.is/c++draft/intro.races#9.

@jieyouxu jieyouxu added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. C-discussion Category: Discussion or questions that doesn't represent real issues. labels Jun 11, 2024
@alexpyattaev
Copy link

Mutex explicitly documented as having what amounts to "side effects" (a.k.a. spooky action at a distance) w.r.t. random atomics does sound somewhat inappropriate for rust in general, especially in environments that might have the machinery to support a mutex, but not necessarily the side-effects that you find on e.g. an ARM implementation of it. Specifically, if you want to enforce a particular ordering w.r.t. a particular atomic, maybe that is what you need to state in the code using an appropriate fence call? Expecting a mutex to implicitly do that for you essentially relies on mutex's internal implementation to work. I might be missing something very important here, of course, but it seems that exposing these internal details into the docs may not be the best idea for future portability of Mutex to newer architectures, unless someone can certify that it is impossible to implement mutex in any other way.

@chorman0773
Copy link
Contributor

chorman0773 commented Jun 11, 2024

I don't think any other implementation exists that is sound, no.
Consider Mutex<Box<T>>. For it to be sound to access the memory inside the Box, every access from all previous attempts at locking the Mutex must happen-before the next time you can access the Box. This implies either synchronizes-with or dependency ordered before at the very least, otherwise you don't get happens-before when you have two different threads.

Being a fence is an overly restrictive specification, but it saying synchronizes-with is required IMO for the API to be sound.

Edit: Yes, Mutex could probably use dependency ordered before, depending on what exactly the preconditions on Send impls are. However, given consume is a hot mess in C++, to the point where it's outright discouraged as of C++17, I don't think we want to use dependency ordering. It may be nice if, in Rust, happens before=simply happens before.

@alexpyattaev
Copy link

happens before=simply happens before.

That does indeed sound like "the rust way".

However, from what I can see in c++ specs, some of them claim that memory order consume was in fact re-added in c++20, and its former definition was discouraged since c++17. It is all a bit wonky in there, so I'd just ignore the C++ spec entirely for the purpose of this discussion, especially since llvm has no implementation for consume anyway. What truly matters is what CPUs can or can not do in principle.

With respect to what consume ordering was supposed to do, as far as I can see, it was intended to support specifically use-cases such as rust's Mutex. As a former hardware engineer, I can envision mechanisms that would enable hardware-level simplifications in cases when Mutex-protected area is very small (e.g. one cache line), as I could plausibly do less work then in the case of a full "happens before" synchronization. Arguably, the main benefits would be seen on NUMA machines and GPUs, and we should look for what is done e.g. in CUDA to be certain.

On the other hand, I'd imagine this sort of aggressive optimization, even if we find a way to implement consume "correctly", would break horribly around unsafe code. Compiler would need a far better understanding of what depends on what to not mess this up, and unsafe/interior mutability stuff will not make it easy at all.

As a solution to the doc problem, a generic parameter may be provided to std::Mutex that would toggle whether it uses memory ordering acquire-release or acquire-consume, with acquire-release being the default. This way we are explicit about the side-effects that may occur or relations that can be relied upon, and can document it as such. If/when memory order consume becomes a thing, users can switch to that without breaking the world.

@RalfJung
Copy link
Member

RalfJung commented Jun 12, 2024

Consume relies on a notion of "dependency", and there is no credible proposal for how to define "dependency" in a language that is aggressively optimized (other than by introducing some form of provenance on integers, but that's not going to happen). What the hardware does is largely irrelevant here (as is often the case when discussing Rust semantics); whatever semantics we want has to first pass the test of fire of being compatible with all sorts of analyses and optimizations -- and then we can ask whether it can be efficiently mapped to hardware.

Adding "consume" to Rust is definitely out-of-scope for this issue.

In the Rust memory model (basically, C++ without "consume"), there's not really any way to implement a Mutex that doesn't establish synchronization between release and future acquire calls. (In fact, the memory orderings are literally named after these lock operations!) Furthermore, using Mutex<()> to guard data "outside" the Mutex is, to my knowledge, a generally accepted-to-be-correct pattern. So I don't think there is a realistic alternative to adding such a guarantee.

@alexpyattaev
Copy link

alexpyattaev commented Jun 12, 2024 via email

@chorman0773
Copy link
Contributor

chorman0773 commented Jun 12, 2024

It is all a bit wonky in there, so I'd just ignore the C++ spec entirely for the purpose of this discussion

Considering that is also Rust's Specification (for atomics/Multithreaded Memory Model), I don't think we can.

You make a good point, in this case the docs indeed should acknowledge this. If someone ever implements hardware that can do better, then we bother:)

Hardware does support Consume, fwiw. ARMv8's "default" memory order is dependency ordered, aka consume. However, no optimizing compiler I'm aware of supports consume (opting to promote it to acquire in all cases).

@alexpyattaev
Copy link

Does anyone have a functional example of a mutex built around consume memory ordering?

@briansmith
Copy link
Contributor Author

briansmith commented Jun 12, 2024

Does anyone have a functional example of a mutex built around consume memory ordering?

I think it would be very cool if Rust could get a useful (higher-performing) consume-based Mutex. However, I think much code already exists that expects acquire-release, so without some kind of opt-in (probably a separate type), Mutex needs to have acquire-release semantics by default.

(acquire-release is usually introduced conceptually as working "like acquiring and releasing a mutex" so it would be humorous, but confusing, if Rust's actual Mutex didn't have acquire-release semantics).

@chorman0773
Copy link
Contributor

chorman0773 commented Jun 12, 2024

Furthermore, using Mutex<()> to guard data "outside" the Mutex is, to my knowledge, a generally accepted-to-be-correct pattern

I've done this before. In my emulator, I do this in my implementation of physical memory:

#[derive(Debug)]
pub struct SysMemory {
    page_limit: u32,
    // SAFETY:
    // page_count is synchronized by the `pages` lock despite not being directly covered by it.
    // It is sound to perform read accesses to this cell from a thread that owns a shared read lock to `pages` or the exclusive write lock to `pages`
    // It is sound to perform write accesses to this cell from a thread that owns the excusive write lock to `pages`
    page_count: SyncUnsafeCell<u32>,
    pages: RwLock<HashMap<LeU32, RwLock<Box<Page>>>>,
}

(Yes that's a nested RwLock, I don't want reading a page to conflict with writing to an already allocated page)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: atomics, barriers, and sync primitives A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-discussion Category: Discussion or questions that doesn't represent real issues. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants