Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutex and RwLock are unsound in presence of discriminant elision #68206

Closed
tmiasko opened this issue Jan 14, 2020 · 31 comments · Fixed by #68491
Closed

Mutex and RwLock are unsound in presence of discriminant elision #68206

tmiasko opened this issue Jan 14, 2020 · 31 comments · Fixed by #68491

Comments

@tmiasko
Copy link

@tmiasko tmiasko commented Jan 14, 2020

Neither Mutex type nor RwLock type inhibit layout optimization and so either of
them might contain a niche. When a niche is used to perform a discriminant
elision extracting the discriminant of the outer type will inadvertently load
the content of Mutex / RwLock without proper synchronization.

@CryZe

This comment has been minimized.

Copy link
Contributor

@CryZe CryZe commented Jan 14, 2020

Can you show an example where this is happening? I can't come up with an example where the niche inside the Mutex is ever used where there even is an initialized Mutex at all.

@tmiasko

This comment has been minimized.

Copy link
Author

@tmiasko tmiasko commented Jan 14, 2020

For example following code contains a data race on x86_64-unknown-linux-gnu, between store in thread1 and load in thread2.

use std::mem::size_of;
use std::sync::Arc;
use std::sync::Mutex;

#[inline(never)]
#[no_mangle]
pub fn thread1(v: &mut bool) {
    *v = true;
}

#[inline(never)]
#[no_mangle]
pub fn thread2(v: &Option<Mutex<bool>>) -> usize {
    match *v {
        None => 0,
        Some(_) => 1,
    }
}

fn main() {
    // Verify that discriminant elision takes place.
    assert_eq!(
        size_of::<Mutex<bool>>(),
        size_of::<Option<Mutex<bool>>>(),
    );

    let a = Arc::new(Some(Mutex::new(false)));
    let b = a.clone();

    let t = std::thread::spawn(move || {
        thread1(&mut (*a).as_ref().unwrap().lock().unwrap())
    });

    println!("{}", thread2(&b));
    t.join().unwrap();
}

The Option<Mutex<T>> combination is not uncommon. It happens when a mutex is placed inside a lazy static for example.

@hanna-kruppe

This comment has been minimized.

Copy link
Member

@hanna-kruppe hanna-kruppe commented Jan 14, 2020

Note that there's another niche in Mutex, the Box<sys::Mutex> field, which also enables e.g. Option<Mutex<i32>> to have the same size as Mutex<i32>. But in this case, indeed the data: UnsafeCell<T> field is used for storing the discriminant (presumably because it has a bigger niche?). This can be confirmed by looking at the assembly code generated for this playground and noticing that the same offset is used for locating the data in the mutex as for locating the discriminant.

A simple fix would be to change the UnsafeCell<T> field to UnsafeCell<MaybeUninit<T>>. In reality this field is always initialized, but MaybeUninit<T> has no niche regardless of T, so it would stop the layout optimizations.

@CryZe

This comment has been minimized.

Copy link
Contributor

@CryZe CryZe commented Jan 14, 2020

I'd say the problem is rather with UnsafeCell itself. match shouldn't look for the tag through the boundary of an UnsafeCell at all, as accessing its value is inherently unsafe and can only be safe if you know the invariants of the safe abstraction on top, which match in general can not know. So UnsafeCell, like MaybeUninit shouldn't allow outside tags to use niches of its T.

@pnkfelix

This comment has been minimized.

Copy link
Member

@pnkfelix pnkfelix commented Jan 14, 2020

I'd say the problem is rather with UnsafeCell itself. match shouldn't look for the tag through the boundary of an UnsafeCell at all, as accessing its value is inherently unsafe and can only be safe if you know the invariants of the safe abstraction on top, which match in general can not know.

I tend to agree that the UnsafeCell<T> exported from std should not allow uses of niches within T

However, I also tend to think that we should have Cell<T> continue to allow use of the niches in its T

This then raises the immediate question of whether we will need to make a new variant of UnsafeCell<T>, e.g. UnsafeCellWithExposedNiches<T> (deliberately ugly name there), that Cell can use...

@hanna-kruppe

This comment has been minimized.

Copy link
Member

@hanna-kruppe hanna-kruppe commented Jan 14, 2020

Cell<T> having the same niche as T should be sound and sensible, but I don't expect it'll actually matter much whether it's done or not. In my experience, type combinations that would benefit from it are pretty rare: generally cells are not (directly) placed in enums, as they are rarely passed around by value in general. If a field with interior mutability is optional, then Cell<Option<T>> is easier to work with and strictly more general. Types containing cells being wrapped in enums is slightly more likely (e.g. with a fallible constructor (fn new(...) -> Result<Self, E>), but I still don't recall ever seeing it in practice.

On its own that isn't a reason to leave the optimization opporturnity on the table, it would still be nice to have if only for "peace of mind". But I don't think hacks such as adding a second variant of UnsafeCell are worthwhile. I'd rather wait and hope that Rust gains a more general and first-class (than rustc_layout_scalar_valid_range_{start,end}) way of declaring niches that allows Cell<T> to inherit the niche of T without such hacks.

@Aaron1011

This comment has been minimized.

Copy link
Contributor

@Aaron1011 Aaron1011 commented Jan 14, 2020

But I don't think hacks such as adding a second variant of UnsafeCell are worthwhile. I'd rather wait and hope that Rust gains a more general and first-class (than rustc_layout_scalar_valid_range_{start,end}) way of declaring niches that allows Cell to inherit the niche of T without such hacks.

I think it would be useful to have a perma-unstable UnsafeCellWithExposedNiches type that's only used within libstd. If we can avoid penalizing user code (even if it's rare), I think we should do so. UnsafeCellWithExposedNiches could then be removed whenever a more general rustc_layout_scalar_valid_range_ is developed.

@MikailBag

This comment has been minimized.

Copy link
Contributor

@MikailBag MikailBag commented Jan 15, 2020

It seems to me that this bug also happens in parking_lot and atomic types, because they don't put MaybeUninit in the UnsafeCell. Is it correct?

@tmiasko

This comment has been minimized.

Copy link
Author

@tmiasko tmiasko commented Jan 15, 2020

Atomics have no niches, but Sync types that use UnsafeCell with user provided type are likely to be affected, including parking_lot.

In terms of naming one might signify !Sync aspect and name variant that participates in layout optimization as UnsafeUnsyncCell.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jan 15, 2020

Ah, good catch! This is very closely related to rust-lang/unsafe-code-guidelines#204, I think. Basically, Rust's notion of "disjointness" says that the discriminant is disjoint from the actual enum data, but with niche optimizations that is not actually the case.

@pnkfelix

This comment has been minimized.

Copy link
Member

@pnkfelix pnkfelix commented Jan 16, 2020

T-compiler triage: P-high, leaving nominated label in place since this needs discussion by T-lang and T-libs (at least).

@pnkfelix pnkfelix added the P-high label Jan 16, 2020
@pitdicker

This comment has been minimized.

Copy link
Contributor

@pitdicker pitdicker commented Jan 16, 2020

I am curious, is this only a problem for Sync types because of data races, or if it is always a problem, for all uses of UnsafeCell.

For example with niches &Option<Cell<bool>> is a shared reference pointing to the same address as the inner bool, which can be mutated. Doesn't this violate the aliasing rules?

use std::cell::Cell;

fn main() {
    let cell = Some(Cell::new(false));
    let cell_ref = &cell;
    if let Some(x) = cell.as_ref() {
        x.set(true);
    }
    // Could this next decision be made with a stale value for cell?
    if let Some(x) = cell_ref.as_ref() {
        x.set(false);
    }
}

I would expect the compiler can already do the load of cell_ref for the second if before or during the first if, because it is a shared reference. Then the second comparison would act on a stale value, although the decision can never be wrong.

I don't know how to choose the right words, but it seems to me using niches through an UnsafeCell is always wrong.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jan 16, 2020

@pitdicker that's basically what rust-lang/unsafe-code-guidelines#204 that I mentioned is about.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

@nikomatsakis nikomatsakis commented Jan 16, 2020

Interesting. I am of the mind that UnsafeCell should not expose niches. I think if we want to enable Cell<T> to have a niche later, we should think about adding some way for a type to "re-expose" niches, but I'd leave that as a "future work" thing for now.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jan 16, 2020

Citing what I said on Zulip (about rust-lang/unsafe-code-guidelines#204), and going a bit beyond that:

I agree with @Centril that we should not mix concepts (superficially, it looks like niches and interior mutability are entirely unrelated), but I also cannot say that I know how to define the concept of interior mutability on its own, ignoring niches.

A variant of Stacked Borrows that correctly models the "one thing UnsafeCell does" with support for exposing niches might be possible, but I think it will require things I would like to strictly avoid -- like "ghost state" that records the actual enum discriminant in some magical place even when in memory, we just see its encoding via some niche. this is exactly the same kind of "ghost state" as the "active union variant", and I did everything I could to make sure we do not have that. And even if we did do that, we'd end up making niches (or rather, discriminants) very explicit in the language semantics just for the purpose of being able to model their interaction with interior mutability.

I see an inherent conflict here between niches, that basically "overlay" multiple pieces of information on the same location, and trying to define interior mutability operationally (in terms of which memory can and cannot be mutated when) in a way that it only applies to "parts of" the information stored in that location.

@pitdicker

This comment has been minimized.

Copy link
Contributor

@pitdicker pitdicker commented Jan 16, 2020

I agree with @Centril that we should not mix concepts (superficially, it looks like niches and interior mutability are entirely unrelated)

Probably my mental model is just very different, but to me they seem quite related.

With interior mutability you need to have a set of conventions about when to create references, when you can do reads, and when mutation is allowed. The read that can happen with niches that pass the UnsafeCell boundary is a read from the interior that doesn't uphold any conventions.

You can easily do a read of the UnsafeCell interior through a niche while there is a mutable reference active with RefCell. A decision on a niche can be done on a stale value after Cell::set. With Mutex and RwLock the read can be done while another thread is writing, or without synchronizing the data. I have trouble coming up with a convention that needs UnsafeCell where you can 'just do a read' from the outside.

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jan 16, 2020

In a sequential setting, "reading a stale value" is not necessarily a problem -- we could define everything such that this is fine. As someone (you?) remarked, the only relevant result -- the discriminant encoded in that value -- will indeed not change.

But defining things like this is complicated, and that's indeed why I agree that they are pretty related.

@pitdicker

This comment has been minimized.

Copy link
Contributor

@pitdicker pitdicker commented Jan 16, 2020

Finding such a definition for a sequential setting seems pretty close to defining when data races are 'benign'. Sounds interesting, but I fully believe the this is complicated part 😄.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

@nikomatsakis nikomatsakis commented Jan 17, 2020

@rfcbot fcp merge

We discussed this in today's @rust-lang/lang meeting. The consensus was that we should inhibit niches from propagating out from UnsafeCell<T> in order to resolve this unsoundness. I'm doing an fcp merge to make this decision official.

I've prepared a write-up of the situation that covers reasoning and implications. Please review it and see if there are things missing from this thread.

In particular, I outlined the problems and implications of the decision, but I'd like to add anything I've overlooked. One aspect I'd like some feedback on in particular is my summary of @RalfJung's comment, in the What about cell section.

Presuming we decide to 'merge' this proposal, I would think that we could close the bug by:

  • Implementing the change
  • Adding a "normative" section to the discussion of niches in the unsafe code guidelines reference
@rfcbot

This comment has been minimized.

Copy link

@rfcbot rfcbot commented Jan 17, 2020

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rodrimati1992 added a commit to rodrimati1992/abi_stable_crates that referenced this issue Jan 17, 2020
rodrimati1992 added a commit to rodrimati1992/abi_stable_crates that referenced this issue Jan 17, 2020
* Bumped patch release,updated layout for Cell and UnsafeCell due to rustc bug.

Link to rust bug rust-lang/rust#68206

* Travis config:remove Cargo.lock after beta builds

* Fix previous commit
@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jan 17, 2020

Further evidence that Rust's aliasing constraints (at least as formalized by Stacked Borrows) don't work well with niches for interior mutable data: #68303

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

@nikomatsakis nikomatsakis commented Jan 18, 2020

Based on @RalfJung's comment, I have greatly expanded the section in my hackmd writeup on the deeper conflict at play here.

Ping @joshtriplett @pnkfelix @scottmcm and @withoutboats boats on the FCP merge for the idea that UnsafeCell<T> inhibits niche optimizations.

@rfcbot

This comment has been minimized.

Copy link

@rfcbot rfcbot commented Jan 22, 2020

🔔 This is now entering its final comment period, as per the review above. 🔔

@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jan 26, 2020

@nikomatsakis that looks good, except for

Stacked borrows has the goal of not introducing “ghost state” or other such things, but instead defining the rules in terms of “which memory can be mutated and when”.

The stacks and tags introduced by Stacked Borrows are "ghost state". But they are entirely orthogonal to the data stored in memory, and they are handled uniformly across all memory accesses.

The "ghost state" needed here would be specific to enums, that feels much more ad-hoc than what we are doing currently.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

@nikomatsakis nikomatsakis commented Jan 27, 2020

@RalfJung yes, good point, thanks.

@rpjohnst

This comment has been minimized.

Copy link
Contributor

@rpjohnst rpjohnst commented Jan 30, 2020

Cell is different from RefCell and Mutex, since it does not permit references to its interior.

Does this interact with RFC 1789? That not only allows projecting references into a Cell's interior, but also allows a program to "materialize" Cells where there were none.

Given this ability to project into and materialize Cells, giving T and Cell<T> different layouts seems dangerous. It's possible it all works out because really the difference is limited to NicheUser<T>/NicheUser<Cell<T>>, where NicheUser does not allow Cell projection, but that's not immediately obvious to me.

@eddyb

This comment has been minimized.

Copy link
Member

@eddyb eddyb commented Jan 30, 2020

@rpjohnst I think it's fine because it doesn't let you make any assumptions, the Cell projections only go through product types.

Also, creating a Cell where there was none requires a mutable reference, which I believe is wrong enough to account for all possible effects.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jan 30, 2020
…ell, r=oli

Hide niches under UnsafeCell

Hide any niche of T from type-construction context of `UnsafeCell<T>`.

Fix rust-lang#68303
Fix rust-lang#68206
@RalfJung

This comment has been minimized.

Copy link
Member

@RalfJung RalfJung commented Jan 30, 2020

Notably, T and Cell<T> are still the same -- just when wrapped in some other type, they might differ (and specifically not when wrapped in a product type). So I do not see how there can be a problem here. But I might be missing something.

@rfcbot

This comment has been minimized.

Copy link

@rfcbot rfcbot commented Feb 1, 2020

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

bors added a commit that referenced this issue Feb 3, 2020
Hide niches under UnsafeCell

Hide any niche of T from type-construction context of `UnsafeCell<T>`.

Fix #68303
Fix #68206
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

@nikomatsakis nikomatsakis commented Feb 3, 2020

@rpjohnst a good catch for sure, definitely something to think about.

@pnkfelix pnkfelix self-assigned this Feb 3, 2020
@pnkfelix

This comment has been minimized.

Copy link
Member

@pnkfelix pnkfelix commented Feb 3, 2020

(assigning self, since I authored PR #68491)

bors added a commit that referenced this issue Feb 6, 2020
Hide niches under UnsafeCell

Hide any niche of T from type-construction context of `UnsafeCell<T>`.

Fix #68303
Fix #68206
@bors bors closed this in fc23a81 Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

You can’t perform that action at this time.