Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is unclear how UnsafeCell is special #34496

Closed
diwic opened this issue Jun 27, 2016 · 13 comments
Closed

It is unclear how UnsafeCell is special #34496

diwic opened this issue Jun 27, 2016 · 13 comments
Labels
T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@diwic
Copy link
Contributor

diwic commented Jun 27, 2016

I'm raising this as an issue, even if it might look more like a question; I think the docs need to be clarified.

I'm trying to figure out if and how UnsafeCell is special. Let's start with two statements, the first from std API docs

The UnsafeCell type is the only legal way to obtain aliasable data that is considered mutable. In general, transmuting an &T type into an &mut T is considered undefined behavior.

The second from nomicon:

Transmuting an & to &mut is UB
Transmuting an & to &mut is always UB
No you can't do it
No you're not special

So, this seems contradictory, and the first theory is that UnsafeCell is indeed special and that the Nomicon just forgot to mention it. The fact that unsafecell is a lang_item points in that direction too. But it seems there is more to it than that. The API docs mention &mut T, but UnsafeCell never hands out a &mut T, only a *mut T. And for *mut T, the compiler cannot assume that two pointers are never the same; that is only true for &mut T. (Is this correct?)

And AFAIK, none of the std wrappers allow for two &mut T - Cell never gives out a &mut T, it just copies values in and out, RefCell, Mutex and RwLock uses runtime checks to make sure there is only one &mut T at a time. And atomics never deal with &mut either.

So, that seems to point to the nomicon being correct, but then I don't see why we would need an UnsafeCell in the first place - Cell and friends could just implement their functionality on top of a T rather than an UnsafeCell<T>.

@pnkfelix pnkfelix added T-lang Relevant to the language team, which will review and decide on the PR/issue. A-docs labels Jun 27, 2016
@sorear
Copy link
Contributor

sorear commented Jun 27, 2016

UnsafeCell is not documented because we don't actually know what it does, not at an abstraction level that would be meaningful to programmers anyway. Operationally it causes rustc to suppress emitting llvm noalias attributes in certain situations, but part of the task of the ongoing Rust memory model effort will be to come up with clear rules for when UnsafeCell is needed.

@durka
Copy link
Contributor

durka commented Jun 27, 2016

There's no contradiction. UnsafeCell is the way to get aliasable, mutable data. Transmuting a &T to a &mut T would be another way, but doing that is always UB.

That said the docs should definitely be improved with the results of the memory model effort.

@brson
Copy link
Contributor

brson commented Jun 27, 2016

So, this seems contradictory, and the first theory is that UnsafeCell is indeed special and that the Nomicon just forgot to mention it.

I don't think they are actually contradictory here, just confusing. As you point out UnsafeCell never does give you access to an &mut T. It really is totally bogus to have aliased &mut Ts. UnsafeCell is special though as the root of interior mutability.

And for *mut T, the compiler cannot assume that two pointers are never the same; that is only true for &mut T. (Is this correct?)

Yes.

So, that seems to point to the nomicon being correct, but then I don't see why we would need an UnsafeCell in the first place - Cell and friends could just implement their functionality on top of a T rather than an UnsafeCell.

UnsafeCell tells the compiler that this type might by mutable despite not containing &mut T. The compiler needs to know this to e.g. make judgments about what types are Sync. Values with no UnsafeCells in them (most Rust values) can be optimized better because 1) they are guaranteed not to be written (in LLVM they are readonly), and 2) as @sorear mentions they can be noalias (a complex subject).

@Manishearth
Copy link
Member

and the first theory is that UnsafeCell is indeed special and that the Nomicon just forgot to mention it.

This is correct.

However, you should not be transmuting an &UnsafeCell to an &mut UnsafeCell anyway. I'm not sure if that is UB or not (probably not), but don't do it.

What you should be doing is converting that *mut T to an &mut T and using it. But you must ensure that at runtime that is the only &mut T in active use.


You should never ever ever have two &mut Ts active pointing to the same T. However, when building abstractions like RefCell, Mutex, or Cell, you need the ability to turn off this restriction for a while.

The compiler basically makes optimizations assuming that & is not mutated and &mut is unique. UnsafeCell turns those optimizations off; letting you build things like Cell and RefCell.

But it is fundamentally unsafe to have two &muts lying around; which is why the stdlib never gives those out without some form of checking.

@Manishearth
Copy link
Member

#34520

@diwic
Copy link
Contributor Author

diwic commented Jun 28, 2016

First, thanks for all explanations!

About the contradiction; I think that if one doc says "In general, transmuting...", that implies that there are exceptions to the rule, and "No, no you're not special" implies that there are no exceptions. That's the core of the contradiction to me.

I'm afraid your answers left me with more questions, but feel free to poke me in case you feel this is taking up too much of your time.

@brson

UnsafeCell tells the compiler that this type might by mutable despite not containing &mut T. The compiler needs to know this to e.g. make judgments about what types are Sync.

I'm not sure we require a lang_item for this part - wouldn't just the existing impl<T> !Sync for UnsafeCell<T> where T: ?Sized suffice?

Values with no UnsafeCells in them (most Rust values) can be optimized better because 1) they are guaranteed not to be written (in LLVM they are readonly),

Okay, so this makes some sense, maybe you can reason about function purity if a &T does not contain UnsafeCells, and also make the assumption that two sequential reads of &T - and things reachable through &T - will return the same value.

@Manishearth

You should never ever ever have two &mut Ts active pointing to the same T. However, when building abstractions like RefCell, Mutex, or Cell, you need the ability to turn off this restriction for a while.

Okay, so this is the part I don't get. Cell, e g, could just as well just have used ptr::write (which takes a *mut T, not a &mut T), and never even touch &mut T.

A related question here is also about the typecast &T -> *const T -> *mut T -> &mut T, which does not require a call to mem::transmute. Is it UB anyhow? It is only the last step that's unsafe, so that must be the step that's UB. (And why do we allow it without a call to mem::transmute if it's always UB?)
Or could it be something like *mut T -> &mut T being UB in all cases where the *mut T did not come from an UnsafeCell...?

@hanna-kruppe
Copy link
Contributor

maybe you can reason about function purity if a &T does not contain UnsafeCells

Since functions can always access statics and call external functions such as printf, it's impossible to tell from the signature if a function is in any sense "pure".

and also make the assumption that two sequential reads of &T - and things reachable through &T - will return the same value.

This is the main win. The compiler doesn't have to prove that intervening writes can't affect the &T. Whether this will actually be assumed, and in which circumstances, is part of the whole memory model question, but I expect it will happen to some degree.

Okay, so this is the part I don't get. Cell, e g, could just as well just have used ptr::write (which takes a *mut T, not a &mut T), and never even touch &mut T.

Cell could, but RefCell hands out &muts to its contents (but only one at a time).

A related question here is also about the typecast &T -> *const T -> *mut T -> &mut T, which does not require a call to mem::transmute. Is it UB anyhow?

The exact rules for what's UB and what isn't are still being hammered out, but it's pretty clear that e.g. this shall be illegal:

let x = &mut something;
let y = unsafe { transmute_copy(&x) };
// Use both *x and *y

And this shall be legal:

let c = RefCell::new(..);
{ let x: RefMut<_> = c.borrow_mut(); let x = &mut *x; /* use *x */ }
{ let y: RefMut<_> = c.borrow_mut(); let y = &mut *y; /* use *y */ }

The latter also creates a &mut from a *mut, but unlike the first it never has two at the same time, which is the main source of UB. The strong wording against transmute in the Nomicon is presumably not because creating a &mut from a *mut is always illegal, but because doing it by transmute is a stupid way to do something that's already incredibly dangerous. If you took the source code of RefCell and replaced the part where it casts the raw pointer to &mut and made it use transmute instead, it would probably be fine, unless it's (somewhat arbitrarily) decided that the mere act of transmuting & to &mut shall also be illegal.

@arielb1
Copy link
Contributor

arielb1 commented Jun 28, 2016

@diwic

We have still not decided on the semantics of &mut UnsafeCell<T>, but the current implementation has it the same as &mut T. In that case, transmuting a &UnsafeCell<T> to an &mut UnsafeCell<T> is likely to cause problems (maybe the UB will only occur at the actual access rather than at the transmute, but it is still not a Good Idea).

Also, doing ptr::write to an &T (through a *mut T) is Definitely Undefined.

@brson
Copy link
Contributor

brson commented Jun 28, 2016

I'm not sure we require a lang_item for this part - wouldn't just the existing impl !Sync for UnsafeCell where T: ?Sized suffice?

I believe you are correct, yes.

@durka
Copy link
Contributor

durka commented Jun 28, 2016

The lang item part seems to be used in various corners of the compiler for
inferring variance, making sure consts are const, etc.

On Tue, Jun 28, 2016 at 2:10 PM, Brian Anderson notifications@github.com
wrote:

I'm not sure we require a lang_item for this part - wouldn't just the
existing impl !Sync for UnsafeCell where T: ?Sized suffice?

I believe you are correct, yes.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#34496 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAC3n8vsqOW-m5eFbKM_rRIrw_Z1OKErks5qQWOIgaJpZM4I-zcx
.

@Manishearth
Copy link
Member

That's the core of the contradiction to me.

But you're not supposed to transmute it. That it internally does something similar to a transmute is irrelevant, don't transmute &UnsafeCell.

Okay, so this is the part I don't get. Cell, e g, could just as well just have used ptr::write

Cell is usually found behind an &, e.g. within an Rc. Mutating this is unsafe, unless unsafecell is involved. &mut is only half the story, there are rules about & too.

@diwic
Copy link
Contributor Author

diwic commented Jun 29, 2016

I think things are a bit clearer now. I also found a third source of information, form the Language reference:

&mut T and &T follow LLVM’s scoped noalias model, except if the &T contains an UnsafeCell. Unsafe code must not violate these aliasing guarantees.
Mutating non-mutable data (that is, data reached through a shared reference or data owned by a let binding), unless that data is contained within an UnsafeCell.

@Manishearth

But you're not supposed to transmute it. That it internally does something similar to a transmute is irrelevant, don't transmute &UnsafeCell.

Yes. Let's just remove the "In general, transmuting an &T type into an &mut T is considered undefined behavior." because the "In general" part suggests otherwise (or can at least be interpreted as such).

&mut is only half the story, there are rules about & too.

Yes, that's what's becoming more obvious to me now after the explanations from you all. Thanks!

@Manishearth
Copy link
Member

In general" part suggests otherwise (or can at least be interpreted as such).

But it is true. Remember that UB isn't completely defined in Rust, so weasel-word statements like this are necessary 😄

bors added a commit that referenced this issue Aug 3, 2016
bors added a commit that referenced this issue Aug 4, 2016
@bors bors closed this as completed in 3873402 Aug 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants