New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate uninitialized in favor of a new MaybeUninit type #1892

Merged
merged 20 commits into from Aug 19, 2018

Conversation

@canndrew
Contributor

canndrew commented Feb 9, 2017

My thoughts on what to do with uninitialized and !.

Rendered

Tracking issue

Edit: This has been updated to instead recommend deprecating uninitialized entirely. The old Inhabited trait proposal is now listed as an alternative.

Edit: FCP Proposal is #1892 (comment) (in the collapsed-by-default part).

@canndrew

This comment has been minimized.

Contributor

canndrew commented Feb 9, 2017

cc @nikomatsakis, @arielb1, @eddyb. I know y'all have some thoughts on what to do with all this.

@ubsan

This comment has been minimized.

Contributor

ubsan commented Feb 9, 2017

Calling uninitialized is already extremely dangerous. This just seems like unnecessary complication. If you want to create a !, for some reason, whatever you're doing is probably UB anyways. At most, we might want to have a warning in the docs.

@canndrew

This comment has been minimized.

Contributor

canndrew commented Feb 9, 2017

@ubsan, what about the catch_unwind example in the RFC? That's based on a compiler bug that got created when feature(never_type) got switched on in the standard library.

Without this, any (otherwise correct) code that uses uninitialized::<T> is landmine waiting to go off when someone tries to set T = !.

@nagisa

This comment has been minimized.

Contributor

nagisa commented Feb 9, 2017

The compiler bug happened mostly because previously the T would become () due to ! not existing. I personally think this PR is going in the right direction (a-la Sized).

This trait is automatically implemented for all inhabited types.
Change the type of `uninitialized` to:

This comment has been minimized.

@nagisa

nagisa Feb 9, 2017

Contributor

This conflicts somewhat with Inhabited being an auto-trait (a-la Sized), or rather T is already Inhabited unless specified otherwise (i.e. T: ?Inhabited)

This comment has been minimized.

@mark-i-m

mark-i-m Feb 10, 2017

Contributor

Question: Does ! impl Sized? Does that even make sense?

This comment has been minimized.

@canndrew

canndrew Feb 10, 2017

Contributor

@mark-i-m Yes, size_of::<!>() == 0, More generally though, for any given size every element of ! has that size. Similarly, any two elements of ! have the same size. These are both trivially true since ! has no elements.

This comment has been minimized.

@mark-i-m

mark-i-m Feb 10, 2017

Contributor

So we are defining size_of::<T>() == n iff for all values v of T that v takes n bits to express, and since there are no values of T = !, this vacuously true?

This comment has been minimized.

@canndrew

canndrew Feb 10, 2017

Contributor

Basically, yeah. There's two subtly different definitions you could give of Sized but ! satisfies both of them vacuously.

This could be a rather large breaking change depending on how many people are
currently calling `uninitialized::<T>` with a generic `T`. However all such
code is already somewhat future-incompatible as it will malfunction (or panic)
if used with `!`.

This comment has been minimized.

@nagisa

nagisa Feb 9, 2017

Contributor

If Inhabited is auto-trait, like Sized, I do not see how that’s problematic in any way.

The author of the crate may expect this change to be private and its effects
contained to within the crate. But in making this change they've also stopped
exporting the `Inhabited` impl, causing potential breakages for downstream
users.

This comment has been minimized.

@nagisa

nagisa Feb 9, 2017

Contributor

Again, that’s pretty much the same story with Sized. I.e. you cannot change pub struct Sized { a: [u8; 42] } to pub struct Sized { a: [u8] }.

@arielb1

This comment has been minimized.

Contributor

arielb1 commented Feb 9, 2017

We pretty much decided in the design sprint to deprecate mem::uninitialized in favor of using a MaybeUninitialized<T> union in the standard library.

@canndrew

This comment has been minimized.

Contributor

canndrew commented Feb 9, 2017

@arielb1 Ah okay. I figured that might too radical for now so I intended this RFC as a (possibly temporary) middle ground.

@canndrew

This comment has been minimized.

Contributor

canndrew commented Feb 9, 2017

@nagisa Sorry if my wording is confusing. Inhabited isn't intended to be an auto-trait like Sized in the sense that you have to explicitly opt-out with ?Inhabited. That would create unnecessary restrictions on using uninhabited types. ! and Void should be usable pretty much anywhere, it's only really uninitialized which is problematic.

@nagisa

This comment has been minimized.

Contributor

nagisa commented Feb 9, 2017

it's only really uninitialized which is problematic.

That’s not true. ptr::read (and probably a number of other functions) has the same problem as uninitialized and we ain’t deprecating that, so an auto-trait is still worthwhile IMO:

fn main() {
let x: *const ! = &0 as *const _ as *const _; // imagine this comes as an generic argument from somewhere
let z: ! = unsafe {
    ::std::mem::read(x) // boom bam blamma
};
}
@canndrew

This comment has been minimized.

Contributor

canndrew commented Feb 9, 2017

Ah true, I hadn't thought of ptr::read. transmute has the same problem but I've made a seperate RFC for that. I just went through the list of intrinsics and I couldn't find any others which return a generic T without also taking one as an argument.

I still think ?Inhabited would make uninhabited types almost unusable unless people added the ?Inhabited bound everywhere. It seems simpler and a lot less restrictive to just put T: Inhabited on ptr::read as well.

match std::panic::catch_unwind(|| {
let val = f();
unsafe {
(*foo_ref).value = val;

This comment has been minimized.

@ubsan

ubsan Feb 9, 2017

Contributor

This line is broken for types with Drop impls. It should be ptr::write(&mut (*foo_ref).value, val)

This comment has been minimized.

@canndrew

canndrew Feb 10, 2017

Contributor

Hmm, I thought we settled on different rules for overwriting union fields for some reason. Thanks for the catch!

```
Yet calling this function does not diverge! It just breaks everything then eats
your laundry instead.

This comment has been minimized.

@mark-i-m

mark-i-m Feb 10, 2017

Contributor

Somehow, this seems preferable to folding my laundry at the moment...

if used with `!`.
Another drawback is that the `Inhabited` trait leaks private information about
types. Consider a type with the following definition:

This comment has been minimized.

@mark-i-m

mark-i-m Feb 10, 2017

Contributor

I am not convinced this is more serious than already-existing leaks... for example, you can already find out the size of a type. Is there any fundamental difference with this?

This comment has been minimized.

@canndrew

canndrew Feb 10, 2017

Contributor

Not really, it's the same as Sized in this regard.

This comment has been minimized.

@mark-i-m

mark-i-m Feb 10, 2017

Contributor

Ok, that makes sense 😄

Ideally, Rust's type system should have a way of talking about initializedness
statically. In the past there have been proposals for new pointer types which
could safely handle uninitialized data. We should seriously consider pursuing
one of these proposals.

This comment has been minimized.

@mark-i-m

mark-i-m Feb 10, 2017

Contributor

👍 I totally agree! This would make static muts much more powerful while still being safe.

@ranma42

This comment has been minimized.

Contributor

ranma42 commented Feb 10, 2017

@canndrew could you add the alternative from the design sprint? (add MaybeUninit<T> and deprecate mem::uninitialized)

I find it compelling, as it removes some magic from the compiler (no special checks or automatic traits for size/inhabitedness) and lets the type system just do "its thing".

@ranma42

This comment has been minimized.

Contributor

ranma42 commented Feb 10, 2017

Sorry, I just realised the option of completely deprecating mem::uninitialized is mentioned as a future direction. While it is certainly possible to make this change in smaller steps, I think it might be very interesting to assess the potential disruption of the "direct" change (would a crater run be sufficient for this?). If deemed possible, I think that would be the best course of action.

@ranma42

This comment has been minimized.

Contributor

ranma42 commented Feb 10, 2017

Are there any plans to do the same to mem::zeroed, either through this RFC or through another one? I found no relevant results searching through the RFCs repo.

@whataloadofwhat

This comment has been minimized.

whataloadofwhat commented Feb 10, 2017

I still think ?Inhabited would make uninhabited types almost unusable unless people added the ?Inhabited bound everywhere. It seems simpler and a lot less restrictive to just put T: Inhabited on ptr::read as well.

I agree with what you're saying about ?Inhabited. I don't want that either. But then I think we'd need some kind of read_or_unreachable function without a bound anyway, otherwise Vec<T> will need an Inhabited bound (at least for remove and into_iter), which will infect a LOT of code. It's genuinely unreachable code to get to ptr::read from Vec<!> anyway because the only way to do it would be to push an uninhabited value onto it in the first place. Vec<!> is harmless, so having an Inhabited bound on it would be pointless.

I'd be (more) okay with Inhabited as a regular (compiler implemented) trait but it's completely unworkable to add it onto these functions as it would just break too much code, uninitialized, zeroed and read are all stable and there's probably a lot of code out there that calls them generically. I think the best we could do to make Inhabited as a trait work (though I don't think it will be used much, this is if you want the Inhabited trait in general rather than just for these cases) is to deprecate everything which should have the bound and then provide better alternatives: mem::MaybeInitialized<T> for mem::uninitialized and mem::zeroed. ptr::read_or_unreachable<T> and ptr::read_inhabited<T: Inhabited> for ptr::read/ptr::read_volatile/ptr::read_unaligned (those last two will become pretty verbose :(). Adding the bound (unless it's ?Inhabited) onto the functions is, unfortunately, a no-go because it will cause a lot of breakage.

Myself, I think the best route is:

  • Don't have an Inhabited trait, nor ?Inhabited. I don't think T: Inhabited is the best default to enforce because I think that cases where T = ! will cause issues are rare. It will lead to a lot of unnecessary restrictions on ! or will lead to ?Inhabited everywhere. Neither of those are ideal. Similarly, I don't think there are many cases where knowing that a type is inhabited is useful, but I'm not sure.
  • Deprecate uninitialized and zeroed and leave the signatures as they are. We could, perhaps, throw an error or lint at trans time if T = !, because it's highly likely to be a bug. I don't know if that's possible, but I think it's what is (was?) done for mem::transmute.
  • Add MaybeInitialized<T> to core::mem. Give it a zeroed constructor which calls memset on it. (maybe one day a const constructor by using [0u8; size_of::<T>()], but that's later). In the deprecation message for uninitialized and zeroed, direct them here.
  • Add a note onto ptr::read's docs that if T in uninhabited then the call must be unreachable.
@djzin

This comment has been minimized.

djzin commented Feb 12, 2017

Could this issue be solved with ! having no size? If the size of a type is defined to be ceil(log(n)) where n is the number of possible representations, then ! should have undefined size, or, if you like, "negative infinity" size. The main issue with this is that empty enums are already defined as being sized... and also "negative infinity" is not representable in a usize... but just a thought I had ;)

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Feb 12, 2017

@newpavlov

This comment has been minimized.

newpavlov commented Aug 11, 2018

we could declare that &mut T to uninitialized memory is actually okay as long as you do not read from that memory.

I think we either should do this (honestly I still do not understand why creating &mut to uninitialized memory is automatically undefined behaviour and not reading from it), or we could introduce something like &out (I think there was a proposal on internals forum) to which &mut will be automatically coerced (in the same way as &mut coerces to &), but it's forbidden for &out to be coerced to & in safe code (doing so in unsafe via raw pointers is UB). Using &out will be a contract "we will only write to this memory" (thus attempts to read from it can be denied at compile time). It could be relaxed to "we will not read from it until we write (partially or fully)", but it's probably will be harder to track and enforce.

In other words we will have the following code:

// io::Read, this change is backwards compatible with the existing signature
fn read_exact(&mut self, buf: &out [u8]) -> io::Result<()>;

// user code, note that we have only one unsafe line
let mut buffer: MaybeUninit<[u8; 4096]> = MaybeUninit::uninitialized();
// `get_out` returns `&out [u8; 4096]`
file.read_exact(buffer.get_out())?;
// we are sure that buffer is fully initialized
let buffer = unsafe { buffer.into_inner() };
parse(&buffer);
@ubsan

This comment has been minimized.

Contributor

ubsan commented Aug 11, 2018

I think it's reasonable to say "an &T or &mut T points at a read-[write] object which may or may not be valid", from a UB level, and only check for validity of the value at load time - i.e.,

fn foo1(x: &i32) { // not UB to pass &uninit
}
fn foo2(x: &mut i32) { // not UB to pass &mut uninit
  ptr::write(x, 0);
}
fn foo3(x: &i32) {
  *x; // maybe UB if one passes &uninit?
  // (we may choose to guarantee that only the lvalue -> value conversion reads the memory)
}
fn foo4(x: &i32) {
  *x == 0; // definitely UB if one passes &uninit, imo
}

Where this gets interesting is something like

fn foo(x: &&i32, y: bool) -> i32 {
  *x;
  if (y) {
    **x // are we allowed to speculatively load **x ahead of time?
  } else {
    0
  }
}
@pftbest

This comment has been minimized.

pftbest commented Aug 11, 2018

@newpavlov Having an &out parameters is a neat idea. Maybe if we would have something like this in Rust, then there will be no need for a hacks like prepare_uninitialized_buffer to exist.

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Aug 11, 2018

The Rust reference says:

Shared references (&): These point to memory owned by some other value.

Surprisingly, that is what most Rust users actually need to know about references: &T always refers to a valid T. That's pretty much it.

Also, per definition, mem::uninitialized() does not initialize memory, which I mentally translate as "the memory contains no value / object". I don't think this is clearly written anywhere in the reference, but we should come up with one liner explanations of what these dangerous features actually do.

@newpavlov

I still do not understand why creating &mut to uninitialized memory is automatically undefined behaviour

It just follows from those two very simple rules. The memory you get from uninitialized contains no value and &T must refer to a valid value of T. Therefore, one cannot create a &T to uninitialized memory. When put it that way, does it really feel like it does not make sense?

I find it almost unbelievable how simple these two rules are, and how much follows from them: &Enum means enum discriminant points to a valid variant, &T the memory is properly aligned so you can use ptr::read instead of ptr::read_unaligned, the object is initialized, &Trait points to a valid object and vtable so you can call methods on it without having to check that the vtbl pointer is... null... or whatever...

I don't see how we could allow &T to point to uninitialized memory without making a pretty big backwards incompatible language change: all code that expects a &T and doesn't guard against T potentially being uninitialized would be at risk at best. Could it become broken or incorrect? No idea.

That can be a pretty big time bomb considering how people interpret uninitialized as "this memory can have any bit-pattern" from which many conclude that "if I am lucky enough to guard against the bit-pattern that causes UB, my program won't have UB". But in practice, things are more subtle, and more often than not if one single bit pattern can trigger undefined behavior then undefined behavior is always invoked.

IMO the current basic definition of &T is simple and pretty reasonable for most Rust programmers.
Making their lives harder so that we can do what? Pass uninitialized memory across APIs more easily, e.g. via slices instead of raw pointers ? That feels like a pretty bad trade-off to me. Whatever we do here, shouldn't IMO impact people not using uninitialized in such a significant way.

@Thomasdezeeuw

This comment has been minimized.

Thomasdezeeuw commented Aug 11, 2018

For the reading: maybe Initializer (RFC read_initializer rust-lang/rust#42788), can provide a method which takes a uninitialized buffer and optionally initializes it. Something like fn initialize(&self, buf: &mut MaybeUninit<[u8]>) -> &mut [u8].

@newpavlov

This comment has been minimized.

newpavlov commented Aug 11, 2018

@gnzlbg

It just follows from those two very simple rules. The memory you get from uninitialized contains no value and &T must refer to a valid value of T. Therefore, one cannot create a &T to uninitialized memory. When put it that way, does it really feel like it does not make sense?

It's just that in my understanding, real UB here is creation of &T to uninitialized memory, or if we are being even more specific reading from an uninitialized memory. (IIUC there shouldn't be any problems in creation of &T from which we will never read) So in this lower-level interpretation it's just one simple rule: "don't read from uninitialized memory".

I guess it's about trade-off between higher level rules which are easier to track and verify, but with annoying false-positives (as in the example with read_exact), and lower-level rules which are much harder to track, because you must know exactly what called functions are doing and that they will not read from &mut T which points to uninitialized memory. So if you use third-party code UB can be potentially introduced by a simple library update, with a simple read from &mut T, which is of course completely safe.

Probably the most optimal solution will be to introduce &out which will allow us to enforce behaviour of programs using compiler checks, and not with unsafe guidelines and documentation. This way we will be able to keep higher level rules and solve some of the false-positives.

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Aug 11, 2018

It's just that in my understanding, real UB here is creation of &T to uninitialized memory,

This is correct, doing so violates the contract of &T.

or if we are being even more specific reading from an uninitialized memory. [...] So in this lower-level interpretation it's just one simple rule: "don't read from uninitialized memory".

I think this is incorrect. As long as you don't violate any of Rust contracts, you have no undefined behavior. That &T must always refer to a valid T is one contract.

The following contract is not written anywhere AFAIK, and I am not a memory model person so there might be a million things wrong about the following contract, but I'd expect every Rust memory model to somehow specify it in one way or another:

The execution of Rust programs does not depend on the contents of uninitialized memory.

Copying uninitialized memory from one memory location to another does not violate this contract because it does not make the execution of the program depend on the content of the memory. Using the memory in any way that affects the execution of the program does, however, invoke undefined behavior.

I agree with you that creating a &T to uninitialized memory does not make the execution of the program depend on the contents of that memory. It does, however, break a different Rust contract: that &T always refer to a valid T.

Rust has many contracts, and all of them must hold for a program to have defined behavior. These are / could just be two of them.

@ubsan

This comment has been minimized.

Contributor

ubsan commented Aug 11, 2018

@gnzlbg I would argue that code like @newpavlov showed is common enough that we should think about not making it UB - there should be a distinction here between what is UB, and what one is allowed to assume. &T should, in general, be assumed to point at an object with a valid value, but it might be that the language shouldn't assume such without a read.

@newpavlov

This comment has been minimized.

newpavlov commented Aug 11, 2018

Copying uninitialized memory from one memory location to another is therefore not undefined behavior under this model because the execution of the Rust program is not affected by the contents of the memory.

But before it you can safely create &T to a previously initialized destination memory, and after such copy this pointer becomes source of UB, as later you can do whatever you want with the previously valid &T, so this interpretation does not look good to me.

The execution of Rust programs does not depend on the contents of uninitialized memory.

I think that this quote is mostly about branching dependent on UB. With "don't read from uninitialized memory" rule this quote is still true, because program execution can't depend on something which is never read, but doing copy from uninitialized memory becomes UB.

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Aug 11, 2018

@ubsan

there should be a distinction here between what is UB, and what one is allowed to assume. &T should, in general, be assumed to point at an object with a valid value, but it might be that the language shouldn't assume such without a read.

I agree, we could do that, but I am unconvinced on whether that is worth it the increased complexity.

@newpavlov

But before it you can safely create &T to a previously initialized destination memory, and after such copy this pointer becomes source of UB,

Yes. If you write uninitialized memory via a pointer somewhere, and that breaks the contract of a reference, that is instant undefined behavior in my mental model:

let mut x = 0_i32;
let y= &mut x; 
let z = y as *mut i32;
drop(y); // not dropping y here would be UB
ptr::write(z, uninitialized());
@ubsan

This comment has been minimized.

Contributor

ubsan commented Aug 11, 2018

@gnzlbg It seems to me that writing an uninit value to an object is where the UB should happen, not just because you happen to have a reference that points to that value (especially since drop(z: &T) is a no-op (and frankly drop(z: &mut i32) should also probably be a no-op))

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Aug 11, 2018

@ubsan That makes sense. I guess one would need to drop the object instead, the storage after the drop should be uninit, so there is no need to write uninit again to it, although I guess one could do so.

@RalfJung

This comment has been minimized.

Member

RalfJung commented Aug 17, 2018

honestly I still do not understand why creating &mut to uninitialized memory is automatically undefined behaviour and not reading from it

Well, there are other people saying the exact opposite. ;) We have to make a choice either way, but for both cases we have people arguing that this is clearly the more intuitive option.

I don't see how we could allow &T to point to uninitialized memory without making a pretty big backwards incompatible language change: all code that expects a &T and doesn't guard against T potentially being uninitialized would be at risk at best. Could it become broken or incorrect? No idea.

No, there would be no incompatible change. As @ubsan mentioned, we have two invariants at play here: Which assumptions the compiler can make for its optimizations, and which assumptions safe code can make for values it sees. There is no a priori reason to think these are the same, and in fact I think it is rather impossible to make them the same. I have a blogpost upcoming for this that should hopefully be done no later than Monday...

But just one example: A safe higher-order function which takes an argument f: fn(&i32) -> &i32 can assume that f can be called with any shared reference. So following your "violating safe code assumptions is insta-UB", it would be UB to have a function which does not have this property. On the other hand, many libraries have private functions not marked unsafe that actually are de-facto unsafe because they make extra assumptions that are guaranteed by the surrounding module. So your proposal makes all those libraries UB.

There are other problems as well. For example, the invariant that may be assumed by safe code is impossible to check for because it is frequently not computable (as in, would require solving the halting problem). I think we should have a definition of UB that can, at least in principle, be checked.

japaric added a commit to japaric/heapless that referenced this pull request Aug 19, 2018

@japaric japaric referenced this pull request Aug 19, 2018

Merged

internally use MaybeUninit #55

bors bot added a commit to japaric/heapless that referenced this pull request Aug 19, 2018

Merge #55
55: internally use MaybeUninit r=japaric a=japaric

which has been proposed in rust-lang/rfcs#1892

Co-authored-by: Jorge Aparicio <jorge@japaric.io>
@rfcbot

This comment has been minimized.

rfcbot commented Aug 19, 2018

The final comment period, with a disposition to merge, as per the review above, is now complete.

@Centril Centril merged commit 21f887f into rust-lang:master Aug 19, 2018

@Centril

This comment has been minimized.

Contributor

Centril commented Aug 19, 2018

Huzzah! This RFC has been merged!

Tracking issue: rust-lang/rust#53491

XOSplicer added a commit to XOSplicer/heapless that referenced this pull request Aug 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment