New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 1892, "Deprecate uninitialized in favor of a new MaybeUninit type" #53491

Open
Centril opened this Issue Aug 19, 2018 · 179 comments

Comments

Projects
None yet
@Centril
Copy link
Contributor

Centril commented Aug 19, 2018

This is a tracking issue for the RFC "Deprecate uninitialized in favor of a new MaybeUninit type" (rust-lang/rfcs#1892).

Steps:

Unresolved questions:

  • Should we have a safe setter that returns an &mut T?
  • Should we rename MaybeUninit? (#56138)
  • Should we rename into_inner? Should it be more like take instead and take &mut self?
  • Should MaybeUninit<T> be Copy for T: Copy?
  • Should we allow calling get_ref and get_mut (but not reading from the returned references) before data got initialized? (AKA: "Are references to uninitialized data insta-UB, or only UB when being read from?") If no, should we rename it similar to into_inner?
  • Can we make into_inner (or whatever it ends up being called) panic when T is uninhabited, like mem::uninitialized does currently? (done)
@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Aug 19, 2018

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Aug 19, 2018

[ ] Implement the RFC

I can help implement the RFC.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 19, 2018

Awesome, I can help reviewing :)

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Aug 19, 2018

I'd like some clarification on this part of the RFC:

Make calling uninitialized on an empty type trigger a runtime panic which also prints the deprecation message.

Should only mem::uninitialized::<!>() panic? Or should this also cover structs (and maybe enums?) that contain the empty type (e.g. (!, u8))?

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 19, 2018

AFAIK we only do the really harmful code generation for !. Most other uses of mem::uninitialized are just as incorrect, but the compiler does not happen to exploit them.

So I'd do it for ! only, but also for mem::zeroed. (I forgot to amend that part when I added zeroed to the RFC, it seems.)

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Aug 19, 2018

We could start off by making this:

"init" => {
let ty = substs.type_at(0);
if !cx.layout_of(ty).is_zst() {
// Just zero out the stack slot.
// If we store a zero constant, LLVM will drown in vreg allocation for large data
// structures, and the generated code will be awful. (A telltale sign of this is
// large quantities of `mov [byte ptr foo],0` in the generated code.)
memset_intrinsic(bx, false, ty, llresult, C_u8(cx, 0), C_usize(cx, 1));
}
return;
}
// Effectively no-ops
"uninit" => {
return;
}

check whether fn_ty.ret.layout.abi is Abi::Uninhabited and at the very least emit a trap, e.g.:

// Allow RalfJ to sleep soundly knowing that even refactorings that remove
// the above error (or silence it under some conditions) will not cause UB
let fnname = bx.cx.get_intrinsic(&("llvm.trap"));
bx.call(fnname, &[], None);

Once you've seen the trap (i.e. intrinsics::abort) in action, you can see if there's any nice way of triggering a panic. It'' be tricky because of unwinding, we'll need to special-case them here:

let intrinsic = intrinsic.as_ref().map(|s| &s[..]);
if intrinsic == Some("transmute") {

To actually panic, you'd need something like this:

// Get the location information.
let loc = bx.sess().codemap().lookup_char_pos(span.lo());
let filename = Symbol::intern(&loc.file.name.to_string()).as_str();
let filename = C_str_slice(bx.cx, filename);
let line = C_u32(bx.cx, loc.line as u32);
let col = C_u32(bx.cx, loc.col.to_usize() as u32 + 1);
let align = tcx.data_layout.aggregate_align
.max(tcx.data_layout.i32_align)
.max(tcx.data_layout.pointer_align);
// Put together the arguments to the panic entry point.
let (lang_item, args) = match *msg {
EvalErrorKind::BoundsCheck { ref len, ref index } => {
let len = self.codegen_operand(&mut bx, len).immediate();
let index = self.codegen_operand(&mut bx, index).immediate();
let file_line_col = C_struct(bx.cx, &[filename, line, col], false);
let file_line_col = consts::addr_of(bx.cx,
file_line_col,
align,
Some("panic_bounds_check_loc"));
(lang_items::PanicBoundsCheckFnLangItem,
vec![file_line_col, index, len])
}
_ => {
let str = msg.description();
let msg_str = Symbol::intern(str).as_str();
let msg_str = C_str_slice(bx.cx, msg_str);
let msg_file_line_col = C_struct(bx.cx,
&[msg_str, filename, line, col],
false);
let msg_file_line_col = consts::addr_of(bx.cx,
msg_file_line_col,
align,
Some("panic_loc"));
(lang_items::PanicFnLangItem,
vec![msg_file_line_col])
}
};
// Obtain the panic entry point.
let def_id = common::langcall(bx.tcx(), Some(span), "", lang_item);
let instance = ty::Instance::mono(bx.tcx(), def_id);
let fn_ty = FnType::of_instance(bx.cx, &instance);
let llfn = callee::get_fn(bx.cx, instance);
// Codegen the actual panic invoke/call.
do_call(self, bx, fn_ty, llfn, &args, None, cleanup);

(you can ignore the EvalErrorKind::BoundsCheck arm)

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Aug 19, 2018

@eddyb Thanks for the pointers.


I'm now fixing (several) deprecation warnings and I feel (very) tempted to just run sed -i s/mem::uninitialized()/mem::MaybeUninit::uninitialized().into_inner()/g but I guess that would miss the point ... Or is that OK if I know that the value is a concrete (Copy) type? e.g. let x: [u8; 1024] = mem::uninitialized();.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 19, 2018

That would exactly miss the point, yeah.^^

At least for now, I would like to consider mem::MaybeUninit::uninitialized().into_inner() UB for all non-union types. Notice that Copy is certainly not sufficient; both bool and &'static i32 are Copy and your snippet is intended to be insta-UB for them. We may want an exception for "types where all bit patterns are okay" (integer types, essentially), but I would be opposed to making such an exception because undef is not a normal bit pattern. That's why the RFC says you need to fully initialize before calling into_inner.

It also says that for get_mut, but the RFC discussion brought up desired by some folks to relax the restriction here. That's an option I could live with. But not for into_inner.

I'm afraid all these uses of uninitialized will have to be more carefully reviewed, and in fact this was one of the intents of the RFC. We'd like the wider ecosystem to be more careful here, if everyone just uses into_inner immediately then the RFC was worthless.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Aug 19, 2018

We'd like the wider ecosystem to be more careful here, if everyone just uses into_inner immediately then the RFC was worthless.

This gives me an idea... perhaps we should lint (group: "correctness") for this sort of code? cc @oli-obk

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Aug 19, 2018

I'm now fixing (several) deprecation warnings

We should only ship Nightly with those warnings once the recommended replacement is available at least on Stable. See similar discussion at #52994 (comment)

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Aug 20, 2018

@RalfJung

We may want an exception for "types where all bit patterns are okay" (integer types, essentially)

You've participated in discussion about this before, but I'll post here to circulate more widely: this is already something we have many existing use-cases for in Fuchsia, and we have a trait for this (FromBytes) and a derive macro for these types. There was also an internals Pre-RFC for adding these to the standard library (cc @gnzlbg @joshlf).

I would be opposed to making such an exception because undef is not a normal bit pattern.

Yeah, this is an aspect in which mem::zeroed() is significantly different from mem::uninitialized().

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Aug 20, 2018

@cramertj

You've participated in discussion about this before, but I'll post here to circulate more widely: this is already something we have many existing use-cases for in Fuchsia, and we have a trait for this (FromBytes) and a derive macro for these types. There was also an internals Pre-RFC for adding these to the standard library (cc @gnzlbg @joshlf).

Those discussions were about ways of allowing safe memcpys across types, but I think that's pretty much orthogonal to whether the memory being copied is initialized or not - if you put uninitialized memory in, you get uninitialized memory out.

The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.

IIRC, nobody there suggested or discussed any approach in which you could put uninitialized memory in and get initialized memory out, so I don't follow what those discussions have to do with this one. To me they are completely orthogonal.

@joshlf

This comment has been minimized.

Copy link
Contributor

joshlf commented Aug 21, 2018

To drive the point home a bit more, LLVM defines uninitialized data as Poison, which is distinct from "some arbitrary but valid bit pattern." Branching based on a Poison value or using it to compute an address which is then dereferenced is UB. So, unfortunately, "types where all bit patterns are okay" are still not safe to construct because using them without separately initializing them will be UB.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Aug 21, 2018

Right, sorry, I should have clarified what I meant. I was trying to say that "types where all bit patterns are okay" is already something that we're interested in defining for other reasons. Like @RalfJung said above,

I would be opposed to making such an exception because undef is not a normal bit pattern.

@joshlf

This comment has been minimized.

Copy link
Contributor

joshlf commented Aug 21, 2018

Thank god there are people who can read, because apparently I can't...

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 21, 2018

Right, so what I meant to say is: We definitely have types where all initialized bit patterns are okay -- all the i* and u* types, raw pointers, I think f* as well and then tuples/structs only consisting of such types.

What is an open question is under which circumstances which of these types are allowed to be uninitialized, i.e., poison. My own preferred answer is "never".

The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.

Reading padding bytes as MaybeUninit<u8> should be fine.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Aug 21, 2018

The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.

Reading padding bytes as MaybeUninit should be fine.

The discussion in a nutshell was about providing a trait, Compatible<T>, with a safe method fn safe_transmute(self) -> T that "reinterprets"/"memcpys" the bits of self into a T. The guarantee of this method is that if self is properly initialized, so is the resulting T. It was proposed for the compiler to fill in transitive implementations automatically, e.g., if there is an impl Compatible<V> for U, and an impl Compatible<W> for V then there is an impl Compatible<W> for U (either because it was provided manually, or the compiler auto generates it - how this could be implemented was completely handwaved).

It was proposed that it should be unsafe to implement the trait: if you implement it for a T that has padding bytes where Self has fields, then everything is fine at least until you try to use the T and your program behavior ends up depending on the contents of the uninitialized memory.

I have no idea what any of this has to do with MaybeUninit<u8>, maybe you could elaborate on that?

The only thing I can imagine is that we could add a blanket impl: unsafe impl<T> Compatible<[MaybeUninit<u8>; size_of::<T>()]> for T { ... } since transmuting any type into a [MaybeUninit<u8>; N] of its size is safe for all types. I don't know how useful such an impl would be, given that MaybeUninit is an union, and whoever uses the [MaybeUninit<u8>; N] has no idea of whether a particular element of the array is initialized or not.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 21, 2018

@gnzlbg back then you were talking about FromBits<T> for [u8]. That is where I say we have to use [MaybeUninit<u8>] instead.

@joshlf

This comment has been minimized.

Copy link
Contributor

joshlf commented Aug 21, 2018

I discussed this proposal with @nikomatsakis at RustConf, and he encouraged me to go forward with an RFC. I was going to do it in a few weeks, but if there's interest, I can try getting one done this weekend. Would that be useful for this discussion?

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 21, 2018

@joshlf which proposal are you talking about?

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Aug 21, 2018

@RalfJung

@gnzlbg back then you were talking about FromBits for [u8]. That is where I say we have to use [MaybeUninit] instead.

Gotcha, fully agree here. Had completely forgotten that we also wanted to do that 😆

@joshlf

This comment has been minimized.

Copy link
Contributor

joshlf commented Aug 21, 2018

@joshlf which proposal are you talking about?

A FromBits/IntoBits proposal. TLDR: T: FromBits<U> means that any bit pattern which is a valid U corresponds to a valid T. U: IntoBits<T> means the same thing. The compiler automatically infers both for all pairs of types given certain rules, and this unlocks lots of fun goodness that currently requires unsafe. There's a draft of this RFC here that I wrote a while back, but I intend to change large parts of it, so don't take that text as anything more than a rough guide.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 21, 2018

@joshlf I think such a pair of traits would more build on top of this discussion than be part of it. AFAIK we have two open questions in terms of validity:

  • Does it recurse below references? I more and more strongly think it should not, as we see more examples. So likely we should adapt the MaybeUninit::get_mut docs accordingly (it is not actually UB to use that before completing initialization, but it is UB to dereference it before completing initialization). However, we first have to make that decision for validity, and I am not sure what the right venue is for that. Probably a dedicated RFC?
  • Does a u8 (and other integer types, floating point, raw pointer) have to be initialized, i.e., is MaybeUinit<u8>::uninitialized().into_inner() insta-UB? I think so, but mostly based on a gut feeling that we want to keep the places where we allow poison/undef to a minimum. However, I could be persuaded otherwise if there are plenty of uses of this pattern (and I hope to use miri to help determining this).
@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Dec 29, 2018

Maybe we should rename it to MaybeInvalid or something like that to better convey the problem it solves and the dangers it avoids.

Bikeshed in #56138.

@mjbshaw

This comment has been minimized.

Copy link
Contributor

mjbshaw commented Dec 30, 2018

@gnzlbg

there are some "native" types (e.g. bool

As long as bool is FFI-safe (which it generally is considered to be, despite RFC 954 being rejected and then unofficially-officially accepted), it should be safe to use mem::zeroed for it.

, &T, etc.) for which mem::zeroed invokes undefined behavior.

Yes, but these types that have UB for mem::zeroed also have UB for MaybeUninit::zeroed().into_inner() (I was careful to intentionally include .into_inner() in my original comment). MaybeUninit adds nothing if the user just immediately calls .into_inner() (which is precisely what I and many others would do if mem::zeroed was deprecated, because I'm only using mem::zeroed for types which are zero-safe).

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Dec 30, 2018

As long as bool is FFI-safe (which it generally is considered to be, despite RFC 954 being rejected and then unofficially-officially accepted), it should be safe to use mem::zeroed for it.

I didn't wanted to get into the specifics of this, but bool is FFI-safe in the sense that it is defined to be equal to C's _Bool. However, the true and false values of C's _Bool are not defined in the C standard (although they might be some day, maybe in C20), so whether mem::zeroed creates a valid bool or not is technically implementation-defined.

Yes, but these types that have UB for mem::zeroed also have UB for MaybeUninit::zeroed().into_inner() (I was careful to intentionally include .into_inner() in my original comment). MaybeUninit adds nothing if the user just immediately calls .into_inner() (which is precisely what I and many others would do if mem::zeroed was deprecated, because I'm only using mem::zeroed for types which are zero-safe).

I don't really understand which point you are trying to make here.MaybeUninit adds the option of calling or not calling into_inner, which mem::zeroed doesn't have, and there is value in that since that is the operation that can introduce undefined behavior (constructing the union as uninitialized or zeroed is safe).

Why would anyone blindly translate mem::zeroed to MayeUninit+into_inner ? That is not the appropriate way to "fix" the deprecation warning of mem::zeroed, and silencing the deprecation warning has the same effect and a much lower cost.

The appropriate way of moving from mem::zeroed to MaybeUninit is to evaluate whether it is safe to call into_inner, in which case one can just do so and write a comment explaining why that is safe, or just keep working with MaybeUninit as an union until calling into_inner becomes safe (one might need to change a lot of code until that's the case, do API breaking changes to return MaybeUninit instead of Ts, etc.).

@mjbshaw

This comment has been minimized.

Copy link
Contributor

mjbshaw commented Dec 30, 2018

I didn't wanted to get into the specifics of this, but bool is FFI-safe in the sense that it is defined to be equal to C's _Bool. However, the true and false values of C's _Boolare not defined in the C standard (although they might be some day, maybe in C20), so whethermem::zeroedcreates a validbool` or not is technically implementation-defined.

Apologies for continuing the tangent, but C11 requires that all-bits-set-to-zero represents the value 0 for integer types (see section 6.2.6.2 "Integer types", paragraph 5) (which includes _Bool). Additionally, the values of true and false are explicitly defined (see the section 7.18 "Boolean type and values <stdbool.h>").

I don't really understand which point you are trying to make here.MaybeUninit adds the option of calling or not calling into_inner, which mem::zeroed doesn't have, and there is value in that since that is the operation that can introduce undefined behavior (constructing the union as uninitialized or zeroed is safe).

There is value in MaybeUninit and MaybeUninit::zeroed. We both agree on that. I'm not arguing for MaybeUninit::zeroed to be removed. My point is that there is also value in std::mem::zeroed.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Dec 30, 2018

There are some types for which mem::uninitialized is perfectly safe (e.g. unit), while there are some "native" types (e.g. bool, &T, etc.) for which mem::zeroed invokes undefined behavior.

This is a red herring. Just because both zeroed and uninitialized are valid for some subset of types doesn't make them comparable in actual use. You need to look at the size of those subsets. The number of types for which mem::uninitialized is valid is very very small (in fact, is it only zero-sized types?), and no one would actually write code that does that (e.g. for ZSTs you would just use the type constructor). On the other hand, there are many many types for which mem::zeroed is valid. mem::zeroed is valid for at least the following types (hope I got this right):

  • all integer types (including bool, as mentioned above)
  • all raw pointer types
  • Option<T> where T triggers enum layout optimization. T includes:
    • NonZeroXXX (all integer types)
    • NonNull<U>
    • &U
    • &mut U
    • fn-pointers
    • any array of any type in this list
    • any struct where any field is a type in this list.
  • Any array, struct, or union consisting only of types in this list.

Yes, both uninitialized and zeroed deal with potentially-invalid values. However, programmers use these primitives in very different ways.

The common pattern for mem::uninitialized is:

let val = MaybeUninit::uninitialized();
initialize_value(val.get_mut()); // or val.as_mut_ptr, or val.set
val.into_inner()

If you are not writing your use of uninitialized values this way, you are most likely making a big mistake.

The most common use of mem::zeroed today is for types described above, and this is perfectly valid. I completely agree with @bluss that I don't see any footgun-prevention gain by replacing mem::zeroed() everywhere by MaybeUninit::zeroed().into_inner().

To summarize, common use of uninitialized is for types for which can have invalid values. Common use of zeroed is for types which are valid if zeroed.

A Zeroed trait or similar (e.g. Pod, but note that T: Zeroed does not imply T: Pod) as has been suggested seems like a fine thing to add in the future, but let's not deprecate fn zeroed<T>() -> T until we actually have a stable fn zeroed2<T: Zeroed>() -> T.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Dec 30, 2018

@mjbshaw

Apologies for continuing the tangent, but C11 requires that

Indeed! It's only C++'s bool which leaves the valid values unspecified! Thanks for correcting me, gonna send a PR to the UCG with this guarantee.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Dec 30, 2018

@jethrogb

You need to look at the size of those subsets. The number of types for which mem::uninitialized is valid is very very small (in fact, is it only zero-sized types?), and no one would actually write code that does that (e.g. for ZSTs you would just use the type constructor).

It's not even correct for all ZSTs if you factor in privacy with which it's possible to have ZSTs as a sort of "proof of work" or "token for resource" or just "proof witness" in general. A trivial example:

mod refl {
    use core::marker::PhantomData;
    use core::mem;

    /// Having an object of type `Id<A, B>` is a proof witness that `A` and `B`
    /// are nominally equal type according to Rust's type system.
    pub struct Id<A, B> {
        witness: PhantomData<(
            // Make sure `A` is Id is invariant wrt. `A`.
            fn(A) -> A,
            // Make sure `B` is Id is invariant wrt. `B`.
            fn(B) -> B,
        )>
    }

    impl<A> Id<A, A> {
        /// The type `A` is always equal to itself.
        /// `REFL` provides a proof of this trivial fact.
        pub const REFL: Self = Id { witness: PhantomData };
    }

    impl<A, B> Id<A, B> {
        /// Casts a value of type `A` to `B`.
        ///
        /// This is safe because the `Id` type is always guaranteed to
        /// only be inhabited by `Id<A, B>` types by construction.
        pub fn cast(self, value: A) -> B {
            unsafe {
                // Transmute the value;
                // This is safe since we know by construction that
                // A == B (including lifetime invariance) always holds.
                let cast_value = mem::transmute_copy(&value);
        
                // Forget the value;
                // otherwise the destructor of A would be run.
                mem::forget(value);
        
                cast_value
            }
        }
    }
}

fn main() {
    use core::mem::uninitialized;

    // `Id<?A, ?B>` is a ZST; let's make one out of thin air:
    let prf: refl::Id<u8, String> = unsafe { uninitialized() };

    // Segfault:
    let _ = prf.cast(42u8);
}
@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Dec 30, 2018

@Centril this is kind of a tangent, but I'm not sure if your code is actually an example of a type for which calling uninitialized creates an invalid value. You're using unsafe code to violate the internal invariants that Id is supposed to uphold. There are many ways to do this, for example transmute(()), or type-casting raw pointers.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Dec 30, 2018

@jethrogb My only points are that a) please be more careful with wording, b) privacy doesn't seem sufficiently reasoned about in discussions about what valid values even are. It seems to me that "violate the internal invariants" and "invalid value" are the same thing; there's a side-condition here "if A != B then Id<A, B> is uninhabited.".

@rkruppe

This comment has been minimized.

Copy link
Contributor

rkruppe commented Dec 30, 2018

It seems to me that "violate the internal invariants" and "invalid value" are the same thing; there's a side-condition here "if A != B then Id<A, B> is uninhabited.".

Invariants "imposed by library code" are different from invariants "imposed by the compiler" in several ways, see @RalfJung's blog post about the topic. In that terminology, your Id example has a safety invariant and mem::zeroed or other ways to generically synthesize a Id<A, B> cannot be safe, but it is not is not immediate UB to just construct a wrong Id value with mem::zeroed or mem::uninitialized because Id has no validity invariant. While unsafe code authors certainly need to keep both kinds of invariants in mind, there are some reasons why this discussions mostly focus on validity:

  • The safety invariants are user-defined, rarely formalized, and can be arbitrary complicated, so there is little hope of reasoning generically about them or the compiler/language helping with upholding any particular safety invariant.
  • Breaking the safety invariant can occasionally be needed (internally within a sound library), so even if we could mechanically rule out mem::zeroed::<T>() based on T's safety invariant, we may not want to.
  • Relatedly, the consequences of broken validity invariants are in some ways worse than a broken safety invariant (less chance to debug it because all hell breaks loose immediately, and often the actual behavior resulting from the UB is less comprehensible because all of the compiler and optimizer factors into it, while the safety invariant is only directly exploited by code in the same module/crate).
@scottjmaddox

This comment has been minimized.

Copy link

scottjmaddox commented Jan 2, 2019

After reading @jethrogb's comment, I agree that mem::zeroed should not be deprecated with the introduction of MaybeUninit.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Jan 2, 2019

@jethrogb Small nit:

any array of any type in this list
any struct where any field is a type in this list.

Not sure if this is a simple typo or a semantic difference, but I think you need to out-dent these two bullets-- I don't believe it's necessarily the case that None of e.g. Option<[&u8; 2]> has bitwise-zeros as a valid representation (it could e.g. use [0, 24601] as the representation of the None case-- only one of the inner values must take on a niche representation -- cc @eddyb to check me on this). I doubt we do this today, but it doesn't seem completely impossible that something like this could appear in the future.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Jan 2, 2019

@jethrogb

The most common use of mem::zeroed today is for types described above, and this is perfectly valid.

Is there a source for this?

On the other hand, there are many many types for which mem::zeroed is valid.

There are also infinitely many cases for which it can be used incorrectly.

I understand that for those using mem::zeroed heavily and correctly, delaying the deprecation until a more ergonomic solution is available is a very appealing alternative.

I prefer the trade-off of reducing or eliminating the number of incorrect usages of mem::zeroed even if that incurs a temporary ergonomic cost. A deprecation warns users that what they are doing does potentially invoke undefined behavior (particularly new users which use it for the first time), and we have a sound solution to what to do instead, which makes the warning actionable.

I use MaybeUninit often and it is less ergonomic to use than mem::zeroed and mem::uninitialized, but it hasn't been painfully unergonomic for me. If MaybeUninit is as painful as some comments in this discussion claim, then a library and / or RFC for a safe mem::zeroed alternative will pop up in no time (nothing is blocking anyone here AFAICT).

Alternatively, users can ignore the warning and keep using mem::zeroed, that's up to them, we can't ever remove mem::zeroed from libcore anyways.

But people using mem::zeroed heavily should be actively inspecting if all their usages are correct anyways. Particularly those using mem::zeroed heavily, those using it in generic code, those using it as a "less scary" alternative to mem::uninitialized, etc. Delaying the deprecation just delays warning users that what they are doing might be undefined behavior.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Jan 4, 2019

@bluss

When removing zeroed it seems like it is only replaced by MaybeUninit::zeroed().into_inner() which becomes an equivalent way to write the same thing. There is no practical change. With uninit values we instead have the practical change of all uninitialized data being kept stored in values of the type MaybeUninit or equivalent union.

This is true when we are talking about integers, but once we look at e.g. reference types, mem::zeroed() becomes a problem as well.

However, I agree that it is much more likely that people will actually realize that mem::zeroed::<&T>() is a problem, than people realizing that mem::uninitialized::<bool>() is a problem. So maybe it makes sense to keep mem::zeroed().

Notice, however, that we might still decide that mem::uninitialized::<u32>() is fine -- if we allow uninitialized bits in integer types, mem::uninitialized() becomes valid for almost all "POD types". I don't think we should allow this, but we still have to have this discussion.

The number of types for which mem::uninitialized is valid is very very small (in fact, is it only zero-sized types?), and no one would actually write code that does that (e.g. for ZSTs you would just use the type constructor).

FWIW, some slice iterator code actually has to create a ZST in generic code without being able to write a type constructor. It uses mem::zeroed()/MaybeUninit::zeroed().into_inner() for that.

@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Jan 4, 2019

mem::zeroed() is useful for certain FFI cases where you are expected to zero a value with memset(&x, 0, sizeof(x)) before calling a C function. I think this is a sufficient reason to keep it undeprecated.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Jan 4, 2019

@Amanieu That seems unnecessary. The Rust construct matching memset is write_bytes.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Jan 4, 2019

mem::zeroed() is useful for certain FFI cases

Also, the last time I checked, mem::zeroed was the idiomatic way to initialize libc structures with private or platform-dependent fields.

@rkruppe

This comment has been minimized.

Copy link
Contributor

rkruppe commented Jan 4, 2019

@RalfJung The full code in question is usually Type x; memset(&x, 0, sizeof(x)); and the first part doesn't have a great Rust equivalent. Using MaybeUninit for this pattern is a lot of line noise (and much worse codegen without optimizations) when the memory is never actually invalid after the memset.

@nicoburns

This comment has been minimized.

Copy link

nicoburns commented Jan 10, 2019

I have a question about the design of MaybeUninit: Is there any way to write to a single field of the T contained inside a MaybeUninit<T> such that you could over time write to all of the fields and end up with a valid/initialized type?

Suppose we have a struct like the following:

// Let us suppose that Foo can in principle be any struct containing arbitrary types
struct Foo {bar: bool, baz: String}

Does generating an &mut Foo reference, and then writing to it trigger UB?

main () {
    let uninit_foo = MaybeUninitilized::<Foo>::uninitialized();
    unsafe { *uninit_foo.get_mut().bar = true; }
    unsafe { *uninit_foo.get_mut().baz = "hello world".to_owned(); }
}

Does using a raw pointer instead of a reference avoid this problem?

main () {
    let uninit_foo = MaybeUninitilized::<Foo>::uninitialized();
    unsafe { *uninit_foo.as_mut_pointer().bar = true; }
    unsafe { *uninit_foo.as_mut_pointer().baz = "hello world".to_owned(); }
}

Or is there any other way in which this pattern can be implemented without triggering UB? Intuitively, it seems to me that as long as I'm not reading uninitialized/invalid memory, then everything should be fine, but several of the comments in this thread lead me to doubt that.

My use case for this functionality would be for an in-place builder pattern for types where some of the fields are required to be specified by the user (and don't have a sensible default), but some of the field do have a default value.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Jan 11, 2019

Is there any way to write to a single field of the T contained inside a MaybeUninit such that you could over time write to all of the fields and end up with a valid/initialized type?

Yes. Use

ptr::write(&mut *(uninit.as_mut_ptr()).bar, val1);
ptr::write(&mut *(uninit.as_mut_ptr()).baz, val2);
...

You must not use get_mut() for this, that's why the docs for get_mut say that the value must be initialized before calling this method. We might relax that rule in the future, that is being discussed at https://github.com/rust-rfcs/unsafe-code-guidelines/.

@scottjmaddox

This comment has been minimized.

Copy link

scottjmaddox commented Jan 11, 2019

@RalfJung Wouldn't *(uninit.as_mut_ptr()).bar = val1; risk dropping the value previously in bar, which might be uninitialized? I think it's necessary to do

ptr::write(&mut (*uninit.as_mut_ptr()).bar, val1);
@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Jan 11, 2019

@scottjmaddox ah, right. I forgot about Drop. I will update the post.

@HeroicKatora

This comment has been minimized.

Copy link

HeroicKatora commented Jan 12, 2019

In what way does this variant of writing to uninitialized fields exhibit less undefined behaviour than get_mut()? At the code point where the first argument to ptr::write is evaluated, the code has created a &mut _ to the inner field which should be just as undefined as the reference to the whole struct that would otherwise be created. Should the compiler not be allowed to assume this to be in initialized state already then?

Would that not necessitate a new pointer-projection method that does not require exposed &mut _ intermediates?


Slightly interesting example:

pub struct A { inner: bool }

pub fn init(mut uninit: MaybeUninit<A>) -> A {
    unsafe {
        let mut previous: [u8; std::mem::size_of::<bool>()] = [0];
        
        {
            // Doesn't the temorary reference assert inner was in valid state before?
            let inner_ptr: *mut _ = &mut (*uninit.as_mut_ptr()).inner;
            ptr::copy(inner_ptr as *const [u8; 1], (&mut previous) as *mut _, 1);
            
            // With the assert below, couldn't the compiler drop this?
            std::ptr::write(inner_ptr, true);
        }
        
        // Assert Inner wasn't false before, so it must have been true already!
        assert!(previous[0] != 0);
        
        // initialized all fields, good to proceed.
        uninit.into_inner()
    }
}

But if the compiler may assume &mut _ to be a valid representation, it may just outright throw away to ptr::write? If we get past the assert, the content was not 0 but the only other valid bool is true/1. So it could assume this to be a no-op if we get past the assert. Since the value is not accessed before, after reordering we could end up with this? It doesn't look like llvm exploits this right now, but I'm very unsure if this would be guaranteed.


If we instead create our own MaybeUninit within the function, we get a slightly different reality. On the playground we instead find out it assumes that the assert can never trigger, presumably as it assumes str::ptr::write is the only write to inner thus it must have happened already before we read from previous? This seems a bit fishy anyways. To support this theory, watch what happens when you change the pointer write to false instead.


I realize this tracking issue may not be best place for this question.

@nicoburns

This comment has been minimized.

Copy link

nicoburns commented Jan 12, 2019

@RalfJung @scottjmaddox Thank you for your answers. These nuances are exactly why I asked.
@HeroicKatora Yes, I was wondering about that.

Perhaps the correct incantation is this?

struct Foo {bar: bool, baz: String}

fn main () {
    let mut uninit_foo = MaybeUninit::<Foo>::uninitialized();
    unsafe { ptr::write_unaligned(&mut ((*uninit_foo.as_mut_ptr()).bar) as *mut bool, true); }
    unsafe { ptr::write_unaligned(&mut ((*uninit_foo.as_mut_ptr()).baz) as *mut String, "".to_string()); }
}

(playground)

I read a comment on Reddit (which unfortunately I can no longer find) which suggested that immediately casting a reference to a pointer (&mut foo as *mut T) actually compiles to just creating a pointer. However, the *uninit_foo.as_mut_ptr() bit worries me. Is it ok to dereference the pointer to the unitialized memory like this? We're not actually reading anything, but it's unclear to me whether the compiler knows that.

I figured the unaligned variant of ptr::write might be required for generic code over MaybeUninit<T> as not all types will have aligned fields?

@scottjmaddox

This comment has been minimized.

Copy link

scottjmaddox commented Jan 13, 2019

No need for write_unaligned. The compiler handles field alignment for you. And the as *mut bool shouldn't be necessary either, since the compiler can infer that it needs to coerce the &mut into a *mut. I think this inferred coercion is why it's safe/valid. If you want to be explicit and do as *mut _, that should be fine, too. If you want to save the pointer in a variable, then it's necessary to do coerce it into a pointer.

@mjbshaw

This comment has been minimized.

Copy link
Contributor

mjbshaw commented Jan 13, 2019

@scottjmaddox Is ptr::write still safe even if the struct is #[repr(packed)]? ptr::write says the pointer must be correctly aligned, so I assume ptr::write_unaligned is required in cases where you are writing some generic code that needs to handle packed representations (though to be honest I'm not sure I can think of an example of "generic code over MaybeUninit<T>" that wouldn't know whether the field was properly aligned or not).

@HeroicKatora

This comment has been minimized.

Copy link

HeroicKatora commented Jan 13, 2019

@nicoburns

which suggested that immediately casting a reference to a pointer (&mut foo as *mut T) actually compiles to just creting a pointer.

What it compiles to is distinct from the semantics the compiler is allowed to use to perform this compilation. Even if it is a no-op in IR, it can still have a semantic effect such as asserting additional assumptions to the compiler. @scottjmaddox is correct in which operations are at play here but the critical part of the question is the creation of the mutable reference which happens before and independently of the ref-to-ptr coercion. Then @mjbshaw is technically correct about the general safety requiring ptr::write_unaligned when the argument is an unknown generic argument.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Jan 13, 2019

I don't remember where I read this (nomicon? One of @RalfJung's blog posts?) but I'm fairly certain that field access via raw pointer dereference, reference, and immediate conversion of the reference into a pointer (either via coercion or casting) is special cased.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Jan 13, 2019

In what way does this variant of writing to uninitialized fields exhibit less undefined behaviour than get_mut()? At the code point where the first argument to ptr::write is evaluated, the code has created a &mut _ to the inner field which should be just as undefined as the reference to the whole struct that would otherwise be created. Should the compiler not be allowed to assume this to be in initialized state already then?

Very good question! These concerns are one reason why I opened rust-lang/rfcs#2582. With that RFC accepted, the code I showed does not create an &mut, it creates a *mut.

@scottjmaddox

This comment has been minimized.

Copy link

scottjmaddox commented Jan 14, 2019

@mjbshaw Touché. Yes, I suppose you're right about the possibility of the struct being packed, and therefor needing ptr::write_unaligned . I had not considered that before, primarily because I've yet to use packed structures in rust. This should probably be a clippy lint, if it's not already.

Edit: I didn't see a relevant clippy lint, so I submitted an issue: rust-lang/rust-clippy#3659

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment