New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Direct and Partial Initialization using &uninit T #2534

Open
wants to merge 14 commits into
base: master
from

Conversation

Projects
None yet
@KrishnaSannasi
Copy link

KrishnaSannasi commented Aug 29, 2018

Use &uninit T to initialize your memory directly!

This can have performance improvements and makes dealing with uninitialized memory completely safe.

Rendered


this RFC has been edited This RFC used to also have &out, but I removed them due to insufficient motivation, and a relatively easy workaround to &out.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Aug 29, 2018

Earlier conversation can be found in the Pre-RFC

@Centril Centril added the T-lang label Aug 29, 2018

@Centril Centril changed the title Initial Commit for Write Pointer RFC: Partial Initialization and Write Pointers using &out T and &uninit T Aug 29, 2018

@newpavlov

This comment has been minimized.

Copy link

newpavlov commented Aug 29, 2018

I think it's better to write a separate RFC for &out T and use this one only for &uninit T.

EDIT: &mut T -> &out T

@KrishnaSannasi KrishnaSannasi changed the title RFC: Partial Initialization and Write Pointers using &out T and &uninit T RFC: Write Pointers for Direct and Partial Initialization using &out T and &uninit T Aug 29, 2018

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Aug 29, 2018

Did you mean &out T? I would like to keep &out T and &uninit T together under the banner of write pointers because they overlap in many ways, and have only a few notable differences. Therefore, I think it is better to talk about them in one RFC.

@KrishnaSannasi KrishnaSannasi changed the title RFC: Write Pointers for Direct and Partial Initialization using &out T and &uninit T RFC: Write References for Direct and Partial Initialization using &out T and &uninit T Aug 29, 2018

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Aug 29, 2018

Can you add some discussion of the advantages of &uninit and &out over &mut MaybeUninit<T> (which provides a safe set method)? (WIP PR for MaybeUninit)

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Aug 30, 2018

@cramertj With a pointer you could write to fields of fields in a write only way, for example:

This example is broken, see my comment below for the udpated version

#[derive(Clone, Copy)]
pub struct Outer {
    pub public_field: u32
    private_field: u32
}

impl Outer {
    pub fn new(private_field: u32) -> Self {
        Outer { public_field: 0, private_field }
    }
}

// The code below lives in a different crate

// relevant imports here

let outer = Outer::new(20);

foo(&out outer);

let outer = MaybeUninit::new(Outer::new(30));

bar(&mut outer);

fn foo(outer: &out Outer) {
    // println!("{}", outer.should_only_write); // Compile time error

    outer.should_only_write = 20; // Fine, because we only write
}

fn bar(outer: &mut MaybeUninit<Outer>) {
    outer.set(
        // what to put in here?
    )
}

Note bar cannot be written without unsafe code and knowning the layout of Outer, which is not stable.

This example had a few problems, see my below for the updated version.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Aug 30, 2018

Also, &out T and &uninit T are unsafe free, whereas MaybeUninit fundamentally cannot be unsafe free. At least without using more space, (through an enum instead of a union, but that defeats the purpose).
edit: after looking through the proposal for MaybeUninit, you can still read the values inside, but only within a unsafe, and writing is safe.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Aug 30, 2018

&uninit T can be safely read after it is written to. MaybeUninit cannot handle this case at all without unsafe. MaybeUninit<T> seems to be very similar to &out T, two different ways to do the same thing.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Aug 30, 2018

@KrishnaSannasi Reading values from a MaybeUninit is safe after setting it, since the set method (or another method with a similar name) returns an &mut reference to the inner value. SImilarly, since MaybeUninit is repr(transparent), it can be used to safely refer to uninitialized elements inside of an array.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Aug 30, 2018

@cramertj How about values on the heap? (Such as in a Vec<T>, if we want to implement something like emplace_back)
I'm not too familiar with #[repr(transparent)], so I don't exactly know how it works. Thanks for your help.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Aug 30, 2018

The process works no differently on the heap, but the specific API for placement-new inside of Vec<T> is more interesting because the compiler needs to know for sure that the value has been initialized before it can move on to other operations.

#[repr(transparent)] means "this type's runtime and ABI representation is exactly the same as that of it's only non-zero-sized field." e.g. #[repr(transparent)] struct Foo { bar: Bar } has exactly the same in-memory representation as Bar.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Aug 30, 2018

A Vec<T> emplace function using &uninit T would look something like:

impl<T> Vec<T> {
    fn emplace_back(put: impl FnOnce(&uninit T)) {
        // ... allocate magic ...
        put(&uninit end);
    }
}

and we could have some-placement-new like sugar for ease of use, but that's not necessary.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Aug 30, 2018

@KrishnaSannasi Yeah, that works out nicely. A similar thing using MaybeUninit could look like this:

pub struct Uninit<'a, T, Tag> {
    location: &'a mut MaybeUninit<T>,
    marker: PhantomData<Tag>,
}

pub struct UninitFilled<Tag>(PhantomData<Tag>);

impl<'a, T, Tag> Uninit<'a, T, Tag> {
    pub fn new(location: &'a mut MaybeUninit<T>) -> Self {
        ...
    }

    #[inline(always)]
    pub fn fill(self, value: T) -> UninitFilled<UniqToken> {
        self.location.set(value);
        UninitFilled(PhantomData)
    }
}

impl<T> Vec<T> {
    pub fn emplace_back(put: impl for<Tag> FnOnce(Uninit<'_, T, Tag>) -> UninitFilled<Tag>) {
        ...
    }
}

The Place and PlaceFilled definitions could exist in the standard library. I personally like this solution because it avoids introducing additional language constructs (new reference types for the compiler and users to understand), but it does have the unfortunate downside of requiring type-level HRTB (for<Tag> ...) in order to prevent users manufacturing their own Place with Place::new and making a PlaceFilled that they then hand back directly rather than actually calling set.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Aug 30, 2018

@cramertj Cool, why do we need Tag?

Also, how about this:
You have a public and private field(s), the public field should only be written to in the relevant functions, otherwise bad things happen (like segfault).

#[derive(Clone, Copy)]
pub struct Outer {
    pub public_field: u32
    private_field: u32
}

impl Outer {
    pub fn new(private_field: u32) -> Self {
        Outer {
            public_field: 0,
            private_field
        }
    }
}

#[repr(transparent)]
struct MaybeUninitWrite<'a, T: 'a>(&mut MaybeUninit<T>);

impl<'a, T: 'a> MaybeUninitWrite<'a, T> {
    pub fn new(inner: &'a mut MaybeUninit) -> Self {
        MaybeUninitWrite(inner)
    }

    pub fn set(&mut self, value: T) {
        (self.0).set(value);
    }
}
// The code below lives in a different crate

// relevant imports here

let outer = Outer::new(20);

foo(&out outer);

let mut outer = MaybeUninit::new(Outer::new(30));
let outer = MaybeUninitWrite::new(&mut outer);

bar(&mut outer);

let mut outer = MaybeUninit::new(Outer::new(40));

unsafe { baz(&mut outer); }

// These functions should not be able to read any value in Outer. It could cause a segfault. 
// (obviously this code won't but we can imagine code very similiar to this
// that might actually segfault, using hw interfaces managed by the OS)

// This enforces that contract statically
fn foo(outer: &out Outer) {
    // println!("{}", outer.public_field); // Compile time error

    outer.public_field = 20; // Fine, because we only write
}

// This cannot do the same thing as foo
fn bar(outer: &mut MaybeUninitWrite<Outer>) {
    // not possible to change public_field on outer
}

// This cannot enforce a write-only contract,
// and might segfualt if outer holds an uninitialized value
unsafe fn baz(outer: &mut MaybeUninit<Outer>) {
    outer.get_mut().public_field = 20;
}

Doing something this simple, should not require unsafe code, or complex type usage.

added unresolved questions
should raw pointers, *out T be added
should a new trait, OverwriteSafe, be used for the bounds of &out T
@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Aug 31, 2018

Another idea for the bounds of &out T, would be an auto-implemented trait OverwriteSafe (name not important), that is implemented for all types, that T: !Drop and all fields implement T: OverwriteSafe. This would prevent nested Drop types from being inside structs or enum. Thereby allowing it to be safely overwritten.

The downside to this approach is that we add a new trait to the standard library, and adding a Drop implementation becomes a breaking change. This is probably undesirable. But, adding a Drop implementation later doesn't seem likely, so this might be fine.

@bjorn3

This comment has been minimized.

Copy link

bjorn3 commented Sep 5, 2018

[…] but it does have the unfortunate downside of requiring type-level HRTB (for ..) […]

Why not use a lifetime, instead of a type as tag?

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Sep 5, 2018

@bjorn3 It's possible that an invariant lifetime would be sufficient (so that the user couldn't create a 'static version or another longer-lived version that could be used instead).

@YaLTeR

This comment has been minimized.

Copy link

YaLTeR commented Sep 6, 2018

In the examples with vector emplace_back() returning an &uninit T what happens if you do nothing and just mem::forget() it? Wouldn't the vector try to drop this uninitialized value in its destructor? The same applies to emplace_back() taking a closure.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Sep 7, 2018

@YaLTeR You would be forgetting the pointer, not the underlying value. In the closure case, you would not be initializing the value, so it would break this rule and get a compiler error

  • Functions and closures that take a &uninit T argument must initialize it before returning

and you cannot return a &uninit T by that same rule's sub-rule and would get a compiler error

  • You cannot return an &uninit T

I put this rule in for the exact reason you pointed out. If we create an uninitialized value in a Vec, then try and drop the Vec, that would cause many problems, and I don't want to enforce a rule of &uninit T must be initialized if you get one from the return value of a function, although that is a viable alternative.

That way, only &uninit T that you create may be allowed to be left uninitialized. This would also allow functions to leave &uninit T uninitialized, if they return that &uninit T. But it seems more complicated than it's worth.

@cramertj

... invariant lifetime would be sufficient ...

How would you define an invariant lifetime, using references will always give a covariant lifetime, and there doesn't seem to be another way to constrain lifetimes, if I'm understanding the nomicon section of variance correctly.

@ExpHP

This comment has been minimized.

Copy link

ExpHP commented Sep 9, 2018

In the closure case, you would not be initializing the value, so it would break this rule and get a compiler error

This assumes the compiler can identify that mem::forget doesn't initialize it, without using global analysis.

...and this is certainly something it can do; any function with the signature fn foo<T>(x: T) couldn't possibly initialize an &uninit by argument from parametricity. But this should lead to a more general restriction; e.g. any generic parameter T can never resolve to &uninit _ as the outermost type constructor, or something like that.

@derekdreery

This comment has been minimized.

Copy link

derekdreery commented Sep 10, 2018

Would this help to solve the problem where only part of a struct has defaults? e.g.

struct Person {
    given_name: String, // no default
    family_name: String, // no default
    other_names: Vec<String>, // default = Vec::new()
    email: Option<String>, // default = None
    //...
}

There could be some function that returns an &uninit Person where the compiler checks that you initialize the remaining values.

Would this be possible?

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Sep 10, 2018

@ExpHP Yes, that sounds right.

@derekdreery That would require global knowledge of the entire program in the worst case, and that is undesirable, so no, this will not solve that problem. This is fine for the compiler to do, but the human programming would find this confusing, and hard to reason about. This also makes changing the implementation of a function a breaking change, which is also undesirable.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Oct 16, 2018

Would changing the name of &out to &write be better? It would more clearly explain what is going on and it doesn't advertise using &out for output parameters, which I think is good.

@Centril Centril added the I-nominated label Oct 18, 2018

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 25, 2018

I definitely feel like there are real use cases here, but I am very wary of growing the number of reference types, especially in a sort of "narrow, targeted" fashion. I would rather that we address these patterns by adding some form of more fundamental capability: for example, being able to "transition" the permissions that you have on a reference (so that you can take an &mut and have it "degrade" to a shared borrow when you return etc). This might be used to allow you to "upgrade" from &uninit to &mut.

That said, I don't see such an extension as on the short path yet. I would rather see us pushing towards:

  • adopting NLL =)
  • adopting polonius =)
  • then maybe looking at self-referential structs, fns that only access a subset of fields or other problems
    • this may intersect here, it's worth keeping in mind

@Centril Centril removed the I-nominated label Oct 25, 2018

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Oct 27, 2018

@nikomatsakis Is this RFC going to be postponed or closed?

@SoniEx2

This comment has been minimized.

Copy link

SoniEx2 commented Oct 27, 2018

so, it turns out PITs would benefit both of us, for different reasons.

you want uninit references, I want opaque references, and PITs provide both in one! (and both need language support anyway)

out references are a little weird and low-level-ish tho and I'm not sure we really need them - can they be done with a safe wrapper around a pointer instead? (similar to Cell etc but without compiler support)

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Oct 27, 2018

@SoniEx2 I don't think you're proposal is worth the cost though. @nikomatsakis didn't outright say that it can't be done, only that he is wary of adding new reference types (which is also what you are doing), and that this proposal doesn't fit in with the near-term goals of Rust right now (which also applies to PITs).

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Dec 20, 2018

@rfcbot fcp postpone

Per my previous comment, I am moving to postpone this RFC. My reasoning remains unchanged:

I definitely feel like there are real use cases here, but I am very wary of growing the number of reference types, especially in a sort of "narrow, targeted" fashion.

(Per that same comment, we are still working through the NLL and Polonius transition and will be for some time.)

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Dec 20, 2018

Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- What is the correct bounds for `T` in `&out T`? conservatively, `T: Copy` works and is sound. But as [@TechnoMancer](https://internals.rust-lang.org/u/TechnoMancer) pointed out, `T: !Drop` is not the correct bound. For example, `Wrapper(Vec<String>)`, clearly cannot be overwritten safely, because `Vec<String>`, must drop do deallocate the `Vec`, but `Wrapper` itself does not implement drop. Therefore either a new trait is needed (but unwanted), or we must keep the `Copy` bound.

This comment has been minimized.

@fstirlitz

fstirlitz Dec 31, 2018

Failing to call destructors is safe; better find another word. This is mostly a technicality, however; resource leaks are still something Rust would rather avoid, and I agree not accounting for this here at all would make them too easy.

But besides this, I've been long thinking that Drop needs an overhaul. I'd replace it with something which looks like this:

  • trait Drop { /* ... */ }: same interface as before
  • unsafe trait Destruct: Drop {}: marker trait, assures that values of a given type can be disposed of by merely calling Drop implementations of their constituent fields; the type has no disposal logic of its own and can be safely destructured (this corresponds to what is now informally expressed as !Drop)
  • unsafe auto trait Forget: Destruct {}: marker trait, assures that all Drop implementations of transitively constituent fields are no-ops; in other words, mem::drop on the type is completely equivalent to mem::forget.

Now, automatic deriving behaviour:

  • for any given type T, unless the containing module provides an explicit Drop implementation, the compiler automatically derives a no-op Drop and Destruct for T;
  • if all constituent fields are also Forget, the compiler derives Forget as well.

This system would remove a major motivation for introducing negative traits/bounds (which have been pointed out to be rather problematic, especially in the context of automatic impls for !), as it would create a way to formalise 'trivial disposal behaviour' in the type system without them; in this case, the correct bound would be Forget. It would also open the doors to adding true linear typing to Rust: those could simply explicitly opt-out of implementing Drop. In other words, lacking an impl Drop would no longer mean 'trivial disposal logic', but 'no automatic disposal at all'.

However, I'm not sure how backwards compatible this would be. Obviously, introducing true linear typing would incur some major changes.

This comment has been minimized.

@KrishnaSannasi

KrishnaSannasi Dec 31, 2018

The Drop overhaul sounds nice on paper, I would still still want some form of negative trait impls, because there are some APIs to need this to be work at all. But that is beside the point. The Drop overhaul doesn't fit in with this RFC, and you could post it to internals if you want more feedback.

I did note that failing to call destructors is safe as the very next sub bullet. This is an unresolved question because I wasn't sure if we wanted to allow leaks this way.

@graydon

This comment has been minimized.

Copy link

graydon commented Jan 12, 2019

Strong oppose. Cognitive load and risk of writing wrong code is high, motivation is marginal.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Jan 12, 2019

Cognitive load and risk of writing wrong code is high, motivation is marginal.

@graydon Could you perhaps elaborate with examples of where it could go wrong and examples of the risks?

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Jan 12, 2019

@graydon I'm currently in the middle of stripping down this proposal. No more &out as it is completely unnecessary (it can be written as a library item based off of &mut). I am also thinking about how to rework uninit.

I can see how it can be a large cognitive load, but dealing with uninitialized memory is hard and so the rules around &uninit are there to make sure nothing can go wrong. The compiler will be the one to check your code so you shouldn't have to think too hard about it. It's like the orphan rules, they can be somewhat complicated, but most of the time you don't need to think about it.


&out as a library item, for those who are curious

struct Out<'a, T> { ptr: &'a mut T }

impl<'a, T> Out<'a, T> {
    // to set without dropping old value
    fn set(&mut self, value: T) {  unsafe { std::ptr::write(self.ptr, value); } }
    fn reborrow<'b>(out: &mut Self) where 'a: 'b { Out { ptr: out.ptr } }
}

impl<'a, T> From<&'a mut T> for Out<'a, T> {
    fn from(ptr: &'a mut T) -> Self {
        Self { ptr }
    }
}

edit: I know this isn't exactly what I put out for &out, but it does cover most of the use case, so I am fine with it.

@KrishnaSannasi KrishnaSannasi changed the title RFC: Write References for Direct and Partial Initialization using &out T and &uninit T RFC: Direct and Partial Initialization using &uninit T Jan 12, 2019

@graydon

This comment has been minimized.

Copy link

graydon commented Jan 12, 2019

Cognitive load and risk of writing wrong code is high, motivation is marginal.

@graydon Could you perhaps elaborate with examples of where it could go wrong and examples of the risks?

Users obtaining pointers to uninitialized memory casually because "it's faster", which they then convert to a raw pointer. Presently casually taking a ref to an uninit variable gets you an error; you have to work somewhat to get yourself some uninit memory. We worked hard to paint uninitialized memory into a very small corner in this language you have to do some ceremony to get at. Now it'll be a chapter heading in a book that's very easy to see, learn, get interested in, tinker with, use in your own libraries.

Additional unclear risk: drop semantics. Drops execute top-down, so this has to not work on drop types. The proposal isn't entirely clear if it's going to try to allow that.

Cognitive load I think is obvious: it's a whole new qualifier with a complex set of rules that people will naturally sprinkle all over codebases, especially if it's associated with speed. We worked super hard to minimize the number of qualifiers. It looks like it can all be done with a library type, so it should be.

It's also a new state of initialized-ness for users to think about, for every memory cell. Currently the user's mental model does not include (unless they're off in unsafe land) the initialized-ness of the substructure of a struct. Now it will.

@SoniEx2

This comment has been minimized.

Copy link

SoniEx2 commented Jan 12, 2019

PITs are supposed to put stricter requirements on this thing. a pointer to a PIT is valid, but is a pointer to a PIT, not to a full object, because PITs encode initialization state in the type. you can still access uninitialized or invalid memory by going through usize but we have that issue with many existing types anyway.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Jan 12, 2019

@graydon

Users obtaining pointers to uninitialized memory casually because "it's faster", which they then convert to a raw pointer. Presently casually taking a ref to an uninit variable gets you an error; you have to work somewhat to get yourself some uninit memory. We worked hard to paint uninitialized memory into a very small corner in this language you have to do some ceremony to get at. Now it'll be a chapter heading in a book that's very easy to see, learn, get interested in, tinker with, use in your own libraries.

I am not convinced by this argument, to do anything with with raw pointers you need unsafe, and if you are already using unsafe it is really easy to get a raw pointer to uninit memory.

let x: String = unsafe { std::mem::unintialized() };

let y: *mut _ = &mut x; // raw pointer to uninit memory

But now that you have a raw pointer to uninit memory you need to do pretty much all the same work that I have laid out to make sure that you don't do anything unsound with it. This is cognitive load that could be shifted onto the compiler to have it validate our program. Which is what this proposal is trying to do

Additional unclear risk: drop semantics. Drops execute top-down, so this has to not work on drop types. The proposal isn't entirely clear if it's going to try to allow that.

Currently this will be allowed, because it logically identical to storing the data in a bunch of temporaries, then constructing the Drop type. Once the final field is initialized, the Drop type is initialized and will be dropped at the end of the scope.

Cognitive load I think is obvious: it's a whole new qualifier with a complex set of rules that people will naturally sprinkle all over codebases, especially if it's associated with speed. We worked super hard to minimize the number of qualifiers. It looks like it can all be done with a library type, so it should be.
It's also a new state of initialized-ness for users to think about, for every memory cell. Currently the user's mental model does not include (unless they're off in unsafe land) the initialized-ness of the substructure of a struct. Now it will.

I agree that this is a large cognitive load, but I would like to make dealing with uninitialized code safe. Also dealing with uninitialized code is entirely local, with absolutely no global analysis, so it shouldn't be too bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment