New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generator fields are not necessarily initialized #56100

Merged
merged 3 commits into from Nov 25, 2018

Conversation

Projects
None yet
7 participants
@RalfJung
Member

RalfJung commented Nov 20, 2018

Looking at the MIR we generate for generators, I think we deliberately leave fields of the generator uninitialized in ways that would be illegal if this was a normal struct (or rather, one would have to use MaybeUninit). Consider this example:

#![feature(generators, generator_trait)]

fn main() {
    let generator = || {
        let mut x = Box::new(5);
        {
            let y = &mut *x;
            *y = 5;
            yield *y;
            *y = 10;
        }
        *x
    };
    let _gen = generator;
}

It generates the MIR

fn main() -> (){
    let mut _0: ();                      // return place
    scope 1 {
        scope 3 {
        }
        scope 4 {
            let _2: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "_gen" in scope 4 at src/main.rs:14:9: 14:13
        }
    }
    scope 2 {
        let _1: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "generator" in scope 2 at src/main.rs:4:9: 4:18
    }

    bb0: {                              
        StorageLive(_1);                 // bb0[0]: scope 0 at src/main.rs:4:9: 4:18
        (_1.0: u32) = const 0u32;        // bb0[1]: scope 0 at src/main.rs:4:21: 13:6
                                         // ty::Const
                                         // + ty: u32
                                         // + val: Scalar(Bits { size: 4, bits: 0 })
                                         // mir::Constant
                                         // + span: src/main.rs:4:21: 13:6
                                         // + ty: u32
                                         // + literal: Const { ty: u32, val: Scalar(Bits { size: 4, bits: 0 }) }
        StorageLive(_2);                 // bb0[2]: scope 1 at src/main.rs:14:9: 14:13
        _2 = move _1;                    // bb0[3]: scope 1 at src/main.rs:14:16: 14:25
        drop(_2) -> bb1;                 // bb0[4]: scope 1 at src/main.rs:15:1: 15:2
    }

    bb1: {                              
        StorageDead(_2);                 // bb1[0]: scope 1 at src/main.rs:15:1: 15:2
        StorageDead(_1);                 // bb1[1]: scope 0 at src/main.rs:15:1: 15:2
        return;                          // bb1[2]: scope 0 at src/main.rs:15:2: 15:2
    }
}

Notice how we only initialize the first field of _1 (even though it contains a Box!), and then assign it to _2. This violates the rule "on assignment, all data must satisfy the validity invariant", and hence miri complains about this code.

What this PR effectively does is to change the validity invariant for generators such that it says nothing about the fields of the generator. We behave as if every field of the generator was wrapped in a MaybeUninit.

r? @oli-obk

Cc @nikomatsakis @eddyb @cramertj @withoutboats @Zoxc

@@ -142,6 +142,7 @@ macro_rules! make_value_visitor {
self.walk_value(v)
}
/// Visit the given value as a union. No automatic recursion can happen here.
/// Also called for the fields of a generator, which may or may not be initialized.

This comment has been minimized.

@oli-obk

oli-obk Nov 20, 2018

Contributor

I don't see this happening in the code below.

This comment has been minimized.

@RalfJung

RalfJung Nov 20, 2018

Member

Oh yeah I went back on this because it doesn't work very well... I guess I could still do it an go through visit_field though.

This comment has been minimized.

@RalfJung

RalfJung Nov 20, 2018

Member

Actually no that doesn't work, it doesn't have a union type. I don't think there is a way to visit the other generator fields at all with the current interface, and it doesn't seem worth extending the interface?

This comment has been minimized.

@oli-obk

oli-obk Nov 20, 2018

Contributor

Oh yea, that's totally fine, as long as the comments mirror reality ;)

Well, as long as validation doesn't get hickups elsewhere because https://github.com/solson/miri/blob/adfede5cec2c8a136830f7fc309dbb45ac7a098a/src/helpers.rs#L221 wasn't visited in miri.

This comment has been minimized.

@RalfJung

RalfJung Nov 20, 2018

Member

Hm, that is a good point. I forgot that I was using visit_union there.

This is relevant when determining where there are UnsafeCell inside a generator. If there is no UnsafeCell, shared references enforce memory to be frozen. So we probably should go conservatively type-based here like we do for unions... dang.

Just calling visit_union after doing the field projections would actually work, but it would violate the protocol that lets a visitor keep track of which "path" inside the data structure we are at. The only visitor relying on the path is validation, which doesn't do anything for unions, so this is fine in principle... but it's not nice. Any ideas?

This comment has been minimized.

@RalfJung

RalfJung Nov 20, 2018

Member

I added a new visit_generator_field hook for this. Now at least it makes sense, and likely nobody will ever overwrite that hook...

// (which is the state) are actually implicitly `MaybeUninit`, i.e.,
// they may or may not be initialized, so we cannot visit them.
match v.layout().ty.sty {
ty::Generator(..) => {

This comment has been minimized.

@oli-obk

oli-obk Nov 20, 2018

Contributor

Niche code also has an exception for generator fields

rust/src/librustc/ty/layout.rs

Lines 1812 to 1817 in 7a0cef7

// Locals variables which live across yields are stored
// in the generator type as fields. These may be uninitialized
// so we don't look for niches there.
if let ty::Generator(..) = layout.ty.sty {
return Ok(None);
}

Would it make sense to try to simplify all downstream code for generators by wrapping all its fields with MaybeUninit very early?

This comment has been minimized.

@RalfJung

RalfJung Nov 20, 2018

Member

Interesting, yes that would be the same exception.

I am not sure how complicated it would be for generators to do this wrapping.

This comment has been minimized.

@eddyb

eddyb Nov 20, 2018

Member

IMO generators should be treated like an union with field offsets.
Unless we want to generate "variants" for the states involved, which would be a bit more work, but would provide a safe view into the state of the generator.

This comment has been minimized.

@RalfJung

RalfJung Nov 20, 2018

Member

They don't have Union layout though, so right now they need special treatment everywhere.

This comment has been minimized.

@oli-obk

oli-obk Nov 20, 2018

Contributor

Unless we want to generate "variants" for the states involved, which would be a bit more work, but would provide a safe view into the state of the generator.

I was considering that, but I don't know if that actually works in a non-scary way, as you'll want to switch from one variant to another without copying everything.

IMO generators should be treated like an union with field offsets.

but why the entire generator? The discriminant field is perfectly safe to read and we could even do value range restrictions on it to be able to use niche optimizations on generators.

This comment has been minimized.

@RalfJung

RalfJung Nov 20, 2018

Member

The discriminant field is perfectly safe to read and we could even do value range restrictions on it to be able to use niche optimizations on generators.

In fact that would make perfect sense, it encodes the state after all and hence has a limited value range.

@RalfJung

This comment has been minimized.

Member

RalfJung commented Nov 20, 2018

@cramertj

This comment has been minimized.

Member

cramertj commented Nov 20, 2018

I'm assuming this is going to be another size pessimization for generators. :( sighs and looks longingly at #52924

@RalfJung

This comment has been minimized.

Member

RalfJung commented Nov 20, 2018

@cramertj I don't follow. This PR doesn't change generator layout at all, and layout computation already pretty much treats them as MaybeUninit because all your code would miscompile if it didn't.^^

@cramertj

This comment has been minimized.

Member

cramertj commented Nov 20, 2018

@RalfJung Ah I missed the comment above saying that we already ignored niches in the layout optimizations. You could imagine initializing the object such that it had a bit-valid repr on creation to prevent UB, but we don't do that, so... :)

@eddyb

This comment has been minimized.

Member

eddyb commented Nov 20, 2018

@cramertj Since it's like a tagged enum, IMO we should use the tag ("current state") as a niche, by giving it a validity range based on the number of states.
We can probably even give it a full tagged enum layout, with "variant" layouts.

@RalfJung

This comment has been minimized.

Member

RalfJung commented Nov 20, 2018

That might not even be possible for some types (uninhabited types, for example), and certainly be "fun" for references (which have to be dereferencable).^^

But also... why? MaybeUninit actually expresses the reality quite well here; the local variables of this generator are not initialized yet, after all. More abstractly, the "body" of a generator really is just some kind of arena used as the backing store in lieu of a proper stack frame. We don't do layout optimizations on stack frames either.^^ (I hope this doesn't give @eddyb ideas...)

@cramertj

This comment has been minimized.

Member

cramertj commented Nov 20, 2018

@RalfJung

We don't do layout optimizations on stack frames either

I mean, this doesn't seem unreasonable to me?

@RalfJung

This comment has been minimized.

Member

RalfJung commented Nov 20, 2018

TBH I don't even see how it helps, let alone how it ever amortizes the cost of having to set the right bit pattern on initialization.^^

But, anyway, if the state tag gets a niche then Option<Future> will get layout optimized. I cannot think of a way how anything else would even be possible. And certainly all of this is entirely off-topic in this PR, which is about figuring out what the current invariant and layout of generators is, not about improving it. ;)

@withoutboats

This comment has been minimized.

Contributor

withoutboats commented Nov 20, 2018

I think the most valuable size optimization would be that the discriminants of all the generators in a stack of generators get unified into a single discriminant value. I doubt using the niches of fields is that important.

@RalfJung

This comment has been minimized.

Member

RalfJung commented Nov 22, 2018

Coming back to the topic of this PR... it seems everyone agrees that currently, the fields of a Generator are de-facto MaybeUninit, and hence the miri visitor should treat them as such? So, can we proceed and land this?

@oli-obk

This comment has been minimized.

Contributor

oli-obk commented Nov 22, 2018

@bors r+

Yes. this PR represents the current state of how the compiler views generators and I think this code will break if we try to change that representation, so we'll notice

@bors

This comment has been minimized.

Contributor

bors commented Nov 22, 2018

📌 Commit 6befe67 has been approved by oli-obk

pietroalbini added a commit to pietroalbini/rust that referenced this pull request Nov 25, 2018

Rollup merge of rust-lang#56100 - RalfJung:visiting-generators, r=oli…
…-obk

generator fields are not necessarily initialized

Looking at the MIR we generate for generators, I think we deliberately leave fields of the generator uninitialized in ways that would be illegal if this was a normal struct (or rather, one would have to use `MaybeUninit`). Consider [this example](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=417b4a2950421b726dd7b307e9ee3bec):
```rust
#![feature(generators, generator_trait)]

fn main() {
    let generator = || {
        let mut x = Box::new(5);
        {
            let y = &mut *x;
            *y = 5;
            yield *y;
            *y = 10;
        }
        *x
    };
    let _gen = generator;
}
```

It generates the MIR
```
fn main() -> (){
    let mut _0: ();                      // return place
    scope 1 {
        scope 3 {
        }
        scope 4 {
            let _2: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "_gen" in scope 4 at src/main.rs:14:9: 14:13
        }
    }
    scope 2 {
        let _1: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "generator" in scope 2 at src/main.rs:4:9: 4:18
    }

    bb0: {
        StorageLive(_1);                 // bb0[0]: scope 0 at src/main.rs:4:9: 4:18
        (_1.0: u32) = const 0u32;        // bb0[1]: scope 0 at src/main.rs:4:21: 13:6
                                         // ty::Const
                                         // + ty: u32
                                         // + val: Scalar(Bits { size: 4, bits: 0 })
                                         // mir::Constant
                                         // + span: src/main.rs:4:21: 13:6
                                         // + ty: u32
                                         // + literal: Const { ty: u32, val: Scalar(Bits { size: 4, bits: 0 }) }
        StorageLive(_2);                 // bb0[2]: scope 1 at src/main.rs:14:9: 14:13
        _2 = move _1;                    // bb0[3]: scope 1 at src/main.rs:14:16: 14:25
        drop(_2) -> bb1;                 // bb0[4]: scope 1 at src/main.rs:15:1: 15:2
    }

    bb1: {
        StorageDead(_2);                 // bb1[0]: scope 1 at src/main.rs:15:1: 15:2
        StorageDead(_1);                 // bb1[1]: scope 0 at src/main.rs:15:1: 15:2
        return;                          // bb1[2]: scope 0 at src/main.rs:15:2: 15:2
    }
}
```
Notice how we only initialize the first field of `_1` (even though it contains a `Box`!), and then assign it to `_2`. This violates the rule "on assignment, all data must satisfy the validity invariant", and hence miri complains about this code.

What this PR effectively does is to change the validity invariant for generators such that it says nothing about the fields of the generator. We behave as if every field of the generator was wrapped in a `MaybeUninit`.

r? @oli-obk

Cc @nikomatsakis @eddyb @cramertj @withoutboats @Zoxc

bors added a commit that referenced this pull request Nov 25, 2018

Auto merge of #56215 - pietroalbini:rollup, r=pietroalbini
Rollup of 14 pull requests

Successful merges:

 - #56024 (Don't auto-inline const functions)
 - #56045 (Check arg/ret sizedness at ExprKind::Path)
 - #56072 (Stabilize macro_literal_matcher)
 - #56075 (Encode a custom "producers" section in wasm files)
 - #56100 (generator fields are not necessarily initialized)
 - #56101 (Incorporate `dyn` into more comments and docs.)
 - #56144 (Fix BTreeSet and BTreeMap gdb pretty-printers)
 - #56151 (Move a flaky process test out of libstd)
 - #56170 (Fix self profiler ICE on Windows)
 - #56176 (Panic setup msg)
 - #56204 (Suggest correct enum variant on typo)
 - #56207 (Stabilize the int_to_from_bytes feature)
 - #56210 (read_c_str should call the AllocationExtra hooks)
 - #56211 ([master] Forward-ports from beta)

Failed merges:

r? @ghost

@bors bors merged commit 6befe67 into rust-lang:master Nov 25, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@RalfJung RalfJung deleted the RalfJung:visiting-generators branch Nov 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment