Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generator fields are not necessarily initialized #56100

Merged
merged 3 commits into from Nov 25, 2018

Conversation

RalfJung
Copy link
Member

Looking at the MIR we generate for generators, I think we deliberately leave fields of the generator uninitialized in ways that would be illegal if this was a normal struct (or rather, one would have to use MaybeUninit). Consider this example:

#![feature(generators, generator_trait)]

fn main() {
    let generator = || {
        let mut x = Box::new(5);
        {
            let y = &mut *x;
            *y = 5;
            yield *y;
            *y = 10;
        }
        *x
    };
    let _gen = generator;
}

It generates the MIR

fn main() -> (){
    let mut _0: ();                      // return place
    scope 1 {
        scope 3 {
        }
        scope 4 {
            let _2: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "_gen" in scope 4 at src/main.rs:14:9: 14:13
        }
    }
    scope 2 {
        let _1: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "generator" in scope 2 at src/main.rs:4:9: 4:18
    }

    bb0: {                              
        StorageLive(_1);                 // bb0[0]: scope 0 at src/main.rs:4:9: 4:18
        (_1.0: u32) = const 0u32;        // bb0[1]: scope 0 at src/main.rs:4:21: 13:6
                                         // ty::Const
                                         // + ty: u32
                                         // + val: Scalar(Bits { size: 4, bits: 0 })
                                         // mir::Constant
                                         // + span: src/main.rs:4:21: 13:6
                                         // + ty: u32
                                         // + literal: Const { ty: u32, val: Scalar(Bits { size: 4, bits: 0 }) }
        StorageLive(_2);                 // bb0[2]: scope 1 at src/main.rs:14:9: 14:13
        _2 = move _1;                    // bb0[3]: scope 1 at src/main.rs:14:16: 14:25
        drop(_2) -> bb1;                 // bb0[4]: scope 1 at src/main.rs:15:1: 15:2
    }

    bb1: {                              
        StorageDead(_2);                 // bb1[0]: scope 1 at src/main.rs:15:1: 15:2
        StorageDead(_1);                 // bb1[1]: scope 0 at src/main.rs:15:1: 15:2
        return;                          // bb1[2]: scope 0 at src/main.rs:15:2: 15:2
    }
}

Notice how we only initialize the first field of _1 (even though it contains a Box!), and then assign it to _2. This violates the rule "on assignment, all data must satisfy the validity invariant", and hence miri complains about this code.

What this PR effectively does is to change the validity invariant for generators such that it says nothing about the fields of the generator. We behave as if every field of the generator was wrapped in a MaybeUninit.

r? @oli-obk

Cc @nikomatsakis @eddyb @cramertj @withoutboats @Zoxc

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 20, 2018
@@ -142,6 +142,7 @@ macro_rules! make_value_visitor {
self.walk_value(v)
}
/// Visit the given value as a union. No automatic recursion can happen here.
/// Also called for the fields of a generator, which may or may not be initialized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this happening in the code below.

Copy link
Member Author

@RalfJung RalfJung Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I went back on this because it doesn't work very well... I guess I could still do it an go through visit_field though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no that doesn't work, it doesn't have a union type. I don't think there is a way to visit the other generator fields at all with the current interface, and it doesn't seem worth extending the interface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yea, that's totally fine, as long as the comments mirror reality ;)

Well, as long as validation doesn't get hickups elsewhere because https://github.com/solson/miri/blob/adfede5cec2c8a136830f7fc309dbb45ac7a098a/src/helpers.rs#L221 wasn't visited in miri.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that is a good point. I forgot that I was using visit_union there.

This is relevant when determining where there are UnsafeCell inside a generator. If there is no UnsafeCell, shared references enforce memory to be frozen. So we probably should go conservatively type-based here like we do for unions... dang.

Just calling visit_union after doing the field projections would actually work, but it would violate the protocol that lets a visitor keep track of which "path" inside the data structure we are at. The only visitor relying on the path is validation, which doesn't do anything for unions, so this is fine in principle... but it's not nice. Any ideas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new visit_generator_field hook for this. Now at least it makes sense, and likely nobody will ever overwrite that hook...

// (which is the state) are actually implicitly `MaybeUninit`, i.e.,
// they may or may not be initialized, so we cannot visit them.
match v.layout().ty.sty {
ty::Generator(..) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niche code also has an exception for generator fields

rust/src/librustc/ty/layout.rs

Lines 1812 to 1817 in 7a0cef7

// Locals variables which live across yields are stored
// in the generator type as fields. These may be uninitialized
// so we don't look for niches there.
if let ty::Generator(..) = layout.ty.sty {
return Ok(None);
}

Would it make sense to try to simplify all downstream code for generators by wrapping all its fields with MaybeUninit very early?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, yes that would be the same exception.

I am not sure how complicated it would be for generators to do this wrapping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO generators should be treated like an union with field offsets.
Unless we want to generate "variants" for the states involved, which would be a bit more work, but would provide a safe view into the state of the generator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't have Union layout though, so right now they need special treatment everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we want to generate "variants" for the states involved, which would be a bit more work, but would provide a safe view into the state of the generator.

I was considering that, but I don't know if that actually works in a non-scary way, as you'll want to switch from one variant to another without copying everything.

IMO generators should be treated like an union with field offsets.

but why the entire generator? The discriminant field is perfectly safe to read and we could even do value range restrictions on it to be able to use niche optimizations on generators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discriminant field is perfectly safe to read and we could even do value range restrictions on it to be able to use niche optimizations on generators.

In fact that would make perfect sense, it encodes the state after all and hence has a limited value range.

@RalfJung
Copy link
Member Author

Cc @Nemo157

@cramertj
Copy link
Member

I'm assuming this is going to be another size pessimization for generators. :( sighs and looks longingly at #52924

@RalfJung
Copy link
Member Author

@cramertj I don't follow. This PR doesn't change generator layout at all, and layout computation already pretty much treats them as MaybeUninit because all your code would miscompile if it didn't.^^

@cramertj
Copy link
Member

@RalfJung Ah I missed the comment above saying that we already ignored niches in the layout optimizations. You could imagine initializing the object such that it had a bit-valid repr on creation to prevent UB, but we don't do that, so... :)

@eddyb
Copy link
Member

eddyb commented Nov 20, 2018

@cramertj Since it's like a tagged enum, IMO we should use the tag ("current state") as a niche, by giving it a validity range based on the number of states.
We can probably even give it a full tagged enum layout, with "variant" layouts.

@RalfJung
Copy link
Member Author

That might not even be possible for some types (uninhabited types, for example), and certainly be "fun" for references (which have to be dereferencable).^^

But also... why? MaybeUninit actually expresses the reality quite well here; the local variables of this generator are not initialized yet, after all. More abstractly, the "body" of a generator really is just some kind of arena used as the backing store in lieu of a proper stack frame. We don't do layout optimizations on stack frames either.^^ (I hope this doesn't give @eddyb ideas...)

@cramertj
Copy link
Member

@RalfJung

We don't do layout optimizations on stack frames either

I mean, this doesn't seem unreasonable to me?

@RalfJung
Copy link
Member Author

TBH I don't even see how it helps, let alone how it ever amortizes the cost of having to set the right bit pattern on initialization.^^

But, anyway, if the state tag gets a niche then Option<Future> will get layout optimized. I cannot think of a way how anything else would even be possible. And certainly all of this is entirely off-topic in this PR, which is about figuring out what the current invariant and layout of generators is, not about improving it. ;)

@withoutboats
Copy link
Contributor

withoutboats commented Nov 20, 2018

I think the most valuable size optimization would be that the discriminants of all the generators in a stack of generators get unified into a single discriminant value. I doubt using the niches of fields is that important.

@RalfJung
Copy link
Member Author

Coming back to the topic of this PR... it seems everyone agrees that currently, the fields of a Generator are de-facto MaybeUninit, and hence the miri visitor should treat them as such? So, can we proceed and land this?

@oli-obk
Copy link
Contributor

oli-obk commented Nov 22, 2018

@bors r+

Yes. this PR represents the current state of how the compiler views generators and I think this code will break if we try to change that representation, so we'll notice

@bors
Copy link
Contributor

bors commented Nov 22, 2018

📌 Commit 6befe67 has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 22, 2018
pietroalbini added a commit to pietroalbini/rust that referenced this pull request Nov 25, 2018
…-obk

generator fields are not necessarily initialized

Looking at the MIR we generate for generators, I think we deliberately leave fields of the generator uninitialized in ways that would be illegal if this was a normal struct (or rather, one would have to use `MaybeUninit`). Consider [this example](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=417b4a2950421b726dd7b307e9ee3bec):
```rust
#![feature(generators, generator_trait)]

fn main() {
    let generator = || {
        let mut x = Box::new(5);
        {
            let y = &mut *x;
            *y = 5;
            yield *y;
            *y = 10;
        }
        *x
    };
    let _gen = generator;
}
```

It generates the MIR
```
fn main() -> (){
    let mut _0: ();                      // return place
    scope 1 {
        scope 3 {
        }
        scope 4 {
            let _2: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "_gen" in scope 4 at src/main.rs:14:9: 14:13
        }
    }
    scope 2 {
        let _1: [generator@src/main.rs:4:21: 13:6 for<'r> {std::boxed::Box<i32>, i32, &'r mut i32, ()}]; // "generator" in scope 2 at src/main.rs:4:9: 4:18
    }

    bb0: {
        StorageLive(_1);                 // bb0[0]: scope 0 at src/main.rs:4:9: 4:18
        (_1.0: u32) = const 0u32;        // bb0[1]: scope 0 at src/main.rs:4:21: 13:6
                                         // ty::Const
                                         // + ty: u32
                                         // + val: Scalar(Bits { size: 4, bits: 0 })
                                         // mir::Constant
                                         // + span: src/main.rs:4:21: 13:6
                                         // + ty: u32
                                         // + literal: Const { ty: u32, val: Scalar(Bits { size: 4, bits: 0 }) }
        StorageLive(_2);                 // bb0[2]: scope 1 at src/main.rs:14:9: 14:13
        _2 = move _1;                    // bb0[3]: scope 1 at src/main.rs:14:16: 14:25
        drop(_2) -> bb1;                 // bb0[4]: scope 1 at src/main.rs:15:1: 15:2
    }

    bb1: {
        StorageDead(_2);                 // bb1[0]: scope 1 at src/main.rs:15:1: 15:2
        StorageDead(_1);                 // bb1[1]: scope 0 at src/main.rs:15:1: 15:2
        return;                          // bb1[2]: scope 0 at src/main.rs:15:2: 15:2
    }
}
```
Notice how we only initialize the first field of `_1` (even though it contains a `Box`!), and then assign it to `_2`. This violates the rule "on assignment, all data must satisfy the validity invariant", and hence miri complains about this code.

What this PR effectively does is to change the validity invariant for generators such that it says nothing about the fields of the generator. We behave as if every field of the generator was wrapped in a `MaybeUninit`.

r? @oli-obk

Cc @nikomatsakis @eddyb @cramertj @withoutboats @Zoxc
bors added a commit that referenced this pull request Nov 25, 2018
Rollup of 14 pull requests

Successful merges:

 - #56024 (Don't auto-inline const functions)
 - #56045 (Check arg/ret sizedness at ExprKind::Path)
 - #56072 (Stabilize macro_literal_matcher)
 - #56075 (Encode a custom "producers" section in wasm files)
 - #56100 (generator fields are not necessarily initialized)
 - #56101 (Incorporate `dyn` into more comments and docs.)
 - #56144 (Fix BTreeSet and BTreeMap gdb pretty-printers)
 - #56151 (Move a flaky process test out of libstd)
 - #56170 (Fix self profiler ICE on Windows)
 - #56176 (Panic setup msg)
 - #56204 (Suggest correct enum variant on typo)
 - #56207 (Stabilize the int_to_from_bytes feature)
 - #56210 (read_c_str should call the AllocationExtra hooks)
 - #56211 ([master] Forward-ports from beta)

Failed merges:

r? @ghost
@bors bors merged commit 6befe67 into rust-lang:master Nov 25, 2018
@RalfJung RalfJung deleted the visiting-generators branch November 30, 2018 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants