Generators are too big #52924

cramertj · 2018-07-31T22:21:40Z

Currently, generators won't ever reuse storage slots, causing them to take up more space than is necessary:

#![feature(generators)]

fn main() {
    let a = || { 
        {
            let x: i32 = 5;
            yield;
            println!("{:?}", x);
        }
        {
            let x: i32 = 5;
            yield;
            println!("{:?}", x);
        }
    };
    println!("{}", std::mem::size_of_val(&a)); // prints "12", could print "8" (4 for state, 4 for `x`)
}

Finding optimal solutions seems like a difficult packing problem, but it should be easy to do quite a bit better.

Also see the example case in #59087.

RalfJung · 2018-11-20T19:01:10Z

I am wondering if thinking of this as "slots of a struct" is the right model anyway. The content of the generator type (aside from the state flag) is basically the call frame for this generator, so shouldn't this be treated much more like the stack? There probably shouldn't even be types, just a large enough opaque "blob of bytes" that can be used for generator execution. If you think of this in terms of fields, you are not going to be able to reuse the space of two i16 for an i32 later, but there is no reason you shouldn't do that -- just like in { let a = 0i16; let b = 1i16; /* ... */ } let c = 2i32;, c can reuse the space that was previously occupied by a and b.

cramertj · 2018-11-20T19:33:15Z

Yes, it's just a stack of bytes. This optimization was referring to the possibility to overlap some of the bytes to hold data that is currently stored in separate "slots of a struct".

withoutboats · 2018-11-20T20:22:09Z

My intuition about generator layout is that it would be:

A discriminant large enough to hold a value for each yield point.
The maximum number of bytes needed to hold the live stack variables at any yield points.

I expect that this is complicated by alignment and wanting to avoid moving values around inside of the generator as it iterates through states, but clearly we should be doing better than we are if the size is currently the sum of all yield point sizes instead of the max. If benchmarks are bad enough this may be a pre-stabilization priority. :-\

cramertj · 2018-11-20T20:55:18Z

It's the sum of all variables that are alive across a yield point (one slot for every variable, not one slot for every yield point). We can do better by tracking which variables are live across non-intersecting sets of yield points and using the same space to store both.

nikomatsakis · 2019-03-05T18:12:37Z

Marking as blocking for async/await stabilization. This may be too strong, but this issue certainly prevents us from having the performant sort of futures that we want.

@cramertj had advocated for a simple strategy, vaguely described here, and covered in some detail in the video

nikomatsakis · 2019-03-05T18:22:06Z

Created a topic on Zulip which contains some notes on this

cramertj · 2019-03-05T22:20:28Z

I think @tmandry is intending to work on this, so tentatively assigning-- feel free to let me know if I should unassign.

tmandry · 2019-03-11T23:10:15Z

#59087 was merged into this, but I'm not sure it would be fixed by the optimization we were discussing here. Taking this (simplified) segment:

    let composed_2 = async {
        let inner = i_am_1kb();
        println!("");
        await!(inner);
    };

Here, removing the println!(""); statement does not change which variables are live across a yield point, but it does cut down the size of composed_2 by half. So it seems something else is going on here. Am I missing something?

petrochenkov · 2019-05-14T21:54:05Z

I haven't followed the Rust async story closely, but I recently encountered this paper in the most recent C++ committee mailing.
"Coroutines: Language and Implementation Impact" (compares several implementation strategies for stackless coroutines aka #![feature(generators)]).

Apparently the size issue was considered a deal-breaker for adopting the same strategy that's implemented in Rust, and the type-erased version was chosen instead.

Where are Rust coroutines exactly in the "trade-offs" tables from the paper?

cramertj · 2019-05-14T22:41:13Z

@petrochenkov We currently have some optimizations that are performed before the generator type is created, and these are allowed to affect the size_of of the generated type, which can only be calculated after generator layout has been solved.

As far as the requirements around being called recursively, placing it on an ABI boundary, being virtual, these can all be done by choosing to type-erase by using the .boxed() combinator in the futures crate, and returning BoxFuture<'lt, OutputTy>. This lines up with the "library based type-erasure" rather than "compiler based type-erasure" set of properties in the doc (flexible wrapping, fewer customization points, harder to optimize out heap allocation, harder to devirtualize indirect calls, harder to inline).

The header-file bits don't apply to Rust for obvious reasons (though it is interesting to consider that these optimizations are a dependency of crate metadata due to their affects on the generator size).

Matthias247 · 2019-05-19T21:25:20Z

I played a bit around with @tmandry's optimization from #60187 in my project and wanted to share the results here instead of spamming the CR:

Here are my results (for the top level Future in my project):

Old nightly:

[project/src/main.rs:107] std::mem::size_of_val(&x) = 74216
[project/src/main.rs:103] std::mem::size_of_val(&everything) = 37096
[project/src/main.rs:88] std::mem::size_of_val(&starter) = 96
[project/src/main.rs:97] std::mem::size_of_val(&conn_fut) = 11840
[project/src/main.rs:98] std::mem::size_of_val(&send_task_futs) = 16
[project/src/main.rs:99] std::mem::size_of_val(&lifecycle_fut) = 648

tmandry:generator-optimization:

[project/src/main.rs:107] std::mem::size_of_val(&x) = 41608
[project/src/main.rs:103] std::mem::size_of_val(&everything) = 20792
[project/src/main.rs:88] std::mem::size_of_val(&starter) = 96
[project/src/main.rs:97] std::mem::size_of_val(&conn_fut) = 6576
[project/src/main.rs:98] std::mem::size_of_val(&send_task_futs) = 16
[project/src/main.rs:99] std::mem::size_of_val(&lifecycle_fut) = 392

And there is the relevant code which put those things together:

let x = async {
    let everything = async {
        let (ctrl, starter) = connection_builder.build().unwrap();
        dbg!(std::mem::size_of_val(&starter));
        let conn_fut = async move {
            let conn_res = starter.start_connection().await;
            trace!("Res: {:?}", conn_res);
        };

        let send_task_futs = join_all((0..8).map(|_| send_msgs(&ctrl)));
        let lifecycle_fut = lifecycle_task(&ctrl);

        dbg!(std::mem::size_of_val(&conn_fut));
        dbg!(std::mem::size_of_val(&send_task_futs));
        dbg!(std::mem::size_of_val(&lifecycle_fut));

        join!(conn_fut, send_task_futs, lifecycle_fut);
    };
    dbg!(std::mem::size_of_val(&everything));
    everything.await
};

dbg!(std::mem::size_of_val(&x));

block_on(x);

So it definitely seems to help quite a bit. The inner Futures (conn_fut, lifecycle_fut) shrunk, which lead to everything else shrink too. I haven't looked yet into how those inner tasks changed their size in detail.

However it seems like the main issue is still the exponential growth that is caused by some operations. E.g. here size_of_val(everything) should equal size_of_val(&conn_fut) + size_of_val(&send_task_futs) + size_of_val(&lifecycle_fut) = 6984, but it is 20792bytes (3x as much).

Afterwards I was curious on how much join! impacts the problem, since I reported it at rust-lang/futures-rs#1473. In order to do this, I implemented a join future for the use case myself in order to be able to check its size.

The code then looks like:

let x = async {
    let everything = async {
        let (ctrl, starter) = connection_builder.build().unwrap();
        dbg!(std::mem::size_of_val(&starter));
        let conn_fut = async move {
            let conn_res = starter.start_connection().await;
            trace!("Res: {:?}", conn_res);
        };

        let send_task_futs = join_all((0..8).map(|_| send_msgs(&ctrl)));
        let lifecycle_fut = lifecycle_task(&ctrl);

        dbg!(std::mem::size_of_val(&conn_fut));
        dbg!(std::mem::size_of_val(&send_task_futs));
        dbg!(std::mem::size_of_val(&lifecycle_fut));

        let joiner = Joiner {
                a: Some(conn_fut),
                b: Some(send_task_futs),
                c: Some(lifecycle_fut),
                a_res: None,
                b_res: None,
                c_res: None,
        };

        dbg!(std::mem::size_of_val(&joiner));
        joiner.await;
    };
    dbg!(std::mem::size_of_val(&everything));
    everything.await
};

dbg!(std::mem::size_of_val(&x));

block_on(x);

The sizes with that change are:

[project/src/main.rs:205] std::mem::size_of_val(&x) = 42472
[project/src/main.rs:201] std::mem::size_of_val(&everything) = 21224
[project/src/main.rs:174] std::mem::size_of_val(&starter) = 96
[project/src/main.rs:183] std::mem::size_of_val(&conn_fut) = 6576
[project/src/main.rs:184] std::mem::size_of_val(&send_task_futs) = 16
[project/src/main.rs:185] std::mem::size_of_val(&lifecycle_fut) = 392
[project/src/main.rs:196] std::mem::size_of_val(&joiner) = 7032

Which are in the same ballpark as with join! - and indicates that there is no huge overhead for the join of the subtasks (6576+16+392 = 6984 -> 48 bytes overhead for my Joiner struct).

The interesting thing here: Actually there is only a single .await point in this function, which is right at the end of it (joiner.await). Thereby the only thing that needs to be persisted across await points is joiner- thereby the resulting Futures size should be exactly the same 7032bytes. Yet it gets blown up to triple its size.

Edit: The last sentence was not true, ctrl also needs to be persisted since its reference is captured by the subtasks. However it's only 56 bytes.

Maybe those are the issues described in #59087 and #59123

tmandry · 2019-05-22T01:31:22Z

@Matthias247 This is really helpful, thanks! I'm not sure what the root cause here is, but if you put that future in a crate I can checkout and build somewhere, I can probably find out.

@Zoxc

…mertj Preserve local scopes in generator MIR Part of #52924, depended upon by the generator layout optimization #60187. This PR adds `StorageDead` statements in more places in generators, so we can see when non-`Drop` locals have gone out of scope and recover their storage. The reason this is only done for generators is compiler performance. See #60187 (comment) for what happens when we do this for all functions. For `Drop` locals, we modify the `MaybeStorageLive` analysis to use `drop` to indicate that storage is no longer live for the local. Once `drop` returns or unwinds to our function, we implicitly assume that the local is `StorageDead`. Instead of using `drop`, it is possible to emit more `StorageDead` statements in the MIR for `Drop` locals so we can handle all locals the same. I am fine with doing it that way, but this was the simplest approach for my purposes. It is also likely to be more performant. r? @Zoxc (feel free to reassign) cc @cramertj @eddyb @RalfJung @rust-lang/wg-async-await

tmandry · 2019-06-12T02:07:48Z

So I've taken a close look at @Matthias247's example, and believe that the solution here is the same as the one in #59123 (comment).

cramertj · 2019-06-14T21:17:55Z

Removing the "blocking" label here as I believe the most urgent issue has been resolved. These types are still bigger than we'd like them to be and there're lots of opportunities for improvement, but I think we're now at a point where I'd feel comfortable stabilizing WRT size.

PvdBerg1998 · 2019-06-14T21:54:34Z

This issue caused stackoverflows for me in some cases. I think it should be a blocker.

tmandry · 2019-06-14T23:01:32Z

@PvdBerg1998 A major optimization PR (#60187) landed the other day, have you tested with that?

That said, I think there's still a huge improvement left to do in #59123. I'm hoping to have a fix for that up soon.

PvdBerg1998 · 2019-06-15T00:16:21Z

@tmandry I didn't, that's great! I'll take a look some time in the future.

bstrie · 2019-06-16T15:45:39Z

Is generator size only a stability hazard because of the performance implications, or are there concerns about adding future generator size optimizations backward-compatibly? I was under the impression that we only guarantee size_of stability for repr(c) types.

bstrie · 2019-06-16T15:56:17Z

Once we have @tmandry's patch for #59123 it might be nice to do a call on internals for people who have reported overlarge generators to re-test their code so we can get more real-world before/after confidence.

petrochenkov · 2019-10-01T11:18:02Z

I recently had to deal with a special-purpose implementation of mini stackless coroutines based on LLVM IR and used for running GPU shaders with barriers on CPU.
Some approaches that I noticed there that could be used for minimizing state saving/restoring at suspension points:

Some variables that are alive at suspension point are cheaper to recalculate again than to save and restore. In that case code calculating them can be duplicated and then run again after the suspension point.
Some passes like CFG simplification, dead code elimination, mem2reg can be run before building coroutine states to reduce the amount of saving/restoring.

Not sure how applicable these ideas are to our MIR, but I'm still just going to leave them here.

DustinByfuglien · 2019-10-21T18:37:57Z

The result size of generator _gen = 4100.
Can it be optimized to much small size? I feel yes.

#![feature(generators, generator_trait)]

fn main() {
    let _gen = move || {
        let x = [1u8; 1024];
        yield;
        drop(x);
        let x = [1u8; 1024];
        yield;
        drop(x);
        let x = [1u8; 1024];
        yield;
        drop(x);
        let x = [1u8; 1024];
        yield;
        drop(x);
    };
    dbg!(std::mem::size_of_val(&_gen));
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.69s
     Running `target/debug/playground`
[src/main.rs:18] std::mem::size_of_val(&_gen) = 4100

cramertj · 2019-10-21T18:51:35Z

@DustinByfuglien Yes, note that adding manual scopes around each variable keeps the size at 1028 no matter how many are added:

#![feature(generators, generator_trait)]

fn main() {
    let _gen = move || {
        {
            let x = [1u8; 1024];
            yield;
            drop(x);
        }
        {
            let x = [1u8; 1024];
            yield;
            drop(x);
        }
        {
            let x = [1u8; 1024];
            yield;
            drop(x);
        }
        {
            let x = [1u8; 1024];
            yield;
            drop(x);
        }
    };
    dbg!(std::mem::size_of_val(&_gen));
}

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.66s
     Running `target/debug/playground`
[src/main.rs:26] std::mem::size_of_val(&_gen) = 1028

DustinByfuglien · 2019-10-21T19:00:18Z

But adding one more "yield" to each scope give 4100 again.

#![feature(generators, generator_trait)]

fn main() {
    let _gen = || {
        {
            let x = [1u8; 1024];
            yield;
            yield;
            drop(x);
        }
        {
            let x = [1u8; 1024];
            yield;
            yield;
            drop(x);
        }
        {
            let x = [1u8; 1024];
            yield;
            yield;
            drop(x);
        }
        {
            let x = [1u8; 1024];
            yield;
            yield;
            drop(x);
        }

    };
    dbg!(std::mem::size_of_val(&_gen));
}

(Playground)

DustinByfuglien · 2019-10-21T19:09:27Z

@tmandry May be you can take some time to look at this. May be this examples can be useful to improve optimizing algorythm.

cramertj · 2019-10-21T21:32:53Z

Yes, that's because we only consider temporaries to be eligible for overlap if they're only live for exactly one yield point.

bstrie · 2020-01-28T18:57:23Z

The original impetus for this issue (getting generators to ever re-use storage slots) has been addressed to some degree, shall we close this? Alternatively, should we re-purpose this issue into a tracking issue for all generator-size-related issues?

nikomatsakis · 2020-02-04T21:34:38Z

I feel like it would likely be better to create a new tracking issue, rather than inherit all the history from this one.

jonas-schievink · 2020-03-08T14:09:11Z

Closing in favor of #69826

cramertj assigned Zoxc and cramertj and unassigned Zoxc Jul 31, 2018

cramertj added the A-coroutines Area: Coroutines label Jul 31, 2018

cramertj mentioned this issue Aug 2, 2018

Smaller await rust-lang/futures-rs#1159

Merged

This was referenced Nov 13, 2018

Possible stack overflow using await! inside generator rust-lang/futures-rs#1330

Closed

generator fields are not necessarily initialized #56100

Merged

Nemo157 mentioned this issue Nov 21, 2018

Tracking issue for async/await (RFC 2394) #50547

Closed

11 tasks

nikomatsakis added the A-async-await Area: Async & Await label Feb 22, 2019

nikomatsakis unassigned cramertj Mar 5, 2019

cramertj assigned tmandry Mar 5, 2019

Matthias247 mentioned this issue Mar 11, 2019

Generator size: borrowed variables are assumed live across following yield points #59087

Open

cramertj changed the title ~~Reuse generator slots after StorageDead~~ Generators are too big Mar 11, 2019

withoutboats mentioned this issue May 28, 2019

async/await borrows more around await points than necessary #61211

Closed

cramertj added AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. and removed AsyncAwait-Polish Async-await issues that are part of the "polish" area labels Jun 14, 2019

withoutboats mentioned this issue Jun 26, 2019

[Stabilization] async/await MVP #62149

Closed

4 tasks

Centril added the F-coroutines `#![feature(coroutines)]` label Jul 28, 2019

jonas-schievink added the I-heavy Issue: Problems and improvements with respect to binary size of generated code. label Feb 18, 2020

jonas-schievink added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Mar 8, 2020

jonas-schievink mentioned this issue Mar 8, 2020

Tracking Issue for Coroutine Memory Usage #69826

Open

7 tasks

jonas-schievink closed this as completed Mar 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generators are too big #52924

Generators are too big #52924

cramertj commented Jul 31, 2018 •

edited

Loading

RalfJung commented Nov 20, 2018 •

edited

Loading

cramertj commented Nov 20, 2018

withoutboats commented Nov 20, 2018

cramertj commented Nov 20, 2018

nikomatsakis commented Mar 5, 2019

nikomatsakis commented Mar 5, 2019 •

edited

Loading

cramertj commented Mar 5, 2019 •

edited

Loading

tmandry commented Mar 11, 2019

petrochenkov commented May 14, 2019

cramertj commented May 14, 2019

Matthias247 commented May 19, 2019 •

edited

Loading

tmandry commented May 22, 2019

tmandry commented Jun 12, 2019

cramertj commented Jun 14, 2019 •

edited

Loading

PvdBerg1998 commented Jun 14, 2019

tmandry commented Jun 14, 2019

PvdBerg1998 commented Jun 15, 2019

bstrie commented Jun 16, 2019

bstrie commented Jun 16, 2019

petrochenkov commented Oct 1, 2019 •

edited

Loading

DustinByfuglien commented Oct 21, 2019

cramertj commented Oct 21, 2019

DustinByfuglien commented Oct 21, 2019

DustinByfuglien commented Oct 21, 2019

cramertj commented Oct 21, 2019

bstrie commented Jan 28, 2020

nikomatsakis commented Feb 4, 2020

jonas-schievink commented Mar 8, 2020

Generators are too big #52924

Generators are too big #52924

Comments

cramertj commented Jul 31, 2018 • edited Loading

RalfJung commented Nov 20, 2018 • edited Loading

cramertj commented Nov 20, 2018

withoutboats commented Nov 20, 2018

cramertj commented Nov 20, 2018

nikomatsakis commented Mar 5, 2019

nikomatsakis commented Mar 5, 2019 • edited Loading

cramertj commented Mar 5, 2019 • edited Loading

tmandry commented Mar 11, 2019

petrochenkov commented May 14, 2019

cramertj commented May 14, 2019

Matthias247 commented May 19, 2019 • edited Loading

tmandry commented May 22, 2019

tmandry commented Jun 12, 2019

cramertj commented Jun 14, 2019 • edited Loading

PvdBerg1998 commented Jun 14, 2019

tmandry commented Jun 14, 2019

PvdBerg1998 commented Jun 15, 2019

bstrie commented Jun 16, 2019

bstrie commented Jun 16, 2019

petrochenkov commented Oct 1, 2019 • edited Loading

DustinByfuglien commented Oct 21, 2019

cramertj commented Oct 21, 2019

DustinByfuglien commented Oct 21, 2019

DustinByfuglien commented Oct 21, 2019

cramertj commented Oct 21, 2019

bstrie commented Jan 28, 2020

nikomatsakis commented Feb 4, 2020

jonas-schievink commented Mar 8, 2020

cramertj commented Jul 31, 2018 •

edited

Loading

RalfJung commented Nov 20, 2018 •

edited

Loading

nikomatsakis commented Mar 5, 2019 •

edited

Loading

cramertj commented Mar 5, 2019 •

edited

Loading

Matthias247 commented May 19, 2019 •

edited

Loading

cramertj commented Jun 14, 2019 •

edited

Loading

petrochenkov commented Oct 1, 2019 •

edited

Loading