Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack space in async closures is not shared between branches in debug builds #72247

Closed
khuey opened this issue May 15, 2020 · 14 comments
Closed
Labels
A-async-await A-generators AsyncAwait-Triaged C-enhancement I-heavy T-compiler

Comments

@khuey
Copy link
Contributor

@khuey khuey commented May 15, 2020

Consider:

#![feature(test)]
use std::future::Future;
use std::hint::black_box;
use std::pin::Pin;

#[derive(PartialEq, Eq)]
enum Kind {
    Thing1,
    Thing2,
    Thing3,
}

impl Into<i32> for Kind {
    fn into(self) -> i32 {
        match self {
            Kind::Thing1 => 42,
            Kind::Thing2 => -1,
            Kind::Thing3 => 0,
        }
    }
}

fn f1(kind: Kind) -> Pin<Box<dyn Future<Output = i32>>> {
    Box::pin(async move {
        match kind {
            Kind::Thing1 => return kind.into(),
            Kind::Thing2 => return kind.into(),
            Kind::Thing3 => return kind.into(),
        }
    })
}

fn f2(kind: Kind) -> Pin<Box<dyn Future<Output = i32>>> {
    match kind {
        Kind::Thing1 => Box::pin(async move { kind.into() }),
        Kind::Thing2 => Box::pin(async move { kind.into() }),
        Kind::Thing3 => Box::pin(async move { kind.into() }),
    }
}

fn f3(kind: Kind) -> Pin<Box<dyn Future<Output = i32>>> {
    Box::pin(async move {
        if kind == Kind::Thing1 {
            return kind.into();
        } else if kind == Kind::Thing2 {
            return kind.into();
        } else {
            return kind.into();
        }
    })
}

fn main() {
    let f_1 = f1(Kind::Thing2);
    let f_2 = f2(Kind::Thing2);
    let f_3 = f3(Kind::Thing2);
    black_box(Kind::Thing1);
    black_box(Kind::Thing3);
    black_box(f_1);
    black_box(f_2);
    black_box(f_3);
}

If you build this (on 2020-05-14 a74d186) you'll see that the stack frames for f1::{{closure}} and f3::{{closure}} are bigger than those for the f2::{{closure}s. If you try to add or remove a branch you'll also see that the size of f1::{{closure}} and f3::{{closure}} stacks vary accordingly.

This is frustrating because it means that branch heavy code blows out the stack much faster in a debug build than it does in an opt build when recursing. I saw an order of magnitude difference in stack size in one of my functions between debug and opt because of this.

@khuey khuey added the C-bug label May 15, 2020
@jonas-schievink jonas-schievink added C-enhancement T-compiler I-heavy and removed C-bug labels May 15, 2020
@jonas-schievink
Copy link
Member

@jonas-schievink jonas-schievink commented May 15, 2020

Is this really limited to async code?

@khuey
Copy link
Contributor Author

@khuey khuey commented May 15, 2020

I don't know. There's not really any equivalent to f2 in non-async code, so it's not as easy to measure.

@khuey
Copy link
Contributor Author

@khuey khuey commented May 15, 2020

From a quick test on a non-async function with a bunch of trivial branches the stack space is constant regardless of how many branches I add.

@khuey
Copy link
Contributor Author

@khuey khuey commented May 16, 2020

I believe the problem here is panic handling. In create_generator_resume_function the generator MIR code takes every basic block in the generator with a terminator that a) can unwind but b) doesn't have an unwind handler already and c) isn't in a cleanup block and points its unwind handler at the generator's poisoned block.

This means any MIR of the form

    bbN: {
        StorageLive(_X);
        _Y = call of some sort -> [return: bbN+1];
    }

    bbN+1: {
        StorageDead(_X);
        goto -> bbN+2;
    }

becomes

    bbN: {
        StorageLive(_X);
        _Y = call of some sort -> [return: bbN+1, unwind: bbPoisoned];
    }

    bbN+1: {
        StorageDead(_X);
        goto -> bbN+2;
    }

    bbPoisoned (cleanup): {
        discriminant(Generator) = 2;
        resume;
    }

So now every storage location that was live across something that could theoretically panic is now live at the same time on the poisoned block, which means none of them can share stack space. Thus the size of the stack blows out enormously. This is also consistent with my observation that adding a match branch that calls no functions but merely returns a constant does not increase the stack size.

So it seems to me that, in essence, #61373 has been regressed here. Maybe @tmandry has some idea what's going on?

@khuey
Copy link
Contributor Author

@khuey khuey commented May 16, 2020

And, to be clear, I simplified this example from some async code with a 100KB stack frame, so this is (at least a part of) a rather serious problem.

@jonas-schievink
Copy link
Member

@jonas-schievink jonas-schievink commented May 16, 2020

All generators in the posted code have a state size of 2 fwiw

@khuey
Copy link
Contributor Author

@khuey khuey commented May 16, 2020

Sorry if I wasn't clear: the values here are not live across yield points so they don't take up space in the generator itself, just on the C stack.

@jonas-schievink
Copy link
Member

@jonas-schievink jonas-schievink commented May 16, 2020

Yeah, you're right. It's just that #61373 was meant to reduce the state size, not the stack usage. This issue was likely introduced by #69814 then (even though it did not affect state size negatively).

@jonas-schievink jonas-schievink added A-generators A-async-await labels May 16, 2020
@tmandry
Copy link
Contributor

@tmandry tmandry commented May 19, 2020

@khuey your analysis makes sense to me – the locals are all considered StorageLive when they hit the cleanup block, and that means they must all have stack space reserved. LLVM is smart enough to optimize the stack space allocation in release, but apparently doesn't do such an analysis in debug builds.

cc #68622, here's a case where using StorageDead as a statement makes it awkward/inefficient when we really just want to say "all these locals are dead before point X."

For now, I think we can revert the "collapsing" of cleanup blocks that #69814 did. @jonas-schievink, are there reasons not to besides possible compile-time slowness?

@tmandry tmandry added this to On deck in wg-async work via automation May 19, 2020
@tmandry tmandry added the AsyncAwait-Triaged label May 19, 2020
@jonas-schievink
Copy link
Member

@jonas-schievink jonas-schievink commented May 19, 2020

All cleanup edges need to poison the generator for soundness, that was the intention behind that change.

@tmandry
Copy link
Contributor

@tmandry tmandry commented May 19, 2020

Oh I see, sorry, I glossed over that. I guess we can add the poison statement to every block that terminates in Resume instead of redirecting them all to the same block, even though it seems unfortunate.

@tmandry tmandry self-assigned this Jun 11, 2020
@tmandry tmandry moved this from On deck to In progress in wg-async work Jun 11, 2020
tmandry added a commit to tmandry/rust that referenced this issue Jun 12, 2020
@tmandry
Copy link
Contributor

@tmandry tmandry commented Jun 12, 2020

So, I did some experimenting here. It turns out that this was pre-existing before #61373. Even if we emit storage markers to LLVM, it doesn't seem to make use of them when optimizations are disabled. I have a work-in-progress branch that does this.

Is there some attribute we can emit on the resume function that will cause LLVM to optimize storage of that function?

@tmandry
Copy link
Contributor

@tmandry tmandry commented Jun 12, 2020

Is there some attribute we can emit on the resume function that will cause LLVM to optimize [stack] storage of that function?

Tagging @nikic who might know?

@tmandry tmandry moved this from In progress to On deck in wg-async work Jul 7, 2020
@tmandry tmandry removed their assignment Jul 7, 2020
@tmandry tmandry removed this from On deck in wg-async work Jul 14, 2020
@khuey
Copy link
Contributor Author

@khuey khuey commented Mar 27, 2022

I think this has been fixed at some point. I don't see this on the test case I provided anymore.

@khuey khuey closed this as completed Mar 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-async-await A-generators AsyncAwait-Triaged C-enhancement I-heavy T-compiler
Projects
None yet
Development

No branches or pull requests

3 participants