New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding --emit=asm speeds up generated code #57235

Open
jrmuizel opened this Issue Dec 31, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@jrmuizel
Copy link
Contributor

jrmuizel commented Dec 31, 2018

With the following rust code:

pub fn main() {
    print_triples();
    println!("hello");
}


fn print_triples() {
    let mut i = 0 as i32;
    for z in 1.. {
        for x in 1..=z {
            for y in x..=z {
                if x*x + y*y == z*z {
                    i = i + 1;
                    if i == 1000 {
                        return;
                    }
                }
            }
        }
    }
}

I get:

/tmp/pythagoras$ rustc --emit=link -O simple.rs
/tmp/pythagoras$ time ./simple
hello

real	0m0.290s
user	0m0.287s
sys	0m0.002s
/tmp/pythagoras$ rustc --emit=asm,link -O simple.rs
/tmp/pythagoras$ time ./simple
hello

real	0m0.005s
user	0m0.002s
sys	0m0.002s
/tmp/pythagoras$ rustc --version
rustc 1.32.0-nightly (400c2bc5e 2018-11-27)
/tmp/pythagoras$
@jonas-schievink

This comment has been minimized.

Copy link
Member

jonas-schievink commented Dec 31, 2018

AFAIK this forces rustc to use a single codegen unit. This generally makes code run faster because every function is available for inlining, although I don't see what might cause such a drastic difference.

You can try to reproduce by using -Ccodegen-units=1.

@jrmuizel

This comment has been minimized.

Copy link
Contributor

jrmuizel commented Dec 31, 2018

Indeed -Ccodegen-units=1 fixes the problem. It's pretty surprising/dangerous that --emit=asm changes the generated code. Why is a single codegen unit forced with --emit=asm?

@jonas-schievink

This comment has been minimized.

Copy link
Member

jonas-schievink commented Dec 31, 2018

This was done in #30208. Multiple codegen units would result in multiple compilation outputs, which is generally not expected when using --emit (which should only output a single file).

EDIT: Also see #30063, which is now obsolete since the build system changed, but the discussion there is still relevant.

@nikic

This comment has been minimized.

Copy link
Contributor

nikic commented Dec 31, 2018

@jrmuizel I think the answer is that nobody has implemented the necessary handling for that. If there are multiple codegen units and we want to produce a single artifact, we'd have to merge the LLVM modules prior to emitting IR/BC/asm (unless LTO, either thin or fat, already takes care of that).

I agree that the current behavior is not great, as these are often used for debugging performance issues and changing the number of codegen-units can impact optimization a lot.

@lambda

This comment has been minimized.

Copy link
Contributor

lambda commented Dec 31, 2018

One thing to note is that the difference in speed listed here is a bit artificial; this program doesn't actually do anything in the print_triples loop, so it looks like with one codegen unit, it's eliminated entirely, while with the default settings its actually running the loop even though it has no effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment