Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower binaries since some days #42935

Closed
leonardo-m opened this Issue Jun 27, 2017 · 20 comments

Comments

Projects
None yet
6 participants
@leonardo-m
Copy link

leonardo-m commented Jun 27, 2017

This could have the same causes of Issue 42808 (i.e. regression due to #42313), or perhaps not.

This small test program solves the Euler problem number 214:

#![feature(step_by)]

fn main() {
    fn e214() -> u64 {
        const N_MAX: usize = 40_000_000;
        const SIZE: usize = N_MAX / 2;
        const LEN: u8 = 25;

        let mut chain_length = vec![2u8; SIZE];
        let mut tot: u64 = 0;

        for i in 1 .. SIZE {
            // if not prime, then try next one
            if chain_length[i] > 2 { continue; }

            // p is prime
            let p = 2 * i + 1;

            // calculate q*2^k = p-1
            let mut q = i;
            let mut k = 1;
            while (q & 1) == 0 {
                q /= 2;
                k += 1;
            }

            // calculate cl = chain_length(p)
            let cl = chain_length[q / 2] + k;

            // add p to tot if the length matches.
            if cl == LEN { tot += p as u64; }

            // loop over powers of p.
            let mut pp = p as u64;
            while pp < N_MAX as u64 {
                // Loop over multiples of pp.
                for j in ((pp / 2) .. SIZE as u64).step_by(pp) {
                    chain_length[j as usize] += cl - 2;
                }
                pp *= p as u64;
            }
        }
        tot
    }
    assert_eq!(e214(), 1_677_366_278_943);
}

On my PC using nightly-2017-06-18-x86_64-pc-windows-gnu it runs in about 0.39 seconds, while using a more recent Nightly, like the current rustc 1.19.0-nightly (78d8416 2017-06-17), it runs in about 0.87 seconds.

@leonardo-m

This comment has been minimized.

Copy link
Author

leonardo-m commented Jun 27, 2017

I compile with:
rustc -C opt-level=3 -C target-cpu=native -C panic=abort

@leonardo-m

This comment has been minimized.

Copy link
Author

leonardo-m commented Jun 27, 2017

Old asm compared to the new asm:
https://gist.github.com/anonymous/c4708600946f04b87c8b9586b15e3017

(Moved to a gist after suggestion by @pnkfelix).

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 27, 2017

I also observe a time slowdown on when compiling executing this benchmark on a Linux system (using the same flags -C opt-level=3 -C target-cpu=native -C panic=abort as documented in comments above), comparing e00c040 to 1143eb2

e00c040 1143eb2
real 0m0.265s real 0m0.690s

(I note this largely to add the data point that this is not just a windows issue.)

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 27, 2017

@leonardo-m it might be good to move those asm dumps to a gist; they're pretty long to scroll through on the thread...

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 27, 2017

(Seems obvious from the asm dumps that one piece of low hanging fruit would be to add some inlining directives to Layout::from_size_align, I think...)

@leonardo-m

This comment has been minimized.

Copy link
Author

leonardo-m commented Jun 27, 2017

I have few other similar small programs that got slower in this timeframe (but their slowdown is smaller. On average I've seen a 10%+ increase of run-time).

I also observe a time slowdown on when compiling this benchmark

Yes, compilation time of my programs is also increased (perhaps because the compiler as well is slower).

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 27, 2017

Yes, compilation time of my programs is also increased (perhaps because the compiler as well is slower).

Sorry, I meant "executing this benchmark when compiled with new Allocator support." (The part I was trying to emphasize was that I duplicated the problem on a non-Windows platform)

@pnkfelix pnkfelix self-assigned this Jun 27, 2017

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Jun 27, 2017

Yeah, this was picked up by perf.rlo as well. MIR end regions and then allocators slowed down compilation time in all crates, I believe.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 27, 2017

(yeah a little embarrassing to have authored both of those PRs and have to deal with the fallout now...)

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 28, 2017

A collection of judiciously placed #[inline] attributes seems to completely fix the speed regression here.

@leonardo-m

This comment has been minimized.

Copy link
Author

leonardo-m commented Jun 28, 2017

A collection of judiciously placed #[inline] attributes seems to completely fix the speed regression here.

Very nice. I'll test your changes when they are in a Nightly. And I'll report if there are other performance regressions.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 28, 2017

However, the #[inline] directives make the binary size regression for my microbenchmark on #42808 worse

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 28, 2017

Also, it seems like @alexcrichton has independently added many of the same #[inline] directives in PR #42727, as he mentioned here

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jun 28, 2017

(yes, I just built a rustc atop #42727, and it looks like it will definitely fix the execution time regression described in this issue.)

@leonardo-m

This comment has been minimized.

Copy link
Author

leonardo-m commented Jul 9, 2017

Replacing the step_by feature with iterator_step_by, and using .step_by(pp as usize), now the best-of-three run-time is 0.47 seconds.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Jul 13, 2017

Hmm indeed, seems like the running times regressed further between (734c83642 2017-07-03) and (720c596ec 2017-07-08) (or maybe I'm not adapting the benchmark correctly...)

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Aug 9, 2017

1.20 is now beta

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Aug 12, 2017

Looking into this, some things I've learned:

  • Technically I don't think this is anything other than a nightly regresion due to the nightly feature usage. Despite this though if you use RUSTC_BOOTSTRAP=1 to test 1.19.0, 1.20.0, and 1.21.0, you can see that performance regressed in 1.20.0 and was fixed in 1.21.0.
  • Historically this program used Range::step_by.
  • The Range::step_by function was removed in #43012 (not in 1.20.0)
  • The new method this program is using is Iterator::step_by, introduced in #41439 (not in 1.20.0)
  • The StepBy iterator implementation calls the nth method to do iteration.
  • The nth implementation for ops::Range calls the add_usize method
  • The add_usize method calls TryFrom
  • On 1.20.0 the number 2 function in the profile is a TryFrom-related method
  • This method, which should be inlined, was inlined in #43248

tl;dr this is fixed, no need for a backport, closing.

@leonardo-m

This comment has been minimized.

Copy link
Author

leonardo-m commented Aug 13, 2017

Thank you, this specific program is indeed fixed (run-time is now 0.40 s), but other performance regressions are not yet fixed, so I'll open other bug reports.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Aug 13, 2017

Yes that'd be much appreciated, thanks @leonardo-m!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.