New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile time regression #43141

Closed
Boddlnagg opened this Issue Jul 10, 2017 · 15 comments

Comments

Projects
None yet
8 participants
@Boddlnagg
Contributor

Boddlnagg commented Jul 10, 2017

There was a major compile time (and memory consumption) regression when I compare building winrt-rust on Rust 1.18 stable and current 1.20 nightly. (I couldn't test on beta because error: the optionZis only accepted on the nightly compiler.)

rustc 1.18.0 (03fc9d6 2017-06-06): 137.31 secs, 2873 MB peak memory
Compiling winrt v0.2.1 (file:///D:/Code/Rust/winrt-rust)
warning: the option `Z` is unstable and should only be used on the nightly compiler, but it is currently accepted for backwards compatibility; this will soon change, see issue #31847 for more details

time: 2.145; rss: 449MB parsing
time: 0.000; rss: 449MB recursion limit
time: 0.000; rss: 449MB crate injection
time: 0.000; rss: 449MB plugin loading
time: 0.000; rss: 449MB plugin registration
time: 12.711; rss: 808MB        expansion
time: 0.000; rss: 808MB maybe building test harness
time: 0.129; rss: 808MB maybe creating a macro crate
time: 0.000; rss: 808MB checking for inline asm in case the target doesn't support it
time: 0.537; rss: 808MB early lint checks
time: 0.158; rss: 808MB AST validation
time: 1.477; rss: 935MB name resolution
time: 0.710; rss: 935MB complete gated feature checking
time: 2.102; rss: 1453MB        lowering ast -> hir
time: 0.642; rss: 1361MB        indexing hir
time: 0.142; rss: 1361MB        attribute checking
time: 0.088; rss: 948MB language item collection
time: 0.451; rss: 954MB lifetime resolution
time: 0.000; rss: 954MB looking for entry point
time: 0.004; rss: 954MB looking for plugin registrar
time: 0.858; rss: 1095MB        region resolution
time: 0.194; rss: 1095MB        loop checking
time: 0.102; rss: 1095MB        static item recursion checking
time: 1.601; rss: 1110MB        compute_incremental_hashes_map
time: 0.000; rss: 1110MB        load_dep_graph
time: 0.248; rss: 1110MB        stability index
time: 0.437; rss: 1123MB        stability checking
time: 1.905; rss: 1292MB        type collecting
time: 0.122; rss: 1293MB        variance inference
time: 0.236; rss: 1307MB        impl wf inference
time: 3.658; rss: 1328MB        coherence checking
time: 8.390; rss: 1419MB        wf checking
time: 5.613; rss: 1477MB        item-types checking
time: 48.962; rss: 2098MB       item-bodies checking
time: 3.430; rss: 2164MB        const checking
time: 2.696; rss: 2174MB        privacy checking
time: 0.700; rss: 2174MB        intrinsic checking
time: 0.257; rss: 2174MB        effect checking
time: 0.880; rss: 2175MB        match checking
time: 0.513; rss: 2175MB        liveness checking
time: 3.851; rss: 2589MB        MIR dump
time: 0.499; rss: 2645MB      SimplifyCfg
time: 1.689; rss: 2645MB      TypeckMir
time: 1.294; rss: 2648MB      QualifyAndPromoteConstants
time: 0.087; rss: 2648MB      SimplifyBranches
time: 0.344; rss: 2648MB      SimplifyCfg
time: 3.914; rss: 2648MB        MIR cleanup and validation
time: 5.023; rss: 2655MB        borrow checking
time: 0.471; rss: 2662MB        reachability checking
time: 1.121; rss: 2662MB        death checking
time: 0.000; rss: 2662MB        unused lib feature checking
time: 3.540; rss: 2662MB        lint checking
time: 0.000; rss: 2662MB        resolving dependency formats
time: 0.085; rss: 2662MB      NoLandingPads
time: 0.349; rss: 2662MB      SimplifyCfg
time: 0.579; rss: 2677MB      EraseRegions
time: 0.168; rss: 2677MB      AddCallGuards
time: 1.869; rss: 2679MB      ElaborateDrops
time: 0.084; rss: 2679MB      NoLandingPads
time: 0.388; rss: 2679MB      SimplifyCfg
time: 0.000; rss: 2679MB      Inline
time: 0.307; rss: 2679MB      InstCombine
time: 0.533; rss: 2679MB      Deaggregator
time: 0.100; rss: 2679MB      CopyPropagation
time: 0.357; rss: 2681MB      SimplifyLocals
time: 0.168; rss: 2681MB      AddCallGuards
time: 0.093; rss: 2681MB      PreTrans
time: 5.084; rss: 2681MB        MIR optimisations
time: 2.501; rss: 2802MB      write metadata
time: 0.991; rss: 2831MB      translation item collection
time: 0.115; rss: 2873MB      codegen unit partitioning
time: 0.134; rss: 2866MB      internalize symbols
time: 6.013; rss: 2836MB        translation
time: 0.000; rss: 2836MB        assert dep graph
time: 0.000; rss: 2836MB        serialize dep graph
time: 0.204; rss: 357MB       llvm function passes [0]
time: 1.705; rss: 361MB       llvm module passes [0]
time: 1.634; rss: 371MB       codegen passes [0]
time: 0.002; rss: 371MB       codegen passes [0]
time: 3.600; rss: 371MB LLVM passes
time: 0.000; rss: 371MB serialize work products
time: 0.639; rss: 358MB linking
 Finished release [optimized] target(s) in 137.31 secs
rustc 1.20.0-nightly (d84693b 2017-07-09): 262.64 secs, 3346 MB peak memory
Compiling winrt v0.2.1 (file:///D:/Code/Rust/winrt-rust)
time: 2.448; rss: 449MB parsing
time: 0.000; rss: 449MB recursion limit
time: 0.000; rss: 449MB crate injection
time: 0.000; rss: 449MB plugin loading
time: 0.000; rss: 449MB plugin registration
time: 12.485; rss: 782MB        expansion
time: 0.000; rss: 782MB maybe building test harness
time: 0.142; rss: 782MB maybe creating a macro crate
time: 0.593; rss: 782MB creating allocators
time: 0.000; rss: 782MB checking for inline asm in case the target doesn't support it
time: 0.602; rss: 782MB early lint checks
time: 0.162; rss: 782MB AST validation
time: 2.673; rss: 982MB name resolution
time: 0.825; rss: 982MB complete gated feature checking
time: 2.580; rss: 1538MB        lowering ast -> hir
time: 0.728; rss: 1391MB        indexing hir
time: 0.166; rss: 1391MB        attribute checking
time: 0.105; rss: 964MB language item collection
time: 0.536; rss: 971MB lifetime resolution
time: 0.000; rss: 971MB looking for entry point
time: 0.005; rss: 971MB looking for plugin registrar
time: 0.228; rss: 971MB loop checking
time: 0.124; rss: 971MB static item recursion checking
time: 1.963; rss: 986MB compute_incremental_hashes_map
time: 0.000; rss: 986MB load_dep_graph
time: 0.271; rss: 987MB stability index
time: 0.490; rss: 1000MB        stability checking
time: 2.389; rss: 1163MB        type collecting
time: 0.327; rss: 1178MB        impl wf inference
time: 5.498; rss: 1195MB        coherence checking
time: 0.042; rss: 1195MB        variance testing
time: 96.245; rss: 1580MB       wf checking
time: 6.401; rss: 1634MB        item-types checking
time: 52.552; rss: 2229MB       item-bodies checking
time: 9.095; rss: 2350MB        const checking
time: 4.262; rss: 2360MB        privacy checking
time: 0.352; rss: 2363MB        intrinsic checking
time: 0.364; rss: 2363MB        effect checking
time: 0.638; rss: 2363MB        match checking
time: 0.607; rss: 2363MB        liveness checking
time: 15.272; rss: 3014MB       borrow checking
time: 0.499; rss: 3022MB        reachability checking
time: 1.118; rss: 3023MB        death checking
time: 0.000; rss: 3023MB        unused lib feature checking
time: 6.384; rss: 3023MB        lint checking
time: 0.000; rss: 3023MB        resolving dependency formats
time: 21.937; rss: 3226MB     write metadata
time: 1.238; rss: 3254MB      translation item collection
time: 0.119; rss: 3309MB      codegen unit partitioning
time: 0.187; rss: 3346MB      internalize symbols
time: 26.257; rss: 3338MB       translation
time: 0.000; rss: 3338MB        assert dep graph
time: 0.000; rss: 3338MB        serialize dep graph
time: 0.003; rss: 430MB       codegen passes [0]
time: 0.226; rss: 432MB       llvm function passes [1]
time: 1.994; rss: 436MB       llvm module passes [1]
time: 1.853; rss: 445MB       codegen passes [1]
time: 4.135; rss: 445MB LLVM passes
time: 0.000; rss: 445MB serialize work products
time: 0.597; rss: 431MB linking
 Finished release [optimized] target(s) in 262.64 secs
 

winrt-rust certainly is a rather exotic crate in that it mainly consists of a > 200k LOC generated file that wraps Windows COM APIs, but something seems to be wrong here, especially in the wf checking pass, which went from 8.4 seconds to 96.2 seconds.

To reproduce the numbers, you can run cargo rustc --release --features all -- -Z time-passes (--features all is important, otherwise only part of the crate is being built).

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Jul 10, 2017

Member

cc @rust-lang/compiler -- Possibly just the standard regression currently caused by MIR borrowck and allocators being introduced, but somewhat seems different.

Member

Mark-Simulacrum commented Jul 10, 2017

cc @rust-lang/compiler -- Possibly just the standard regression currently caused by MIR borrowck and allocators being introduced, but somewhat seems different.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Jul 10, 2017

Member

For WF checking I'd ask @nikomatsakis and @arielb1 specifically. Sounds like a new check was added, or an existing one became much slower.

Member

eddyb commented Jul 10, 2017

For WF checking I'd ask @nikomatsakis and @arielb1 specifically. Sounds like a new check was added, or an existing one became much slower.

@Boddlnagg

This comment has been minimized.

Show comment
Hide comment
@Boddlnagg

Boddlnagg Jul 12, 2017

Contributor

I just tested on beta (rustc 1.19.0-beta.3) without -Z time-passes (which is no longer allowed) and it took 246.59 secs. So the major regression seems to have been introduced between 1.18 and 1.19.

Contributor

Boddlnagg commented Jul 12, 2017

I just tested on beta (rustc 1.19.0-beta.3) without -Z time-passes (which is no longer allowed) and it took 246.59 secs. So the major regression seems to have been introduced between 1.18 and 1.19.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jul 12, 2017

Contributor

@Boddlnagg any chance you can run this in some sort of other profiler? I presume it only builds on Windows, right?

Contributor

nikomatsakis commented Jul 12, 2017

@Boddlnagg any chance you can run this in some sort of other profiler? I presume it only builds on Windows, right?

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jul 12, 2017

Contributor

@Boddlnagg another awesome thing -- if you are able to do it -- would be bisecting with nightly releases to try and narrow this down a bit more. I don't have easy access to a windows machine, but I'll try to find someone who does also.

Contributor

nikomatsakis commented Jul 12, 2017

@Boddlnagg another awesome thing -- if you are able to do it -- would be bisecting with nightly releases to try and narrow this down a bit more. I don't have easy access to a windows machine, but I'll try to find someone who does also.

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Jul 12, 2017

Member

Bisection narrowed to 75b0568...14f30da61(commits between nightly-2017-05-16 and nightly-2017-05-21. It seems likely this means #41911 is at fault here, cc @michaelwoerister. No guarantees though. cargo check works to reproduce, so anyone should be able to test this.

Member

Mark-Simulacrum commented Jul 12, 2017

Bisection narrowed to 75b0568...14f30da61(commits between nightly-2017-05-16 and nightly-2017-05-21. It seems likely this means #41911 is at fault here, cc @michaelwoerister. No guarantees though. cargo check works to reproduce, so anyone should be able to test this.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jul 12, 2017

Contributor

Ah, #41911 could certainly have done it, though the reason is not yet obvious.

Contributor

nikomatsakis commented Jul 12, 2017

Ah, #41911 could certainly have done it, though the reason is not yet obvious.

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Jul 13, 2017

Contributor

Seems like this won't be fixed for 1.19, but it looks pretty bad.

Contributor

brson commented Jul 13, 2017

Seems like this won't be fixed for 1.19, but it looks pretty bad.

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Jul 27, 2017

Member

@rust-lang/compiler -- this looks like it needs a look from someone to investigate the cause here.

Member

Mark-Simulacrum commented Jul 27, 2017

@rust-lang/compiler -- this looks like it needs a look from someone to investigate the cause here.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jul 27, 2017

Contributor

Assigning to @arielb1 to do profiling before/after #41911.

Contributor

nikomatsakis commented Jul 27, 2017

Assigning to @arielb1 to do profiling before/after #41911.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Aug 7, 2017

Contributor

So after one possibly-unneeded reboot into Windows, it turns out that... drumroll

#41911 broke is_relevant_trait_impl:
https://github.com/rust-lang/rust/pull/41911/files#diff-6bfb45b8d6ccab4d1e18e623dee893faR192

+    let relevant: Vec<DefId> = all_trait_impls
+        .non_blanket_impls
+        .iter()
+        .cloned()
+        .filter(|&impl_def_id| {
+            let impl_self_ty = tcx.type_of(impl_def_id);
+            let impl_simple_self_ty = fast_reject::simplify_type(tcx,
+                                                                 impl_self_ty,
+                                                                 false).unwrap();
+            impl_simple_self_ty == self_ty
+        })
+        .collect(); 
Contributor

arielb1 commented Aug 7, 2017

So after one possibly-unneeded reboot into Windows, it turns out that... drumroll

#41911 broke is_relevant_trait_impl:
https://github.com/rust-lang/rust/pull/41911/files#diff-6bfb45b8d6ccab4d1e18e623dee893faR192

+    let relevant: Vec<DefId> = all_trait_impls
+        .non_blanket_impls
+        .iter()
+        .cloned()
+        .filter(|&impl_def_id| {
+            let impl_self_ty = tcx.type_of(impl_def_id);
+            let impl_simple_self_ty = fast_reject::simplify_type(tcx,
+                                                                 impl_self_ty,
+                                                                 false).unwrap();
+            impl_simple_self_ty == self_ty
+        })
+        .collect(); 
@eddyb

This comment has been minimized.

Show comment
Hide comment
@michaelwoerister

This comment has been minimized.

Show comment
Hide comment
@michaelwoerister

michaelwoerister Aug 7, 2017

Contributor

@eddyb I thought you were talking about sharing a single blanket_impls vec between all results for the same trait back then.

Contributor

michaelwoerister commented Aug 7, 2017

@eddyb I thought you were talking about sharing a single blanket_impls vec between all results for the same trait back then.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Aug 7, 2017

Member

@michaelwoerister Maybe I noticed one problem, but clearly not all of them.

Member

eddyb commented Aug 7, 2017

@michaelwoerister Maybe I noticed one problem, but clearly not all of them.

arielb1 added a commit to arielb1/rust that referenced this issue Aug 7, 2017

make `for_all_relevant_impls` O(1) again
A change in #41911 had made `for_all_relevant_impls` do a linear scan over
all impls, instead of using an HashMap. Use an HashMap again to avoid
quadratic blowup when there is a large number of structs with impls.

I think this fixes #43141 completely, but I want better measurements in
order to be sure. As a perf patch, please don't roll this up.

arielb1 added a commit to arielb1/rust that referenced this issue Aug 7, 2017

make `for_all_relevant_impls` O(1) again
A change in #41911 had made `for_all_relevant_impls` do a linear scan over
all impls, instead of using an HashMap. Use an HashMap again to avoid
quadratic blowup when there is a large number of structs with impls.

I think this fixes #43141 completely, but I want better measurements in
order to be sure. As a perf patch, please don't roll this up.
@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Aug 8, 2017

Member

@arielb1 Responding to your ping on IRC, yes, I was able to reproduce via cargo check (just had to pass --target the-relevant-windows-triple).

Member

Mark-Simulacrum commented Aug 8, 2017

@arielb1 Responding to your ping on IRC, yes, I was able to reproduce via cargo check (just had to pass --target the-relevant-windows-triple).

arielb1 added a commit to arielb1/rust that referenced this issue Aug 8, 2017

make `for_all_relevant_impls` O(1) again
A change in #41911 had made `for_all_relevant_impls` do a linear scan over
all impls, instead of using an HashMap. Use an HashMap again to avoid
quadratic blowup when there is a large number of structs with impls.

I think this fixes #43141 completely, but I want better measurements in
order to be sure. As a perf patch, please don't roll this up.

bors added a commit that referenced this issue Aug 8, 2017

Auto merge of #43723 - arielb1:nonincremental-fast-reject, r=eddyb
make `for_all_relevant_impls` O(1) again

A change in #41911 had made `for_all_relevant_impls` do a linear scan over
all impls, instead of using an HashMap. Use an HashMap again to avoid
quadratic blowup when there is a large number of structs with impls.

I think this fixes #43141 completely, but I want better measurements in
order to be sure. As a perf patch, please don't roll this up.

r? @eddyb
beta-nominating because regression

@bors bors closed this in #43723 Aug 8, 2017

alexcrichton added a commit to alexcrichton/rust that referenced this issue Aug 23, 2017

make `for_all_relevant_impls` O(1) again
A change in #41911 had made `for_all_relevant_impls` do a linear scan over
all impls, instead of using an HashMap. Use an HashMap again to avoid
quadratic blowup when there is a large number of structs with impls.

I think this fixes #43141 completely, but I want better measurements in
order to be sure. As a perf patch, please don't roll this up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment