Compile time regression #43141

Boddlnagg · 2017-07-10T11:04:01Z

There was a major compile time (and memory consumption) regression when I compare building winrt-rust on Rust 1.18 stable and current 1.20 nightly. (I couldn't test on beta because error: the option Z is only accepted on the nightly compiler.)

rustc 1.18.0 (03fc9d6 2017-06-06): 137.31 secs, 2873 MB peak memory

Compiling winrt v0.2.1 (file:///D:/Code/Rust/winrt-rust)
warning: the option `Z` is unstable and should only be used on the nightly compiler, but it is currently accepted for backwards compatibility; this will soon change, see issue #31847 for more details

time: 2.145; rss: 449MB parsing
time: 0.000; rss: 449MB recursion limit
time: 0.000; rss: 449MB crate injection
time: 0.000; rss: 449MB plugin loading
time: 0.000; rss: 449MB plugin registration
time: 12.711; rss: 808MB        expansion
time: 0.000; rss: 808MB maybe building test harness
time: 0.129; rss: 808MB maybe creating a macro crate
time: 0.000; rss: 808MB checking for inline asm in case the target doesn't support it
time: 0.537; rss: 808MB early lint checks
time: 0.158; rss: 808MB AST validation
time: 1.477; rss: 935MB name resolution
time: 0.710; rss: 935MB complete gated feature checking
time: 2.102; rss: 1453MB        lowering ast -> hir
time: 0.642; rss: 1361MB        indexing hir
time: 0.142; rss: 1361MB        attribute checking
time: 0.088; rss: 948MB language item collection
time: 0.451; rss: 954MB lifetime resolution
time: 0.000; rss: 954MB looking for entry point
time: 0.004; rss: 954MB looking for plugin registrar
time: 0.858; rss: 1095MB        region resolution
time: 0.194; rss: 1095MB        loop checking
time: 0.102; rss: 1095MB        static item recursion checking
time: 1.601; rss: 1110MB        compute_incremental_hashes_map
time: 0.000; rss: 1110MB        load_dep_graph
time: 0.248; rss: 1110MB        stability index
time: 0.437; rss: 1123MB        stability checking
time: 1.905; rss: 1292MB        type collecting
time: 0.122; rss: 1293MB        variance inference
time: 0.236; rss: 1307MB        impl wf inference
time: 3.658; rss: 1328MB        coherence checking
time: 8.390; rss: 1419MB        wf checking
time: 5.613; rss: 1477MB        item-types checking
time: 48.962; rss: 2098MB       item-bodies checking
time: 3.430; rss: 2164MB        const checking
time: 2.696; rss: 2174MB        privacy checking
time: 0.700; rss: 2174MB        intrinsic checking
time: 0.257; rss: 2174MB        effect checking
time: 0.880; rss: 2175MB        match checking
time: 0.513; rss: 2175MB        liveness checking
time: 3.851; rss: 2589MB        MIR dump
time: 0.499; rss: 2645MB      SimplifyCfg
time: 1.689; rss: 2645MB      TypeckMir
time: 1.294; rss: 2648MB      QualifyAndPromoteConstants
time: 0.087; rss: 2648MB      SimplifyBranches
time: 0.344; rss: 2648MB      SimplifyCfg
time: 3.914; rss: 2648MB        MIR cleanup and validation
time: 5.023; rss: 2655MB        borrow checking
time: 0.471; rss: 2662MB        reachability checking
time: 1.121; rss: 2662MB        death checking
time: 0.000; rss: 2662MB        unused lib feature checking
time: 3.540; rss: 2662MB        lint checking
time: 0.000; rss: 2662MB        resolving dependency formats
time: 0.085; rss: 2662MB      NoLandingPads
time: 0.349; rss: 2662MB      SimplifyCfg
time: 0.579; rss: 2677MB      EraseRegions
time: 0.168; rss: 2677MB      AddCallGuards
time: 1.869; rss: 2679MB      ElaborateDrops
time: 0.084; rss: 2679MB      NoLandingPads
time: 0.388; rss: 2679MB      SimplifyCfg
time: 0.000; rss: 2679MB      Inline
time: 0.307; rss: 2679MB      InstCombine
time: 0.533; rss: 2679MB      Deaggregator
time: 0.100; rss: 2679MB      CopyPropagation
time: 0.357; rss: 2681MB      SimplifyLocals
time: 0.168; rss: 2681MB      AddCallGuards
time: 0.093; rss: 2681MB      PreTrans
time: 5.084; rss: 2681MB        MIR optimisations
time: 2.501; rss: 2802MB      write metadata
time: 0.991; rss: 2831MB      translation item collection
time: 0.115; rss: 2873MB      codegen unit partitioning
time: 0.134; rss: 2866MB      internalize symbols
time: 6.013; rss: 2836MB        translation
time: 0.000; rss: 2836MB        assert dep graph
time: 0.000; rss: 2836MB        serialize dep graph
time: 0.204; rss: 357MB       llvm function passes [0]
time: 1.705; rss: 361MB       llvm module passes [0]
time: 1.634; rss: 371MB       codegen passes [0]
time: 0.002; rss: 371MB       codegen passes [0]
time: 3.600; rss: 371MB LLVM passes
time: 0.000; rss: 371MB serialize work products
time: 0.639; rss: 358MB linking
 Finished release [optimized] target(s) in 137.31 secs

rustc 1.20.0-nightly (d84693b 2017-07-09): 262.64 secs, 3346 MB peak memory

Compiling winrt v0.2.1 (file:///D:/Code/Rust/winrt-rust)
time: 2.448; rss: 449MB parsing
time: 0.000; rss: 449MB recursion limit
time: 0.000; rss: 449MB crate injection
time: 0.000; rss: 449MB plugin loading
time: 0.000; rss: 449MB plugin registration
time: 12.485; rss: 782MB        expansion
time: 0.000; rss: 782MB maybe building test harness
time: 0.142; rss: 782MB maybe creating a macro crate
time: 0.593; rss: 782MB creating allocators
time: 0.000; rss: 782MB checking for inline asm in case the target doesn't support it
time: 0.602; rss: 782MB early lint checks
time: 0.162; rss: 782MB AST validation
time: 2.673; rss: 982MB name resolution
time: 0.825; rss: 982MB complete gated feature checking
time: 2.580; rss: 1538MB        lowering ast -> hir
time: 0.728; rss: 1391MB        indexing hir
time: 0.166; rss: 1391MB        attribute checking
time: 0.105; rss: 964MB language item collection
time: 0.536; rss: 971MB lifetime resolution
time: 0.000; rss: 971MB looking for entry point
time: 0.005; rss: 971MB looking for plugin registrar
time: 0.228; rss: 971MB loop checking
time: 0.124; rss: 971MB static item recursion checking
time: 1.963; rss: 986MB compute_incremental_hashes_map
time: 0.000; rss: 986MB load_dep_graph
time: 0.271; rss: 987MB stability index
time: 0.490; rss: 1000MB        stability checking
time: 2.389; rss: 1163MB        type collecting
time: 0.327; rss: 1178MB        impl wf inference
time: 5.498; rss: 1195MB        coherence checking
time: 0.042; rss: 1195MB        variance testing
time: 96.245; rss: 1580MB       wf checking
time: 6.401; rss: 1634MB        item-types checking
time: 52.552; rss: 2229MB       item-bodies checking
time: 9.095; rss: 2350MB        const checking
time: 4.262; rss: 2360MB        privacy checking
time: 0.352; rss: 2363MB        intrinsic checking
time: 0.364; rss: 2363MB        effect checking
time: 0.638; rss: 2363MB        match checking
time: 0.607; rss: 2363MB        liveness checking
time: 15.272; rss: 3014MB       borrow checking
time: 0.499; rss: 3022MB        reachability checking
time: 1.118; rss: 3023MB        death checking
time: 0.000; rss: 3023MB        unused lib feature checking
time: 6.384; rss: 3023MB        lint checking
time: 0.000; rss: 3023MB        resolving dependency formats
time: 21.937; rss: 3226MB     write metadata
time: 1.238; rss: 3254MB      translation item collection
time: 0.119; rss: 3309MB      codegen unit partitioning
time: 0.187; rss: 3346MB      internalize symbols
time: 26.257; rss: 3338MB       translation
time: 0.000; rss: 3338MB        assert dep graph
time: 0.000; rss: 3338MB        serialize dep graph
time: 0.003; rss: 430MB       codegen passes [0]
time: 0.226; rss: 432MB       llvm function passes [1]
time: 1.994; rss: 436MB       llvm module passes [1]
time: 1.853; rss: 445MB       codegen passes [1]
time: 4.135; rss: 445MB LLVM passes
time: 0.000; rss: 445MB serialize work products
time: 0.597; rss: 431MB linking
 Finished release [optimized] target(s) in 262.64 secs

winrt-rust certainly is a rather exotic crate in that it mainly consists of a > 200k LOC generated file that wraps Windows COM APIs, but something seems to be wrong here, especially in the wf checking pass, which went from 8.4 seconds to 96.2 seconds.

To reproduce the numbers, you can run cargo rustc --release --features all -- -Z time-passes (--features all is important, otherwise only part of the crate is being built).

The text was updated successfully, but these errors were encountered:

Mark-Simulacrum · 2017-07-10T11:48:19Z

cc @rust-lang/compiler -- Possibly just the standard regression currently caused by MIR borrowck and allocators being introduced, but somewhat seems different.

eddyb · 2017-07-10T12:41:20Z

For WF checking I'd ask @nikomatsakis and @arielb1 specifically. Sounds like a new check was added, or an existing one became much slower.

Boddlnagg · 2017-07-12T09:18:53Z

I just tested on beta (rustc 1.19.0-beta.3) without -Z time-passes (which is no longer allowed) and it took 246.59 secs. So the major regression seems to have been introduced between 1.18 and 1.19.

nikomatsakis · 2017-07-12T20:23:18Z

@Boddlnagg any chance you can run this in some sort of other profiler? I presume it only builds on Windows, right?

nikomatsakis · 2017-07-12T20:24:44Z

@Boddlnagg another awesome thing -- if you are able to do it -- would be bisecting with nightly releases to try and narrow this down a bit more. I don't have easy access to a windows machine, but I'll try to find someone who does also.

Mark-Simulacrum · 2017-07-12T21:20:03Z

Bisection narrowed to 75b0568...14f30da61(commits between nightly-2017-05-16 and nightly-2017-05-21. It seems likely this means #41911 is at fault here, cc @michaelwoerister. No guarantees though. cargo check works to reproduce, so anyone should be able to test this.

nikomatsakis · 2017-07-12T21:25:01Z

Ah, #41911 could certainly have done it, though the reason is not yet obvious.

brson · 2017-07-13T16:22:41Z

Seems like this won't be fixed for 1.19, but it looks pretty bad.

Mark-Simulacrum · 2017-07-27T16:30:51Z

@rust-lang/compiler -- this looks like it needs a look from someone to investigate the cause here.

nikomatsakis · 2017-07-27T20:28:57Z

Assigning to @arielb1 to do profiling before/after #41911.

arielb1 · 2017-08-07T19:30:37Z

So after one possibly-unneeded reboot into Windows, it turns out that... drumroll

#41911 broke is_relevant_trait_impl:
https://github.com/rust-lang/rust/pull/41911/files#diff-6bfb45b8d6ccab4d1e18e623dee893faR192

+    let relevant: Vec<DefId> = all_trait_impls
+        .non_blanket_impls
+        .iter()
+        .cloned()
+        .filter(|&impl_def_id| {
+            let impl_self_ty = tcx.type_of(impl_def_id);
+            let impl_simple_self_ty = fast_reject::simplify_type(tcx,
+                                                                 impl_self_ty,
+                                                                 false).unwrap();
+            impl_simple_self_ty == self_ty
+        })
+        .collect();

eddyb · 2017-08-07T20:09:34Z

@arielb1 Ahhh I wish we had a good testcase back then.

michaelwoerister · 2017-08-07T20:35:46Z

@eddyb I thought you were talking about sharing a single blanket_impls vec between all results for the same trait back then.

eddyb · 2017-08-07T20:47:39Z

@michaelwoerister Maybe I noticed one problem, but clearly not all of them.

Mark-Simulacrum · 2017-08-08T01:13:15Z

@arielb1 Responding to your ping on IRC, yes, I was able to reproduce via cargo check (just had to pass --target the-relevant-windows-triple).

A change in rust-lang#41911 had made `for_all_relevant_impls` do a linear scan over all impls, instead of using an HashMap. Use an HashMap again to avoid quadratic blowup when there is a large number of structs with impls. I think this fixes rust-lang#43141 completely, but I want better measurements in order to be sure. As a perf patch, please don't roll this up.

@eddyb

make `for_all_relevant_impls` O(1) again A change in #41911 had made `for_all_relevant_impls` do a linear scan over all impls, instead of using an HashMap. Use an HashMap again to avoid quadratic blowup when there is a large number of structs with impls. I think this fixes #43141 completely, but I want better measurements in order to be sure. As a perf patch, please don't roll this up. r? @eddyb beta-nominating because regression

A change in rust-lang#41911 had made `for_all_relevant_impls` do a linear scan over all impls, instead of using an HashMap. Use an HashMap again to avoid quadratic blowup when there is a large number of structs with impls. I think this fixes rust-lang#43141 completely, but I want better measurements in order to be sure. As a perf patch, please don't roll this up.

Mark-Simulacrum added I-compiletime Issue: Problems and improvements with respect to compile times. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jul 10, 2017

Mark-Simulacrum added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 10, 2017

eddyb added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jul 12, 2017

glandium mentioned this issue Jul 13, 2017

Firefox compile time regression #43211

Closed

Mark-Simulacrum mentioned this issue Jul 13, 2017

Track wf checking better rust-lang-deprecated/rustc-perf-collector#6

Closed

brson added I-nominated P-high High priority labels Jul 13, 2017

alexcrichton added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Jul 23, 2017

nikomatsakis assigned arielb1 Jul 27, 2017

nikomatsakis removed the I-nominated label Jul 27, 2017

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 28, 2017

arielb1 mentioned this issue Aug 7, 2017

make for_all_relevant_impls O(1) again #43723

Merged

bors closed this as completed in #43723 Aug 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile time regression #43141

Compile time regression #43141

Boddlnagg commented Jul 10, 2017

Mark-Simulacrum commented Jul 10, 2017

eddyb commented Jul 10, 2017

Boddlnagg commented Jul 12, 2017

nikomatsakis commented Jul 12, 2017

nikomatsakis commented Jul 12, 2017

Mark-Simulacrum commented Jul 12, 2017

nikomatsakis commented Jul 12, 2017

brson commented Jul 13, 2017

Mark-Simulacrum commented Jul 27, 2017

nikomatsakis commented Jul 27, 2017

arielb1 commented Aug 7, 2017

eddyb commented Aug 7, 2017

michaelwoerister commented Aug 7, 2017

eddyb commented Aug 7, 2017

Mark-Simulacrum commented Aug 8, 2017

Compile time regression #43141

Compile time regression #43141

Comments

Boddlnagg commented Jul 10, 2017

Mark-Simulacrum commented Jul 10, 2017

eddyb commented Jul 10, 2017

Boddlnagg commented Jul 12, 2017

nikomatsakis commented Jul 12, 2017

nikomatsakis commented Jul 12, 2017

Mark-Simulacrum commented Jul 12, 2017

nikomatsakis commented Jul 12, 2017

brson commented Jul 13, 2017

Mark-Simulacrum commented Jul 27, 2017

nikomatsakis commented Jul 27, 2017

arielb1 commented Aug 7, 2017

eddyb commented Aug 7, 2017

michaelwoerister commented Aug 7, 2017

eddyb commented Aug 7, 2017

Mark-Simulacrum commented Aug 8, 2017