Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIR CopyPropagation is slow #36673

Open
chrivers opened this Issue Sep 23, 2016 · 36 comments

Comments

Projects
None yet
9 participants
@chrivers
Copy link

chrivers commented Sep 23, 2016

Hello everyone

I've tried searching for a github issue that reports this, but I couldn't find anything, so I hope this isn't a duplicate. I'm on nightly, since my project uses type macros. Overall, there is a good amount of macro expansion going on, since I build parsers for binary data based on structs.

Compilation time for a 5500 LOC project has been around 7-9 seconds on my Core i7-4770, going a bit up and down with changes in the project, and changes in nightly.

However, today this changed very drastically:

[chrivers@helios]~/git/artemis/oxide #cargo build
Compiling oxide v0.1.0 (file:///home/chrivers/git/artemis/oxide)
Finished debug [unoptimized + debuginfo] target(s) in 21.94 secs

This is more than twice as slow as before! Something pretty significant must have happened.

This version is definitely affected:
rustc 1.13.0-nightly (4f9812a59 2016-09-21)

Honestly, I'm not quite sure when it worked, but I can say that the version I used yesterday, which was probably no more than 1 week old, worked much better.

@chrivers

This comment has been minimized.

Copy link
Author

chrivers commented Sep 23, 2016

I figured out how to install older nightly versions with rustup - I'm going to try searching for the problem

@jonas-schievink

This comment has been minimized.

Copy link
Member

jonas-schievink commented Sep 23, 2016

Is the code available somewhere (or a smaller reproduction of the bug)? Can you try running with cargo rustc -Ztime-passes?

@chrivers

This comment has been minimized.

Copy link
Author

chrivers commented Sep 23, 2016

I can definitely send the code in private, but unfortunately I can't make it publicly available at this point. I'd be more than happy to send you a targz (or some other format, if you prefer)

I'll try a few different versions to see where it happens, and measure time-passes of the affected version(s)

@chrivers

This comment has been minimized.

Copy link
Author

chrivers commented Sep 23, 2016

e9bc1bac8 2016-08-24: 16.27 # rustc --version
2c01bb885 2016-08-31: 15.88 # rustc --version
f1f40f850 2016-09-09: 15.33 # rustc --version
9399229 2016-09-14: 24.39 # cargo --version (by mistake)
4f9812a59 2016-09-21: 23.23 # rustc --version

And here's the timing pass on the last version:

[chrivers@helios]~/git/artemis/oxide #cargo rustc --bin oxide -- -Ztime-passes -Ztime-llvm-passes                  [pts/0 # 11945H 1J 0R # 2016-09-23 18:27:25 # 0+08:11:05 # 0.65 1.06 0.86]
   Compiling oxide v0.1.0 (file:///home/chrivers/git/artemis/oxide)
warning: this feature has been stable since 1.13.0. Attribute no longer needed, #[warn(stable_features)] on by default
 --> src/lib.rs:2:12
  |
2 | #![feature(type_macros)]
  |            ^^^^^^^^^^^

time: 0.002; rss: 49MB  parsing
time: 0.000; rss: 49MB  recursion limit
time: 0.000; rss: 49MB  crate injection
time: 0.000; rss: 49MB  plugin loading
time: 0.000; rss: 49MB  plugin registration
time: 0.024; rss: 74MB  expansion
time: 0.000; rss: 74MB  maybe building test harness
time: 0.000; rss: 74MB  maybe creating a macro crate
time: 0.000; rss: 74MB  checking for inline asm in case the target doesn't support it
time: 0.000; rss: 74MB  complete gated feature checking
time: 0.000; rss: 74MB  collecting defs
time: 0.028; rss: 93MB  external crate/lib resolution
time: 0.000; rss: 93MB  early lint checks
time: 0.000; rss: 93MB  AST validation
time: 0.002; rss: 93MB  name resolution
time: 0.000; rss: 94MB  lowering ast -> hir
time: 0.000; rss: 94MB  indexing hir
time: 0.000; rss: 94MB  attribute checking
time: 0.000; rss: 94MB  language item collection
time: 0.000; rss: 94MB  lifetime resolution
time: 0.000; rss: 94MB  looking for entry point
time: 0.000; rss: 94MB  looking for plugin registrar
time: 0.000; rss: 94MB  region resolution
time: 0.000; rss: 94MB  loop checking
time: 0.000; rss: 94MB  static item recursion checking
time: 0.002; rss: 94MB  compute_incremental_hashes_map
time: 0.000; rss: 94MB  load_dep_graph
time: 0.003; rss: 96MB  type collecting
time: 0.000; rss: 96MB  variance inference
time: 0.007; rss: 96MB  coherence checking
time: 0.009; rss: 98MB  wf checking
time: 0.002; rss: 98MB  item-types checking
time: 0.056; rss: 104MB item-bodies checking
time: 0.000; rss: 104MB drop-impl checking
time: 0.002; rss: 104MB const checking
time: 0.000; rss: 104MB privacy checking
time: 0.000; rss: 104MB stability index
time: 0.000; rss: 104MB intrinsic checking
time: 0.000; rss: 104MB effect checking
time: 0.000; rss: 104MB match checking
time: 0.000; rss: 105MB liveness checking
time: 0.001; rss: 105MB rvalue checking
time: 0.003; rss: 106MB MIR dump
  time: 0.000; rss: 106MB       SimplifyCfg
  time: 0.001; rss: 106MB       QualifyAndPromoteConstants
  time: 0.001; rss: 106MB       TypeckMir
  time: 0.000; rss: 106MB       SimplifyBranches
  time: 0.000; rss: 106MB       SimplifyCfg
time: 0.002; rss: 106MB MIR passes
time: 0.005; rss: 108MB borrow checking
time: 0.000; rss: 108MB reachability checking
time: 0.000; rss: 108MB death checking
time: 0.000; rss: 108MB stability checking
time: 0.000; rss: 108MB unused lib feature checking
time: 0.006; rss: 108MB lint checking
time: 0.004; rss: 108MB resolving dependency formats
  time: 0.000; rss: 108MB       NoLandingPads
  time: 0.000; rss: 108MB       SimplifyCfg
  time: 0.001; rss: 108MB       EraseRegions
  time: 0.000; rss: 108MB       AddCallGuards
  time: 0.001; rss: 108MB       ElaborateDrops
  time: 0.000; rss: 108MB       NoLandingPads
  time: 0.000; rss: 108MB       SimplifyCfg
  time: 0.000; rss: 108MB       InstCombine
  time: 0.000; rss: 108MB       Deaggregator
  time: 0.005; rss: 108MB       CopyPropagation
  time: 0.000; rss: 108MB       AddCallGuards
  time: 0.000; rss: 108MB       PreTrans
time: 0.008; rss: 108MB Prepare MIR codegen passes
  time: 0.000; rss: 108MB       write metadata
  time: 0.045; rss: 114MB       translation item collection
  time: 0.004; rss: 114MB       codegen unit partitioning
  time: 0.004; rss: 127MB       internalize symbols
time: 0.290; rss: 127MB translation
time: 0.000; rss: 127MB assert dep graph
time: 0.000; rss: 127MB serialize dep graph
  time: 0.026; rss: 115MB       llvm function passes [0]
  time: 0.010; rss: 116MB       llvm module passes [0]
  time: 0.583; rss: 120MB       codegen passes [0]
  time: 0.000; rss: 120MB       codegen passes [0]
===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1240 seconds (0.1095 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0200 ( 18.5%)   0.0000 (  0.0%)   0.0200 ( 16.1%)   0.0304 ( 27.8%)  Instruction Selection
   0.0160 ( 14.8%)   0.0040 ( 25.0%)   0.0200 ( 16.1%)   0.0185 ( 16.9%)  Instruction Scheduling
   0.0160 ( 14.8%)   0.0080 ( 50.0%)   0.0240 ( 19.4%)   0.0155 ( 14.1%)  DAG Combining 1
   0.0120 ( 11.1%)   0.0000 (  0.0%)   0.0120 (  9.7%)   0.0104 (  9.5%)  DAG Combining 2
   0.0160 ( 14.8%)   0.0000 (  0.0%)   0.0160 ( 12.9%)   0.0104 (  9.5%)  Instruction Creation
   0.0120 ( 11.1%)   0.0040 ( 25.0%)   0.0160 ( 12.9%)   0.0093 (  8.5%)  DAG Legalization
   0.0080 (  7.4%)   0.0000 (  0.0%)   0.0080 (  6.5%)   0.0084 (  7.7%)  Type Legalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0033 (  3.0%)  DAG Combining after legalize types
   0.0080 (  7.4%)   0.0000 (  0.0%)   0.0080 (  6.5%)   0.0017 (  1.6%)  Vector Legalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0016 (  1.4%)  Instruction Scheduling Cleanup
   0.1080 (100.0%)   0.0160 (100.0%)   0.1240 (100.0%)   0.1095 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0360 seconds (0.0622 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0160 ( 57.1%)   0.0080 (100.0%)   0.0240 ( 66.7%)   0.0423 ( 68.0%)  Debug Info Emission
   0.0120 ( 42.9%)   0.0000 (  0.0%)   0.0120 ( 33.3%)   0.0194 ( 31.2%)  DWARF Exception Writer
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.8%)  DWARF Debug Writer
   0.0280 (100.0%)   0.0080 (100.0%)   0.0360 (100.0%)   0.0622 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.5400 seconds (0.5185 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.2360 ( 50.4%)   0.0280 ( 38.9%)   0.2640 ( 48.9%)   0.2088 ( 40.3%)  X86 DAG->DAG Instruction Selection
   0.0760 ( 16.2%)   0.0200 ( 27.8%)   0.0960 ( 17.8%)   0.1273 ( 24.5%)  X86 Assembly / Object Emitter
   0.0200 (  4.3%)   0.0000 (  0.0%)   0.0200 (  3.7%)   0.0294 (  5.7%)  Module Verifier
   0.0280 (  6.0%)   0.0000 (  0.0%)   0.0280 (  5.2%)   0.0245 (  4.7%)  Module Verifier
   0.0240 (  5.1%)   0.0040 (  5.6%)   0.0280 (  5.2%)   0.0228 (  4.4%)  Module Verifier
   0.0040 (  0.9%)   0.0000 (  0.0%)   0.0040 (  0.7%)   0.0147 (  2.8%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0160 (  3.4%)   0.0000 (  0.0%)   0.0160 (  3.0%)   0.0129 (  2.5%)  Fast Register Allocator
   0.0120 (  2.6%)   0.0000 (  0.0%)   0.0120 (  2.2%)   0.0082 (  1.6%)  Machine Function Analysis
   0.0040 (  0.9%)   0.0000 (  0.0%)   0.0040 (  0.7%)   0.0073 (  1.4%)  Inliner for always_inline functions
   0.0160 (  3.4%)   0.0000 (  0.0%)   0.0160 (  3.0%)   0.0070 (  1.4%)  Insert stack protectors
   0.0040 (  0.9%)   0.0000 (  0.0%)   0.0040 (  0.7%)   0.0065 (  1.3%)  Live DEBUG_VALUE analysis
   0.0080 (  1.7%)   0.0040 (  5.6%)   0.0120 (  2.2%)   0.0049 (  0.9%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0080 ( 11.1%)   0.0080 (  1.5%)   0.0046 (  0.9%)  Two-Address instruction pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0038 (  0.7%)  Scalar Evolution Analysis
   0.0040 (  0.9%)   0.0000 (  0.0%)   0.0040 (  0.7%)   0.0035 (  0.7%)  Natural Loop Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0033 (  0.6%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0029 (  0.6%)  Dominator Tree Construction
   0.0040 (  0.9%)   0.0000 (  0.0%)   0.0040 (  0.7%)   0.0027 (  0.5%)  Function Alias Analysis Results
   0.0000 (  0.0%)   0.0040 (  5.6%)   0.0040 (  0.7%)   0.0026 (  0.5%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0024 (  0.5%)  Expand Atomic instructions
   0.0080 (  1.7%)   0.0000 (  0.0%)   0.0080 (  1.5%)   0.0022 (  0.4%)  CallGraph Construction
   0.0040 (  0.9%)   0.0000 (  0.0%)   0.0040 (  0.7%)   0.0019 (  0.4%)  Exception handling preparation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.3%)  Post-RA pseudo instruction expansion pass
   0.0000 (  0.0%)   0.0040 (  5.6%)   0.0040 (  0.7%)   0.0012 (  0.2%)  Remove unreachable blocks from the CFG
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.2%)  X86 pseudo instruction expansion pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.2%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.2%)  Bundle Machine CFG Edges
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.2%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.1%)  Expand ISel Pseudo-instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.1%)  Insert XRay ops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.1%)  Contiguously Lay Out Funclets
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.1%)  X86 FP Stackifier
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.1%)  Implement the 'patchable-function' attribute
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.1%)  X86 PIC Global Base Reg Initialization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.1%)  Local Stack Slot Allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.1%)  X86 WinAlloca Expander
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.1%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.1%)  StackMap Liveness Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.1%)  X86 vzeroupper inserter
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.1%)  Safe Stack instrumentation pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.1%)  Shadow Stack GC Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.1%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Force set function attributes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.4680 (100.0%)   0.0720 (100.0%)   0.5400 (100.0%)   0.5185 (100.0%)  Total

time: 0.622; rss: 120MB LLVM passes
time: 0.000; rss: 120MB serialize work products
  time: 0.456; rss: 109MB       running linker
time: 0.458; rss: 109MB linking
    Finished debug [unoptimized + debuginfo] target(s) in 21.80 secs

I don't quite understand the output.. it seems like the timing pass describes almost none of the walltime, and yet rustc is pegged at 100% CPU for about 22 seconds before it even gets to the first line of the time output. I don't know if this is because it's summarized only after completion?

Otherwise, something weird seems to be going on..

@chrivers

This comment has been minimized.

Copy link
Author

chrivers commented Sep 23, 2016

Here's something else interesting:

[chrivers@helios]~/git/artemis/oxide #cargo rustc --bin oxide -- -Zparse-only
   Compiling oxide v0.1.0 (file:///home/chrivers/git/artemis/oxide)
warning: this feature has been stable since 1.13.0. Attribute no longer needed, #[warn(stable_features)] on by default
 --> src/lib.rs:2:12
  |
2 | #![feature(type_macros)]
  |            ^^^^^^^^^^^

    Finished debug [unoptimized + debuginfo] target(s) in 20.63 secs

This means about 94% of the time is spent on the parsing step (or the steps leading up to it).

I'm guessing this means my macros are too heavy?

@jonas-schievink

This comment has been minimized.

Copy link
Member

jonas-schievink commented Sep 23, 2016

cargo rustc only passes the flags to the last crate compiled, which is a binary in your case. Try cargo rustc --lib -- -Ztime-passes.

@chrivers

This comment has been minimized.

Copy link
Author

chrivers commented Sep 23, 2016

Oops, you're quite right. This is that:

[chrivers@helios]~/git/artemis/oxide #cargo rustc --lib -- -Ztime-passes
   Compiling oxide v0.1.0 (file:///home/chrivers/git/artemis/oxide)
time: 0.014; rss: 53MB  parsing
time: 0.000; rss: 53MB  recursion limit
time: 0.000; rss: 53MB  crate injection
time: 0.000; rss: 53MB  plugin loading
time: 0.000; rss: 53MB  plugin registration
time: 0.587; rss: 136MB expansion
time: 0.000; rss: 136MB maybe building test harness
time: 0.005; rss: 136MB maybe creating a macro crate
time: 0.000; rss: 136MB checking for inline asm in case the target doesn't support it
time: 0.009; rss: 136MB complete gated feature checking
time: 0.032; rss: 138MB collecting defs
time: 0.017; rss: 142MB external crate/lib resolution
time: 0.022; rss: 142MB early lint checks
time: 0.007; rss: 142MB AST validation
time: 0.074; rss: 159MB name resolution
time: 0.042; rss: 187MB lowering ast -> hir
time: 0.009; rss: 192MB indexing hir
time: 0.006; rss: 192MB attribute checking
time: 0.005; rss: 147MB language item collection
time: 0.008; rss: 147MB lifetime resolution
time: 0.000; rss: 147MB looking for entry point
time: 0.000; rss: 147MB looking for plugin registrar
time: 0.037; rss: 157MB region resolution
time: 0.005; rss: 157MB loop checking
time: 0.005; rss: 157MB static item recursion checking
time: 0.221; rss: 159MB compute_incremental_hashes_map
time: 0.000; rss: 159MB load_dep_graph
time: 0.020; rss: 161MB type collecting
time: 0.001; rss: 161MB variance inference
time: 0.040; rss: 167MB coherence checking
time: 0.044; rss: 168MB wf checking
time: 0.144; rss: 171MB item-types checking
time: 2.333; rss: 204MB item-bodies checking
time: 0.000; rss: 204MB drop-impl checking
time: 0.157; rss: 206MB const checking
time: 0.039; rss: 206MB privacy checking
time: 0.006; rss: 206MB stability index
time: 0.018; rss: 206MB intrinsic checking
time: 0.017; rss: 206MB effect checking
time: 0.038; rss: 206MB match checking
time: 0.149; rss: 205MB liveness checking
time: 0.116; rss: 205MB rvalue checking
time: 0.277; rss: 320MB MIR dump
  time: 0.042; rss: 331MB       SimplifyCfg
  time: 0.055; rss: 331MB       QualifyAndPromoteConstants
  time: 0.110; rss: 332MB       TypeckMir
  time: 0.003; rss: 332MB       SimplifyBranches
  time: 0.026; rss: 332MB       SimplifyCfg
time: 0.235; rss: 332MB MIR passes
time: 1.213; rss: 349MB borrow checking
time: 0.001; rss: 349MB reachability checking
time: 0.033; rss: 349MB death checking
time: 0.030; rss: 349MB stability checking
time: 0.000; rss: 349MB unused lib feature checking
warning: this feature has been stable since 1.13.0. Attribute no longer needed, #[warn(stable_features)] on by default
 --> src/lib.rs:2:12
  |
2 | #![feature(type_macros)]
  |            ^^^^^^^^^^^

time: 0.171; rss: 349MB lint checking
time: 0.000; rss: 349MB resolving dependency formats
  time: 0.002; rss: 349MB       NoLandingPads
  time: 0.017; rss: 349MB       SimplifyCfg
  time: 0.061; rss: 349MB       EraseRegions
  time: 0.008; rss: 349MB       AddCallGuards
  time: 1.188; rss: 377MB       ElaborateDrops
  time: 0.002; rss: 377MB       NoLandingPads
  time: 0.034; rss: 348MB       SimplifyCfg
  time: 0.023; rss: 348MB       InstCombine
  time: 0.007; rss: 348MB       Deaggregator
  time: 6.139; rss: 352MB       CopyPropagation
  time: 0.007; rss: 352MB       AddCallGuards
  time: 0.002; rss: 352MB       PreTrans
time: 7.490; rss: 352MB Prepare MIR codegen passes
  time: 0.049; rss: 353MB       write metadata
  time: 0.139; rss: 356MB       translation item collection
  time: 0.015; rss: 359MB       codegen unit partitioning
  time: 0.007; rss: 417MB       internalize symbols
time: 1.690; rss: 318MB translation
time: 0.000; rss: 318MB assert dep graph
time: 0.000; rss: 318MB serialize dep graph
  time: 0.196; rss: 185MB       llvm function passes [0]
  time: 0.095; rss: 189MB       llvm module passes [0]
  time: 3.877; rss: 194MB       codegen passes [0]
  time: 0.001; rss: 194MB       codegen passes [0]
time: 4.401; rss: 194MB LLVM passes
time: 0.000; rss: 194MB serialize work products
time: 0.239; rss: 112MB linking
    Finished debug [unoptimized + debuginfo] target(s) in 20.18 secs
@jonas-schievink

This comment has been minimized.

Copy link
Member

jonas-schievink commented Sep 23, 2016

time: 6.139; rss: 352MB CopyPropagation

cc @pcwalton

@chrivers

This comment has been minimized.

Copy link
Author

chrivers commented Sep 26, 2016

Is there anything I can do to help?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 26, 2016

Nominated to discuss priority. Marked as regression.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Oct 3, 2016

@chrivers You said that you were not willing to post the code publicly, but that you are willing to send it via some private channel?

If you have some reduced test case to share, are you willing to share it with all the members of the rust compiler team? Or is there some non-disclosure agreement issues in play?

@chrivers

This comment has been minimized.

Copy link
Author

chrivers commented Oct 3, 2016

I haven't really been able to figure out what causes this, but I believe it could be because I'm building a large-ish structure in memory. I actually ended up creating a template compiler just for this (kind of) use case:

https://github.com/chrivers/transwarp

It's nothing super secret at all, and sharing it with the rust language team is certainly fine. Just as long as it doesn't hit google, that's no problem :)

There are some licensing terms I have to clear, before putting it on github. But nothing that prevents me from sharing it with a closed group of experts, certainly.

@pnkfelix can you email me? I can send you a tarball

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 6, 2016

Nominating to discuss how to fix this in compiler team mtg.

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Oct 6, 2016

Please assign someone.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 6, 2016

I think we should probably comment out the copy-propagation fix at least for beta:

  • I believe the pass doesn't take aliasing or borrowing into account
  • The fix for this is likely to be somewhat involved

But we should separately be moving on improving the performance/correctness of CopyPropagation, which is definitely an important thing to have.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Oct 6, 2016

CopyPropagation bails out instantly on the first sign of borrowing, so that shouldn't be a problem.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 6, 2016

@arielb1 ah ok I was under the impression it did not do that yet. I have to go re-read the code.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 6, 2016

Conclusion from meeting: Proper fix then for nightly would be to try to cache work here, but for the beta branch an easy short-term fix is to disable copy-prop. I'll assign myself the beta PR.

@nikomatsakis nikomatsakis self-assigned this Oct 6, 2016

@chrivers

This comment has been minimized.

Copy link
Author

chrivers commented Oct 7, 2016

@pnkfelix I've sent you and @nikomatsakis a source tarball which reproduces the issue, and I've mentioned the exact version of at least one rustc version with this problem in the mail.

I'm very interested in any updates - and let me know if I can do anything else :)

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Oct 20, 2016

@nikomatsakis @pnkfelix still waiting on a fix to beta. There's about 2 weeks left to get it in.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Oct 20, 2016

I have been in private communication with @chrivers ; I was able to replicate the regression injected between 9/13 and 9/21, it appears (based on my informal measurements) to have was resolved sometime between 9/22 and 9/27.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Oct 20, 2016

(I want to verify that @chrivers is observing the same outcome on their end, but have not yet heard back.)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 20, 2016

Compiler team meeting: waiting to hear back from @chrivers before we decide whether to close or what. We're a bit curious how this got resolved. =)

@chrivers

This comment has been minimized.

Copy link
Author

chrivers commented Oct 20, 2016

Hey everybody. Can confirm that I have received the emails! However, I'm in a summer house at the moment, and there is only exactly enough internet connection to send you this message :)

I'll be back on monday, then I'll look into the different versions.

I'd say there's a chance that the bug got resolved after my initial report, but before the testing. I'm not sure yet, however. I'll keep you posted

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Oct 28, 2016

After discussion at compiler team mtg last night, I decided to double-check the -Z time-passes output for the crate in question at three dates:

  • Sept 9 (which @chrivers reported as a point where the compiler was fast, or at least fast enough),
  • Sept 22, which I had previously established as a late point in time where it was slow
  • Sept 27, which I had previously established as an early point in time where it was fast enough again.

a gist with the output is here:
https://gist.github.com/pnkfelix/cd77c382822690444988dce3301428e3

I think this data confirms that the compiler got faster between Sept 22 and Sept 27, and the difference between Sept 9 and Sept 27 is probably in the noise.

of particular interest: CopyPropagation went down between Sept 22 and Sept 27

sept 22: time: 8.683 CopyPropagation
sept 27: time: 0.002 CopyPropagation

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Nov 1, 2016

Thanks for following up @pnkfelix!

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 3, 2016

@pnkfelix think you can test with the current beta? we're not sure whether the "fix" (whatever it was!) made it to beta.

Separately, it would prob be worth trying to find out what has changed, also. It's confusing why it's faster...

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Nov 3, 2016

Monday is probably the cutoff for backporting to beta if this isn't in already.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Nov 3, 2016

The test crate in question does not compile out of the box with the beta branch itself (it uses features only available on nightly).

However, good news: The point in my experiments where the speed became acceptable again was the nightly from sept 27, which is this commit: d0623cf

That commit, if I understand the output of git log (which may be a big "if") is included in the current beta branch.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 3, 2016

triage: P-medium

It is still a mystery why this is a problem, but not a burning issue at this point.

@rust-highfive rust-highfive added P-medium and removed P-high labels Nov 3, 2016

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Nov 4, 2016

Here is a small test case adapted from a vast reduction of the code that @chrivers sent.

(In this reduction, I have left in some seemingly unrelated details, such as repeated calls to println! and some similar looking match constructs. I found no immediately obvious relationship between how many times one repeats these forms and how slow the compilation is; its certainly slower either way. Its probably a sign that control-flow in general is the way to expose the problem.)

// one_file.rs

fn bit() -> Option<bool> { unimplemented!() }

macro_rules! def_thing_fields {
    (  { $($field:ident,)* } )
        =>
    {
        pub struct Thing { $(pub $field: (),)* }
        pub fn go()
        {
            Thing
            {
                $(
                    $field:
                    {
                        println!("Thing");
                        println!("{}", "Thing");
                        println!("{}", stringify!($field));

                        let b = match bit() {
                            Some(x) => x,
                            None => return,
                        };

                        if b {
                            let result = match bit() {
                                Some(x) => x,
                                None => return,
                            };
                            println!("next_bit -> {:?}", result);
                        }
                    }
                ),*
            };
        }
    }
}


fn main() {

}

def_thing_fields! {
    {
        a00, a01, a02, a03, a04, a05, a06, a07, a08, a09,
        a10, a11, a12, a13, a14, a15, a16, a17, a18, a19,
        a20, a21, a22, a23, a24, a25, a26, a27, a28, a29,
        a30, a31, a32, a33, a34, a35, a36, a37, a38, a39,
        a40, a41, a42, a43, a44, a45, a46, a47, a48, a49,
        a50, a51, a52, a53, a54, a55, a56, a57, a58, a59,
        a60, a61, a62, a63, a64, a65, a66, a67, a68, a69,
        a70, a71, a72, a73, a74, a75, a76, a77, a78, a79,
        a80, a81, a82, a83, a84, a85, a86, a87, a88, a89,
        a90, a91, a92, a93, a94, a95, a96, a97, a98, a99,
    }
}

Here are timing results showing that the same compilers I tested above show an analogous change in speed for this test case:

% multirust default nightly-2016-09-22
multirust: using existing install for 'nightly-2016-09-22'
multirust: default toolchain set to 'nightly-2016-09-22'
% time rustc -O one_file.rs  -Z time-passes | grep CopyProp
  time: 1.795   CopyPropagation

real    0m2.804s
user    0m2.585s
sys 0m0.235s
% multirust default nightly-2016-09-27
multirust: using existing install for 'nightly-2016-09-27'
multirust: default toolchain set to 'nightly-2016-09-27'
% time rustc -O one_file.rs  -Z time-passes | grep CopyProp
  time: 0.000   CopyPropagation

real    0m0.970s
user    0m0.754s
sys 0m0.237s
% 
@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Nov 4, 2016

Ah!

Someone mentioned in the meeting that the time spent in CopyPropagation may depend on the optimization level. I had thought that passing -O would compensate for that, but I am now inferring that -O has no effect on the mir-optimize-level ? Is that correct?

Here's data that reflects this:

% rustc --version
rustc 1.14.0-nightly (5665bdf3e 2016-11-02)
% time rustc -Z time-passes -O ./one_file.rs  | grep CopyProp
  time: 0.000   CopyPropagation

real    0m1.001s
user    0m0.760s
sys 0m0.265s
% time rustc -Z time-passes -Z mir-opt-level=1 ./one_file.rs  | grep CopyProp
  time: 1.857   CopyPropagation

real    0m2.911s
user    0m2.661s
sys 0m0.277s
% 

(and with that knowledge, it seems obvious now why the problem "went away" -- pcwalton's commit 79cb2db deliberately avoids running the CopyPropogation when the mir optimization level is 0.

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Nov 4, 2016

@brson the short version of all of this that I do not think there is anything that needs to be backported to beta at this point.

@arielb1 arielb1 changed the title Significant compile-speed regression in nightly MIR CopyPropagation is slow Nov 10, 2016

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 10, 2016

We untagged this as a regression but kept the issue open since we would like to eventually make copy-prop fast again.

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Dec 15, 2016

FYI: copy propagation has moved to -Zmir-opt-level=2 (default is 1)

bors added a commit that referenced this issue Jan 5, 2019

Auto merge of #57351 - oli-obk:cheap_const_ops, r=<try>
Don't actually create a full MIR stack frame when not needed

r? @dotdash

This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc #36673)

bors added a commit that referenced this issue Jan 10, 2019

Auto merge of #57351 - oli-obk:cheap_const_ops, r=RalfJung
Don't actually create a full MIR stack frame when not needed

r? @dotdash

This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc #36673)

bors added a commit that referenced this issue Jan 10, 2019

Auto merge of #57351 - oli-obk:cheap_const_ops, r=RalfJung
Don't actually create a full MIR stack frame when not needed

r? @dotdash

This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc #36673)

pietroalbini added a commit to pietroalbini/rust that referenced this issue Jan 12, 2019

Rollup merge of rust-lang#57351 - oli-obk:cheap_const_ops, r=RalfJung
Don't actually create a full MIR stack frame when not needed

r? @dotdash

This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc rust-lang#36673)

Centril added a commit to Centril/rust that referenced this issue Jan 13, 2019

Rollup merge of rust-lang#57351 - oli-obk:cheap_const_ops, r=RalfJung
Don't actually create a full MIR stack frame when not needed

r? @dotdash

This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc rust-lang#36673)

Centril added a commit to Centril/rust that referenced this issue Jan 13, 2019

Rollup merge of rust-lang#57351 - oli-obk:cheap_const_ops, r=RalfJung
Don't actually create a full MIR stack frame when not needed

r? @dotdash

This should significantly reduce overhead during const propagation and reduce overhead *after* copy propagation (cc rust-lang#36673)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.