Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cargo overhead for incremental recompilation #8833

Open
bjorn3 opened this issue Nov 5, 2020 · 8 comments
Open

Reduce cargo overhead for incremental recompilation #8833

bjorn3 opened this issue Nov 5, 2020 · 8 comments
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Performance Gotta go fast!

Comments

@bjorn3
Copy link
Member

bjorn3 commented Nov 5, 2020

Describe the problem you are trying to solve

When only a single small crate of a large crate graph is recompiled, cargo has a significant overhead trying to parse all Cargo.toml files and resolve all dependencies. On Bevy I measured an overhead of 28% when doing touch examples/game/breakout.rs; cargo build --example breakout after bevyengine/bevy#791.

Describe the solution you'd like

I would like the overhead of cargo to be reduced. This could be done by optimizing the relevant functions or by for example adding a binary cache for a specific cargo invocation that contains the build plan (I think it is called that way) of everything that cargo will do. This binary cache could be invalidated on any Cargo.toml change or change to the cargo arguments. In case of argument change it would be preferable to keep the old cache around though, as rust-analyzer may run cargo check and the user cargo run.

Notes

Profiles

full profile of cargo build --example breakout

profile zoomed into cargo

By the way I noticed that the new resolver is a bit slower than the old one:

$ hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && cargo build --example breakout" "touch examples/game/breakout.rs && cargo build -Zfeatures=all -Zpackage-features --example breakout"
Benchmark #1: touch examples/game/breakout.rs && cargo build --example breakout
  Time (mean ± σ):     678.8 ms ±  21.5 ms    [User: 573.7 ms, System: 160.8 ms]
  Range (min … max):   668.1 ms … 812.3 ms    100 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark #2: touch examples/game/breakout.rs && cargo build -Zfeatures=all -Zpackage-features --example breakout
  Time (mean ± σ):     690.2 ms ±  20.4 ms    [User: 582.6 ms, System: 164.8 ms]
  Range (min … max):   683.3 ms … 891.4 ms    100 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (891.4 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Summary
  'touch examples/game/breakout.rs && cargo build --example breakout' ran
    1.02 ± 0.04 times faster than 'touch examples/game/breakout.rs && cargo build -Zfeatures=all -Zpackage-features --example breakout'

cc bevyengine/bevy#791

@bjorn3 bjorn3 added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Nov 5, 2020
@alexcrichton
Copy link
Member

Thanks for the report! It's definitely the intention that Cargo is a speedy build system and doesn't have a lot of overhead. We've done a fair bit of performance work in the past, but unfortunately we don't have a great benchmark suite and don't track performance over time.

That being said, unfortunately "make Cargo faster" probably isn't the most actionable here. It's probably best to dig into profiles and knock out slow functions one-by-one.

@bjorn3
Copy link
Member Author

bjorn3 commented Nov 5, 2020

or by for example adding a binary cache for a specific cargo invocation that contains the build plan (I think it is called that way) of everything that cargo will do. This binary cache could be invalidated on any Cargo.toml change or change to the cargo arguments. In case of argument change it would be preferable to keep the old cache around though, as rust-analyzer may run cargo check and the user cargo run.

What do you think about this idea?

@alexcrichton
Copy link
Member

That sounds plausible, but I think it'd be good to dig in and see where the performance is going on a deeper level. It's not entirely obvious to me given the graphs where the time is going and whether that would help a whole lot.

@bjorn3
Copy link
Member Author

bjorn3 commented Nov 6, 2020

@alexcrichton
Copy link
Member

I'd encourage local testing and strategies to try to measure the impact of possible optimizations, and PRs to make things faster are of course always welcome!

@bjorn3
Copy link
Member Author

bjorn3 commented Nov 7, 2020

bjorn3@c7f58d3 saves about 7% by avoiding unnecessarily capturing a backtrace at the start of each job.

@bjorn3
Copy link
Member Author

bjorn3 commented Nov 7, 2020

bjorn3@602a1bd saves another big chunk by avoiding spawning a new thread for each fresh job.

@alexcrichton
Copy link
Member

Nice!

bors added a commit that referenced this issue Nov 11, 2020
Improve performance of almost fresh builds

This currently saves about 15ms out of the 170ms Cargo overhead when compiling Bevy.

part of #8833

<details><summary>Benchmark results as of <a href="https://github.com/rust-lang/cargo/commit/602a1bd7f5a0da9878952f492508aca5eddd1c53"><code>602a1bd</code></a></summary>

Completely fresh:

```
$ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "../../cargo/cargo_master build --release --example breakout" "../../cargo/cargo_improve_perf build --release --example breakout"
Benchmark #1: ../../cargo/cargo_master build --release --example breakout
  Time (mean ± σ):     613.8 ms ± 509.1 ms    [User: 147.6 ms, System: 38.4 ms]
  Range (min … max):   170.8 ms … 1199.2 ms    100 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: ../../cargo/cargo_improve_perf build --release --example breakout
  Time (mean ± σ):     164.2 ms ±   0.8 ms    [User: 137.0 ms, System: 27.1 ms]
  Range (min … max):   162.8 ms … 169.6 ms    100 runs

Summary
  '../../cargo/cargo_improve_perf build --release --example breakout' ran
    3.74 ± 3.10 times faster than '../../cargo/cargo_master build --release --example breakout'
```

(statistical outliers are consistently reproducible and don't happen for any other benchmarks)

```
$ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_master build --release --example breakout
    Finished release [optimized] target(s) in 0.16s
[...]
    Finished release [optimized] target(s) in 0.16s

 Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs):

            192,95 msec task-clock:u              #    0,242 CPUs utilized            ( +-  0,69% )
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
              4926      page-faults:u             #    0,026 M/sec                    ( +-  0,11% )
         387516813      cycles:u                  #    2,008 GHz                      ( +-  0,69% )
         685141858      instructions:u            #    1,77  insn per cycle           ( +-  0,04% )
         124443483      branches:u                #  644,958 M/sec                    ( +-  0,05% )
           2726944      branch-misses:u           #    2,19% of all branches          ( +-  0,10% )

             0,796 +- 0,168 seconds time elapsed  ( +- 21,06% )
$ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_improve_perf build --release --example breakout
    Finished release [optimized] target(s) in 0.14s
[...]
    Finished release [optimized] target(s) in 0.15s

 Performance counter stats for '../../cargo/cargo_improve_perf build --release --example breakout' (10 runs):

            168,78 msec task-clock:u              #    0,997 CPUs utilized            ( +-  0,56% )
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
              4075      page-faults:u             #    0,024 M/sec                    ( +-  0,16% )
         372565970      cycles:u                  #    2,207 GHz                      ( +-  0,61% )
         667356921      instructions:u            #    1,79  insn per cycle           ( +-  0,03% )
         120170432      branches:u                #  712,010 M/sec                    ( +-  0,04% )
           2642670      branch-misses:u           #    2,20% of all branches          ( +-  0,12% )

          0,169294 +- 0,000933 seconds time elapsed  ( +-  0,55% )
```

Need to recompile single executable:

```
$ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout" "touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf build --release --example breakout"
Benchmark #1: touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout
  Time (mean ± σ):     658.9 ms ±   1.8 ms    [User: 538.6 ms, System: 181.1 ms]
  Range (min … max):   655.5 ms … 668.8 ms    100 runs

Benchmark #2: touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf build --release --example breakout
  Time (mean ± σ):     648.6 ms ±   2.1 ms    [User: 534.7 ms, System: 162.6 ms]
  Range (min … max):   645.2 ms … 659.5 ms    100 runs

Summary
  'touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf build --release --example breakout' ran
    1.02 ± 0.00 times faster than 'touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout'
```

```
$ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_master build --release --example breakout
   Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy)
    Finished release [optimized] target(s) in 0.67s
[...]
   Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy)
    Finished release [optimized] target(s) in 0.66s

 Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs):

            183,65 msec task-clock:u              #    0,265 CPUs utilized            ( +-  1,08% )
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
              6603      page-faults:u             #    0,036 M/sec                    ( +-  0,11% )
         382629371      cycles:u                  #    2,083 GHz                      ( +-  0,79% )
         673192095      instructions:u            #    1,76  insn per cycle           ( +-  0,03% )
         121518254      branches:u                #  661,688 M/sec                    ( +-  0,04% )
           2698503      branch-misses:u           #    2,22% of all branches          ( +-  0,18% )

           0,69376 +- 0,00274 seconds time elapsed  ( +-  0,39% )
$ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_improve_perf build --release --example breakout
   Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy)
    Finished release [optimized] target(s) in 0.66s
[...]
   Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy)
    Finished release [optimized] target(s) in 0.76s

 Performance counter stats for '../../cargo/cargo_improve_perf build --release --example breakout' (10 runs):

            177,03 msec task-clock:u              #    0,256 CPUs utilized            ( +-  1,70% )
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
              5774      page-faults:u             #    0,033 M/sec                    ( +-  0,14% )
         381121369      cycles:u                  #    2,153 GHz                      ( +-  1,29% )
         672129390      instructions:u            #    1,76  insn per cycle           ( +-  0,03% )
         121248111      branches:u                #  684,900 M/sec                    ( +-  0,04% )
           2672832      branch-misses:u           #    2,20% of all branches          ( +-  0,34% )

            0,6924 +- 0,0148 seconds time elapsed  ( +-  2,13% )
```

</details>

<details><summary>Benchmark results as of <a href="https://github.com/rust-lang/cargo/commit/ba49b13e65fd487b8fc1761456c34e8d9feefb0e"><code>ba49b13</code></a></summary>

Completely fresh:

```
$ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "../../cargo/cargo_master build --release --example breakout" "../../cargo/cargo_improve_perf2 build --release --example breakout"
Benchmark #1: ../../cargo/cargo_master build --release --example breakout
  Time (mean ± σ):     635.4 ms ± 511.6 ms    [User: 146.2 ms, System: 41.1 ms]
  Range (min … max):   172.0 ms … 1208.7 ms    100 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: ../../cargo/cargo_improve_perf2 build --release --example breakout
  Time (mean ± σ):     165.3 ms ±   1.2 ms    [User: 137.2 ms, System: 28.0 ms]
  Range (min … max):   163.7 ms … 171.0 ms    100 runs

Summary
  '../../cargo/cargo_improve_perf2 build --release --example breakout' ran
    3.84 ± 3.09 times faster than '../../cargo/cargo_master build --release --example breakout'
```

(statistical outliers are consistently reproducible and don't happen for any other benchmarks)

```
$ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_master build --release --example breakout
    Finished release [optimized] target(s) in 0.16s
[...]
    Finished release [optimized] target(s) in 0.16s

 Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs):

            197,21 msec task-clock:u              #    0,220 CPUs utilized            ( +-  0,79% )
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
              4918      page-faults:u             #    0,025 M/sec                    ( +-  0,09% )
         395214529      cycles:u                  #    2,004 GHz                      ( +-  0,71% )
         685707083      instructions:u            #    1,74  insn per cycle           ( +-  0,04% )
         124571038      branches:u                #  631,667 M/sec                    ( +-  0,05% )
           2748386      branch-misses:u           #    2,21% of all branches          ( +-  0,35% )

             0,897 +- 0,155 seconds time elapsed  ( +- 17,32% )
$ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_improve_perf2 build --release --example breakout
    Finished release [optimized] target(s) in 0.14s
[...]
    Finished release [optimized] target(s) in 0.15s

 Performance counter stats for '../../cargo/cargo_improve_perf2 build --release --example breakout' (10 runs):

            168,28 msec task-clock:u              #    0,621 CPUs utilized            ( +-  0,51% )
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
              4086      page-faults:u             #    0,024 M/sec                    ( +-  0,15% )
         371029906      cycles:u                  #    2,205 GHz                      ( +-  0,48% )
         667493108      instructions:u            #    1,80  insn per cycle           ( +-  0,02% )
         120202436      branches:u                #  714,308 M/sec                    ( +-  0,03% )
           2659209      branch-misses:u           #    2,21% of all branches          ( +-  0,13% )

             0,271 +- 0,103 seconds time elapsed  ( +- 37,82% )
```

Need to recompile single executable:

```
$ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout" "touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf2 build --release --example breakout"
Benchmark #1: touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout
  Time (mean ± σ):     660.7 ms ±   2.9 ms    [User: 545.6 ms, System: 175.2 ms]
  Range (min … max):   656.2 ms … 675.1 ms    100 runs

Benchmark #2: touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf2 build --release --example breakout
  Time (mean ± σ):     650.2 ms ±   5.3 ms    [User: 542.9 ms, System: 156.0 ms]
  Range (min … max):   645.9 ms … 687.7 ms    100 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  'touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf2 build --release --example breakout' ran
    1.02 ± 0.01 times faster than 'touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout'
```

```
$ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_master build --release --example breakout
   Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy)
    Finished release [optimized] target(s) in 0.65s
[...]
   Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy)
    Finished release [optimized] target(s) in 0.66s

 Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs):

            181,52 msec task-clock:u              #    0,264 CPUs utilized            ( +-  0,29% )
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
              6618      page-faults:u             #    0,036 M/sec                    ( +-  0,09% )
         381324766      cycles:u                  #    2,101 GHz                      ( +-  0,21% )
         673170130      instructions:u            #    1,77  insn per cycle           ( +-  0,03% )
         121511051      branches:u                #  669,422 M/sec                    ( +-  0,04% )
           2700116      branch-misses:u           #    2,22% of all branches          ( +-  0,17% )

           0,68766 +- 0,00293 seconds time elapsed  ( +-  0,43% )
$ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_improve_perf2 build --release --example breakout
   Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy)
    Finished release [optimized] target(s) in 0.64s
[...]
   Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy)
    Finished release [optimized] target(s) in 0.64s

 Performance counter stats for '../../cargo/cargo_improve_perf2 build --release --example breakout' (10 runs):

            173,78 msec task-clock:u              #    0,257 CPUs utilized            ( +-  0,65% )
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
              5772      page-faults:u             #    0,033 M/sec                    ( +-  0,17% )
         378994346      cycles:u                  #    2,181 GHz                      ( +-  0,62% )
         672499584      instructions:u            #    1,77  insn per cycle           ( +-  0,04% )
         121341331      branches:u                #  698,266 M/sec                    ( +-  0,05% )
           2691563      branch-misses:u           #    2,22% of all branches          ( +-  0,17% )

           0,67554 +- 0,00641 seconds time elapsed  ( +-  0,95% )
```

</summary>
@ehuss ehuss added the Performance Gotta go fast! label Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Performance Gotta go fast!
Projects
None yet
Development

No branches or pull requests

3 participants