Speed up `plug_leaks` #36917

Merged
merged 2 commits into from Oct 4, 2016

Conversation

Projects
None yet
7 participants
@nnethercote
Contributor

nnethercote commented Oct 3, 2016

Profiling shows that plug_leaks and the functions it calls are hot on some benchmarks. It's very common that skol_map is empty in this function, and we can specialize plug_leaks in that case for some big speed-ups.

The PR has two commits. I'm fairly confident that the first one is correct -- I traced through the code to confirm that the fold_regions and pop_skolemized calls are no-ops when skol_map is empty, and I also temporarily added an assertion to check that result ends up having the same value as value in that case. This commit is responsible for most of the improvement.

I'm less confident about the second commit. The call to resolve_type_vars_is_possible can change value when skol_map is empty... but testing suggests that it doesn't matter if the call is
omitted.

So, please check both patches carefully, especially the second one!

Here are the speed-ups for the first commit alone.

stage1 compiler (built with old rustc, using glibc malloc), doing debug builds:

futures-rs-test  4.710s vs  4.538s --> 1.038x faster (variance: 1.009x, 1.005x)
issue-32062-equ  0.415s vs  0.368s --> 1.129x faster (variance: 1.009x, 1.010x)
issue-32278-big  1.884s vs  1.808s --> 1.042x faster (variance: 1.020x, 1.017x)
jld-day15-parse  1.907s vs  1.668s --> 1.143x faster (variance: 1.011x, 1.007x)
piston-image-0. 13.024s vs 12.421s --> 1.049x faster (variance: 1.004x, 1.012x)
rust-encoding-0  3.335s vs  3.276s --> 1.018x faster (variance: 1.021x, 1.028x)

stage2 compiler (built with new rustc, using jemalloc), doing debug builds:

futures-rs-test  4.167s vs  4.065s --> 1.025x faster (variance: 1.006x, 1.018x)
issue-32062-equ  0.383s vs  0.343s --> 1.118x faster (variance: 1.012x, 1.016x)
issue-32278-big  1.680s vs  1.621s --> 1.036x faster (variance: 1.007x, 1.007x)
jld-day15-parse  1.671s vs  1.478s --> 1.131x faster (variance: 1.016x, 1.004x)
piston-image-0. 11.336s vs 10.852s --> 1.045x faster (variance: 1.003x, 1.006x)
rust-encoding-0  3.036s vs  2.971s --> 1.022x faster (variance: 1.030x, 1.032x)

I've omitted the benchmarks for which the change was negligible.

And here are the speed-ups for the first and second commit in combination.

stage1 compiler (built with old rustc, using glibc malloc), doing debug
builds:

futures-rs-test  4.684s vs  4.498s --> 1.041x faster (variance: 1.012x, 1.012x)
issue-32062-equ  0.413s vs  0.355s --> 1.162x faster (variance: 1.019x, 1.006x)
issue-32278-big  1.869s vs  1.763s --> 1.060x faster (variance: 1.013x, 1.018x)
jld-day15-parse  1.900s vs  1.602s --> 1.186x faster (variance: 1.010x, 1.003x)
piston-image-0. 12.907s vs 12.352s --> 1.045x faster (variance: 1.005x, 1.006x)
rust-encoding-0  3.254s vs  3.248s --> 1.002x faster (variance: 1.063x, 1.045x)

stage2 compiler (built with new rustc, using jemalloc), doing debug builds:

futures-rs-test  4.183s vs  4.046s --> 1.034x faster (variance: 1.007x, 1.004x)
issue-32062-equ  0.380s vs  0.340s --> 1.117x faster (variance: 1.020x, 1.003x)
issue-32278-big  1.671s vs  1.616s --> 1.034x faster (variance: 1.031x, 1.012x)
jld-day15-parse  1.661s vs  1.417s --> 1.172x faster (variance: 1.013x, 1.005x)
piston-image-0. 11.347s vs 10.841s --> 1.047x faster (variance: 1.007x, 1.010x)
rust-encoding-0  3.050s vs  3.000s --> 1.017x faster (variance: 1.016x, 1.012x)

@eddyb: git blame suggests that you should review this. Thanks!

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Oct 3, 2016

Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Oct 3, 2016

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 3, 2016

Contributor

BTW, I used Valgrind's "DHAT" profilter to find this. It told me that lots of malloc calls were happening in and beneath plug_leaks.

Contributor

nnethercote commented Oct 3, 2016

BTW, I used Valgrind's "DHAT" profilter to find this. It told me that lots of malloc calls were happening in and beneath plug_leaks.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 3, 2016

Contributor

r? @eddyb

Contributor

nnethercote commented Oct 3, 2016

r? @eddyb

src/librustc/infer/higher_ranked/mod.rs
- debug!("plug_leaks: result={:?}",
- result);
+ if skol_map.is_empty() {
+ result = value;

This comment has been minimized.

@eddyb

eddyb Oct 3, 2016

Member

I think you can use early return to avoid indenting the entire function to the right.

@eddyb

eddyb Oct 3, 2016

Member

I think you can use early return to avoid indenting the entire function to the right.

nnethercote added some commits Sep 29, 2016

Optimize plug_leaks.
This commit avoids the `fold_regions` call in `plug_leaks` when
`skol_map` is empty, which is the common case. This gives speed-ups of
up to 1.14x on some of the rustc-benchmarks.
Optimize plug_leaks some more.
This commit avoids the `resolve_type_vars_if_possible` call in
`plug_leaks` when `skol_map` is empty, which is the common case. It also
changes the signature of `plug_leaks` slightly to avoid the need for a
`clone` of `value`. These changes give speed-ups of up a few percent on
some of the rustc-benchmarks.
@eddyb

eddyb approved these changes Oct 3, 2016

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Oct 3, 2016

Member

@bors r+

Member

eddyb commented Oct 3, 2016

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 3, 2016

Contributor

📌 Commit 3779971 has been approved by eddyb

Contributor

bors commented Oct 3, 2016

📌 Commit 3779971 has been approved by eddyb

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Oct 3, 2016

Contributor

We should also be doing this for leak_check.

Contributor

arielb1 commented Oct 3, 2016

We should also be doing this for leak_check.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 3, 2016

Contributor

We should also be doing this for leak_check.

leak_check doesn't show up nearly as high as plug_leaks in profiles, but it does show up. I'll do it in a follow-up. Thank you for the suggestion.

Contributor

nnethercote commented Oct 3, 2016

We should also be doing this for leak_check.

leak_check doesn't show up nearly as high as plug_leaks in profiles, but it does show up. I'll do it in a follow-up. Thank you for the suggestion.

@brson brson added the relnotes label Oct 4, 2016

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Oct 4, 2016

Contributor

leak-check done by #36931.

Contributor

arielb1 commented Oct 4, 2016

leak-check done by #36931.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 4, 2016

Contributor

leak-check done by #36931.

Huh. I have a local patch to add an early return to leak_check. It produces a measurable speed-up of a couple of percent on some of the rustc-benchmarks, but it doesn't look anything like the commits in #36931.

Contributor

nnethercote commented Oct 4, 2016

leak-check done by #36931.

Huh. I have a local patch to add an early return to leak_check. It produces a measurable speed-up of a couple of percent on some of the rustc-benchmarks, but it doesn't look anything like the commits in #36931.

Manishearth added a commit to Manishearth/rust that referenced this pull request Oct 4, 2016

Rollup merge of #36917 - nnethercote:speed-up-plug_leaks, r=eddyb
Speed up `plug_leaks`

Profiling shows that `plug_leaks` and the functions it calls are hot on some benchmarks. It's very common that `skol_map` is empty in this function, and we can specialize `plug_leaks` in that case for some big speed-ups.

The PR has two commits. I'm fairly confident that the first one is correct -- I traced through the code to confirm that the `fold_regions` and `pop_skolemized` calls are no-ops when `skol_map` is empty, and I also temporarily added an assertion to check that `result` ends up having the same value as `value` in that case. This commit is responsible for most of the improvement.

I'm less confident about the second commit. The call to `resolve_type_vars_is_possible` can change `value` when `skol_map` is empty... but testing suggests that it doesn't matter if the call is
omitted.

So, please check both patches carefully, especially the second one!

Here are the speed-ups for the first commit alone.

stage1 compiler (built with old rustc, using glibc malloc), doing debug builds:
```
futures-rs-test  4.710s vs  4.538s --> 1.038x faster (variance: 1.009x, 1.005x)
issue-32062-equ  0.415s vs  0.368s --> 1.129x faster (variance: 1.009x, 1.010x)
issue-32278-big  1.884s vs  1.808s --> 1.042x faster (variance: 1.020x, 1.017x)
jld-day15-parse  1.907s vs  1.668s --> 1.143x faster (variance: 1.011x, 1.007x)
piston-image-0. 13.024s vs 12.421s --> 1.049x faster (variance: 1.004x, 1.012x)
rust-encoding-0  3.335s vs  3.276s --> 1.018x faster (variance: 1.021x, 1.028x)
```
stage2 compiler (built with new rustc, using jemalloc), doing debug builds:
```
futures-rs-test  4.167s vs  4.065s --> 1.025x faster (variance: 1.006x, 1.018x)
issue-32062-equ  0.383s vs  0.343s --> 1.118x faster (variance: 1.012x, 1.016x)
issue-32278-big  1.680s vs  1.621s --> 1.036x faster (variance: 1.007x, 1.007x)
jld-day15-parse  1.671s vs  1.478s --> 1.131x faster (variance: 1.016x, 1.004x)
piston-image-0. 11.336s vs 10.852s --> 1.045x faster (variance: 1.003x, 1.006x)
rust-encoding-0  3.036s vs  2.971s --> 1.022x faster (variance: 1.030x, 1.032x)
```
I've omitted the benchmarks for which the change was negligible.

And here are the speed-ups for the first and second commit in combination.

stage1 compiler (built with old rustc, using glibc malloc), doing debug
builds:
```
futures-rs-test  4.684s vs  4.498s --> 1.041x faster (variance: 1.012x, 1.012x)
issue-32062-equ  0.413s vs  0.355s --> 1.162x faster (variance: 1.019x, 1.006x)
issue-32278-big  1.869s vs  1.763s --> 1.060x faster (variance: 1.013x, 1.018x)
jld-day15-parse  1.900s vs  1.602s --> 1.186x faster (variance: 1.010x, 1.003x)
piston-image-0. 12.907s vs 12.352s --> 1.045x faster (variance: 1.005x, 1.006x)
rust-encoding-0  3.254s vs  3.248s --> 1.002x faster (variance: 1.063x, 1.045x)
```
stage2 compiler (built with new rustc, using jemalloc), doing debug builds:
```
futures-rs-test  4.183s vs  4.046s --> 1.034x faster (variance: 1.007x, 1.004x)
issue-32062-equ  0.380s vs  0.340s --> 1.117x faster (variance: 1.020x, 1.003x)
issue-32278-big  1.671s vs  1.616s --> 1.034x faster (variance: 1.031x, 1.012x)
jld-day15-parse  1.661s vs  1.417s --> 1.172x faster (variance: 1.013x, 1.005x)
piston-image-0. 11.347s vs 10.841s --> 1.047x faster (variance: 1.007x, 1.010x)
rust-encoding-0  3.050s vs  3.000s --> 1.017x faster (variance: 1.016x, 1.012x)
```
@eddyb: `git blame` suggests that you should review this. Thanks!

bors added a commit that referenced this pull request Oct 4, 2016

Auto merge of #36953 - Manishearth:rollup, r=Manishearth
Rollup of 12 pull requests

- Successful merges: #36798, #36878, #36902, #36903, #36908, #36916, #36917, #36921, #36928, #36938, #36941, #36951
- Failed merges:
@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Oct 4, 2016

Contributor

Huh. I have a local patch to add an early return to leak_check. It produces a measurable speed-up of a couple of percent on some of the rustc-benchmarks, but it doesn't look anything like the commits in #36931.

How do you run the rustc benchmarks? I've been measuring bootstrap times. My patch short-cuts the fast path to leak_check at the callsites.

Contributor

arielb1 commented Oct 4, 2016

Huh. I have a local patch to add an early return to leak_check. It produces a measurable speed-up of a couple of percent on some of the rustc-benchmarks, but it doesn't look anything like the commits in #36931.

How do you run the rustc benchmarks? I've been measuring bootstrap times. My patch short-cuts the fast path to leak_check at the callsites.

@bors bors merged commit 3779971 into rust-lang:master Oct 4, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nnethercote nnethercote deleted the nnethercote:speed-up-plug_leaks branch Oct 4, 2016

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 4, 2016

Contributor

@arielb1: I updated my local repo to get #36931 and the speed-ups from my local patch have now evaporated, so I guess its effect overlapped with the effect of your commit. My patch was simpler, though, adding just a if skol_map.is_empty() { return Ok(()); } near the start of leak_check. I don't know the code well enough to tell if that's functionally equivalent to your commit.

As for running the benchmarks, I compared two stage1 compilers that were both configured with --enable-optimize --enable-debuginfo. I used the compare.py script for this, see https://internals.rust-lang.org/t/recent-rustc-benchmarks-improvements/4153 for instructions.

Contributor

nnethercote commented Oct 4, 2016

@arielb1: I updated my local repo to get #36931 and the speed-ups from my local patch have now evaporated, so I guess its effect overlapped with the effect of your commit. My patch was simpler, though, adding just a if skol_map.is_empty() { return Ok(()); } near the start of leak_check. I don't know the code well enough to tell if that's functionally equivalent to your commit.

As for running the benchmarks, I compared two stage1 compilers that were both configured with --enable-optimize --enable-debuginfo. I used the compare.py script for this, see https://internals.rust-lang.org/t/recent-rustc-benchmarks-improvements/4153 for instructions.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Oct 4, 2016

Contributor

It should be functionally equivalent. Is it performance-equivalent (my patch skips a bunch of unneeded setup code)? [checking]

Contributor

arielb1 commented Oct 4, 2016

It should be functionally equivalent. Is it performance-equivalent (my patch skips a bunch of unneeded setup code)? [checking]

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Oct 4, 2016

Contributor

The leak_check patch seems to be slower. Will investigate more tomorrow.

Contributor

arielb1 commented Oct 4, 2016

The leak_check patch seems to be slower. Will investigate more tomorrow.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 8, 2016

Contributor

Good news -- the improvement for jld-day15-parser showed up clearly on perf.rust-lang.org.
plug_leaks
It's the big drop on the RHS on Oct 4. The time dropped from 1.402 to 1.199, which is a 1.169x speed-up (a.k.a a 14.5% reduction in execution time).

The other benchmarks mentioned in the first comment also saw speedups similar to what I measured locally, which is good.

Contributor

nnethercote commented Oct 8, 2016

Good news -- the improvement for jld-day15-parser showed up clearly on perf.rust-lang.org.
plug_leaks
It's the big drop on the RHS on Oct 4. The time dropped from 1.402 to 1.199, which is a 1.169x speed-up (a.k.a a 14.5% reduction in execution time).

The other benchmarks mentioned in the first comment also saw speedups similar to what I measured locally, which is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment