Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up plug_leaks #36917

Merged
merged 2 commits into from Oct 4, 2016
Merged

Speed up plug_leaks #36917

merged 2 commits into from Oct 4, 2016

Conversation

@nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Oct 3, 2016

Profiling shows that plug_leaks and the functions it calls are hot on some benchmarks. It's very common that skol_map is empty in this function, and we can specialize plug_leaks in that case for some big speed-ups.

The PR has two commits. I'm fairly confident that the first one is correct -- I traced through the code to confirm that the fold_regions and pop_skolemized calls are no-ops when skol_map is empty, and I also temporarily added an assertion to check that result ends up having the same value as value in that case. This commit is responsible for most of the improvement.

I'm less confident about the second commit. The call to resolve_type_vars_is_possible can change value when skol_map is empty... but testing suggests that it doesn't matter if the call is
omitted.

So, please check both patches carefully, especially the second one!

Here are the speed-ups for the first commit alone.

stage1 compiler (built with old rustc, using glibc malloc), doing debug builds:

futures-rs-test  4.710s vs  4.538s --> 1.038x faster (variance: 1.009x, 1.005x)
issue-32062-equ  0.415s vs  0.368s --> 1.129x faster (variance: 1.009x, 1.010x)
issue-32278-big  1.884s vs  1.808s --> 1.042x faster (variance: 1.020x, 1.017x)
jld-day15-parse  1.907s vs  1.668s --> 1.143x faster (variance: 1.011x, 1.007x)
piston-image-0. 13.024s vs 12.421s --> 1.049x faster (variance: 1.004x, 1.012x)
rust-encoding-0  3.335s vs  3.276s --> 1.018x faster (variance: 1.021x, 1.028x)

stage2 compiler (built with new rustc, using jemalloc), doing debug builds:

futures-rs-test  4.167s vs  4.065s --> 1.025x faster (variance: 1.006x, 1.018x)
issue-32062-equ  0.383s vs  0.343s --> 1.118x faster (variance: 1.012x, 1.016x)
issue-32278-big  1.680s vs  1.621s --> 1.036x faster (variance: 1.007x, 1.007x)
jld-day15-parse  1.671s vs  1.478s --> 1.131x faster (variance: 1.016x, 1.004x)
piston-image-0. 11.336s vs 10.852s --> 1.045x faster (variance: 1.003x, 1.006x)
rust-encoding-0  3.036s vs  2.971s --> 1.022x faster (variance: 1.030x, 1.032x)

I've omitted the benchmarks for which the change was negligible.

And here are the speed-ups for the first and second commit in combination.

stage1 compiler (built with old rustc, using glibc malloc), doing debug
builds:

futures-rs-test  4.684s vs  4.498s --> 1.041x faster (variance: 1.012x, 1.012x)
issue-32062-equ  0.413s vs  0.355s --> 1.162x faster (variance: 1.019x, 1.006x)
issue-32278-big  1.869s vs  1.763s --> 1.060x faster (variance: 1.013x, 1.018x)
jld-day15-parse  1.900s vs  1.602s --> 1.186x faster (variance: 1.010x, 1.003x)
piston-image-0. 12.907s vs 12.352s --> 1.045x faster (variance: 1.005x, 1.006x)
rust-encoding-0  3.254s vs  3.248s --> 1.002x faster (variance: 1.063x, 1.045x)

stage2 compiler (built with new rustc, using jemalloc), doing debug builds:

futures-rs-test  4.183s vs  4.046s --> 1.034x faster (variance: 1.007x, 1.004x)
issue-32062-equ  0.380s vs  0.340s --> 1.117x faster (variance: 1.020x, 1.003x)
issue-32278-big  1.671s vs  1.616s --> 1.034x faster (variance: 1.031x, 1.012x)
jld-day15-parse  1.661s vs  1.417s --> 1.172x faster (variance: 1.013x, 1.005x)
piston-image-0. 11.347s vs 10.841s --> 1.047x faster (variance: 1.007x, 1.010x)
rust-encoding-0  3.050s vs  3.000s --> 1.017x faster (variance: 1.016x, 1.012x)

@eddyb: git blame suggests that you should review this. Thanks!

@rust-highfive
Copy link
Collaborator

@rust-highfive rust-highfive commented Oct 3, 2016

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

Loading

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Oct 3, 2016

BTW, I used Valgrind's "DHAT" profilter to find this. It told me that lots of malloc calls were happening in and beneath plug_leaks.

Loading

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Oct 3, 2016

r? @eddyb

Loading

debug!("plug_leaks: result={:?}",
result);
if skol_map.is_empty() {
result = value;
Copy link
Member

@eddyb eddyb Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use early return to avoid indenting the entire function to the right.

Loading

nnethercote added 2 commits Oct 3, 2016
This commit avoids the `fold_regions` call in `plug_leaks` when
`skol_map` is empty, which is the common case. This gives speed-ups of
up to 1.14x on some of the rustc-benchmarks.
This commit avoids the `resolve_type_vars_if_possible` call in
`plug_leaks` when `skol_map` is empty, which is the common case. It also
changes the signature of `plug_leaks` slightly to avoid the need for a
`clone` of `value`. These changes give speed-ups of up a few percent on
some of the rustc-benchmarks.
@nnethercote nnethercote force-pushed the speed-up-plug_leaks branch from 6016e8c to 3779971 Oct 3, 2016
eddyb
eddyb approved these changes Oct 3, 2016
@eddyb
Copy link
Member

@eddyb eddyb commented Oct 3, 2016

@bors r+

Loading

@bors
Copy link
Contributor

@bors bors commented Oct 3, 2016

📌 Commit 3779971 has been approved by eddyb

Loading

@arielb1
Copy link
Contributor

@arielb1 arielb1 commented Oct 3, 2016

We should also be doing this for leak_check.

Loading

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Oct 3, 2016

We should also be doing this for leak_check.

leak_check doesn't show up nearly as high as plug_leaks in profiles, but it does show up. I'll do it in a follow-up. Thank you for the suggestion.

Loading

@brson brson added the relnotes label Oct 4, 2016
@arielb1
Copy link
Contributor

@arielb1 arielb1 commented Oct 4, 2016

leak-check done by #36931.

Loading

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Oct 4, 2016

leak-check done by #36931.

Huh. I have a local patch to add an early return to leak_check. It produces a measurable speed-up of a couple of percent on some of the rustc-benchmarks, but it doesn't look anything like the commits in #36931.

Loading

Manishearth added a commit to Manishearth/rust that referenced this issue Oct 4, 2016
…eddyb

Speed up `plug_leaks`

Profiling shows that `plug_leaks` and the functions it calls are hot on some benchmarks. It's very common that `skol_map` is empty in this function, and we can specialize `plug_leaks` in that case for some big speed-ups.

The PR has two commits. I'm fairly confident that the first one is correct -- I traced through the code to confirm that the `fold_regions` and `pop_skolemized` calls are no-ops when `skol_map` is empty, and I also temporarily added an assertion to check that `result` ends up having the same value as `value` in that case. This commit is responsible for most of the improvement.

I'm less confident about the second commit. The call to `resolve_type_vars_is_possible` can change `value` when `skol_map` is empty... but testing suggests that it doesn't matter if the call is
omitted.

So, please check both patches carefully, especially the second one!

Here are the speed-ups for the first commit alone.

stage1 compiler (built with old rustc, using glibc malloc), doing debug builds:
```
futures-rs-test  4.710s vs  4.538s --> 1.038x faster (variance: 1.009x, 1.005x)
issue-32062-equ  0.415s vs  0.368s --> 1.129x faster (variance: 1.009x, 1.010x)
issue-32278-big  1.884s vs  1.808s --> 1.042x faster (variance: 1.020x, 1.017x)
jld-day15-parse  1.907s vs  1.668s --> 1.143x faster (variance: 1.011x, 1.007x)
piston-image-0. 13.024s vs 12.421s --> 1.049x faster (variance: 1.004x, 1.012x)
rust-encoding-0  3.335s vs  3.276s --> 1.018x faster (variance: 1.021x, 1.028x)
```
stage2 compiler (built with new rustc, using jemalloc), doing debug builds:
```
futures-rs-test  4.167s vs  4.065s --> 1.025x faster (variance: 1.006x, 1.018x)
issue-32062-equ  0.383s vs  0.343s --> 1.118x faster (variance: 1.012x, 1.016x)
issue-32278-big  1.680s vs  1.621s --> 1.036x faster (variance: 1.007x, 1.007x)
jld-day15-parse  1.671s vs  1.478s --> 1.131x faster (variance: 1.016x, 1.004x)
piston-image-0. 11.336s vs 10.852s --> 1.045x faster (variance: 1.003x, 1.006x)
rust-encoding-0  3.036s vs  2.971s --> 1.022x faster (variance: 1.030x, 1.032x)
```
I've omitted the benchmarks for which the change was negligible.

And here are the speed-ups for the first and second commit in combination.

stage1 compiler (built with old rustc, using glibc malloc), doing debug
builds:
```
futures-rs-test  4.684s vs  4.498s --> 1.041x faster (variance: 1.012x, 1.012x)
issue-32062-equ  0.413s vs  0.355s --> 1.162x faster (variance: 1.019x, 1.006x)
issue-32278-big  1.869s vs  1.763s --> 1.060x faster (variance: 1.013x, 1.018x)
jld-day15-parse  1.900s vs  1.602s --> 1.186x faster (variance: 1.010x, 1.003x)
piston-image-0. 12.907s vs 12.352s --> 1.045x faster (variance: 1.005x, 1.006x)
rust-encoding-0  3.254s vs  3.248s --> 1.002x faster (variance: 1.063x, 1.045x)
```
stage2 compiler (built with new rustc, using jemalloc), doing debug builds:
```
futures-rs-test  4.183s vs  4.046s --> 1.034x faster (variance: 1.007x, 1.004x)
issue-32062-equ  0.380s vs  0.340s --> 1.117x faster (variance: 1.020x, 1.003x)
issue-32278-big  1.671s vs  1.616s --> 1.034x faster (variance: 1.031x, 1.012x)
jld-day15-parse  1.661s vs  1.417s --> 1.172x faster (variance: 1.013x, 1.005x)
piston-image-0. 11.347s vs 10.841s --> 1.047x faster (variance: 1.007x, 1.010x)
rust-encoding-0  3.050s vs  3.000s --> 1.017x faster (variance: 1.016x, 1.012x)
```
@eddyb: `git blame` suggests that you should review this. Thanks!
bors added a commit that referenced this issue Oct 4, 2016
Rollup of 12 pull requests

- Successful merges: #36798, #36878, #36902, #36903, #36908, #36916, #36917, #36921, #36928, #36938, #36941, #36951
- Failed merges:
@arielb1
Copy link
Contributor

@arielb1 arielb1 commented Oct 4, 2016

Huh. I have a local patch to add an early return to leak_check. It produces a measurable speed-up of a couple of percent on some of the rustc-benchmarks, but it doesn't look anything like the commits in #36931.

How do you run the rustc benchmarks? I've been measuring bootstrap times. My patch short-cuts the fast path to leak_check at the callsites.

Loading

@bors bors merged commit 3779971 into rust-lang:master Oct 4, 2016
1 check passed
Loading
@nnethercote nnethercote deleted the speed-up-plug_leaks branch Oct 4, 2016
@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Oct 4, 2016

@arielb1: I updated my local repo to get #36931 and the speed-ups from my local patch have now evaporated, so I guess its effect overlapped with the effect of your commit. My patch was simpler, though, adding just a if skol_map.is_empty() { return Ok(()); } near the start of leak_check. I don't know the code well enough to tell if that's functionally equivalent to your commit.

As for running the benchmarks, I compared two stage1 compilers that were both configured with --enable-optimize --enable-debuginfo. I used the compare.py script for this, see https://internals.rust-lang.org/t/recent-rustc-benchmarks-improvements/4153 for instructions.

Loading

@arielb1
Copy link
Contributor

@arielb1 arielb1 commented Oct 4, 2016

It should be functionally equivalent. Is it performance-equivalent (my patch skips a bunch of unneeded setup code)? [checking]

Loading

@arielb1
Copy link
Contributor

@arielb1 arielb1 commented Oct 4, 2016

The leak_check patch seems to be slower. Will investigate more tomorrow.

Loading

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Oct 8, 2016

Good news -- the improvement for jld-day15-parser showed up clearly on perf.rust-lang.org.
plug_leaks
It's the big drop on the RHS on Oct 4. The time dropped from 1.402 to 1.199, which is a 1.169x speed-up (a.k.a a 14.5% reduction in execution time).

The other benchmarks mentioned in the first comment also saw speedups similar to what I measured locally, which is good.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants