Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Optimize `Substs::super_fold_with`. #37108
Conversation
| // and instead reuse the existing substs. | ||
| // - In that case, when the substs length is one, we also avoid | ||
| // creating a new Vec, which avoids a heap allocation. | ||
| let params; |
KalitaAlexey
Oct 12, 2016
Contributor
params is used only in 338 line block. Can you move into the block?
params is used only in 338 line block. Can you move into the block?
nnethercote
Oct 12, 2016
Author
Contributor
No. It's used on lines 336, 338, 349.
No. It's used on lines 336, 338, 349.
| // - In that case, when the substs length is one, we also avoid | ||
| // creating a new Vec, which avoids a heap allocation. | ||
| let params; | ||
| if self.params.len() == 1 { |
KalitaAlexey
Oct 12, 2016
Contributor
Can you rewrite it to match self.params.len()?
Can you rewrite it to match self.params.len()?
nnethercote
Oct 12, 2016
Author
Contributor
I prefer having it this way. The 1 case is most frequent, then >1, and 0 is a distant last. With a match the ordering is less clear (i.e. not guaranteed).
I prefer having it this way. The 1 case is most frequent, then >1, and 0 is a distant last. With a match the ordering is less clear (i.e. not guaranteed).
nnethercote
Oct 12, 2016
Author
Contributor
I can add a comment to that effect, if it helps.
I can add a comment to that effect, if it helps.
| return &self; | ||
| } | ||
| } else { | ||
| return &self; |
bluss
Oct 12, 2016
Member
super minor, but the & in &self is redundant. The Self type is &'tcx Substs<'tcx> and the self identifier is of type &Self. Deref coercion means that using &self, self or *self here is going to be equivalent.
super minor, but the & in &self is redundant. The Self type is &'tcx Substs<'tcx> and the self identifier is of type &Self. Deref coercion means that using &self, self or *self here is going to be equivalent.
|
So this is an interesting take. I'd like to do it after moving to on-stack arrays and slice arena allocations instead of always allocating a |
|
Oh and this doesn't handle |
My original version of the patch just checked if Then I added the skip-Vec-creation-in-the-length-1 case and that improved things a bit more. |
I haven't seen it in any of the profiles I've looked at. Can you describe in more detail how you think it might have an impact? |
|
|
7209d3f
to
6310135
This speeds up several rustc-benchmarks by 1--4%.
6310135
to
1e4241a
|
I have changed the code to just do the "skip mk_substs if the fold was a no-op" optimisation. That should avoid any conflicts with @Mark-Simulacrum's changes to
r? @eddyb |
|
Note that it will be at least a few days, more likely a week until I can resume work on the |
| let params: Vec<_> = self.iter().map(|k| k.fold_with(folder)).collect(); | ||
|
|
||
| // If folding doesn't change the substs, it's faster to avoid calling | ||
| // `mk_substs` and instead reuse the existing substs. |
eddyb
Oct 18, 2016
Member
Hmm, maybe open an issue about doing this in general, or at least further investigation? That is, there's a bunch of places where we can short-circuit the interner, maybe some are worth doing.
Hmm, maybe open an issue about doing this in general, or at least further investigation? That is, there's a bunch of places where we can short-circuit the interner, maybe some are worth doing.
|
@bors r+ |
|
|
Optimize `Substs::super_fold_with`. This speeds up some of the rustc-benchmarks by up to ~4%. ``` futures-rs-test 4.467s vs 4.387s --> 1.018x faster (variance: 1.001x, 1.006x) helloworld 0.242s vs 0.246s --> 0.980x faster (variance: 1.007x, 1.013x) html5ever-2016- 7.664s vs 7.630s --> 1.004x faster (variance: 1.008x, 1.006x) hyper.0.5.0 5.218s vs 5.133s --> 1.016x faster (variance: 1.013x, 1.008x) inflate-0.1.0 5.040s vs 5.103s --> 0.988x faster (variance: 1.005x, 1.008x) issue-32062-equ 0.361s vs 0.345s --> 1.047x faster (variance: 1.008x, 1.019x) issue-32278-big 1.874s vs 1.850s --> 1.013x faster (variance: 1.020x, 1.018x) jld-day15-parse 1.569s vs 1.508s --> 1.040x faster (variance: 1.009x, 1.003x) piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x) regex.0.1.30 2.568s vs 2.555s --> 1.005x faster (variance: 1.018x, 1.044x) rust-encoding-0 2.139s vs 2.135s --> 1.001x faster (variance: 1.012x, 1.005x) syntex-0.42.2 33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x) syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x) ``` r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.
|
@bors: retry force clean
|
Just kidding I'm doing this only to unstuck @bors/homu/buildbot.
|
@bors r+ |
|
|
… r=eddyb Optimize `Substs::super_fold_with`. This speeds up some of the rustc-benchmarks by up to ~4%. ``` futures-rs-test 4.467s vs 4.387s --> 1.018x faster (variance: 1.001x, 1.006x) helloworld 0.242s vs 0.246s --> 0.980x faster (variance: 1.007x, 1.013x) html5ever-2016- 7.664s vs 7.630s --> 1.004x faster (variance: 1.008x, 1.006x) hyper.0.5.0 5.218s vs 5.133s --> 1.016x faster (variance: 1.013x, 1.008x) inflate-0.1.0 5.040s vs 5.103s --> 0.988x faster (variance: 1.005x, 1.008x) issue-32062-equ 0.361s vs 0.345s --> 1.047x faster (variance: 1.008x, 1.019x) issue-32278-big 1.874s vs 1.850s --> 1.013x faster (variance: 1.020x, 1.018x) jld-day15-parse 1.569s vs 1.508s --> 1.040x faster (variance: 1.009x, 1.003x) piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x) regex.0.1.30 2.568s vs 2.555s --> 1.005x faster (variance: 1.018x, 1.044x) rust-encoding-0 2.139s vs 2.135s --> 1.001x faster (variance: 1.012x, 1.005x) syntex-0.42.2 33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x) syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x) ``` r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.
Avoid more unnecessary mk_ty calls in Ty::super_fold_with. This speeds up several rustc-benchmarks by 1--5%. This PR is the lovechild of #37108 and #37705. ``` futures-rs-test 4.059s vs 4.011s --> 1.012x faster (variance: 1.016x, 1.026x) helloworld 0.236s vs 0.239s --> 0.986x faster (variance: 1.051x, 1.014x) html5ever-2016- 3.831s vs 3.824s --> 1.002x faster (variance: 1.020x, 1.019x) hyper.0.5.0 4.928s vs 4.936s --> 0.998x faster (variance: 1.003x, 1.012x) inflate-0.1.0 4.135s vs 4.104s --> 1.007x faster (variance: 1.026x, 1.028x) issue-32062-equ 0.309s vs 0.303s --> 1.017x faster (variance: 1.019x, 1.084x) issue-32278-big 1.818s vs 1.797s --> 1.011x faster (variance: 1.011x, 1.008x) jld-day15-parse 1.304s vs 1.271s --> 1.026x faster (variance: 1.018x, 1.012x) piston-image-0. 10.938s vs 10.921s --> 1.002x faster (variance: 1.025x, 1.016x) reddit-stress 2.327s vs 2.208s --> 1.054x faster (variance: 1.016x, 1.006x) regex-0.1.80 8.796s vs 8.727s --> 1.008x faster (variance: 1.012x, 1.019x) regex.0.1.30 2.294s vs 2.249s --> 1.020x faster (variance: 1.013x, 1.026x) rust-encoding-0 1.914s vs 1.886s --> 1.015x faster (variance: 1.027x, 1.026x) ```
This speeds up some of the rustc-benchmarks by up to ~4%.
r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.