Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize `Substs::super_fold_with`. #37108

Merged
merged 2 commits into from Oct 19, 2016

Conversation

@nnethercote
Copy link
Contributor

commented Oct 12, 2016

This speeds up some of the rustc-benchmarks by up to ~4%.

futures-rs-test  4.467s vs  4.387s --> 1.018x faster (variance: 1.001x, 1.006x)
helloworld       0.242s vs  0.246s --> 0.980x faster (variance: 1.007x, 1.013x)
html5ever-2016-  7.664s vs  7.630s --> 1.004x faster (variance: 1.008x, 1.006x)
hyper.0.5.0      5.218s vs  5.133s --> 1.016x faster (variance: 1.013x, 1.008x)
inflate-0.1.0    5.040s vs  5.103s --> 0.988x faster (variance: 1.005x, 1.008x)
issue-32062-equ  0.361s vs  0.345s --> 1.047x faster (variance: 1.008x, 1.019x)
issue-32278-big  1.874s vs  1.850s --> 1.013x faster (variance: 1.020x, 1.018x)
jld-day15-parse  1.569s vs  1.508s --> 1.040x faster (variance: 1.009x, 1.003x)
piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x)
regex.0.1.30     2.568s vs  2.555s --> 1.005x faster (variance: 1.018x, 1.044x)
rust-encoding-0  2.139s vs  2.135s --> 1.001x faster (variance: 1.012x, 1.005x)
syntex-0.42.2   33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x)
syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x)

r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.

// and instead reuse the existing substs.
// - In that case, when the substs length is one, we also avoid
// creating a new Vec, which avoids a heap allocation.
let params;

This comment has been minimized.

Copy link
@KalitaAlexey

KalitaAlexey Oct 12, 2016

Contributor

params is used only in 338 line block. Can you move into the block?

This comment has been minimized.

Copy link
@nnethercote

nnethercote Oct 12, 2016

Author Contributor

No. It's used on lines 336, 338, 349.

// - In that case, when the substs length is one, we also avoid
// creating a new Vec, which avoids a heap allocation.
let params;
if self.params.len() == 1 {

This comment has been minimized.

Copy link
@KalitaAlexey

KalitaAlexey Oct 12, 2016

Contributor

Can you rewrite it to match self.params.len()?

This comment has been minimized.

Copy link
@nnethercote

nnethercote Oct 12, 2016

Author Contributor

I prefer having it this way. The 1 case is most frequent, then >1, and 0 is a distant last. With a match the ordering is less clear (i.e. not guaranteed).

This comment has been minimized.

Copy link
@nnethercote

nnethercote Oct 12, 2016

Author Contributor

I can add a comment to that effect, if it helps.

This comment has been minimized.

Copy link
@KalitaAlexey

KalitaAlexey Oct 12, 2016

Contributor

@nnethercote Okay. I got it.

return &self;
}
} else {
return &self;

This comment has been minimized.

Copy link
@bluss

bluss Oct 12, 2016

Member

super minor, but the & in &self is redundant. The Self type is &'tcx Substs<'tcx> and the self identifier is of type &Self. Deref coercion means that using &self, self or *self here is going to be equivalent.

@eddyb

This comment has been minimized.

Copy link
Member

commented Oct 12, 2016

So this is an interesting take. I'd like to do it after moving to on-stack arrays and slice arena allocations instead of always allocating a Vec. Then it could either be an extension of that algorithm or maybe it wouldn't even have a measurable impact etc

@eddyb

This comment has been minimized.

Copy link
Member

commented Oct 12, 2016

Oh and this doesn't handle ty::relate which may have a larger impact.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2016

I'd like to do it after moving to on-stack arrays and slice arena allocations instead of always allocating a Vec.

My original version of the patch just checked if params == self.params and skipped the mk_substs call in that case, and that alone was a decent improvement because it avoids the interning, which is dominated by hash operations.

Then I added the skip-Vec-creation-in-the-length-1 case and that improved things a bit more.

@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2016

Oh and this doesn't handle ty::relate which may have a larger impact.

I haven't seen it in any of the profiles I've looked at. Can you describe in more detail how you think it might have an impact?

@bors

This comment has been minimized.

Copy link
Contributor

commented Oct 15, 2016

☔️ The latest upstream changes (presumably #37100) made this pull request unmergeable. Please resolve the merge conflicts.

@nnethercote nnethercote force-pushed the nnethercote:substs-experimentation branch from 7209d3f to 6310135 Oct 18, 2016
This speeds up several rustc-benchmarks by 1--4%.
@nnethercote nnethercote force-pushed the nnethercote:substs-experimentation branch from 6310135 to 1e4241a Oct 18, 2016
@nnethercote

This comment has been minimized.

Copy link
Contributor Author

commented Oct 18, 2016

I have changed the code to just do the "skip mk_substs if the fold was a no-op" optimisation. That should avoid any conflicts with @Mark-Simulacrum's changes to Substs. It still gives some nice speed-ups.

futures-rs-test  4.189s vs  4.157s --> 1.008x faster (variance: 1.004x, 1.007x)
helloworld       0.223s vs  0.223s --> 1.000x faster (variance: 1.013x, 1.012x)
html5ever-2016-  5.192s vs  5.147s --> 1.009x faster (variance: 1.007x, 1.004x)
hyper.0.5.0      4.788s vs  4.763s --> 1.005x faster (variance: 1.008x, 1.008x)
inflate-0.1.0    4.553s vs  4.494s --> 1.013x faster (variance: 1.010x, 1.044x)
issue-32062-equ  0.324s vs  0.314s --> 1.032x faster (variance: 1.008x, 1.020x)
issue-32278-big  1.661s vs  1.654s --> 1.005x faster (variance: 1.008x, 1.002x)
jld-day15-parse  1.475s vs  1.422s --> 1.038x faster (variance: 1.007x, 1.008x)
piston-image-0. 11.386s vs 11.227s --> 1.014x faster (variance: 1.005x, 1.063x)
reddit-stress    2.359s vs  2.308s --> 1.022x faster (variance: 1.017x, 1.024x)
regex.0.1.30     2.421s vs  2.388s --> 1.014x faster (variance: 1.036x, 1.008x)
rust-encoding-0  1.983s vs  1.973s --> 1.005x faster (variance: 1.008x, 1.016x)
syntex-0.42.2   29.582s vs 29.610s --> 0.999x faster (variance: 1.014x, 1.045x)
syntex-0.42.2-i 15.203s vs 15.175s --> 1.002x faster (variance: 1.011x, 1.006x)

r? @eddyb

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Oct 18, 2016

Note that it will be at least a few days, more likely a week until I can resume work on the Substs improvements, so I'm not sure removing things now is a good idea. I haven't followed this PR closely though, not sure how large/significant the changes here are.

@eddyb
eddyb approved these changes Oct 18, 2016
let params: Vec<_> = self.iter().map(|k| k.fold_with(folder)).collect();

// If folding doesn't change the substs, it's faster to avoid calling
// `mk_substs` and instead reuse the existing substs.

This comment has been minimized.

Copy link
@eddyb

eddyb Oct 18, 2016

Member

Hmm, maybe open an issue about doing this in general, or at least further investigation? That is, there's a bunch of places where we can short-circuit the interner, maybe some are worth doing.

@eddyb

This comment has been minimized.

Copy link
Member

commented Oct 18, 2016

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2016

📌 Commit 1e4241a has been approved by eddyb

@bors

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2016

⌛️ Testing commit 1e4241a with merge eeb6379...

bors added a commit that referenced this pull request Oct 18, 2016
Optimize `Substs::super_fold_with`.

This speeds up some of the rustc-benchmarks by up to ~4%.
```
futures-rs-test  4.467s vs  4.387s --> 1.018x faster (variance: 1.001x, 1.006x)
helloworld       0.242s vs  0.246s --> 0.980x faster (variance: 1.007x, 1.013x)
html5ever-2016-  7.664s vs  7.630s --> 1.004x faster (variance: 1.008x, 1.006x)
hyper.0.5.0      5.218s vs  5.133s --> 1.016x faster (variance: 1.013x, 1.008x)
inflate-0.1.0    5.040s vs  5.103s --> 0.988x faster (variance: 1.005x, 1.008x)
issue-32062-equ  0.361s vs  0.345s --> 1.047x faster (variance: 1.008x, 1.019x)
issue-32278-big  1.874s vs  1.850s --> 1.013x faster (variance: 1.020x, 1.018x)
jld-day15-parse  1.569s vs  1.508s --> 1.040x faster (variance: 1.009x, 1.003x)
piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x)
regex.0.1.30     2.568s vs  2.555s --> 1.005x faster (variance: 1.018x, 1.044x)
rust-encoding-0  2.139s vs  2.135s --> 1.001x faster (variance: 1.012x, 1.005x)
syntex-0.42.2   33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x)
syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x)
```
r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.
@alexcrichton

This comment has been minimized.

Copy link
Member

commented Oct 18, 2016

@bors: retry force clean

  • restarted buildbot
Just kidding I'm doing this only to unstuck @bors/homu/buildbot.
@eddyb

This comment has been minimized.

Copy link
Member

commented Oct 19, 2016

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Oct 19, 2016

📌 Commit ab5dcff has been approved by eddyb

eddyb added a commit to eddyb/rust that referenced this pull request Oct 19, 2016
… r=eddyb

Optimize `Substs::super_fold_with`.

This speeds up some of the rustc-benchmarks by up to ~4%.
```
futures-rs-test  4.467s vs  4.387s --> 1.018x faster (variance: 1.001x, 1.006x)
helloworld       0.242s vs  0.246s --> 0.980x faster (variance: 1.007x, 1.013x)
html5ever-2016-  7.664s vs  7.630s --> 1.004x faster (variance: 1.008x, 1.006x)
hyper.0.5.0      5.218s vs  5.133s --> 1.016x faster (variance: 1.013x, 1.008x)
inflate-0.1.0    5.040s vs  5.103s --> 0.988x faster (variance: 1.005x, 1.008x)
issue-32062-equ  0.361s vs  0.345s --> 1.047x faster (variance: 1.008x, 1.019x)
issue-32278-big  1.874s vs  1.850s --> 1.013x faster (variance: 1.020x, 1.018x)
jld-day15-parse  1.569s vs  1.508s --> 1.040x faster (variance: 1.009x, 1.003x)
piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x)
regex.0.1.30     2.568s vs  2.555s --> 1.005x faster (variance: 1.018x, 1.044x)
rust-encoding-0  2.139s vs  2.135s --> 1.001x faster (variance: 1.012x, 1.005x)
syntex-0.42.2   33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x)
syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x)
```
r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.
@eddyb eddyb referenced this pull request Oct 19, 2016
bors added a commit that referenced this pull request Oct 19, 2016
Rollup of 23 pull requests

- Successful merges: #36964, #37108, #37117, #37124, #37161, #37176, #37182, #37193, #37198, #37202, #37208, #37218, #37221, #37224, #37230, #37231, #37233, #37236, #37240, #37254, #37257, #37265, #37267
- Failed merges: #37213, #37220, #37261
@bors bors merged commit ab5dcff into rust-lang:master Oct 19, 2016
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@nnethercote nnethercote deleted the nnethercote:substs-experimentation branch Oct 19, 2016
@brson brson added the relnotes label Oct 24, 2016
bors added a commit that referenced this pull request Nov 25, 2016
Avoid more unnecessary mk_ty calls in Ty::super_fold_with.

This speeds up several rustc-benchmarks by 1--5%.

This PR is the lovechild of #37108 and #37705.
```
futures-rs-test  4.059s vs  4.011s --> 1.012x faster (variance: 1.016x, 1.026x)
helloworld       0.236s vs  0.239s --> 0.986x faster (variance: 1.051x, 1.014x)
html5ever-2016-  3.831s vs  3.824s --> 1.002x faster (variance: 1.020x, 1.019x)
hyper.0.5.0      4.928s vs  4.936s --> 0.998x faster (variance: 1.003x, 1.012x)
inflate-0.1.0    4.135s vs  4.104s --> 1.007x faster (variance: 1.026x, 1.028x)
issue-32062-equ  0.309s vs  0.303s --> 1.017x faster (variance: 1.019x, 1.084x)
issue-32278-big  1.818s vs  1.797s --> 1.011x faster (variance: 1.011x, 1.008x)
jld-day15-parse  1.304s vs  1.271s --> 1.026x faster (variance: 1.018x, 1.012x)
piston-image-0. 10.938s vs 10.921s --> 1.002x faster (variance: 1.025x, 1.016x)
reddit-stress    2.327s vs  2.208s --> 1.054x faster (variance: 1.016x, 1.006x)
regex-0.1.80     8.796s vs  8.727s --> 1.008x faster (variance: 1.012x, 1.019x)
regex.0.1.30     2.294s vs  2.249s --> 1.020x faster (variance: 1.013x, 1.026x)
rust-encoding-0  1.914s vs  1.886s --> 1.015x faster (variance: 1.027x, 1.026x)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.