Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid most allocations in `Canonicalizer`. #52342

Merged
merged 1 commit into from Jul 18, 2018
Merged

Conversation

@nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Jul 13, 2018

Extra allocations are a significant cost of NLL, and the most common
ones come from within Canonicalizer. In particular, canonical_var()
contains this code:

indices
.entry(kind)
.or_insert_with(|| {
    let cvar1 = variables.push(info);
    let cvar2 = var_values.push(kind);
    assert_eq!(cvar1, cvar2);
    cvar1
})
.clone()

variables and var_values are Vecs. indices is a HashMap used
to track what elements have been inserted into var_values. If kind
hasn't been seen before, indices, variables and var_values all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the Vecs to SmallVecs and only using indices
once enough elements are present. (When the number of elements is small,
a direct linear search of var_values is as good or better than a
hashmap lookup.)

The changes to variables are straightforward and contained within
Canonicalizer. The changes to indices are more complex but also
contained within Canonicalizer. The changes to var_values are more
intrusive because they require defining a new type
SmallCanonicalVarValues -- which is to CanonicalVarValues as
SmallVec is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.

r? @nikomatsakis

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Jul 13, 2018

Here are the NLL builds with a non-negligible speed-up:

coercions-check
        avg: -2.4%      min: -2.4%      max: -2.4%
sentry-cli-check
        avg: -0.9%      min: -0.9%      max: -0.9%
regex-check
        avg: -0.9%      min: -0.9%      max: -0.9%
webrender-check
        avg: -0.8%      min: -0.8%      max: -0.8%
clap-rs-check
        avg: -0.8%      min: -0.8%      max: -0.8%
cargo-check
        avg: -0.8%      min: -0.8%      max: -0.8%
serde-check
        avg: -0.8%      min: -0.8%      max: -0.8%
ripgrep-check
        avg: -0.7%      min: -0.7%      max: -0.7%
style-servo-check
        avg: -0.7%?     min: -0.7%?     max: -0.7%?
regression-31157-check
        avg: -0.7%      min: -0.7%      max: -0.7%
tokio-webpush-simple-check
        avg: -0.6%      min: -0.6%      max: -0.6% 
encoding-check
        avg: -0.6%      min: -0.6%      max: -0.6%
syn-check
        avg: -0.5%      min: -0.5%      max: -0.5% 
futures-check
        avg: -0.5%      min: -0.5%      max: -0.5%
piston-image-check
        avg: -0.5%      min: -0.5%      max: -0.5%
crates.io-check
        avg: -0.4%      min: -0.4%      max: -0.4%
where
V: TypeFoldable<'tcx> + Lift<'gcx>,
{
let mut _var_values = SmallVec::new();

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Jul 13, 2018
Member

Can probably drop the leading underscore here?

This comment has been minimized.

@nnethercote

nnethercote Jul 15, 2018
Author Contributor

Ok.

assert_eq!(variables.len(), var_values.len());

// If `var_values` has become big enough to be heap-allocated,
// fill up `indices` to hasten subsequent lookups.

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Jul 13, 2018
Member

AFAICT, the code will not behave correctly if this prefill is removed -- perhaps we should update the comment to not say "hasten subsequent lookups"? i.e. this isn't an optimization but a requirement

This comment has been minimized.

@nnethercote

nnethercote Jul 15, 2018
Author Contributor

I'll change it to "facilitate"

// If `var_values` has become big enough to be heap-allocated,
// fill up `indices` to hasten subsequent lookups.
if !var_values.is_array() {
for (i, &kind) in var_values.iter().enumerate() {

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Jul 13, 2018
Member

Might as well do an indices.reserve here

This comment has been minimized.

@nnethercote

nnethercote Jul 15, 2018
Author Contributor

I realize that I can use collect here instead.

@nnethercote nnethercote force-pushed the nnethercote:CanonicalVar branch from c93eb64 to 4fbdc01 Jul 16, 2018
@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Jul 16, 2018

New version addresses all the comment.

@nikomatsakis
Copy link
Contributor

@nikomatsakis nikomatsakis commented Jul 16, 2018

Nice.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Seems like a good change! Got a few questions.

@@ -74,6 +75,10 @@ pub struct CanonicalVarValues<'tcx> {
pub var_values: IndexVec<CanonicalVar, Kind<'tcx>>,
}

/// Like CanonicalVarValues, but for use in places where a SmallVec is
/// appropriate.
pub type SmallCanonicalVarValues<'tcx> = SmallVec<[Kind<'tcx>; 8]>;

This comment has been minimized.

@nikomatsakis

nikomatsakis Jul 16, 2018
Contributor

Why not just use this everywhere...?

This comment has been minimized.

@nnethercote

nnethercote Jul 17, 2018
Author Contributor

Because CanonicalVarValue derives Clone, Debug, PartialEq, Eq, Hash, RustcDecodable, and RustcEncodable. In contrast, SmallVec doesn't define any of those. Also, I don't know what the impact of possible copying of SmallVecs (which are quite large, in terms of the number of bytes they take up on the stack) in lots of other places.

@@ -295,7 +304,8 @@ impl<'cx, 'gcx, 'tcx> Canonicalizer<'cx, 'gcx, 'tcx> {
infcx: Option<&'cx InferCtxt<'cx, 'gcx, 'tcx>>,
tcx: TyCtxt<'cx, 'gcx, 'tcx>,
canonicalize_region_mode: CanonicalizeRegionMode,
) -> (Canonicalized<'gcx, V>, CanonicalVarValues<'tcx>)
var_values: &'cx mut SmallCanonicalVarValues<'tcx>
) -> Canonicalized<'gcx, V>

This comment has been minimized.

@nikomatsakis

nikomatsakis Jul 16, 2018
Contributor

Why not just return this? I guess it's more efficient this way...?

This comment has been minimized.

@nnethercote

nnethercote Jul 17, 2018
Author Contributor

Yes. Copying the SmallCanonicalVars reduces the size of the win by about 20--25%. I figure we need every saving we can get for NLL!

// fill up `indices` to facilitate subsequent lookups.
if !var_values.is_array() {
assert!(indices.is_empty());
::std::mem::replace(

This comment has been minimized.

@nikomatsakis

nikomatsakis Jul 16, 2018
Contributor

Why replace and not *indices = ...? Seems simpler.

This comment has been minimized.

@nnethercote

nnethercote Jul 17, 2018
Author Contributor

True!

Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:

    indices
	.entry(kind)
	.or_insert_with(|| {
	    let cvar1 = variables.push(info);
	    let cvar2 = var_values.push(kind);
	    assert_eq!(cvar1, cvar2);
	    cvar1
	})
	.clone()

`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)

The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.
@nnethercote nnethercote force-pushed the nnethercote:CanonicalVar branch from 4fbdc01 to 7cc5277 Jul 17, 2018
@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Jul 17, 2018

Comments have been addressed. r? @nikomatsakis

@nikomatsakis
Copy link
Contributor

@nikomatsakis nikomatsakis commented Jul 17, 2018

@bors r+ -- seems good. we can revisit later, as I would like to do an even more aggressive optimization here anyway (#48417)

@bors
Copy link
Contributor

@bors bors commented Jul 17, 2018

📌 Commit 7cc5277 has been approved by nikomatsakis

@bors
Copy link
Contributor

@bors bors commented Jul 18, 2018

Testing commit 7cc5277 with merge f686885...

bors added a commit that referenced this pull request Jul 18, 2018
Avoid most allocations in `Canonicalizer`.

Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:

    indices
	.entry(kind)
	.or_insert_with(|| {
	    let cvar1 = variables.push(info);
	    let cvar2 = var_values.push(kind);
	    assert_eq!(cvar1, cvar2);
	    cvar1
	})
	.clone()

`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)

The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.

r? @nikomatsakis
@bors
Copy link
Contributor

@bors bors commented Jul 18, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing f686885 to master...

@bors bors merged commit 7cc5277 into rust-lang:master Jul 18, 2018
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@nnethercote nnethercote deleted the nnethercote:CanonicalVar branch Jul 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.