New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid most allocations in `Canonicalizer`. #52342

Merged
merged 1 commit into from Jul 18, 2018

Conversation

Projects
None yet
5 participants
@nnethercote
Contributor

nnethercote commented Jul 13, 2018

Extra allocations are a significant cost of NLL, and the most common
ones come from within Canonicalizer. In particular, canonical_var()
contains this code:

indices
.entry(kind)
.or_insert_with(|| {
    let cvar1 = variables.push(info);
    let cvar2 = var_values.push(kind);
    assert_eq!(cvar1, cvar2);
    cvar1
})
.clone()

variables and var_values are Vecs. indices is a HashMap used
to track what elements have been inserted into var_values. If kind
hasn't been seen before, indices, variables and var_values all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the Vecs to SmallVecs and only using indices
once enough elements are present. (When the number of elements is small,
a direct linear search of var_values is as good or better than a
hashmap lookup.)

The changes to variables are straightforward and contained within
Canonicalizer. The changes to indices are more complex but also
contained within Canonicalizer. The changes to var_values are more
intrusive because they require defining a new type
SmallCanonicalVarValues -- which is to CanonicalVarValues as
SmallVec is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.

r? @nikomatsakis

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 13, 2018

Here are the NLL builds with a non-negligible speed-up:

coercions-check
        avg: -2.4%      min: -2.4%      max: -2.4%
sentry-cli-check
        avg: -0.9%      min: -0.9%      max: -0.9%
regex-check
        avg: -0.9%      min: -0.9%      max: -0.9%
webrender-check
        avg: -0.8%      min: -0.8%      max: -0.8%
clap-rs-check
        avg: -0.8%      min: -0.8%      max: -0.8%
cargo-check
        avg: -0.8%      min: -0.8%      max: -0.8%
serde-check
        avg: -0.8%      min: -0.8%      max: -0.8%
ripgrep-check
        avg: -0.7%      min: -0.7%      max: -0.7%
style-servo-check
        avg: -0.7%?     min: -0.7%?     max: -0.7%?
regression-31157-check
        avg: -0.7%      min: -0.7%      max: -0.7%
tokio-webpush-simple-check
        avg: -0.6%      min: -0.6%      max: -0.6% 
encoding-check
        avg: -0.6%      min: -0.6%      max: -0.6%
syn-check
        avg: -0.5%      min: -0.5%      max: -0.5% 
futures-check
        avg: -0.5%      min: -0.5%      max: -0.5%
piston-image-check
        avg: -0.5%      min: -0.5%      max: -0.5%
crates.io-check
        avg: -0.4%      min: -0.4%      max: -0.4%
where
V: TypeFoldable<'tcx> + Lift<'gcx>,
{
let mut _var_values = SmallVec::new();

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Jul 13, 2018

Member

Can probably drop the leading underscore here?

This comment has been minimized.

@nnethercote

nnethercote Jul 15, 2018

Contributor

Ok.

assert_eq!(variables.len(), var_values.len());
// If `var_values` has become big enough to be heap-allocated,
// fill up `indices` to hasten subsequent lookups.

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Jul 13, 2018

Member

AFAICT, the code will not behave correctly if this prefill is removed -- perhaps we should update the comment to not say "hasten subsequent lookups"? i.e. this isn't an optimization but a requirement

This comment has been minimized.

@nnethercote

nnethercote Jul 15, 2018

Contributor

I'll change it to "facilitate"

// If `var_values` has become big enough to be heap-allocated,
// fill up `indices` to hasten subsequent lookups.
if !var_values.is_array() {
for (i, &kind) in var_values.iter().enumerate() {

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Jul 13, 2018

Member

Might as well do an indices.reserve here

This comment has been minimized.

@nnethercote

nnethercote Jul 15, 2018

Contributor

I realize that I can use collect here instead.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 16, 2018

New version addresses all the comment.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 16, 2018

Nice.

@nikomatsakis

Seems like a good change! Got a few questions.

@@ -74,6 +75,10 @@ pub struct CanonicalVarValues<'tcx> {
pub var_values: IndexVec<CanonicalVar, Kind<'tcx>>,
}
/// Like CanonicalVarValues, but for use in places where a SmallVec is
/// appropriate.
pub type SmallCanonicalVarValues<'tcx> = SmallVec<[Kind<'tcx>; 8]>;

This comment has been minimized.

@nikomatsakis

nikomatsakis Jul 16, 2018

Contributor

Why not just use this everywhere...?

This comment has been minimized.

@nnethercote

nnethercote Jul 17, 2018

Contributor

Because CanonicalVarValue derives Clone, Debug, PartialEq, Eq, Hash, RustcDecodable, and RustcEncodable. In contrast, SmallVec doesn't define any of those. Also, I don't know what the impact of possible copying of SmallVecs (which are quite large, in terms of the number of bytes they take up on the stack) in lots of other places.

@@ -295,7 +304,8 @@ impl<'cx, 'gcx, 'tcx> Canonicalizer<'cx, 'gcx, 'tcx> {
infcx: Option<&'cx InferCtxt<'cx, 'gcx, 'tcx>>,
tcx: TyCtxt<'cx, 'gcx, 'tcx>,
canonicalize_region_mode: CanonicalizeRegionMode,
) -> (Canonicalized<'gcx, V>, CanonicalVarValues<'tcx>)
var_values: &'cx mut SmallCanonicalVarValues<'tcx>
) -> Canonicalized<'gcx, V>

This comment has been minimized.

@nikomatsakis

nikomatsakis Jul 16, 2018

Contributor

Why not just return this? I guess it's more efficient this way...?

This comment has been minimized.

@nnethercote

nnethercote Jul 17, 2018

Contributor

Yes. Copying the SmallCanonicalVars reduces the size of the win by about 20--25%. I figure we need every saving we can get for NLL!

// fill up `indices` to facilitate subsequent lookups.
if !var_values.is_array() {
assert!(indices.is_empty());
::std::mem::replace(

This comment has been minimized.

@nikomatsakis

nikomatsakis Jul 16, 2018

Contributor

Why replace and not *indices = ...? Seems simpler.

This comment has been minimized.

@nnethercote

nnethercote Jul 17, 2018

Contributor

True!

Avoid most allocations in `Canonicalizer`.
Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:

    indices
	.entry(kind)
	.or_insert_with(|| {
	    let cvar1 = variables.push(info);
	    let cvar2 = var_values.push(kind);
	    assert_eq!(cvar1, cvar2);
	    cvar1
	})
	.clone()

`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)

The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.
@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Jul 17, 2018

Comments have been addressed. r? @nikomatsakis

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Jul 17, 2018

@bors r+ -- seems good. we can revisit later, as I would like to do an even more aggressive optimization here anyway (#48417)

@bors

This comment has been minimized.

Contributor

bors commented Jul 17, 2018

📌 Commit 7cc5277 has been approved by nikomatsakis

@bors

This comment has been minimized.

Contributor

bors commented Jul 18, 2018

⌛️ Testing commit 7cc5277 with merge f686885...

bors added a commit that referenced this pull request Jul 18, 2018

Auto merge of #52342 - nnethercote:CanonicalVar, r=nikomatsakis
Avoid most allocations in `Canonicalizer`.

Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:

    indices
	.entry(kind)
	.or_insert_with(|| {
	    let cvar1 = variables.push(info);
	    let cvar2 = var_values.push(kind);
	    assert_eq!(cvar1, cvar2);
	    cvar1
	})
	.clone()

`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)

The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.

r? @nikomatsakis
@bors

This comment has been minimized.

Contributor

bors commented Jul 18, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing f686885 to master...

@bors bors merged commit 7cc5277 into rust-lang:master Jul 18, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@nnethercote nnethercote deleted the nnethercote:CanonicalVar branch Jul 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment