Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Avoid most allocations in `Canonicalizer`. #52342
Conversation
|
Here are the NLL builds with a non-negligible speed-up:
|
| where | ||
| V: TypeFoldable<'tcx> + Lift<'gcx>, | ||
| { | ||
| let mut _var_values = SmallVec::new(); |
Mark-Simulacrum
Jul 13, 2018
Member
Can probably drop the leading underscore here?
Can probably drop the leading underscore here?
nnethercote
Jul 15, 2018
Author
Contributor
Ok.
Ok.
| assert_eq!(variables.len(), var_values.len()); | ||
|
|
||
| // If `var_values` has become big enough to be heap-allocated, | ||
| // fill up `indices` to hasten subsequent lookups. |
Mark-Simulacrum
Jul 13, 2018
Member
AFAICT, the code will not behave correctly if this prefill is removed -- perhaps we should update the comment to not say "hasten subsequent lookups"? i.e. this isn't an optimization but a requirement
AFAICT, the code will not behave correctly if this prefill is removed -- perhaps we should update the comment to not say "hasten subsequent lookups"? i.e. this isn't an optimization but a requirement
nnethercote
Jul 15, 2018
Author
Contributor
I'll change it to "facilitate"
I'll change it to "facilitate"
| // If `var_values` has become big enough to be heap-allocated, | ||
| // fill up `indices` to hasten subsequent lookups. | ||
| if !var_values.is_array() { | ||
| for (i, &kind) in var_values.iter().enumerate() { |
Mark-Simulacrum
Jul 13, 2018
Member
Might as well do an indices.reserve here
Might as well do an indices.reserve here
nnethercote
Jul 15, 2018
Author
Contributor
I realize that I can use collect here instead.
I realize that I can use collect here instead.
|
New version addresses all the comment. |
|
Nice. |
|
Seems like a good change! Got a few questions. |
| @@ -74,6 +75,10 @@ pub struct CanonicalVarValues<'tcx> { | |||
| pub var_values: IndexVec<CanonicalVar, Kind<'tcx>>, | |||
| } | |||
|
|
|||
| /// Like CanonicalVarValues, but for use in places where a SmallVec is | |||
| /// appropriate. | |||
| pub type SmallCanonicalVarValues<'tcx> = SmallVec<[Kind<'tcx>; 8]>; | |||
nikomatsakis
Jul 16, 2018
Contributor
Why not just use this everywhere...?
Why not just use this everywhere...?
nnethercote
Jul 17, 2018
Author
Contributor
Because CanonicalVarValue derives Clone, Debug, PartialEq, Eq, Hash, RustcDecodable, and RustcEncodable. In contrast, SmallVec doesn't define any of those. Also, I don't know what the impact of possible copying of SmallVecs (which are quite large, in terms of the number of bytes they take up on the stack) in lots of other places.
Because CanonicalVarValue derives Clone, Debug, PartialEq, Eq, Hash, RustcDecodable, and RustcEncodable. In contrast, SmallVec doesn't define any of those. Also, I don't know what the impact of possible copying of SmallVecs (which are quite large, in terms of the number of bytes they take up on the stack) in lots of other places.
| @@ -295,7 +304,8 @@ impl<'cx, 'gcx, 'tcx> Canonicalizer<'cx, 'gcx, 'tcx> { | |||
| infcx: Option<&'cx InferCtxt<'cx, 'gcx, 'tcx>>, | |||
| tcx: TyCtxt<'cx, 'gcx, 'tcx>, | |||
| canonicalize_region_mode: CanonicalizeRegionMode, | |||
| ) -> (Canonicalized<'gcx, V>, CanonicalVarValues<'tcx>) | |||
| var_values: &'cx mut SmallCanonicalVarValues<'tcx> | |||
| ) -> Canonicalized<'gcx, V> | |||
nikomatsakis
Jul 16, 2018
Contributor
Why not just return this? I guess it's more efficient this way...?
Why not just return this? I guess it's more efficient this way...?
nnethercote
Jul 17, 2018
Author
Contributor
Yes. Copying the SmallCanonicalVars reduces the size of the win by about 20--25%. I figure we need every saving we can get for NLL!
Yes. Copying the SmallCanonicalVars reduces the size of the win by about 20--25%. I figure we need every saving we can get for NLL!
| // fill up `indices` to facilitate subsequent lookups. | ||
| if !var_values.is_array() { | ||
| assert!(indices.is_empty()); | ||
| ::std::mem::replace( |
nikomatsakis
Jul 16, 2018
Contributor
Why replace and not *indices = ...? Seems simpler.
Why replace and not *indices = ...? Seems simpler.
nnethercote
Jul 17, 2018
Author
Contributor
True!
True!
Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:
indices
.entry(kind)
.or_insert_with(|| {
let cvar1 = variables.push(info);
let cvar2 = var_values.push(kind);
assert_eq!(cvar1, cvar2);
cvar1
})
.clone()
`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.
In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)
The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.
All this speeds up a number of NLL "check" builds, the best by 2%.
|
Comments have been addressed. r? @nikomatsakis |
|
|
Avoid most allocations in `Canonicalizer`.
Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:
indices
.entry(kind)
.or_insert_with(|| {
let cvar1 = variables.push(info);
let cvar2 = var_values.push(kind);
assert_eq!(cvar1, cvar2);
cvar1
})
.clone()
`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.
In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)
The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.
All this speeds up a number of NLL "check" builds, the best by 2%.
r? @nikomatsakis
|
|
Extra allocations are a significant cost of NLL, and the most common
ones come from within
Canonicalizer. In particular,canonical_var()contains this code:
variablesandvar_valuesareVecs.indicesis aHashMapusedto track what elements have been inserted into
var_values. Ifkindhasn't been seen before,
indices,variablesandvar_valuesall geta new element. (The number of elements in each container is always the
same.) This results in lots of allocations.
In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the
Vecs toSmallVecs and only usingindicesonce enough elements are present. (When the number of elements is small,
a direct linear search of
var_valuesis as good or better than ahashmap lookup.)
The changes to
variablesare straightforward and contained withinCanonicalizer. The changes toindicesare more complex but alsocontained within
Canonicalizer. The changes tovar_valuesare moreintrusive because they require defining a new type
SmallCanonicalVarValues-- which is toCanonicalVarValuesasSmallVecis to `Vec -- and passing stack-allocated values of that typein from outside.
All this speeds up a number of NLL "check" builds, the best by 2%.
r? @nikomatsakis