Don't allocate during default HashSet creation. #36734

Merged
merged 1 commit into from Sep 26, 2016

Conversation

Projects
None yet
7 participants
@nnethercote
Contributor

nnethercote commented Sep 26, 2016

The following HashMap creation functions don't allocate heap storage for elements.

HashMap::new()
HashMap::default()
HashMap::with_hasher()

This is good, because it's surprisingly common to create a HashMap and never
use it. So that case should be cheap.

However, HashSet does not have the same behaviour. The corresponding creation
functions do allocate heap storage for the default number of non-zero
elements (which is 32 slots for 29 elements).

HashMap::new()
HashMap::default()
HashMap::with_hasher()

This commit gives HashSet the same behaviour as HashMap, by simply calling
the corresponding HashMap functions (something HashSet already does for
with_capacity and with_capacity_and_hasher). It also reformats one existing
HashSet construction to use a consistent single-line format.

This speeds up rustc itself by 1.01--1.04x on most of the non-tiny
rustc-benchmarks.

Don't allocate during default HashSet creation.
The following `HashMap` creation functions don't allocate heap storage for elements.
```
HashMap::new()
HashMap::default()
HashMap::with_hasher()
```
This is good, because it's surprisingly common to create a HashMap and never
use it. So that case should be cheap.

However, `HashSet` does not have the same behaviour. The corresponding creation
functions *do* allocate heap storage for the default number of non-zero
elements (which is 32 slots for 29 elements).
```
HashMap::new()
HashMap::default()
HashMap::with_hasher()
```
This commit gives `HashSet` the same behaviour as `HashMap`, by simply calling
the corresponding `HashMap` functions (something `HashSet` already does for
`with_capacity` and `with_capacity_and_hasher`). It also reformats one existing
`HashSet` construction to use a consistent single-line format.

This speeds up rustc itself by 1.01--1.04x on most of the non-tiny
rustc-benchmarks.
@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Sep 26, 2016

Collaborator

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Sep 26, 2016

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 26, 2016

Contributor

Some detailed measurements...

stage1 rustc (w/glibc malloc) doing debug builds.

futures-rs-test-all    7.360s vs  7.185s --> 1.024x faster
hyper.0.5.0           20.051s vs 19.327s --> 1.037x faster
inflate-0.1.0         10.607s vs 10.404s --> 1.020x faster
issue-32278-big-arra   1.917s vs  1.912s --> 1.003x faster
jld-day15-parser       5.628s vs  5.518s --> 1.020x faster
piston-image-0.10.3   27.727s vs 26.980s --> 1.028x faster
regex.0.1.30           2.710s vs  2.639s --> 1.027x faster
rust-encoding-0.3.0    3.347s vs  3.304s --> 1.013x faster
syntex-0.42.2         50.658s vs 49.310s --> 1.027x faster

stage1 rustc (w/jemalloc) doing debug builds.

futures-rs-test-all    6.251s vs  6.157s --> 1.015x faster
hyper.0.5.0           16.640s vs 16.079s --> 1.035x faster
inflate-0.1.0          8.777s vs  8.682s --> 1.011x faster
issue-32278-big-arra   1.556s vs  1.582s --> 0.984x faster
jld-day15-parser       4.475s vs  4.411s --> 1.015x faster
piston-image-0.10.3   23.127s vs 22.900s --> 1.010x faster
regex.0.1.30           2.402s vs  2.335s --> 1.029x faster
rust-encoding-0.3.0    3.040s vs  3.035s --> 1.002x faster
syntex-0.42.2         42.263s vs 41.344s --> 1.022x faster

Notes:

  • The issue-32278-big-arra slowdown is just noise; I did a few extra runs and
    some showed slight speed-ups, some showed slight slow-downs.
  • I omitted the results for html5ever because the variation was too large to get
    useful results.
  • I omitted the results for helloworld and issue-32062 because they're both
    very small, and didn't see any benefit from this change.
Contributor

nnethercote commented Sep 26, 2016

Some detailed measurements...

stage1 rustc (w/glibc malloc) doing debug builds.

futures-rs-test-all    7.360s vs  7.185s --> 1.024x faster
hyper.0.5.0           20.051s vs 19.327s --> 1.037x faster
inflate-0.1.0         10.607s vs 10.404s --> 1.020x faster
issue-32278-big-arra   1.917s vs  1.912s --> 1.003x faster
jld-day15-parser       5.628s vs  5.518s --> 1.020x faster
piston-image-0.10.3   27.727s vs 26.980s --> 1.028x faster
regex.0.1.30           2.710s vs  2.639s --> 1.027x faster
rust-encoding-0.3.0    3.347s vs  3.304s --> 1.013x faster
syntex-0.42.2         50.658s vs 49.310s --> 1.027x faster

stage1 rustc (w/jemalloc) doing debug builds.

futures-rs-test-all    6.251s vs  6.157s --> 1.015x faster
hyper.0.5.0           16.640s vs 16.079s --> 1.035x faster
inflate-0.1.0          8.777s vs  8.682s --> 1.011x faster
issue-32278-big-arra   1.556s vs  1.582s --> 0.984x faster
jld-day15-parser       4.475s vs  4.411s --> 1.015x faster
piston-image-0.10.3   23.127s vs 22.900s --> 1.010x faster
regex.0.1.30           2.402s vs  2.335s --> 1.029x faster
rust-encoding-0.3.0    3.040s vs  3.035s --> 1.002x faster
syntex-0.42.2         42.263s vs 41.344s --> 1.022x faster

Notes:

  • The issue-32278-big-arra slowdown is just noise; I did a few extra runs and
    some showed slight speed-ups, some showed slight slow-downs.
  • I omitted the results for html5ever because the variation was too large to get
    useful results.
  • I omitted the results for helloworld and issue-32062 because they're both
    very small, and didn't see any benefit from this change.
@Aatch

This comment has been minimized.

Show comment
Hide comment
@Aatch

Aatch Sep 26, 2016

Contributor

@bors r+

Contributor

Aatch commented Sep 26, 2016

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 26, 2016

Contributor

📌 Commit 4eb069c has been approved by Aatch

Contributor

bors commented Sep 26, 2016

📌 Commit 4eb069c has been approved by Aatch

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 26, 2016

Contributor

BTW, this problem was identified with Valgrind's DHAT tool. Here is the resulting change in allocations for future-rs-test-all:

tot_alloc:    3,746,817,651 bytes in 8,978,467 blocks
tot_alloc:    1,784,955,527 bytes in 8,060,575 blocks

and for hyper:

tot_alloc:    5,264,555,775 bytes in 10,361,838 blocks
tot_alloc:    2,290,236,835 bytes in 8,636,727 blocks
Contributor

nnethercote commented Sep 26, 2016

BTW, this problem was identified with Valgrind's DHAT tool. Here is the resulting change in allocations for future-rs-test-all:

tot_alloc:    3,746,817,651 bytes in 8,978,467 blocks
tot_alloc:    1,784,955,527 bytes in 8,060,575 blocks

and for hyper:

tot_alloc:    5,264,555,775 bytes in 10,361,838 blocks
tot_alloc:    2,290,236,835 bytes in 8,636,727 blocks

@bluss bluss added the relnotes label Sep 26, 2016

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 26, 2016

Contributor

⌛️ Testing commit 4eb069c with merge 3bf4a7a...

Contributor

bors commented Sep 26, 2016

⌛️ Testing commit 4eb069c with merge 3bf4a7a...

bors added a commit that referenced this pull request Sep 26, 2016

Auto merge of #36734 - nnethercote:fix-HashSet-sizing, r=Aatch
Don't allocate during default HashSet creation.

The following `HashMap` creation functions don't allocate heap storage for elements.
```
HashMap::new()
HashMap::default()
HashMap::with_hasher()
```
This is good, because it's surprisingly common to create a HashMap and never
use it. So that case should be cheap.

However, `HashSet` does not have the same behaviour. The corresponding creation
functions *do* allocate heap storage for the default number of non-zero
elements (which is 32 slots for 29 elements).
```
HashMap::new()
HashMap::default()
HashMap::with_hasher()
```
This commit gives `HashSet` the same behaviour as `HashMap`, by simply calling
the corresponding `HashMap` functions (something `HashSet` already does for
`with_capacity` and `with_capacity_and_hasher`). It also reformats one existing
`HashSet` construction to use a consistent single-line format.

This speeds up rustc itself by 1.01--1.04x on most of the non-tiny
rustc-benchmarks.

@bors bors merged commit 4eb069c into rust-lang:master Sep 26, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 26, 2016

Member

Nice!

Member

alexcrichton commented Sep 26, 2016

Nice!

@nnethercote nnethercote deleted the nnethercote:fix-HashSet-sizing branch Oct 7, 2016

bors added a commit that referenced this pull request Apr 28, 2018

Auto merge of #50240 - nnethercote:LazyBTreeMap, r=cramertj
Implement LazyBTreeMap and use it in a few places.

This is a thin wrapper around BTreeMap that avoids allocating upon creation.

I would prefer to change BTreeMap directly to make it lazy (like I did with HashSet in #36734) and I initially attempted that by making BTreeMap::root an Option<>. But then I also had to change Iter and Range to handle trees with no root, and those types have stability markers on them and I wasn't sure if that was acceptable. Also, BTreeMap has a lot of complex code and changing it all was challenging, and I didn't have high confidence about my general approach.

So I prototyped this wrapper instead and used it in the hottest locations to get some measurements about the effect. The measurements are pretty good!

- Doing a debug build of serde, it reduces the total number of heap allocations from 17,728,709 to 13,359,384, a 25% reduction. The number of bytes allocated drops from 7,474,672,966 to 5,482,308,388, a 27% reduction.

- It gives speedups of up to 3.6% on some rustc-perf benchmark jobs. crates.io, futures, and serde benefit most.
```
futures-check
        avg: -1.9%      min: -3.6%      max: -0.5%
serde-check
        avg: -2.1%      min: -3.5%      max: -0.7%
crates.io-check
        avg: -1.7%      min: -3.5%      max: -0.3%
serde
        avg: -2.0%      min: -3.0%      max: -0.9%
serde-opt
        avg: -1.8%      min: -2.9%      max: -0.3%
futures
        avg: -1.5%      min: -2.8%      max: -0.4%
tokio-webpush-simple-check
        avg: -1.1%      min: -2.2%      max: -0.1%
futures-opt
        avg: -1.2%      min: -2.1%      max: -0.4%
piston-image-check
        avg: -0.8%      min: -1.1%      max: -0.3%
crates.io
        avg: -0.6%      min: -1.0%      max: -0.3%
```
@Gankro, how do you think I should proceed here? Is leaving this as a wrapper reasonable? Or should I try to make BTreeMap itself lazy? If so, can I change the representation of Iter and Range?

Thanks!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment