New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily allocate TypedArena's first chunk #36592

Merged
merged 1 commit into from Sep 22, 2016

Conversation

Projects
None yet
10 participants
@nnethercote
Contributor

nnethercote commented Sep 20, 2016

Currently TypedArena allocates its first chunk, which is usually 4096
bytes, as soon as it is created. If no allocations are ever made from
the arena then this allocation (and the corresponding deallocation) is
wasted effort.

This commit changes TypedArena so it doesn't allocate the first chunk
until the first allocation is made.

This change speeds up rustc by a non-trivial amount because rustc uses
TypedArena heavily: compilation speed (producing debug builds) on
several of the rustc-benchmarks increases by 1.02--1.06x. The change
should never cause a slow-down because the hot alloc function is
unchanged. It does increase the size of TypedArena by one usize
field, however.

The commit also fixes some out-of-date comments.

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Sep 20, 2016

Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Sep 20, 2016

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 20, 2016

Contributor

Some more details. For hyper.0.5.0 this reduces cumulative heap allocations
like so:

Before: 12,326,769,350 bytes in 13,849,772 blocks
After:   5,264,559,086 bytes in 10,361,847 blocks

(These measurements are from Valgrind's DHAT tool, which I used to identify
this problem.)

This is due to rustc's frequent use of CtxtArenas, which contains 10(!)
TypedArenas that are rarely used. When unused, the arena memory isn't
touched, but the cost of many malloc/free pairs is non-trivial. Here are the
speedups for the larger rustc-benchmarks on my Linux box.

opt stage1 rustc (w/glibc malloc) producing debug builds:
- hyper.0.5.0                          6.167s vs  5.927s --> 1.040x faster
- html5ever-2016-08-25                 8.511s vs  8.296s --> 1.026x faster
- regex.0.1.30                         2.970s vs  2.797s --> 1.062x faster
- piston-image-0.10.3                 13.848s vs 13.224s --> 1.047x faster
- rust-encoding-0.3.0                  3.654s vs  3.558s --> 1.027x faster

opt stage2 rustc (w/jemalloc) producing debug builds:
- hyper.0.5.0                          5.271s vs  5.188s --> 1.016x faster
- html5ever-2016-08-25                 6.957s vs  6.775s --> 1.027x faster
- regex.0.1.30                         2.518s vs  2.448s --> 1.029x faster
- piston-image-0.10.3                 11.689s vs 11.444s --> 1.021x faster
- rust-encoding-0.3.0                  3.276s vs  3.268s --> 1.002x faster

The stage2 improvements are smaller, presumably because jemalloc is faster at
doing unnecessary malloc/free operations.

Contributor

nnethercote commented Sep 20, 2016

Some more details. For hyper.0.5.0 this reduces cumulative heap allocations
like so:

Before: 12,326,769,350 bytes in 13,849,772 blocks
After:   5,264,559,086 bytes in 10,361,847 blocks

(These measurements are from Valgrind's DHAT tool, which I used to identify
this problem.)

This is due to rustc's frequent use of CtxtArenas, which contains 10(!)
TypedArenas that are rarely used. When unused, the arena memory isn't
touched, but the cost of many malloc/free pairs is non-trivial. Here are the
speedups for the larger rustc-benchmarks on my Linux box.

opt stage1 rustc (w/glibc malloc) producing debug builds:
- hyper.0.5.0                          6.167s vs  5.927s --> 1.040x faster
- html5ever-2016-08-25                 8.511s vs  8.296s --> 1.026x faster
- regex.0.1.30                         2.970s vs  2.797s --> 1.062x faster
- piston-image-0.10.3                 13.848s vs 13.224s --> 1.047x faster
- rust-encoding-0.3.0                  3.654s vs  3.558s --> 1.027x faster

opt stage2 rustc (w/jemalloc) producing debug builds:
- hyper.0.5.0                          5.271s vs  5.188s --> 1.016x faster
- html5ever-2016-08-25                 6.957s vs  6.775s --> 1.027x faster
- regex.0.1.30                         2.518s vs  2.448s --> 1.029x faster
- piston-image-0.10.3                 11.689s vs 11.444s --> 1.021x faster
- rust-encoding-0.3.0                  3.276s vs  3.268s --> 1.002x faster

The stage2 improvements are smaller, presumably because jemalloc is faster at
doing unnecessary malloc/free operations.

@Mark-Simulacrum

Not an official Rust reviewer, but some general thoughts I had when looking through the code.

Show outdated Hide outdated src/libarena/lib.rs
let prev_capacity = chunks.last().unwrap().storage.cap();
let new_capacity = prev_capacity.checked_mul(2).unwrap();
if chunks.last_mut().unwrap().storage.double_in_place() {
if chunks.len() == 0 {

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Sep 20, 2016

Member

Prefer Vec::is_empty()

@Mark-Simulacrum

Mark-Simulacrum Sep 20, 2016

Member

Prefer Vec::is_empty()

Show outdated Hide outdated src/libarena/lib.rs
for mut chunk in chunks_borrow.drain(..last_idx) {
let cap = chunk.storage.cap();
chunk.destroy(cap);
if chunks_borrow.len() > 0 {

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Sep 20, 2016

Member

Prefer !chunks_borrow.is_empty().

@Mark-Simulacrum

Mark-Simulacrum Sep 20, 2016

Member

Prefer !chunks_borrow.is_empty().

Show outdated Hide outdated src/libarena/lib.rs
chunk.destroy(cap);
if chunks_borrow.len() > 0 {
let last_idx = chunks_borrow.len() - 1;
self.clear_last_chunk(&mut chunks_borrow[last_idx]);

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Sep 20, 2016

Member

Why not chunks_borrow.last_mut()? It might conflict with the drain below, in which case you can use split_at_mut I think.

@Mark-Simulacrum

Mark-Simulacrum Sep 20, 2016

Member

Why not chunks_borrow.last_mut()? It might conflict with the drain below, in which case you can use split_at_mut I think.

Show outdated Hide outdated src/libarena/lib.rs
for chunk in chunks_borrow.iter_mut() {
let cap = chunk.storage.cap();
chunk.destroy(cap);
if chunks_borrow.len() > 0 {

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Sep 20, 2016

Member

Prefer is_empty.

@Mark-Simulacrum

Mark-Simulacrum Sep 20, 2016

Member

Prefer is_empty.

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Sep 20, 2016

Member

The code (both preexisting and current) also duplicates the pop last element, then drain/mutably iterate and destroy the rest of the chunks. Can it be extracted into a helper function?

I've discussed this with @nnethercote, I think they believe that this would be best done as a follow up PR; leaving this comment here so this idea doesn't get lost.

Member

Mark-Simulacrum commented Sep 20, 2016

The code (both preexisting and current) also duplicates the pop last element, then drain/mutably iterate and destroy the rest of the chunks. Can it be extracted into a helper function?

I've discussed this with @nnethercote, I think they believe that this would be best done as a follow up PR; leaving this comment here so this idea doesn't get lost.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 20, 2016

Contributor

I replaced the len() > 0 expressions in the original version with is_empty().

Contributor

nnethercote commented Sep 20, 2016

I replaced the len() > 0 expressions in the original version with is_empty().

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Sep 20, 2016

Contributor

In the libcollections convention, with_capacity is for explicit up front allocation of capacity, while new is welcome to not allocate anything. Since the compiler exclusively uses new what I can see, is it not best to follow the convention here?

Alternatively, with_capacity is already a bit outside the convention since it's really, with_chunk_size with_chunk_capacity or something like that, so it can be renamed.

Either way, doc comments need updates to not say "preallocated" for new and with_capacity.

Contributor

bluss commented Sep 20, 2016

In the libcollections convention, with_capacity is for explicit up front allocation of capacity, while new is welcome to not allocate anything. Since the compiler exclusively uses new what I can see, is it not best to follow the convention here?

Alternatively, with_capacity is already a bit outside the convention since it's really, with_chunk_size with_chunk_capacity or something like that, so it can be renamed.

Either way, doc comments need updates to not say "preallocated" for new and with_capacity.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Sep 20, 2016

Contributor

The is_empty tests seem to be inverted

Contributor

bluss commented Sep 20, 2016

The is_empty tests seem to be inverted

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 20, 2016

Contributor

Thank you for the comments, @bluss. I fixed the inverted is_empty tests and updated the comments for new and with_capacity.

Contributor

nnethercote commented Sep 20, 2016

Thank you for the comments, @bluss. I fixed the inverted is_empty tests and updated the comments for new and with_capacity.

@TimNN

This comment has been minimized.

Show comment
Hide comment
@TimNN

TimNN Sep 20, 2016

Contributor

I did a quick grep over the rust source code and I don't think TypedArena::with_capacity is ever used, so it may be possible to just remove it entirely.

Contributor

TimNN commented Sep 20, 2016

I did a quick grep over the rust source code and I don't think TypedArena::with_capacity is ever used, so it may be possible to just remove it entirely.

Show outdated Hide outdated src/libarena/lib.rs
let prev_capacity = chunks.last().unwrap().storage.cap();
let new_capacity = prev_capacity.checked_mul(2).unwrap();
if chunks.last_mut().unwrap().storage.double_in_place() {
if chunks.is_empty() {

This comment has been minimized.

@bluss

bluss Sep 20, 2016

Contributor

I would too like to rewrite this to use the .last_mut() option for control flow. (None is the empty case). But it needs some wrangling to be able to call chunks.push at the end.

@bluss

bluss Sep 20, 2016

Contributor

I would too like to rewrite this to use the .last_mut() option for control flow. (None is the empty case). But it needs some wrangling to be able to call chunks.push at the end.

Show outdated Hide outdated src/libarena/lib.rs
let cap = chunk.storage.cap();
chunk.destroy(cap);
if !chunks_borrow.is_empty() {
let mut last_chunk = chunks_borrow.pop().unwrap();

This comment has been minimized.

@bluss

bluss Sep 20, 2016

Contributor

pop's Option can be used for control flow

@bluss

bluss Sep 20, 2016

Contributor

pop's Option can be used for control flow

Show outdated Hide outdated src/libarena/lib.rs
for mut chunk in chunks_borrow.drain(..last_idx) {
let cap = chunk.storage.cap();
chunk.destroy(cap);
if !chunks_borrow.is_empty() {

This comment has been minimized.

@bluss

bluss Sep 20, 2016

Contributor

We could use .pop() here too, drain all other chunks, then put the last chunk back (seems like the simplest way to keep the borrow checker happy).

@bluss

bluss Sep 20, 2016

Contributor

We could use .pop() here too, drain all other chunks, then put the last chunk back (seems like the simplest way to keep the borrow checker happy).

This comment has been minimized.

@bluss

bluss Sep 20, 2016

Contributor

This is not more work than what drain already does.

@bluss

bluss Sep 20, 2016

Contributor

This is not more work than what drain already does.

@bluss

Using the Options for control flow will end up with prettier Rust code

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Sep 20, 2016

Contributor

(I haven't ever used the review feature before. I haven't heard any news on how we want to use it in the project.)

Contributor

bluss commented Sep 20, 2016

(I haven't ever used the review feature before. I haven't heard any news on how we want to use it in the project.)

Lazily allocate TypedArena's first chunk.
Currently `TypedArena` allocates its first chunk, which is usually 4096
bytes, as soon as it is created. If no allocations are ever made from
the arena then this allocation (and the corresponding deallocation) is
wasted effort.

This commit changes `TypedArena` so it doesn't allocate the first chunk
until the first allocation is made.

This change speeds up rustc by a non-trivial amount because rustc uses
`TypedArena` heavily: compilation speed (producing debug builds) on
several of the rustc-benchmarks increases by 1.02--1.06x. The change
should never cause a slow-down because the hot `alloc` function is
unchanged. It does increase the size of `TypedArena` by one `usize`
field, however.

The commit also fixes some out-of-date comments.
@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 20, 2016

Contributor

I made the requested control flow changes. I haven't changed with_capacity, though I'm happy to remove it if there is consensus there.

Contributor

nnethercote commented Sep 20, 2016

I made the requested control flow changes. I haven't changed with_capacity, though I'm happy to remove it if there is consensus there.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 20, 2016

Contributor

Note that clear and drop are now very similar, although one uses drain and the other iter_mut. I don't know if that similarity can be factored out.

Contributor

nnethercote commented Sep 20, 2016

Note that clear and drop are now very similar, although one uses drain and the other iter_mut. I don't know if that similarity can be factored out.

@bluss

bluss approved these changes Sep 20, 2016

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Sep 20, 2016

Contributor

@bors r+

Contributor

bluss commented Sep 20, 2016

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 20, 2016

Contributor

📌 Commit 80a4477 has been approved by bluss

Contributor

bors commented Sep 20, 2016

📌 Commit 80a4477 has been approved by bluss

@brson brson added the relnotes label Sep 20, 2016

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Sep 20, 2016

Contributor

Nice catch @nnethercote! The redundant arenas used to not matter because we had 1 CtxtArenas struct per compiler run, but we missed the overhead when we moved to 1 arena/function.

Contributor

arielb1 commented Sep 20, 2016

Nice catch @nnethercote! The redundant arenas used to not matter because we had 1 CtxtArenas struct per compiler run, but we missed the overhead when we moved to 1 arena/function.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Sep 20, 2016

Contributor

Now that I have a better idea of how rustc-benchmarks works, here are some
updated numbers. This is with a rustc configured with '--enable-optimize
--enable-debuginfo', producing debug builds.

stage 1 (uses glibc malloc)

futures-rs-test-all                  4.925s vs  4.755s --> 1.036x faster
helloworld                           0.220s vs  0.221s --> 0.995x faster
html5ever-2016-08-25                23.086s vs 22.216s --> 1.039x faster
hyper.0.5.0                         21.441s vs 20.491s --> 1.046x faster
inflate-0.1.0                        5.083s vs  4.860s --> 1.046x faster
issue-32062-equality-relations-c...  0.397s vs  0.396s --> 1.003x faster
issue-32278-big-array-of-strings     1.839s vs  1.837s --> 1.001x faster
jld-day15-parser                     5.805s vs  5.656s --> 1.026x faster
piston-image-0.10.3                 28.530s vs 27.061s --> 1.054x faster
regex.0.1.30                         2.975s vs  2.798s --> 1.063x faster
rust-encoding-0.3.0                  3.571s vs  3.537s --> 1.010x faster
syntex-0.42.2                       52.195s vs 49.760s --> 1.049x faster
syntex-0.42.2-incr-clean            52.023s vs 49.806s --> 1.045x faster

stage2 (uses jemalloc)

futures-rs-test-all                  4.283s vs  4.188s --> 1.023x faster
helloworld                           0.222s vs  0.221s --> 1.005x faster
html5ever-2016-08-25                17.508s vs 17.154s --> 1.021x faster
hyper.0.5.0                         17.506s vs 17.164s --> 1.020x faster
inflate-0.1.0                        4.410s vs  4.380s --> 1.007x faster
issue-32062-equality-relations-c...  0.366s vs  0.362s --> 1.011x faster
issue-32278-big-array-of-strings     1.636s vs  1.650s --> 0.992x faster
jld-day15-parser                     4.698s vs  4.646s --> 1.011x faster
piston-image-0.10.3                 23.283s vs 22.819s --> 1.020x faster
regex.0.1.30                         2.527s vs  2.460s --> 1.027x faster
rust-encoding-0.3.0                  3.279s vs  3.315s --> 0.989x faster
syntex-0.42.2                       42.986s vs 42.215s --> 1.018x faster
syntex-0.42.2-incr-clean            43.079s vs 42.134s --> 1.022x faster

With glibc malloc they're mostly in the range 1.03--1.06x faster. With jemalloc
they're mostly in the range 1.01--1.03x faster. The couple that look slower are
just due to measurement noise.

Contributor

nnethercote commented Sep 20, 2016

Now that I have a better idea of how rustc-benchmarks works, here are some
updated numbers. This is with a rustc configured with '--enable-optimize
--enable-debuginfo', producing debug builds.

stage 1 (uses glibc malloc)

futures-rs-test-all                  4.925s vs  4.755s --> 1.036x faster
helloworld                           0.220s vs  0.221s --> 0.995x faster
html5ever-2016-08-25                23.086s vs 22.216s --> 1.039x faster
hyper.0.5.0                         21.441s vs 20.491s --> 1.046x faster
inflate-0.1.0                        5.083s vs  4.860s --> 1.046x faster
issue-32062-equality-relations-c...  0.397s vs  0.396s --> 1.003x faster
issue-32278-big-array-of-strings     1.839s vs  1.837s --> 1.001x faster
jld-day15-parser                     5.805s vs  5.656s --> 1.026x faster
piston-image-0.10.3                 28.530s vs 27.061s --> 1.054x faster
regex.0.1.30                         2.975s vs  2.798s --> 1.063x faster
rust-encoding-0.3.0                  3.571s vs  3.537s --> 1.010x faster
syntex-0.42.2                       52.195s vs 49.760s --> 1.049x faster
syntex-0.42.2-incr-clean            52.023s vs 49.806s --> 1.045x faster

stage2 (uses jemalloc)

futures-rs-test-all                  4.283s vs  4.188s --> 1.023x faster
helloworld                           0.222s vs  0.221s --> 1.005x faster
html5ever-2016-08-25                17.508s vs 17.154s --> 1.021x faster
hyper.0.5.0                         17.506s vs 17.164s --> 1.020x faster
inflate-0.1.0                        4.410s vs  4.380s --> 1.007x faster
issue-32062-equality-relations-c...  0.366s vs  0.362s --> 1.011x faster
issue-32278-big-array-of-strings     1.636s vs  1.650s --> 0.992x faster
jld-day15-parser                     4.698s vs  4.646s --> 1.011x faster
piston-image-0.10.3                 23.283s vs 22.819s --> 1.020x faster
regex.0.1.30                         2.527s vs  2.460s --> 1.027x faster
rust-encoding-0.3.0                  3.279s vs  3.315s --> 0.989x faster
syntex-0.42.2                       42.986s vs 42.215s --> 1.018x faster
syntex-0.42.2-incr-clean            43.079s vs 42.134s --> 1.022x faster

With glibc malloc they're mostly in the range 1.03--1.06x faster. With jemalloc
they're mostly in the range 1.01--1.03x faster. The couple that look slower are
just due to measurement noise.

jonathandturner added a commit to jonathandturner/rust that referenced this pull request Sep 21, 2016

Rollup merge of #36592 - nnethercote:TypedArena, r=bluss
Lazily allocate TypedArena's first chunk

Currently `TypedArena` allocates its first chunk, which is usually 4096
bytes, as soon as it is created. If no allocations are ever made from
the arena then this allocation (and the corresponding deallocation) is
wasted effort.

This commit changes `TypedArena` so it doesn't allocate the first chunk
until the first allocation is made.

This change speeds up rustc by a non-trivial amount because rustc uses
`TypedArena` heavily: compilation speed (producing debug builds) on
several of the rustc-benchmarks increases by 1.02--1.06x. The change
should never cause a slow-down because the hot `alloc` function is
unchanged. It does increase the size of `TypedArena` by one `usize`
field, however.

The commit also fixes some out-of-date comments.

bors added a commit that referenced this pull request Sep 21, 2016

Auto merge of #36627 - jonathandturner:rollup, r=jonathandturner
Rollup of 9 pull requests

- Successful merges: #36330, #36496, #36539, #36578, #36585, #36589, #36592, #36600, #36623
- Failed merges:

jonathandturner added a commit to jonathandturner/rust that referenced this pull request Sep 21, 2016

Rollup merge of #36592 - nnethercote:TypedArena, r=bluss
Lazily allocate TypedArena's first chunk

Currently `TypedArena` allocates its first chunk, which is usually 4096
bytes, as soon as it is created. If no allocations are ever made from
the arena then this allocation (and the corresponding deallocation) is
wasted effort.

This commit changes `TypedArena` so it doesn't allocate the first chunk
until the first allocation is made.

This change speeds up rustc by a non-trivial amount because rustc uses
`TypedArena` heavily: compilation speed (producing debug builds) on
several of the rustc-benchmarks increases by 1.02--1.06x. The change
should never cause a slow-down because the hot `alloc` function is
unchanged. It does increase the size of `TypedArena` by one `usize`
field, however.

The commit also fixes some out-of-date comments.

bors added a commit that referenced this pull request Sep 21, 2016

Auto merge of #36635 - jonathandturner:rollup, r=jonathandturner
Rollup of 9 pull requests

- Successful merges: #36330, #36539, #36571, #36578, #36585, #36589, #36592, #36600, #36631
- Failed merges:
@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Sep 21, 2016

Member

@nnethercote FWIW I've been meaning to eventually move to a single common drop-less arena (instead of a dozen typed ones), but there were things to rework first to make that even possible - we're almost there, in fact Ty only has TraitObject left that's not POD (I think I want that to be a slice of existential predicates), and everything else has at most a Vec somewhere (which can become an arena slice).

Member

eddyb commented Sep 21, 2016

@nnethercote FWIW I've been meaning to eventually move to a single common drop-less arena (instead of a dozen typed ones), but there were things to rework first to make that even possible - we're almost there, in fact Ty only has TraitObject left that's not POD (I think I want that to be a slice of existential predicates), and everything else has at most a Vec somewhere (which can become an arena slice).

_own: PhantomData,
}
TypedArena {
first_chunk_capacity: cmp::max(1, capacity),

This comment has been minimized.

@eddyb

eddyb Sep 21, 2016

Member

If with_capacity isn't used, I think it'd be worth just not having first_chunk_capacity around at all.

@eddyb

eddyb Sep 21, 2016

Member

If with_capacity isn't used, I think it'd be worth just not having first_chunk_capacity around at all.

This comment has been minimized.

@nnethercote

nnethercote Sep 21, 2016

Contributor

Good suggestion. I'll file a follow-up PR to remove with_capacity once this one lands.

@nnethercote

nnethercote Sep 21, 2016

Contributor

Good suggestion. I'll file a follow-up PR to remove with_capacity once this one lands.

This comment has been minimized.

@eddyb

eddyb Sep 21, 2016

Member

Well, this PR would be simpler if it also did that change, I'm saying. I'd r+ it immediately and this PR will have to wait at least half a day more before getting merged, so you have time now.

@eddyb

eddyb Sep 21, 2016

Member

Well, this PR would be simpler if it also did that change, I'm saying. I'd r+ it immediately and this PR will have to wait at least half a day more before getting merged, so you have time now.

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Sep 22, 2016

Contributor

⌛️ Testing commit 80a4477 with merge b2627b0...

Contributor

bors commented Sep 22, 2016

⌛️ Testing commit 80a4477 with merge b2627b0...

bors added a commit that referenced this pull request Sep 22, 2016

Auto merge of #36592 - nnethercote:TypedArena, r=bluss
Lazily allocate TypedArena's first chunk

Currently `TypedArena` allocates its first chunk, which is usually 4096
bytes, as soon as it is created. If no allocations are ever made from
the arena then this allocation (and the corresponding deallocation) is
wasted effort.

This commit changes `TypedArena` so it doesn't allocate the first chunk
until the first allocation is made.

This change speeds up rustc by a non-trivial amount because rustc uses
`TypedArena` heavily: compilation speed (producing debug builds) on
several of the rustc-benchmarks increases by 1.02--1.06x. The change
should never cause a slow-down because the hot `alloc` function is
unchanged. It does increase the size of `TypedArena` by one `usize`
field, however.

The commit also fixes some out-of-date comments.

@bors bors merged commit 80a4477 into rust-lang:master Sep 22, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

bors added a commit that referenced this pull request Sep 23, 2016

Auto merge of #36657 - nnethercote:rm-TypedArena-with_capacity, r=eddyb
[breaking-change] Remove TypedArena::with_capacity

This is a follow-up to #36592.

The function is unused by rustc. Also, it doesn't really follow the
usual meaning of a `with_capacity` function because the first chunk
allocation is now delayed until the first `alloc` call.

This change reduces the size of `TypedArena` by one `usize`.

@eddyb: we discussed this on IRC. Would you like to review it?

bors added a commit that referenced this pull request Sep 24, 2016

Auto merge of #36657 - nnethercote:rm-TypedArena-with_capacity, r=eddyb
[breaking-change] Remove TypedArena::with_capacity

This is a follow-up to #36592.

The function is unused by rustc. Also, it doesn't really follow the
usual meaning of a `with_capacity` function because the first chunk
allocation is now delayed until the first `alloc` call.

This change reduces the size of `TypedArena` by one `usize`.

@eddyb: we discussed this on IRC. Would you like to review it?

bors added a commit that referenced this pull request Sep 24, 2016

Auto merge of #36657 - nnethercote:rm-TypedArena-with_capacity, r=eddyb
[breaking-change] Remove TypedArena::with_capacity

This is a follow-up to #36592.

The function is unused by rustc. Also, it doesn't really follow the
usual meaning of a `with_capacity` function because the first chunk
allocation is now delayed until the first `alloc` call.

This change reduces the size of `TypedArena` by one `usize`.

@eddyb: we discussed this on IRC. Would you like to review it?

@nnethercote nnethercote deleted the nnethercote:TypedArena branch Oct 7, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment