New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change HirVec<P<T>> to HirVec<T> in hir:: Expr. #37642

Merged
merged 1 commit into from Nov 22, 2016

Conversation

Projects
None yet
9 participants
@nnethercote
Contributor

nnethercote commented Nov 7, 2016

This PR changes data structures like this:

[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]

to this:

[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]

I thought this would be a win for #36799, and on a cut-down version of that workload this reduces the peak heap size (as measured by Massif) from 885 MiB to 875 MiB. However, the peak RSS as measured by -Ztime-passes and by /usr/bin/time increases by about 30 MiB.

I'm not sure why. Just look at the picture above -- the second data structure clearly takes up less space than the first. My best idea relates to unused elements in the slices. HirVec<Expr> is a typedef for P<[Expr]>. If there were any unused excess elements then I could see that memory usage would increase, because those excess elements are larger in HirVec<Expr> than in HirVec<P<Expr>>. But AIUI there are no such excess elements, and Massif's measurements corroborate that.

However, the two main creation points for these data structures are these lines from lower_expr:

        ExprKind::Vec(ref exprs) => {
            hir::ExprArray(exprs.iter().map(|x| self.lower_expr(x)).collect())
        }
        ExprKind::Tup(ref elts) => {
            hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
        }

I suspect what is happening is that temporary excess elements are created within the collect calls. The workload from #36799 has many 2-tuples and 3-tuples and when Vec gets doubled it goes from a capacity of 1 to 4, which would lead to excess elements. Though, having said that, Vec::with_capacity doesn't create excess AFAICT. So I'm not sure. What goes on inside collect is complex.

Anyway, in its current form this PR is a memory consumption regression and so not worth landing but I figured I'd post it in case anyone has additional insight.

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Nov 7, 2016

Collaborator

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Nov 7, 2016

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

@michaelwoerister

This comment has been minimized.

Show comment
Hide comment
@michaelwoerister

michaelwoerister Nov 8, 2016

Contributor

Interesting. Thanks for posting, @nnethercote.

Contributor

michaelwoerister commented Nov 8, 2016

Interesting. Thanks for posting, @nnethercote.

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Nov 8, 2016

Contributor

Vec's FromIterator

  • Uses only a single allocation up front, if the iterator reports a lower bound in Iterator::size_hint that fits at least all the elements in the iterator
  • Uses a more efficient loop with no capacity check in the body, if the iterator is a TrustedLen iterator.

Since both of those lines use the slice iterator and then just .map() they should already be "perfect" (which never should be taken for granted, I guess).

Contributor

bluss commented Nov 8, 2016

Vec's FromIterator

  • Uses only a single allocation up front, if the iterator reports a lower bound in Iterator::size_hint that fits at least all the elements in the iterator
  • Uses a more efficient loop with no capacity check in the body, if the iterator is a TrustedLen iterator.

Since both of those lines use the slice iterator and then just .map() they should already be "perfect" (which never should be taken for granted, I guess).

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Nov 10, 2016

Contributor

☔️ The latest upstream changes (presumably #37678) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

bors commented Nov 10, 2016

☔️ The latest upstream changes (presumably #37678) made this pull request unmergeable. Please resolve the merge conflicts.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Nov 10, 2016

Member

IMO there isn't a lot we can do that's not a compromise (due to heap pressure dynamics) without moving to arena/SoA models for storing the HIR in the first place (replacing P, at least at that level).

Member

eddyb commented Nov 10, 2016

IMO there isn't a lot we can do that's not a compromise (due to heap pressure dynamics) without moving to arena/SoA models for storing the HIR in the first place (replacing P, at least at that level).

@michaelwoerister

This comment has been minimized.

Show comment
Hide comment
@michaelwoerister

michaelwoerister Nov 10, 2016

Contributor

What's "SoA"?

Contributor

michaelwoerister commented Nov 10, 2016

What's "SoA"?

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Nov 10, 2016

Member

"Struct of Arrays", where each type has its own vector for all instances of that type, in a central struct, although it's usually used for fields, not nodes.
We do something similar with our side-tables, this would just be flatter. I'm still a bit torn between this and huge arenas, but the latter are somewhat more ergonomic.

Member

eddyb commented Nov 10, 2016

"Struct of Arrays", where each type has its own vector for all instances of that type, in a central struct, although it's usually used for fields, not nodes.
We do something similar with our side-tables, this would just be flatter. I'm still a bit torn between this and huge arenas, but the latter are somewhat more ergonomic.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Nov 18, 2016

Contributor

I just rebased and remeasured and this change is now a small win for a cut-down version of the workload from #36799, reducing peak RSS from 866 MiB to 848 MiB. I'm not sure what changed, other than #37764 landing, though there shouldn't really be any interaction between that PR and this one.

r? @michaelwoerister

Contributor

nnethercote commented Nov 18, 2016

I just rebased and remeasured and this change is now a small win for a cut-down version of the workload from #36799, reducing peak RSS from 866 MiB to 848 MiB. I'm not sure what changed, other than #37764 landing, though there shouldn't really be any interaction between that PR and this one.

r? @michaelwoerister

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Nov 18, 2016

Contributor

Some heap allocation numbers from DHAT.

old

tot_alloc:    5,980,761,512 in 27,087,211 blocks
max_live:     838,850,202 in 3,978,848 blocks

new

tot_alloc:    5,971,161,300 in 25,887,185 blocks
max_live:     829,249,992 in 2,778,822 blocks

This PR also speeds up compilation of that workload by about 4%. It also speeds up a couple of the rustc-benchmarks by 1--2%.

Contributor

nnethercote commented Nov 18, 2016

Some heap allocation numbers from DHAT.

old

tot_alloc:    5,980,761,512 in 27,087,211 blocks
max_live:     838,850,202 in 3,978,848 blocks

new

tot_alloc:    5,971,161,300 in 25,887,185 blocks
max_live:     829,249,992 in 2,778,822 blocks

This PR also speeds up compilation of that workload by about 4%. It also speeds up a couple of the rustc-benchmarks by 1--2%.

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Nov 21, 2016

Contributor

☔️ The latest upstream changes (presumably #37824) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

bors commented Nov 21, 2016

☔️ The latest upstream changes (presumably #37824) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister

This comment has been minimized.

Show comment
Hide comment
@michaelwoerister

michaelwoerister Nov 21, 2016

Contributor

Looks good to me. Happy to approve it after a rebase.

Contributor

michaelwoerister commented Nov 21, 2016

Looks good to me. Happy to approve it after a rebase.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Nov 21, 2016

Member

If it's measurably better I have no real concerns - I imagine it'd eventually be &[&Expr] vs &[Expr], unless we end up with some sort of ID linked list to help with incremental recompilation or something.

Member

eddyb commented Nov 21, 2016

If it's measurably better I have no real concerns - I imagine it'd eventually be &[&Expr] vs &[Expr], unless we end up with some sort of ID linked list to help with incremental recompilation or something.

Change HirVec<P<T>> to HirVec<T> in Expr.
This changes structures like this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]
```
to this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]
```
@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Nov 21, 2016

Contributor

I rebased. r? @michaelwoerister

Contributor

nnethercote commented Nov 21, 2016

I rebased. r? @michaelwoerister

@michaelwoerister

This comment has been minimized.

Show comment
Hide comment
Contributor

michaelwoerister commented Nov 21, 2016

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Nov 21, 2016

Contributor

📌 Commit 382d3b0 has been approved by michaelwoerister

Contributor

bors commented Nov 21, 2016

📌 Commit 382d3b0 has been approved by michaelwoerister

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Nov 21, 2016

Contributor

⌛️ Testing commit 382d3b0 with merge 82d8833...

Contributor

bors commented Nov 21, 2016

⌛️ Testing commit 382d3b0 with merge 82d8833...

bors added a commit that referenced this pull request Nov 21, 2016

Auto merge of #37642 - nnethercote:no-HirVec-of-P, r=michaelwoerister
Change HirVec<P<T>> to HirVec<T> in hir:: Expr.

This PR changes data structures like this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]
```
to this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]
```
I thought this would be a win for #36799, and on a cut-down version of that workload this reduces the peak heap size (as measured by Massif) from 885 MiB to 875 MiB. However, the peak RSS as measured by `-Ztime-passes` and by `/usr/bin/time` increases by about 30 MiB.

I'm not sure why. Just look at the picture above -- the second data structure clearly takes up less space than the first. My best idea relates to unused elements in the slices. `HirVec<Expr>` is a typedef for `P<[Expr]>`. If there were any unused excess elements then I could see that memory usage would increase, because those excess elements are larger in `HirVec<Expr>` than in `HirVec<P<Expr>>`. But AIUI there are no such excess elements, and Massif's measurements corroborate that.

However, the two main creation points for these data structures are these lines from `lower_expr`:
```rust
        ExprKind::Vec(ref exprs) => {
            hir::ExprArray(exprs.iter().map(|x| self.lower_expr(x)).collect())
        }
        ExprKind::Tup(ref elts) => {
            hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
        }
```
I suspect what is happening is that temporary excess elements are created within the `collect` calls. The workload from #36799 has many 2-tuples and 3-tuples and when `Vec` gets doubled it goes from a capacity of 1 to 4, which would lead to excess elements. Though, having said that, `Vec::with_capacity` doesn't create excess AFAICT. So I'm not sure. What goes on inside `collect` is complex.

Anyway, in its current form this PR is a memory consumption regression and so not worth landing but I figured I'd post it in case anyone has additional insight.

@bors bors merged commit 382d3b0 into rust-lang:master Nov 22, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@nnethercote nnethercote deleted the nnethercote:no-HirVec-of-P branch Nov 22, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment