Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation of a crate using a large static map fails on latest i686-pc-windows-gnu Beta #36799

Open
urschrei opened this Issue Sep 28, 2016 · 99 comments

Comments

Projects
None yet
@urschrei
Copy link
Contributor

urschrei commented Sep 28, 2016

I'm trying to build a cdylib (https://github.com/urschrei/lonlat_bng) which requires the https://github.com/urschrei/ostn15_phf crate.

building on AppVeyor, on i686-pc-windows-gnu, using the latest beta, is failing with an OOM error:

Details

The ostn15_phf crate is essentially just a very big static map, built using PHF (the generated, uncompiled map is around 42.9mb)

The build passed when running cargo test, using rustc 1.12.0-beta.3 (341bfe43c 2016-09-16):
https://ci.appveyor.com/project/urschrei/lonlat-bng/build/105/job/3y1llt6luqs3phs3

It's now failing when running cargo test, using rustc 1.12.0-beta.6 (d3eb33ef8 2016-09-23):
https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/27pgrkx2cnn2gw50

The failure occurs when compiling ostn15_phf with fatal runtime error: out of memory

@urschrei urschrei changed the title Compilation of a crate using a large static map fails on latest i686 Beta Compilation of a crate using a large static map fails on latest i686-pc-windows-gnu Beta Sep 28, 2016

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Sep 28, 2016

cc @brson just want to make sure you're aware of this.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 4, 2016

Related to #36926 perhaps? ("1.12.0: High memory usage when linking in release mode with debug info")

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 6, 2016

@urschrei have you tried this on platforms other than windows, out of curiosity?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 6, 2016

I see. It seems to work on x86_64-pc-windows-gnu, but not i686.

@urschrei

This comment has been minimized.

Copy link
Contributor Author

urschrei commented Oct 6, 2016

I haven't tried to build i686 on Linux or OSX, but I easily could…

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 6, 2016

Well, I just did a run on my linux box. The memory usage is certainly through the roof: https://gist.github.com/nikomatsakis/ea771dd69f12ebc5d3d5848fa59fb43a

This is using nightly (rustc 1.13.0-nightly (a059cb2f3 2016-09-27)). The peak is around 4GB.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 6, 2016

So @alexcrichton has executed a massif run: https://gist.github.com/alexcrichton/d20d685dd7475b1801a2ccac6ba15b08

The peak result Measurement 58 shows something like this:

Percentage Area
36% HIR
9% MIR
14% type/region arenas
5% region maps
2% constants

The peak result (measurement 48) looks pretty similar. More memory used by MIR:

Percentage Area
26% HIR
14% MIR
5% type/region arenas
4% region maps
3% constants

These numbers are just based on a kind of quick scan of the gist.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 6, 2016

It seems clear we need to revisit our handling of statics and constants in a big way. But then this has been clear for a long time. =) I'm wondering if we can find some kind of "quick fix" here.

I also haven't compared with historical records -- but most of that memory we see above, we would have been allocating before too, so I'm guessing this is a case of being pushed just over the threshold on i686, versus a sudden spike.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 7, 2016

Sizes of some types from MIR (gathered from play):

statement:      192
statement-kind: 176
lvalue:         16
rvalue:         152
local-decl:     48
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 12, 2016

I am feeling torn here. It seems the best we can do short-term is to make some small changes to MIR/HIR and try to bring the peak down below 4GB. The long term fix would be to revisit the overall representation particularly around constants and see what we can do to bring the size down. One thing I was wondering about (which is probably answerable from @alexcrichton's measurements) is what percentage of memory is being used just in the side-tables vs the HIR itself.

In any case, it seems like 152 bytes for a mir rvalue is really quite big.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 12, 2016

Looking again at the massif results, it looks like MIR is taking more memory than I initially thought. One place that seems to use quite a bit is the list of scopes.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 12, 2016

Better numbers:

percent pass
35.09% HIR
34.96% MIR
11.77% mk_region, node_type_insert
6.24% region_maps
2.42% consts
1.39% spans
@brson

This comment has been minimized.

Copy link
Contributor

brson commented Oct 20, 2016

Just pinging @nikomatsakis to keep this P-high bug on his radar.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 20, 2016

I've made basically no progress here, I'm afraid. I think the most likely path forward in short term is to try and reduce memory usage in various data structures. Not very satisfying though. I'm not sure if we can make up the gap that way.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 20, 2016

Discussed in compiler meeting. Conclusion: miri would be the proper fix, but maybe we can shrink MIR a bit in the short term, perhaps enough to push us over the edge. I personally probably don't have time for this just now (have some other regr to examine). Hence re-assigning to @pnkfelix -- @arielb1 maybe also interested in doing something?

@pnkfelix

This comment has been minimized.

Copy link
Member

pnkfelix commented Oct 27, 2016

cc @nnethercote in case he might have input on ways to attack this

@nnethercote

This comment has been minimized.

Copy link
Contributor

nnethercote commented Oct 28, 2016

Massif is slow and the output isn't the easy to read, but it's the ideal tool for tackling this. The peak snapshot in @alexcrichton's data is number 48, and I've made a nicer, cut-down version of it here: https://gist.github.com/nnethercote/935db34ff2da854df8a69fa28c978497

You can see that ~30% of the memory usage comes from lower_expr. I did a DHAT run (more on that in a moment) and this is the main culprit:

    ExprKind::Tup(ref elts) => {
        hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
    } 

push_scope/pop_scope account for another ~15%.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Oct 28, 2016

@nnethercote

The push_scope/pop_scope is MIR things. I think interning lvalues (they take 16 bytes, but occur multiple times per statement/terminator - they should take 4) and boxing rvalues (many statements are storage statements that don't have an rvalue) should claw back some performance.

Also, enum Operand is very common, in most contexts just an lvalue, and takes 72 bytes. Something should be done about this, even just boxing constants (which would reduce it to 24 bytes, or 16 bytes with interning lvalues, or 8 bytes with interning lvalues and bitpacking).

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Oct 28, 2016

Having a 32-bit MIR "shared object representation" MirValue with 3/4 tag bits and the rest an index to an interner would basically apply all of these suggestions (where MirValue can represent all of Constant, Lvalue, Rvalue, Operand, and we keep the structs as strongly-typed newtypes).

We can then intern MirValues and/or occasionally GC-intern them, for further memory gain.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Sep 24, 2017

actually, all memory usage is accounted for, so I'm sure it's not LLVM.

arielb1 added a commit to arielb1/rust that referenced this issue Sep 24, 2017

encode region::Scope using fewer bytes
Now that region::Scope is no longer interned, its size is more
important. This PR encodes region::Scope in 8 bytes instead of 12, which
should speed up region inference somewhat (perf testing needed) and
should improve the margins on rust-lang#36799 by 64MB (that's not a lot, I did
this PR mostly to speed up region inference).

bors added a commit that referenced this issue Sep 24, 2017

Auto merge of #44743 - arielb1:size-rollback, r=eddyb
typeck::check::coercion - roll back failed unsizing type vars

This wraps unsizing coercions within an additional level of
`commit_if_ok`, which rolls back type variables if the unsizing coercion
fails. This prevents a large amount of type-variables from accumulating
while type-checking a large function, e.g. shaving 2GB off one of the
4GB peaks in #36799.

This is a performance-sensitive PR so please don't roll it up.

r? @eddyb
cc @nikomatsakis

bors added a commit that referenced this issue Sep 24, 2017

Auto merge of #44758 - arielb1:a-small-path, r=eddyb
put empty generic lists behind a pointer

This reduces the size of hir::Expr from 128 to 88 bytes (!) and shaves
200MB out of #36799.

This is a performance-sensitive PR so please don't roll it up.

r? @eddyb

bors added a commit that referenced this issue Sep 24, 2017

Auto merge of #44809 - arielb1:small-scope, r=eddyb
encode region::Scope using fewer bytes

Now that region::Scope is no longer interned, its size is more important. This PR encodes region::Scope in 8 bytes instead of 12, which should speed up region inference somewhat (perf testing needed) and should improve the margins on #36799 by 64MB (that's not a lot, I did this PR mostly to speed up region inference).

This is a perf-sensitive PR. Please don't roll me up.

r? @eddyb

This is based on  #44743 so I could get more accurate measurements on #36799.

bors added a commit that referenced this issue Sep 24, 2017

Auto merge of #44758 - arielb1:a-small-path, r=eddyb
put empty generic lists behind a pointer

This reduces the size of hir::Expr from 128 to 88 bytes (!) and shaves
200MB out of #36799.

This is a performance-sensitive PR so please don't roll it up.

r? @eddyb
@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Sep 25, 2017

@arielb1 Oh right, with one CGU memory usage should be the same unless there's some kind of bug.

bors added a commit that referenced this issue Sep 25, 2017

Auto merge of #44809 - arielb1:small-scope, r=eddyb
encode region::Scope using fewer bytes

Now that region::Scope is no longer interned, its size is more important. This PR encodes region::Scope in 8 bytes instead of 12, which should speed up region inference somewhat (perf testing needed) and should improve the margins on #36799 by 64MB (that's not a lot, I did this PR mostly to speed up region inference).

This is a perf-sensitive PR. Please don't roll me up.

r? @eddyb

This is based on  #44743 so I could get more accurate measurements on #36799.
@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Sep 27, 2017

@petrochenkov's span reform PR had lost us 400MB of space here, which means making this compile is that much harder. Maybe we should just use enough bits for the crate when encoding spans?

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Sep 27, 2017

or use 40 bit spans, which should be enough to handle 64MB crates?

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

michaelwoerister commented Sep 28, 2017

@arielb1 I'm not sure I understand: The span PR makes this test case use 400 MB more -- or less?

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Sep 28, 2017

it makes the test case use 400MB more, because the span just overflows the 24-bit length we have.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Sep 28, 2017

The current peak is during LLVM translation where we are using memory from both MIR and LLVM. I think that after miri lands, we could easily store constants as byte arrays, which should bring this well into the green zone.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 9, 2017

triage: P-medium

It seems like we are going to wait until we can fix this properly.

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Apr 17, 2018

miri has landed. I'm not entirely sure how to reproduce this though.

@urschrei

This comment has been minimized.

Copy link
Contributor Author

urschrei commented Apr 17, 2018

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Apr 17, 2018

jup. it's even in beta (although somewhat broken)

@urschrei urschrei referenced this issue Apr 17, 2018

Closed

Constant evaluation regression in beta 1.26 #49930

2 of 2 tasks complete
@urschrei

This comment has been minimized.

Copy link
Contributor Author

urschrei commented Apr 17, 2018

Just ran into #49930, which causes…some failures. Will hold off and try again when the backport lands.

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Apr 19, 2018

we have new nightlies!

@urschrei

This comment has been minimized.

Copy link
Contributor Author

urschrei commented Apr 20, 2018

i686 is failing on ac3c228 (2018-04-18) with exit code: 3221225501: https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/qaipuqj88243xs4c#L115

(Which is a memory exhaustion error I think?)

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Apr 20, 2018

looks like it. Because the x86_64 target host works. You can probably build for the i686 target on the x86_64 host.

@urschrei

This comment has been minimized.

Copy link
Contributor Author

urschrei commented Apr 20, 2018

@oli-obk Oh, I had no idea – can you point me at some details?

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Apr 21, 2018

You need to install the cross toolchain via rustup and invoke cargo with the target flag for that cross target

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.