Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
High memory usage compiling `keccak` benchmark #54208
Comments
|
Now for NLL. According to perf.rust-lang.org, an "Nll" build of The three allocation sites are here: rust/src/librustc_mir/borrow_check/mod.rs Lines 171 to 197 in 28bcffe Each rust/src/librustc_mir/dataflow/mod.rs Line 710 in 28bcffe In each case num_blocks is 25,994, and bits_per_block is 94,972 in the first two and 83,308 in the third.
I tried changing One trivial idea: it looks like @nikomatsakis: any other thoughts here from the algorithmic side? |
I have implemented this in #54211. |
I have implemented this in #54213. |
|
#54420 improves the non-NLL case some more. |
Because of this, the NLL:non-NLL ratio for |
|
@nnethercote two questions:
|
|
I guess this answers my question:
|
|
@nikomatsakis: I have run out of ideas on this one. If it helps, here is what the
In other words, it is 25994 x 94976 bits (308.6MB), and the rows start off almost entirely set, and by the end drop down to about half set. About 75% of the bits are set. And here's what
It is 25994 x 83328 bits (270.8MB). Apart from the second row, the rows start of almost empty and get fuller until they are 77% full by the end. About 38% of the bits are set. I didn't look at I can't see how to represent this data more compactly, and I don't understand the algorithm in enough detail to know if less data could be stored. I also looked into separating the lifetimes of the two structures but they are used in tandem, as far as I can tell. |
|
Discussed with @nikomatsakis during triage of NLL issues. We decided that the memory usage on this case should not block NLL's inclusion in RC2. In terms of whether to put this on the Release milestone or not, we decided that it would be a better idea, at least in the short-to-middle term, to focus effort more on Polonius, since that component might end up replacing the dataflow entirely, and thus the pay-off from optimizing So, tagging as NLL-deferred, with the intention of revisiting after we've learned more about what we plan to do with Polonius, if anything. |
|
NLL triage. P-medium. WG-compiler-performance. |

According to perf.rust-lang.org, a "Clean" build of

keccak-checkhas amax-rssof 637 MB. Here's a Massif profile of the heap memory usage.The spike is due to a single allocation of 500,363,244 bytes here:
rust/src/librustc/middle/liveness.rs
Line 601 in 28bcffe
Each vector element is a
Users, which is a three field struct taking up 12 bytes.num_live_nodesis 16,371, andnum_varsis 2,547, and 12 * 16,371 * 2,547 = 500,363,244.I have one idea to improve this:
Usersis a triple contains twou32s and abool, which means that it is 96 bytes even though it only contains 65 bytes of data. If we split it up so we have 3 vectors instead of a vector of triples, we'd end up with 4 * 16,371 * 2,547 + 4 * 16,371 * 2,547 + 1 * 16,371 * 2,547 = 375,272,433, which is a reduction of 125,090,811 bytes. This would getmax-rssdown from 637MB to 512MB, a reduction of 20%.Alternatively, if we packed the
bools into a bitset we could get it down to 338,787,613 bytes, which is a reduction of 161,575,631 bytes. This would getmax-rssdown from 637MB to 476MB, a reduction of 25%. But it might slow things down... depends if the improved locality is outweighed by the extra instructions needs for bit manipulations.@nikomatsakis: do you have any ideas for improving this on the algorithmic side? Is this dense
num_live_nodes * num_varsrepresentation avoidable?