Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upBinary size blowup presumably because of allocator integration #42808
Comments
This comment has been minimized.
This comment has been minimized.
leonardo-m
commented
Jun 21, 2017
|
The blowup of binary size is the minor difference. The main difference is in compilation times and run-times of that resulting binary, that are quite worse. |
This comment has been minimized.
This comment has been minimized.
|
the increased binary size is as important as the others (at least to me), and it's the most important factor in the work im doing right now. |
Mark-Simulacrum
added
P-high
T-compiler
T-libs
labels
Jun 22, 2017
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@alexbool can you enable us to reproduce this? I can investigate in parallel but bug reports tend to be much more readily fixable if there's a way to reproduce what caused the bug report! |
This comment has been minimized.
This comment has been minimized.
|
Looks like this isn't a regression for "hello world", or likely just "in the noise"
|
This comment has been minimized.
This comment has been minimized.
|
I'll try to make a minimal reproducible example today or later this week |
brson
added
the
regression-from-stable-to-nightly
label
Jun 22, 2017
This comment has been minimized.
This comment has been minimized.
|
@alexbool any updates on this? @leonardo-m or @hinaria do y'all have examples of this causing a size increase? |
This comment has been minimized.
This comment has been minimized.
|
Unfortunately didn't have much time to dig into this. Will try as soon as I have possibility. |
This comment has been minimized.
This comment has been minimized.
|
Here is a test case that may serve for exhibiting the space increase that @alexbool observed around this time. (Or at least, on my Mac, building without optimizations, I am observing a size increase of about 1.08x on the stripped binaries. With optimizations, the size increase is about 1.05x) // vecel.rs
use std::env;
fn main() {
let mut args = env::args();
args.next();
let arg1 = args.next().unwrap();
let num: usize = arg1.parse().unwrap();
let mut v = Vec::new();
for i in 0..num {
v.push(i);
}
assert_eq!(v.len(), num);
}Build script:
Results:
|
This comment has been minimized.
This comment has been minimized.
|
My previous comment notwithstanding, it would be really great if @alexbool could provide us either with their original test case or a somewhat reduced version of it, since it is entirely possible that my (strawman) object size microbenchmark does not actually reflect the real issues that are underpinning the size increase that @alexbool is observing. |
This comment has been minimized.
This comment has been minimized.
leonardo-m
commented
Jun 27, 2017
|
@alexcrichton, a little increase in binary size is not important to me. But I am seeing significant increases in the run-time of the binary. I've filed a different issue but the cause could be the same: |
pnkfelix
self-assigned this
Jun 27, 2017
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
This pathological case is producing ~8.5% increase in release mode between 2017-06-19 and 2017-06-23: #[macro_use]
extern crate serde;
extern crate serde_yaml;
use std::fs::OpenOptions;
#[derive(Debug, Deserialize)]
pub struct Something {
field1: Vec<S1>,
field2: Vec<S2>,
field3: Vec<S3>,
field4: Vec<S4>,
field5: Vec<S5>,
field6: Vec<S6>,
field7: Vec<S7>,
field8: Vec<S8>,
field9: Vec<S9>,
}
#[derive(Debug, Deserialize)]
pub struct S1(String);
#[derive(Debug, Deserialize)]
pub struct S2(String);
#[derive(Debug, Deserialize)]
pub struct S3(String);
#[derive(Debug, Deserialize)]
pub struct S4(String);
#[derive(Debug, Deserialize)]
pub struct S5(String);
#[derive(Debug, Deserialize)]
pub struct S6(String);
#[derive(Debug, Deserialize)]
pub struct S7(String);
#[derive(Debug, Deserialize)]
pub struct S8(String);
#[derive(Debug, Deserialize)]
pub struct S9(String);
fn main() {
println!(
"{:?}",
serde_yaml::from_reader::<_, Something>(OpenOptions::new().open("whatever").unwrap())
);
}
[package]
name = "issue-42808"
version = "0.1.0"
authors = ["Alexander Bulaev <alexbool@yandex-team.ru>"]
[dependencies]
serde = { version = "=1.0.8", features = ["derive"] }
serde_yaml = "=0.7.1" |
This comment has been minimized.
This comment has been minimized.
|
I have a datapoint to provide. For ease of experimentation, I made a small variation on alexbool's benchmark that avoids pulling in serde. (Basically, my goal was to have a single .rs file that could be independently compiled without any crate dependencies.) I think my benchmark is likely to get at the heart of what is problematic here (namely that the code-duplication from monomorphization of generics can amplify the effect of otherwise minor code size regressions). Manually inspecting the output assembly and comparing what we generated before and after the Allocator API landed, I saw cases where we could/should be inlining more. So I tried comparing against the branch that is basis for PR #42727, which adds some Another thing I noticed was that we seem to be spending a number of instructions just shuffling words around. It wasn't immediately obvious what all the causes were, but one clear candidate was the
So this shows that some of the code size regression can perhaps be blamed on the change to alloc's return type; it used to be Update: I attempted to recreate my above results and discovered that I posted my original comment without finishing a crucial sentence: the 311,296 figure was gathered from a build that both made |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the investigation and minimization @pnkfelix! I'm still pretty surprised about the increase in code there! I think it may be worth digging in a bit more to see where this "code bloat" is coming from. I would suspect that these are legitimately new locations for more |
This comment has been minimized.
This comment has been minimized.
|
One thing that I keep seeing in the diffs regardless of how I attempt to "correct" for whatever changes that Building my benchmark just before the |
This comment has been minimized.
This comment has been minimized.
|
Hm ok so now I'm getting interesting results again: Using
However removing
Can this still be reproduced on the latest nightly? |
This comment has been minimized.
This comment has been minimized.
|
Just a quick note for anyone trying to figure out how the numbers have been jumping up and down (at least, I was initially flummoxed when trying to compare @alexcrichton 's results with the numbers I provided 12 days ago) The original bug report was comparing
In case its hard to see, here's the important sequence of numbers:
So, there was an observable regression (and it still seems to be present), but it is potentially masked by other improvements to code size that happened between May 31st and June 18th. |
This comment has been minimized.
This comment has been minimized.
|
Oh I was actually just picking random dates, looks like I "got lucky"! |
This comment has been minimized.
This comment has been minimized.
|
Also, another factor that is easy to overlook is that in the earlier comments from both @alexcrichton and myself: our first invocations of |
This comment has been minimized.
This comment has been minimized.
|
In an effort to try to understand the size trajectories over time, I generalized the benchmark so that one can choose whether one wants to compile it with 1, 2, 3, 11, or 21 distinct variations on the My intention was to differentiate between a couple different things:
Here is a gist with the program, a shell script to drive it, and two sets of output transcripts (one that uses https://gist.github.com/pnkfelix/fe632a4bad0759c52e0001efde83e038 Here are some observations based on looking both the development of the numbers over time (i.e. scanning down the columns)
|
This comment has been minimized.
This comment has been minimized.
|
Thanks for the scripts and investigation @pnkfelix! Lots of good data.
This range of commits notably includes #42566 which probably just means that jemalloc 4.0.0 is a smaller binary than jemalloc 4.5.0 I tweaked the script/source file to account for this and always link to the system allocator and got these results which notably shows the same point of a regression you saw, and that the addition of |
alexcrichton
added
regression-from-stable-to-beta
and removed
regression-from-stable-to-nightly
labels
Jul 23, 2017
This comment has been minimized.
This comment has been minimized.
|
I'm having some problems reproducing the exact sizes locally - I'll try more precisely tomorrow - but marking |
This comment has been minimized.
This comment has been minimized.
|
I think I have a summary of the situation:
The "jemalloc-related bloat" is an increase in the size of I'm not entirely sure what the unexplained bloat is or where it hides, but it's only 8k so I think I'll let it slide. |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the analysis @arielb1! It sounds to me though like there's not necessarily a huge amount to do here. The jemalloc bloat isn't something that should be a problem (small projects turn it off anyway) and the dealloc-related "unwind bloat" is more of a correctness fix right now than anything else. Also if projects care about binary size they're probably compiling with That just leaves 8k of unexplained size along with 3k of OOM-related stuff, both of which seem quite reasonable and ripe for future optimization. In that sense with a lack of clear actionable bug, is it time to close this? |
This comment has been minimized.
This comment has been minimized.
|
In which cases can |
This comment has been minimized.
This comment has been minimized.
arielb1
removed
the
T-compiler
label
Aug 22, 2017
alexcrichton
added
the
I-nominated
label
Aug 22, 2017
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Aug 22, 2017
alexcrichton
referenced this issue
Aug 22, 2017
Merged
std: Mark allocation functions as nounwind #44049
This comment has been minimized.
This comment has been minimized.
|
Seeing how no global allocators right now can unwind (system/jemalloc) I've proposed #44049 to fix the regression. I'll open a further issue for more discussion here related to unwinding allocators. |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Aug 22, 2017
alexcrichton
modified the milestone:
1.20
Aug 23, 2017
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Aug 23, 2017
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Aug 24, 2017
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Aug 24, 2017
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Aug 24, 2017
This comment has been minimized.
This comment has been minimized.
|
This is fixed on beta, removing from milestone. |
alexbool commentedJun 21, 2017
•
edited by arielb1
Between
rustc 1.19.0-nightly (10d7cb44c 2017-06-18)and1.20.0-nightly (445077963 2017-06-20)I experienced one of my binaries to grow up from 3.4M to 3.7M, which is approx 8.8%. At first I was scared that this was a fallout of my own PR (#42716), but that particular binary didn't use C strings almost at all.I examined a diff in symbols made with
nmand saw that the binary compiled with1.20.0-nightly (445077963 2017-06-20)has a lot more symbols like that:_<alloc::raw_vec::RawVec<T, A>>::dealloc_buffer(14 occurences)_alloc::allocator::Alloc::alloc_array(57 occurences)_alloc::allocator::Alloc::realloc_array(28 occurences)_core::ptr::drop_in_place(mighty 138 new occurences)This leads us to the possibility that #42313 is the culprit.
Current Situation
There are 2 reproducers, and I need to make some measurements: