Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign uprustc: Default 32 codegen units at O0 #44853
Conversation
rust-highfive
assigned
nikomatsakis
Sep 25, 2017
This comment has been minimized.
This comment has been minimized.
rust-highfive
assigned
michaelwoerister
and unassigned
nikomatsakis
Sep 25, 2017
This comment has been minimized.
This comment has been minimized.
|
(rust_highfive has picked a reviewer for you, use r? to override) |
alexcrichton
force-pushed the
alexcrichton:debug-codegen-units
branch
3 times, most recently
from
a6c4f73
to
8e09bfb
Sep 25, 2017
This comment has been minimized.
This comment has been minimized.
|
I feel this is an Cargo thing. |
This comment has been minimized.
This comment has been minimized.
|
@ishitatsuyuki Cargo can definitely control this - but we want a sane default when running raw This will make it so that if no other configuration happens - either via rustc parameters, or cargo configuration, rustc defaults to a very parallel build of a single crate. Beginner projects which don't use cargo shouldn't have to have a slow build just because they aren't using cargo. This makes a sane default for all uses of |
This comment has been minimized.
This comment has been minimized.
|
This is the jobserver (for cpu resource management) and async-llvm (for peak memory consumption) really pay off r=me with the tests fixed. |
This comment has been minimized.
This comment has been minimized.
|
On a separate note: I don't like how we often duplicate things between |
This comment has been minimized.
This comment has been minimized.
|
What's the situation with perf.rlo? Are we still limiting benchmarking to a single core there? |
This comment has been minimized.
This comment has been minimized.
|
To my knowledge, all benchmarks on perf.rlo currently are using 8 threads of parallelism. @alexcrichton may be able to correct me if I recall incorrectly. |
alexcrichton
force-pushed the
alexcrichton:debug-codegen-units
branch
from
8e09bfb
to
9e35b79
Sep 26, 2017
This comment has been minimized.
This comment has been minimized.
|
@bors: r=michaelwoerister |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
agreed that the duplication is unfortunate! I'd hope that one day we could just use functions to access these rather than accessing fields, but agreed that this is probably best left for a future PR |
This comment has been minimized.
This comment has been minimized.
|
Forwarding here a comment that I accidentally left in #44841 I've just ran some build time benchmarks on my project that uses a lot of popular rust libraries and codegen (diesel, hyper, serde, tokio, futures, reqwest) on my Intel Core i5 laptop (skylake 2c/4t) and got these results:
Cargo profile:
rustc 1.22.0-nightly (17f56c5 2017-09-21) As expected, the best results is for codegen units of number of cpus and 32 is way to much for an average machine. Did you concider an option to select number of codegen units depending on number of cpus, with Thank you for working on compile times! |
This comment has been minimized.
This comment has been minimized.
|
@mersinvald fascinating! First up though, can you clarify what you were measuring? Was it a Locally I ran
oddly though sometimes it was very variable what the build times were...
I've got an 8 core machine locally, but the number of cores vs number of codegen units should have little effect on compile time (in theory). The codegen units are chosen to be explicitly high here to hopefully make sure that no codegen unit takes too long in the optimizer, allowing ideally for optimal use of all available cores throughout compilation. Additionally, more cgus should mean a lower peak memory of rustc itself due to async translation/codegen. Are you sure you didn't have anythign else running in the background when you were collecting that timing? And were the timings you got reproducible? |
arielb1
added
the
S-waiting-on-bors
label
Sep 26, 2017
This comment has been minimized.
This comment has been minimized.
I'm building with Good point about background tasks, though. I disabled everything that can eat up CPU to get more steady results and re-ran each build three times:
Results seem to be quiet steady and reproducible 2-4 units are optimal for my setup. Btw, I've updated rustc to the latest nightly version before running new tests, so now it is: |
This comment has been minimized.
This comment has been minimized.
|
@mersinvald hm so if you're on Linux, mind poking around with Otherwise though this is indeed curious! It may be worth drilling into specific crates as well, maybe going one at a time running rustc by hand. If one crate takes way longer in 32 codegen units than in 2 then that's something to investigate. Overall builds tend to be hard to drill into :( |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton ok, I'll do |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton i've collected statistics for clean https://drive.google.com/drive/folders/0B28cL71oGfpOTVRKMUZjTlFQU2M?usp=sharing
Hope it will help. I don't think I can interpret this data myself, but it you'll need me to run perf on some specific crates, feel free to ask, I'm happy to help. |
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Sep 29, 2017
This comment has been minimized.
This comment has been minimized.
|
|
alexcrichton commentedSep 25, 2017
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
more quickly.
incremental compilation.
jobservercrate to avoid overloading thesystem.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most
cargo buildcompiles that arebottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to always enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!