Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upParallelize rustc via multi-process approach #47518
Comments
This comment has been minimized.
This comment has been minimized.
|
I still think compiling dependent crates in parallel with codegen is better. |
This comment has been minimized.
This comment has been minimized.
|
I'm thinking of cases like the |
This comment has been minimized.
This comment has been minimized.
|
Another benefit of this approach is that we can split a crate into |
This comment has been minimized.
This comment has been minimized.
|
Some crates are rather heavy on parsing and expansion, such as winapi, which spends 17% of its time on just that. Duplicating that work across multiple processes might not be the best idea. |
This comment has been minimized.
This comment has been minimized.
|
I wonder if cargo had a way to recompile the current crate multiple times with different flags and automatically update |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
We could also scale the compiler to multiple machines by distributing codegen units compiled to LLVM bitcode and running optimizations on multiple machines. This would be very effective for release builds, given how LLVM dominates the build time and is already parallel. My plan to parallelize the compiler using Rayon would ensure we could generate and send LLVM bitcode even faster, making this more effective. This has a number of advantages:
The disadvantage is that only LLVM optimization and code generation can be distributed, though that is a large portion of the compile time. This seems like a good idea to me, especially if we could make it easy to setup. Distributing work across multiple machines also seems to be an effective way to speed up bors too. Does @rust-lang/infra have any opinions on this? |
This comment has been minimized.
This comment has been minimized.
|
#44675 (comment) indicated that tweaking codegen-units decreased bootstrap time but increased time taken to run tests. So even if we distributed and sped up compilation of rustc itself, that's only one part of the story for bors times. I can see us trying it if it was available, just wanted to note it may not be an easy win. |
This comment has been minimized.
This comment has been minimized.
|
@aidanhs I expect that ThinLTO will bring performance with multiple codegen units on pair with a single one. We may have to wait a bit for that though. Updating LLVM would be a good start. |
jkordish
added
the
C-enhancement
label
Feb 1, 2018
This comment has been minimized.
This comment has been minimized.
|
@Zoxc's variant would mesh well with MIR-only RLIBs. |
rkruppe
referenced this issue
Oct 7, 2018
Closed
Parallelize some phases of compilation between a crate and its dependency #610
This comment has been minimized.
This comment has been minimized.
|
Another way to distribute work across multiple machines by sending the whole crate as source code to all the machines. Each machine could then run a single This scheme isn't as efficient as the one I proposed above, since parsing and other things would be done per machine, but other things like type checking could scale better. |
This comment has been minimized.
This comment has been minimized.
|
That's roughly what I proposed here originally (+work stealing, maybe?). |
This comment has been minimized.
This comment has been minimized.
|
@michaelwoerister And it would use a single parallel rustc instance per machine, instead of multiple rustc instances per machine, like you proposed. |
michaelwoerister commentedJan 17, 2018
•
edited
For big crates, the Rust compiler can be stuck in single-threaded execution for quite some time because only the last phase of compilation is properly parallelized. This issue describes one particular approach for making most of compilation parallel.
Basic Concept: Spawn multiple
rustcprocesses that compile "vertical slices" of a crateThe compiler's internal architecture has become rather flexible and demand-driven over the last couple of years and one could imagine implementing an option for the compiler that allows it to just compile part of a crate. Given a deterministic partitioning for a crate, one could then run multiple compilation processes for compiling disjunct parts of a crate in parallel and then stitch those parts together in a final step. This is very similar to a traditional compiler & linker setup.
Advantages
Disadvantages
Conclusion
I am not particularly advocating for following this approach. This issue is meant to provide input for a wider discussion on how to bring more parallelism to the compilation process. This approach is kind of brute-force. However, I have to say, after thinking about it a little I am surprised to actually find it viable
:)cc @rust-lang/compiler