Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upparallelize LLVM optimization and codegen passes #16367
Conversation
This comment has been minimized.
This comment has been minimized.
|
Have you clocked the resulting code performance difference? This seems like it would be great for servo. |
This comment has been minimized.
This comment has been minimized.
|
Very excited about this. I don't see the words 'codegen-threads' in this patch. Are you sure it exists? What happens when you specify |
This comment has been minimized.
This comment has been minimized.
|
Replace the two with --codegen-tasks ? 2014年8月9日 上午7:45于 "Brian Anderson" notifications@github.com写道:
|
alexcrichton
reviewed
Aug 9, 2014
| // For LTO purposes, the bytecode of this library is also | ||
| // inserted into the archive. We currently do this only when | ||
| // codegen_units == 1, so we don't have to deal with multiple | ||
| // bitcode files per crate. |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 9, 2014
Member
Could each module be linked into one module to be emitted? (is that too timing-intensive?)
Otherwise, could we name the files bytecodeN.deflate and list how many bytecodes are in the metadata?
This comment has been minimized.
This comment has been minimized.
epdtry
Aug 11, 2014
Author
Contributor
Could each module be linked into one module to be emitted? (is that too timing-intensive?)
I haven't tried this yet. I don't think it would take much longer than we already spend linking the object files together. The only problem is, all the LLVM modules are in separate contexts, so we would need to serialize each one and then deserialize into a shared context for linking.
Otherwise, could we name the files bytecodeN.deflate and list how many bytecodes are in the metadata?
Yeah, this is my preferred solution (which I also haven't tried implementing yet).
alexcrichton
reviewed
Aug 9, 2014
| cmd.stdin(::std::io::process::Ignored) | ||
| .stdout(::std::io::process::InheritFd(1)) | ||
| .stderr(::std::io::process::InheritFd(2)); | ||
| cmd.status().unwrap(); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 9, 2014
Member
In the past I have seen ld -r produce some very odd object files that turn out to not work at all. I initially intended for ld -r to be used to link together the rust object with all native static libraries to produce one object to put inside of an rlib (I even went as far as to hope that the object could be the rlib itself). I ended up abandoning that strategy once I saw that ld -r was flaky across platforms.
Sadly I don't quite remember what errors I was seeing, nor why I abandoned the strategy (specific errors encountered). We may want to not perform ld -r in the meantime. On the other hand, have you confirmed this object to be usable? It may be difficult to verify that as you'd probably still need to link to upstream libraries and such, but it may be a good smoke test!
This comment has been minimized.
This comment has been minimized.
epdtry
Aug 11, 2014
Author
Contributor
have you confirmed this object to be usable?
Yes, it works fine on Linux at least. I can bootstrap rustc and get nearly all tests to pass with codegen-units > 1. (The failing tests are either LTO-related or require a single bitcode file per crate.)
I also added a few run-pass tests that use codegen-units > 1. These should fail if ld -r produces bad objects.
This comment has been minimized.
This comment has been minimized.
|
Could you describe some of the difficulties with sharing the |
This comment has been minimized.
This comment has been minimized.
|
This is also some super amazing work, I'm incredibly excited to see where this goes! Major prosp @epdtry! |
alexcrichton
reviewed
Aug 9, 2014
| } | ||
|
|
||
| match config.opt_level { | ||
| Some(opt_level) => { |
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Aug 9, 2014
| Some(sess) => | ||
| if sess.lto() { | ||
| let reachable = cgcx.reachable | ||
| .expect("reachable should be Some if sess is Some"); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Aug 9, 2014
| llvm::LLVMWriteBitcodeToFile(llmod, buf); | ||
| match cgcx.sess { | ||
| Some(sess) => | ||
| if sess.lto() { |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 9, 2014
Member
You could take this indent down one level with Some(sess) if sess.lto() => {
alexcrichton
reviewed
Aug 9, 2014
| }); | ||
| } | ||
|
|
||
| time(config.time_passes, "codegen passes", (), |()| { |
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Aug 9, 2014
| trans: &CrateTranslation, | ||
| output_types: &[OutputType], | ||
| crate_output: &OutputFilenames) { | ||
| let mut cmd = Command::new("ld"); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 9, 2014
Member
Could this be declared closer to where it's being used? I'm a little worried about hardcoding "ld" as well, but I have worries about invoking ld -r anyway down below.
alexcrichton
reviewed
Aug 9, 2014
| }); | ||
| } | ||
|
|
||
| if sess.opts.cg.codegen_threads == 1 { |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 9, 2014
Member
In theory it would be nice to unify these two code paths. Due to how CodegenContext is created though, that may not be possible.
alexcrichton
reviewed
Aug 9, 2014
|
|
||
|
|
||
| // Populate a queue with a list of codegen tasks. | ||
| let mut work_queue = RingBuf::with_capacity(1 + trans.modules.len()); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Aug 9, 2014
| } | ||
| } | ||
|
|
||
| unsafe fn optimize_and_codegen(cgcx: &CodegenContext, |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 9, 2014
Member
This is a pretty big function to have the entire thing be unsafe. Would it be possible to limit the scopes more? If it ends up having an unsafeblock on every other line it's probably not worth it!
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 9, 2014
Member
Hm, looking through, this may not be worth it. Could you add a comment saying why it's unsafe?
alexcrichton
reviewed
Aug 9, 2014
|
|
||
| // Make sure to fail the worker so the main thread can tell | ||
| // that there were errors. | ||
| cgcx.handler.abort_if_errors(); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 9, 2014
Member
Should this happen right after work runs in case it generates errors?
This comment has been minimized.
This comment has been minimized.
epdtry
Aug 11, 2014
Author
Contributor
OK, sure. We might show fewer errors per compiler invocation, but that's probably fine, since most of the errors you can get at this point are things like "ran out of disk space" which will show up during other work operations as well.
alexcrichton
reviewed
Aug 9, 2014
| llvm::LLVMWriteBitcodeToFile(llmod, buf); | ||
| }); | ||
| llvm::LLVMDisposeModule(llmod); | ||
| llvm::LLVMContextDispose(llcx); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
epdtry
Aug 11, 2014
Author
Contributor
Whoops, forgot to remove that when I changed the LTO bitcode handling.
alexcrichton
reviewed
Aug 9, 2014
| output.with_extension("lto.bc").with_c_str(|buf| { | ||
| llvm::LLVMWriteBitcodeToFile(llmod, buf); | ||
| }) | ||
| pub fn run_passes(sess: &Session, |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Aug 9, 2014
Member
This function is becoming quite huge, could it be broken apart? The same kinda applies to the write module at this point. It probably shouldn't be defined in back/link.rs, but rather separate submodules at this point. Feel free to reorganize things!
This comment has been minimized.
This comment has been minimized.
asterite
commented
Aug 9, 2014
|
I don't know if it's applicable, but the way we do it in Crystal is to have one llvm module for each "logical unit". In our cases each logical unit is a class or a module. Maybe in Rust a "logical unit" is a struct, an array, etc., together will all its impls. Then you can also have another logical unit to be the top level functions. Then we fire up N threads and each one takes a task (an llvm module) to compile it. This greatly reduces the compilation time. When you split your whole program in N modules and fire up N threads to compile those (as you are proposing here), if a thread finishes early its left without a job to do, so a thread becomes idle. With M smaller modules and N threads, with N > M, when a thread finishes it can start working on another module, reducing the idle time. Additionally, before compiling each module we write its bitcode to a .bc file in a hidden directory (.crystal, in our case). We then compare that .bc file to the .bc file generated by the previous run. If they turn out to be the same (and this will be true as long as you don't modify any impl of that logical unit), we can safely reuse the .o file of the previous run. This, again, reduces dramatically the times to recompile a project that had minimal changes. Bits of the source code implementing this behaviour are here and here, in case you want to take a look. |
This comment has been minimized.
This comment has been minimized.
|
^THIS^ !!! Please, please implement incremental compilation! Though I wonder if (transitively) unchanged modules could be culled right after resolution pass, based on source file timestamps and inter-module dependency info, which should be available at that point. |
This comment has been minimized.
This comment has been minimized.
|
I would say that the 'logical unit' in Rust is a module, they tend to be fairly small (at least relative to crate size) and are naturally self contained. It is probably worth getting data (at least) for smaller units - thanks for the idea! Incremental compilation is the next part of the project - looking forward to what comes out of that :-) |
This comment has been minimized.
This comment has been minimized.
Compiling
It's a codegen flag, so the flag name
Like most of
One way I tried to address this problem was by adding some basic load balancing:
This is basically my next project. Translation is the #2 time sink in |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton: Regarding |
This comment has been minimized.
This comment has been minimized.
I would definitely expect an It looked like it would make parts of this much nicer to have access to the raw session rather than duplicating some logic here and there, but it may not be too worth it in the end.
I don't think that any of our tests actually use the object file emitted, they just emit it. I also recall that the linker always succeeded in creating an object, but the object itself was just unusable (for one reason or another). Again though, this could all just be misremembering, or some bug which has since been fixed! |
This comment has been minimized.
This comment has been minimized.
The |
This comment has been minimized.
This comment has been minimized.
|
Oh dear, I must be over looking a test! I only see two instances of |
This comment has been minimized.
This comment has been minimized.
|
OK, let me back up. I think the relevant part of the design was unclear. On the master branch, On this branch, So, on this branch, any test that involves compiling and running Rust code will end up using |
This comment has been minimized.
This comment has been minimized.
|
Oh wow, I missed that entirely, I thought it was only used for In that case, I'm definitely willing to trust |
nrc
reviewed
Aug 21, 2014
| codegen_units: uint = (1, parse_uint, | ||
| "divide crate into N units for optimization and codegen"), | ||
| codegen_threads: uint = (1, parse_uint, | ||
| "number of worker threads to use when running codegen"), |
This comment has been minimized.
This comment has been minimized.
nrc
Aug 21, 2014
Member
Given there was no benefit to having different values here, lets just have one option.
This comment has been minimized.
This comment has been minimized.
|
OK, looks good! r=me with all the changes (most of which are nits, TBH) and with Alex's review. @alexcrichton r? (specifically the stuff in back and concerning linking, about which I have no idea). |
This comment has been minimized.
This comment has been minimized.
|
Well then, that's a new segfault I've never seen before! |
epdtry
added some commits
Sep 5, 2014
epdtry
force-pushed the
epdtry:parallel-codegen
branch
from
6a60448
to
6d2d47b
Sep 5, 2014
This comment has been minimized.
This comment has been minimized.
|
Older versions of OSX's On master, the Newer versions of The latest commit on this branch avoids running |
This comment has been minimized.
This comment has been minimized.
|
@epdtry, oh my, that is quite the investigation! That's quite unfortunate that we'll segfault on older versions of OSX. It looks like there's not a whole lot we can do right now though. I'm sad that this may mean that we have to turn off parallel codegen for rustc itself by default (at least for osx), but we can cross that bridge later! |
This comment has been minimized.
This comment has been minimized.
|
Also, major major props for that investigation, that must have been quite a beast to track down! |
This comment has been minimized.
This comment has been minimized.
alexcrichton
commented on 6d2d47b
Sep 6, 2014
|
r+ |
This comment has been minimized.
This comment has been minimized.
|
saw approval from alexcrichton |
This comment has been minimized.
This comment has been minimized.
|
merging epdtry/rust/parallel-codegen = 6d2d47b into auto |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
fast-forwarding master to auto = 4bea7b3 |
bors
added a commit
that referenced
this pull request
Sep 6, 2014
bors
closed this
Sep 6, 2014
bors
merged commit 6d2d47b
into
rust-lang:master
Sep 6, 2014
1 of 2 checks passed
This comment has been minimized.
This comment has been minimized.
|
Awesome work! Looks great for an efficient #2369.
I think the Ninja build system use hashes from the compilation commands and source files (all dependencies) instead of relying on timestamps. @bors should like it ;) |
This comment has been minimized.
This comment has been minimized.
|
Q: Is this flag ignored if I just tried Did I do something wrong? (Also |
This comment has been minimized.
This comment has been minimized.
|
The flag is not ignored, it's just that your library is small enough that it doesn't get much benefit from this patch (especially when optimization is turned off). With
And with
Since
I removed that flag because in my testing I found no benefit from setting |
This comment has been minimized.
This comment has been minimized.
|
@epdtry Thanks for the detailed info! It seems that the bottleneck is the type checking phase in my particular case, and now I'm wondering if spending 50%+ of the time in that phase is normal (but that's off-topic). |
epdtry commentedAug 8, 2014
This branch adds support for running LLVM optimization and codegen on different parts of a crate in parallel. Instead of translating the crate into a single LLVM compilation unit,
rustcnow distributes items in the crate among several compilation units, and spawns worker threads to optimize and codegen each compilation unit independently. This improves compile times on multicore machines, at the cost of worse performance in the compiled code. The intent is to speed up build times during development without sacrificing too much optimization.On the machine I tested this on,
librustcbuild time with-Owent from 265 seconds (master branch, single-threaded) to 115s (this branch, with 4 threads), a speedup of 2.3x. For comparison, the build time without-Owas 90s (single-threaded). Bootstrappingrustcusing 4 threads gets a 1.6x speedup over the default settings (870s vs. 1380s), and buildinglibrustcwith the resulting stage2 compiler takes 1.3x as long as the master branch (44s vs. 55s, single threaded, ignoring time spent in LLVM codegen).The user-visible changes from this branch are two new codegen flags:
-C codegen-units=N: Distribute items acrossNcompilation units.-C codegen-threads=N: SpawnNworker threads for running optimization and codegen. (It is possible to setcodegen-threadslarger thancodegen-units, but this is not very useful.)Internal changes to the compiler are described in detail on the individual commit messages.
Note: The first commit on this branch is copied from #16359, which this branch depends on.
r? @nick29581