Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for splitting linker invocation to a second execution of `rustc` #64191

Open
alexcrichton opened this issue Sep 5, 2019 · 23 comments
Open

Comments

@alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Sep 5, 2019

This issue is intended to track support for splitting a rustc invocation that ends up invoking a system linker (e.g. cdylib, proc-macro, bin, dylib, and even staticlib in the sense that everything is assembled) into two different rustc invocations. There are a number of reasons to do this, including:

  • This can improved pipelined compilation support. The initial pass of pipelined compilation explicitly did not pipeline linkable compilations because the linking step needs to wait for codegen of all previous steps. By literally splitting it out build systems could then synchronize with previous codegen steps and only execute the link step once everything is finished.

  • This makes more artifacts cacheable with caching solutions like sccache. Anything involving the system linker cannot be cached by sccache because it pulls in too many system dependencies. The output of the first half of these linkable compilations, however, is effectively an rlib which can already be cached.

  • This can provide build systems which desire more control over the linker step with, well, more control over the linker step. We could presumably extend the second half here with more options eventually. This is a somewhat amorphous reason to do this, the previous two are the most compelling ones so far.

This is a relatively major feature of rustc, and as such this may even require an RFC. This issue is intended to get the conversation around this feature started and see if we can drum up support and/or more use cases. To give a bit of an idea about what I'm thinking, though, a strawman for this might be:

  1. Add two new flags to rustc, --only-link and --do-not-link.
  2. Cargo, for example, would first compile the bin crate type by passing the --do-not-link flag, passing all the flags it normally does today.
  3. Cargo, afterwards, would then execute rustc again, only this time passing the --only-link flag.

These two flags would indicate to rustc what's happening, notably:

  • --do-not-link indicates that rustc should be creating a linkable artifact, such as a one of the ones mentioned above. This means that rustc should not actually perform the link phase of compilation, but rather it's skipped entirely. In lieu of this a temporary artifact is emitted in the output directory, such as *.rlink. Maybe this artifact is a folder of files? Unsure. (maybe it's just an rlib!)

  • The converse of --do-not-link, --only-link, is then passed to indicate that the compiler's normal phases should all be entirely skipped except for the link phase. Note that for performance this is crucial in that this does not rely on incremental compilation, nor does this rely on queries, or anything like that. Instead the compiler forcibly skips all this work and goes straight to linking. Anything the compiler needs as input for linking should either be in command line flags (which are reparsed and guaranteed to be the same as the --do-not-link invocation) or the input would be an output of the --do-not-link invocation. For example maybe the --do-not-link invocation emits an file that indicates where to find everything to link (or something like that).

The general gist is that --do-not-link says "prepare to emit the final crate type, like bin, but only do the crate-local stuff". This step can be pipelined, doesn't require upstream objects, and can be cached. This is also the longest step for most final compilations. The gist of --only-link is that it's execution time is 99% the linker. The compiler should do the absolute minimal amount of work to figure out how to invoke the linker, it then invokes the linker, and then exits. To reiterate again, this will not rely on incremental compilation because engaging all of the incremental infrastructure takes quite some time, and additionally the "inputs" to this phase are just object files, not source code.

In any case this is just a strawman, I think it'd be best to prototype this in rustc, learn some requirements, and then perhaps open an RFC asking for feedback on the implementation. This is a big enough change it'd want to get a good deal of buy-in! That being said I would believe (without data at this time, but have a strong hunch) that the improvements to both pipelining and the ability to use sccache would be quite significant and worthwhile pursuing.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

@alexcrichton alexcrichton commented Sep 5, 2019

cc @cramertj and @jsgf, we talked about this at the RustConf and figured y'all would want to be aware of this

@jsgf

This comment has been minimized.

Copy link
Contributor

@jsgf jsgf commented Sep 6, 2019

Would the --only-link literally just invoke ld? If so, it might be useful to just be able to extract whatever extra libraries/options its adding to the link line, so we can independently regenerate the link line.

This would be useful for linking hybrid Rust/C++/(other) programs where the final executable is non-Rust. In other words, we could have C++ depending on Rust without needing to use staticlib/cdylib.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

@alexcrichton alexcrichton commented Sep 6, 2019

I don't think it'd just be a thin wrapper around ld, no, but it would also prepare files to get passed to the linker. For example when producing a dylib rustc will unpack an rlib and make a temporary *.a without bytecode/metadata. Additionally if performing LTO this'd probably be the time we'd take out all the bytecode and process it. (maybe LTO throws a wrench in this whole thing). Overall though I don't think it's safe to assume that it'll just be ld.

@jsgf

This comment has been minimized.

Copy link
Contributor

@jsgf jsgf commented Sep 7, 2019

Firstly, we'd want the final linker doing LTO in order to get it cross-language, regardless of whatever language the final target is in and what mix of languages went into the target.

Secondly, since Buck has full dependency information, including Rust dependencies on C/C++, it will arrange for all the right libraries to be on the final link line. As a result we never want to use or honor #[link] directive, and our .rlibs don't contain anything to be unpacked.

(Even if that weren't true, at least on Unix systems, the .rlib is just a .a and could be used directly, except perhaps for the extension).

I like this proposal because it allows us to factor out the Rust-specific details from the language-independent ones. For example there's no reason for rustc to implement LTO if we're already having to solve that for other languages - especially when that solution pretty infrastructure-specific (distributed thin LTO, for example). There's also no real reason for us to use staticlib/cdylib if we can arrange for all the Rust linker parameters to be on the final link line, even if the final executable is C++, and it would be a significant code duplication reduction (unless LTO see it and eliminate it, but that's still a compile-time cost).

Ultimately, Rust lives in the world of linkable object files, and a final artifact is generated by calling the linker with a certain set of inputs. Since Rust doesn't have unusual requirements that make it incompatible with C/C++ linkage (eg special linkage requirements or elaborate linker scripts) then the final linker stage could be broadly language agnostic.

@cramertj

This comment has been minimized.

Copy link
Member

@cramertj cramertj commented Sep 9, 2019

+1 to wanting the ability to turn off / disable #[link].

@michaelwoerister

This comment has been minimized.

Copy link
Contributor

@michaelwoerister michaelwoerister commented Sep 11, 2019

I'm generally in favor of this. Some thoughts:

  • One of the computationally more heavy things that linking needs is the list of exported symbols (i.e. the linker script). Getting this list involves analyzing the HIR and reading upstream crate metadata. But it should be easy to serialize this information during the --do-not-link step and store it in the .rlink output.
  • Could we move all of LTO out of rustc? That would make things simpler for rustc but probably has some overhead. Also, as far as I know, llvm-ar does not support doing LTO, but for staticlibs one might want to have it (Firefox does this at least).
@alexcrichton

This comment has been minimized.

Copy link
Member Author

@alexcrichton alexcrichton commented Sep 11, 2019

I don't disagree that y'all's sorts of projects don't want to use the rustc-baked-in LTO, but I don't think we can remove it because many other projects do use it (and rightfully want to). Also this is still just a sort of high-level concept, but if a lot of feature requests are piled onto this it's unfortunately unlikely to happen.

@0dvictor

This comment has been minimized.

Copy link
Contributor

@0dvictor 0dvictor commented Nov 18, 2019

Hi, my name is Victor and I'm workin with @tmandry.
I am very interested in this Issue as it potentially enables many other great cool. Therefore, I plan to make a prototype and ultimately implement it. @alexcrichton , do you have any suggestions/tips to start?

@alexcrichton

This comment has been minimized.

Copy link
Member Author

@alexcrichton alexcrichton commented Nov 18, 2019

Great @0dvictor! The steps I'd recommend for doing this would probably look like:

  • Poke around rustc and the linking phase, and formulate a high-level plan of how you'd like to implement this feature.
  • Confirm with compiler team folks that the plan of action you've got is reasonable. The details probably wouldn't be fully fleshed out, but this is likely to be a large-ish change so the compiler folks will want to be onboard and it's best to start that early.
  • Iterate towards a working prototype (probably with compiler team help)
  • Start writing tests/etc
  • Evaluate with the compiler team at this point if the change needs an RFC or if it's good to land unstable in the compiler

As for the actual change itself I haven't looked too much into this, so I wouldn't know where best to start there.

@tmandry

This comment has been minimized.

Copy link
Contributor

@tmandry tmandry commented Nov 19, 2019

Some implementation notes:

Recommended reading

Current state of things

The main run_compiler function calls Compiler::link, which ensures there is a codegen step running. It then dispatches to the trait method CodegenBackend::join_codegen_and_link.

For the LLVM backend (which is the only one right now), this method is implemented here. join_codegen_and_link joins all the threads running codegen, which save their results to an object file per thread (often named foo.bar.<hash>-cgu.<threadno>.rcgu.o on linux). The names of these object files are saved in the CodegenResults struct (specifically, look in CompiledModule).

Finally, it calls link_binary which has the main logic for invoking the linker.

Strategy

Obviously, we need to split apart all the code that assumes codegen and linking happen at the same time. This starts with the join_codegen_and_link trait method. Thankfully, it doesn't seem like there is too much code that assumes this, but there's still a question of what to do in the new code.

For the flags, we can start with unstable options (-Z no-link and -Z only-link) and later stabilize them via the RFC process. When the no-link flag is passed, we should not invoke link_binary anymore. When the only-link flag is passed, we need a way of recovering the information that was in our CodegenResults struct so we can call link_binary.

@tmandry

This comment has been minimized.

Copy link
Contributor

@tmandry tmandry commented Nov 19, 2019

  • In lieu of this a temporary artifact is emitted in the output directory, such as *.rlink. Maybe this artifact is a folder of files? Unsure. (maybe it's just an rlib!)

I don't much like using a directory as output, since some build systems might not support this.

Probably the best thing to do is to make an ar file (which I should note is what an rlib is today). I don't much care what the extension is (I like .rlink, since it probably gets handled differently than rlibs or staticlibs do today).

That said, the choice of extension should be up to whoever is invoking rustc, and we should use the -o flag to decide where our output goes.

I think there might be details we need to pay attention to regarding the linker's default behavior when linking object files vs archive files (like symbol visibility), but not sure what those details are. cc @petrhosek

@bjorn3

This comment has been minimized.

Copy link
Contributor

@bjorn3 bjorn3 commented Nov 19, 2019

Bundling it together in a single ar file would do unnecessary work (both IO and CPU) Object files are always written to the disk. When building an ar file, they are copied then copied to the ar file and a symtab is created (ranlib). Creating a symtab can't be avoided if you dont want to unpack the ar file again before linking, as the linker requires a symtab to be present.

@tmandry

This comment has been minimized.

Copy link
Contributor

@tmandry tmandry commented Nov 19, 2019

@0dvictor

This comment has been minimized.

Copy link
Contributor

@0dvictor 0dvictor commented Dec 2, 2019

I finally get a prototype working for the first part - generate a linkable object or bitcode file, as well as the linking command to invoke manually to finish linking. In addition, I also successfully ran LTO linking with a native lib.

While starting on the second stage, I found Rust is moving to LLD per #39915. Can I make an assumption that I only need to support LLD or "clang/gcc -fuse-ld=lld"?

@alexcrichton

This comment has been minimized.

Copy link
Member Author

@alexcrichton alexcrichton commented Dec 2, 2019

While moving to LLD is nice, it's unlikely to happen any time soon, so it's best to not make the assumption of LLD.

@tmandry

This comment has been minimized.

Copy link
Contributor

@tmandry tmandry commented Dec 6, 2019

@0dvictor As a suggestion, you may want to file a PR that includes only the -Z no-link option while you work on finishing the implementation of -Z link-only. That way you can start getting feedback sooner and stay more in sync with mainline. But you may decide it's not worth the trouble.

rustc flags are in src/librustc/session/config.rs. See e.g. this PR for an example.

@0dvictor

This comment has been minimized.

Copy link
Contributor

@0dvictor 0dvictor commented Dec 8, 2019

@0dvictor As a suggestion, you may want to file a PR that includes only the -Z no-link option w

Good idea, let me polish my changes and create a PR.

0dvictor added a commit to 0dvictor/rust that referenced this issue Dec 10, 2019
Adds a compiler option to allow rustc compile a crate without linking.

With this flag, rustc usually generates three files:
    1. xxxx.bc/o/ll/s: the main binary;
    2. xxxx.allocator.bc/o/ll/s: the allocator;
    3. libxxxx.rmeta: the rust metadata containing dependency info.

Part of Issue rust-lang#64191
@0dvictor

This comment has been minimized.

Copy link
Contributor

@0dvictor 0dvictor commented Dec 10, 2019

Sorry about the delay.

After studying the code and making some experiments, I found all linker arguments comes from the following four sources:

  1. Target specs: e.g. linker, pre/post link args/objects, etc.
  2. CLI of rustc: e.g. -L -C relocation-model, -C link-args, -Z pre-link-args, etc.
  3. Compiled modules, include allocator and metadata when necessary;
  4. Dependent libraries (rlib and native)
  5. Crate source code: i.e. [link_args = "-foo -bar -baz"]

At linking stage, assume user always pass the required CLI arguments:

  • (1) and (2) can be constructed without any information from compiling stage.
  • (3) is the object or bitcode files from compiling state.
    • i.e. --emit=object or --emit=llvm-bc
  • (4) is the information contained in a rmeta file
    • Can be obtained via --emit=metadata
  • (5) is not part of any generated files from compiling stage
    • Need somehow be passed to linking stage from compiling stage

Therefore, in my experiment, to compile without linking is basically:
--emit=metadata,object` or `--emit=metadata,llvm-bc
The user can choose to generate either an object file or LLVM bitcode file.

I have to make the following three changes to get it to work [PR #67195]:

  • Generate and save the .bc/.o/.ll/.s file of the allocator and metadata (if needed) when user requests --emit=llvm-bc/object/llvm-ir/asm
    • [I feel this should be the expected behavior instead of completely ignoring allocator and metadata. I was very confused before knowing the .bc/.o/.ll/.s file does not contain all code compiled from the rust source code.]
  • Write the metadata to file when user requests --emit=metadata even if the OutputType is OutputType::Exe.
    • [I also feel this should be the expected behavior.]
  • Skip linking.

To minimize the impact of existing code, all changes are guarded by -Z no-link.

I have not included (5) yet. My plan is to save it in either the rmeta file, or the bitcode/object file using LLVM’s !llvm.linker.options. I prefer the latter as we can get it for free for targets using LLD. (Of course we still have to generate corresponding linker args for targets do not use LLD)

If we want one single .rlink file, we can ar the .rmeta and .bc/o files generated in compiling stage.

@0dvictor

This comment has been minimized.

Copy link
Contributor

@0dvictor 0dvictor commented Dec 10, 2019

Then for the linking stage, I plan to insert code here to read the .rmeta or .rlink file, resolve all dependencies then reconstruct the CodegenResults, so that we can create and execute the linker.

@0dvictor

This comment has been minimized.

Copy link
Contributor

@0dvictor 0dvictor commented Dec 10, 2019

Finally, some thoughts on LTO: once this Issue finishes, we should be able to do LTO easily when we use LLD (either directly or via clang/gcc -fuse-ld=lld). I have successfully run LTO linking a rust crate and a llvm bitcode file generated by clang -flto -c. However, LLD only takes uncompressed bitcode files (either by itself or residing inside a .a file). Rust rlib only contains compressed ones, so that we would have to extract and uncompressed them before sending to LLD.

Out of curiosity, why does an rlib contains both native object file and LLVM bitcode file? Is it because the time cost of “LLVM bitcode => native object” is too expensive? Otherwise, we only need to save LLVM bitcode in an rlib and generate native objects when needed.

@mati865

This comment has been minimized.

Copy link
Contributor

@mati865 mati865 commented Dec 10, 2019

Out of curiosity, why does an rlib contains both native object file and LLVM bitcode file? Is it because the time cost of “LLVM bitcode => native object” is too expensive? Otherwise, we only need to save LLVM bitcode in an rlib and generate native objects when needed.

#66961

@tmandry

This comment has been minimized.

Copy link
Contributor

@tmandry tmandry commented Dec 13, 2019

  • Generate and save the .bc/.o/.ll/.s file of the allocator and metadata (if needed) when user requests --emit=llvm-bc/object/llvm-ir/asm

    • [I feel this should be the expected behavior instead of completely ignoring allocator and metadata. I was very confused before knowing the .bc/.o/.ll/.s file does not contain all code compiled from the rust source code.]
  • Write the metadata to file when user requests --emit=metadata even if the OutputType is OutputType::Exe.

    • [I also feel this should be the expected behavior.]
  • Skip linking.

To minimize the impact of existing code, all changes are guarded by -Z no-link.

Yeah, those should probably be the default. Would you mind opening an issue to track this?

0dvictor added a commit to 0dvictor/rust that referenced this issue Dec 14, 2019
Adds a compiler option to allow rustc compile a crate without linking.

With this flag, rustc usually generates three files:
    1. xxxx.bc/o/ll/s: the main binary;
    2. xxxx.allocator.bc/o/ll/s: the allocator;
    3. libxxxx.rmeta: the rust metadata containing dependency info.

Part of Issue rust-lang#64191
@0dvictor

This comment has been minimized.

Copy link
Contributor

@0dvictor 0dvictor commented Dec 14, 2019

Yeah, those should probably be the default. Would you mind opening an issue to track this?

Here they are: #67292 and #67293

I also created #67294 to track a different issue found while I am working on splitting compiling and linking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.