Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upTracking issue for MIR-only RLIBs #38913
Comments
michaelwoerister
added
T-compiler
T-tools
labels
Jan 8, 2017
This comment has been minimized.
This comment has been minimized.
|
This is also potentially breaking people who are linking to rlibs expecting them to at least expose the extern I did that at least once before, though the application where I did it was already very hacky for other reasons and I do not think the project is around anymore. |
This comment has been minimized.
This comment has been minimized.
|
Another advantage I see is that pure MIR RLIBs effectively let us "recompile" Basically, Other case where one uses Xargo to recompile the cc @brson ^ pure MIR RLIBs would eliminate the need for std-aware Cargo and I expect the above will also make using sanitizers (cf #38699) cc @alexcrichton ^ relevant to sanitizer support
This would be required for the "easy sanitizers" scenario I'm describing above. Also, note that today one can build statically linked Rust programs using the |
This comment has been minimized.
This comment has been minimized.
|
@japaric that’s trickly advantage as it prevents us from adding any MIR optimisations that depend on the codegen options set :) We already have one which acts upon the |
This comment has been minimized.
This comment has been minimized.
|
@nagisa @japaric isn't platform independence listed in the issue description as non advantage?
I'd add to the advantages that it would add more parallelism, as the passes up to MIR being finished take less time than passes up to codegen being finished, and in combination with |
This comment has been minimized.
This comment has been minimized.
|
Could you elaborate on how it would prevent you from adding such optimizations? The way I see it is that the If you want the most optimized code possible then, yeah, you would have to use Xargo or std-aware Cargo to opt into MIR optimizations that depend on codegen options. While you are at it you can also throw in |
This comment has been minimized.
This comment has been minimized.
|
I agree that MIR optimisations don't really prevent you to have platform agnostic MIR. As both their input and output is MIR, those optimisations could be run in the leaf crates, once the target and other info is known. However, if earlier stages in the compiler depend on the target, which is the case with cfg, one would either have to refactor the entire compiler to understand cfg's in all later stages, or simulate compilation with all possible combinations of cfg's enabled/disabled (in the end cfg is an on/off question). The first approach will probably hugely bloat code complexity of the compiler, the second approach would bloat runtime complexity exponentially by the number of kinds of used cfg's. So MIR will probably stay platform dependent for some time. |
This comment has been minimized.
This comment has been minimized.
I'm not sure what are trying to get at? The |
This comment has been minimized.
This comment has been minimized.
|
@japaric Removing landing pads from MIR is already somewhat a problem since you cannot add them back after fact, so you already lose some of the so-called advantage by being unable to reverse that. Later on we might want to add something more invasive. For a completely hypothetical example consider something resembling autovectorisation which, again, is not exactly reversible and thus So, what I’m trying to say is that specifying codegen options on leaf crates only would still not be equivalent (and diverge more over time with extra hypothetical MIR opts) to specifying the codegen option(s) for every crate. You could (as @est31 did just now) argue for storing unoptimised MIR instead, but that, in addition to inreasing size of intermediate rlibs, serializes MIR opts.
Codegen options aren’t exactly related to platform independence in this context. |
This comment has been minimized.
This comment has been minimized.
|
I'm not sure if this can be listed as an advantage but pure MIR RLIBs would have prevented #38824. The TL;DR is that LLVM raises assertions when you try lower functions that take/return |
This comment has been minimized.
This comment has been minimized.
Ah, sorry, I've misread, you only talked about codegen options. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I don't think it will affect const evaluation one way or another, but it would help us test Miri outside rustc to be able to easily build dependencies as MIR-only rlibs (with MIR for all items, not just generic/inline/constant ones like in the existing metadata). Largely, Miri is just like another backend in this context, so it is an instance of this previously mentioned advantage:
|
This comment has been minimized.
This comment has been minimized.
MSVC using |
This comment has been minimized.
This comment has been minimized.
|
Regarding (I would have more sympathy if someone could give a good reason for using rlibs as archives that isn't already covered by |
This comment has been minimized.
This comment has been minimized.
|
I'm very enthusiastic about this. I think separating the type checking and code generation into two phases is smart no matter exactly the strategy for when the MIR finally get translated. Gives us a lot of flexibility for coordinating the build. For example, we don't have to delay code generation until the final crate. Cargo itself could spawn parallel processes to do code generation for already-typechecked crates, while their downstreams continue type checking. By collapsing duplicate monomorphizations, I'm hopeful that this will lead to significant improvements to the major disadvantages of monomorphization, the bloat and the compile time. We could end up in a position where we can say, "the generics model is like C++, but more efficient". That could be a major advantage. One significant disadvantage with this model is link-time scalability. This will put massive memory pressure on the leaf crate builds, and that could bite us in the future as bigger projects are written in Rust. |
This comment has been minimized.
This comment has been minimized.
|
LTO is a downside too because of compile time. I'd expect we'd need a range of strategies for the actual codegen, to accomplish different goals in |
This comment has been minimized.
This comment has been minimized.
I’ve very worried about this for Servo. There’s currently 319 crates in the dependency graph, but after an initial build only a few of them are recompiled in the typical edit-build-test cycle. Even so, compile times are already pretty bad. Do MIR-only rlibs mean doing code generation for the entire dependency graph every time? This sounds like unacceptable explosion of compile times. |
This comment has been minimized.
This comment has been minimized.
|
@SimonSapin I see no point in experimenting with this on Servo's scale without enabling incremental recompilation (with ThinLTO in the future, too). Btw I hear @rkruppe is making good progress towards such a compilation mode. |
This comment has been minimized.
This comment has been minimized.
|
We discussed this in the last @rust-lang/tools meeting and the consensus was that this looks like a good idea in many ways but we will not pursue it as long as it would mean a significant compile regression. |
This comment has been minimized.
This comment has been minimized.
|
So then we'll be pursuing this as soon as Rust is able to fully take advantage of incremental compilation using ThinLTO? |
Ixrec
referenced this issue
Feb 4, 2017
Open
Support generating interface files separately (and before) compiled code #39541
This comment has been minimized.
This comment has been minimized.
|
Given the recent work to make the compiler incremental, I wonder if it will be possible to perform incremental builds at the level of individual functions, caching anything that hasn't changed. That could allow amazing feats, such as executables that are incrementally updated as the user compiles their source code. |
This comment has been minimized.
This comment has been minimized.
|
Just jotting this down before I forget about it again: Currently, So it would be cleaner to also delay translation of statics to the final binary/staticlib/cdylib. This requires non-trivial refactoring though, as a lot of the current code is written under the assumption that all statics to translate come from the current crate (e.g., It also means metadata needs a way to enumerate all the statics and other collector roots (monomorphic functions, and some more things in "eager" mode) from other crates. The information is all there, but there's no efficient/easy way to enumerate them. |
alexcrichton
removed
the
T-tools
label
May 22, 2017
michaelwoerister
referenced this issue
Jul 13, 2017
Open
Building with LTO should skip "compilation" #43212
Mark-Simulacrum
added
the
C-tracking-issue
label
Jul 26, 2017
This comment has been minimized.
This comment has been minimized.
|
We can do a stepwise migration towards MIR-only RLIBs:
|
This comment has been minimized.
This comment has been minimized.
Can you elaborate on the parenthetical? I don't think MIR inlining on its own can have any effect on where and how statics are translated. Even when statics are lexically nested in a function, they're not part of the function's MIR. Statics are also trans-item-collect'd separately from MIR (as part of walking the HIR of the current crate), at least last time I checked. |
This comment has been minimized.
This comment has been minimized.
I have been getting undefined references to statics inside functions that were inlined into other crates when compiling libstd via xargo with |
This comment has been minimized.
This comment has been minimized.
|
I believe I found yet another benefit to MIR-only rlibs: Currently, With MIR-only rlibs, the only remaining differences between rlibs and rmeta files would (1) the rlib has wrapped the metadata in an archive file, and (2) the archive includes bundled native libraries, if any. Creating the archive should have neglegible cost, so we could probably get rid of rmeta files and make (Going further, @nagisa (I think?) once suggested to me on IRC that metadata and machine code should be two separate files on disk. I found this appealing for other reasons, but to stay on-topic, such a split would make it possible to pick up a previously-generated rmeta file and generate all the machine code and so on from it, without recompiling the leaf crate from scratch. But that is mostly orthogonal to MIR-only rlibs, so whatever.) |
This comment has been minimized.
This comment has been minimized.
|
@oli-obk re: the undefined references to statics: I don't have much time to investigate, but one cuplrit I can think of would be Anyway, it would be great if you could file an issue for that (if there isn't one already) with a small test case. This is definitely a bug, but so far I don't believe it's an issue with statics getting translated locally. |
rkruppe
referenced this issue
Sep 15, 2017
Open
[idea] Allow .rmeta to be translated in .rlib files #44587
This comment has been minimized.
This comment has been minimized.
aep
commented
Sep 22, 2017
•
|
uuh i'm scratching my head what i'm missing here: how is this gong to work with -C linker= ? We're relying on the fact that the hash of the input file to the linker is the same every invocation. If the objects get translated to native code before being passed to the linker, is the translation stable? Or will the targets system linker actually only ever see a single already relocated and re-ordered object file ? |
This comment has been minimized.
This comment has been minimized.
|
This issue only affects which Rust code (or monomorphization of generic Rust code) gets translated into which LLVM compilation unit. It doesn't affect what happens afterwards with these LLVM modules, the resulting object files, etc. — and while it's plausible that MIR-only rlibs would enable more innovation in the later stages of the backend, nothing along those lines has been proposed or even discussed as far as I remember. |
This comment has been minimized.
This comment has been minimized.
|
Another (marginal) benefit, assuming |
michaelwoerister
referenced this issue
Jan 10, 2018
Closed
Experiment with sharing monomorphized code between crates #47317
michaelwoerister
referenced this issue
Jan 22, 2018
Open
Tracking Issue for Incremental Compilation 2018 #47660
michaelwoerister
referenced this issue
Feb 20, 2018
Closed
[experimental] Allow for RLIBs to only contain metadata. #48373
bors
added a commit
that referenced
this issue
Feb 20, 2018
This comment has been minimized.
This comment has been minimized.
|
I've put together a proof-of-concept implementation of this in #48373. Although the implementation crashes for many crates, I was able to collect timings for a number of projects. The tables show the aggregate time spent for various tasks while compiling the whole crate graph. In many cases we do less work overall but due to worse parallelization, wall-clock time increases. I.e. everything seems to be bottlenecked on the MIR-to-LLVM translation in the leaf crates. To me this suggests that MIR-only RLIBs are blocked on the compiler internals being parallelized. ripgrep - cargo build
encoding-rs - cargo test --no-run
webrender - cargo build
futures-rs - cargo test --no-run
tokio-webpush-simple - cargo build
Number of LLVM function definitions generated for whole crate graph
|
rkruppe
referenced this issue
Apr 12, 2018
Merged
Workarounds for all/any mask reductions on x86, armv7, and aarch64 #425
eddyb
referenced this issue
Jun 27, 2018
Closed
Experiment with pipelining rustc invocations via metadata #4831
rkruppe
referenced this issue
Jul 2, 2018
Open
RFC: Existential types with external definition #2492
This comment has been minimized.
This comment has been minimized.
|
What if we did this, but for (prompted by @dwijnand's comments on Discord about their workflow of changing EDIT: here's some data, since I wanted to replicate what @dwijnand was seeing:
Most of the time is spent building libstd, which should be improved once #53673 ends up in beta (perhaps at the cost of the |
michaelwoerister commentedJan 8, 2017
•
edited
There's been some talk about switching RLIBs to "MIR-only", that is, make RLIBs contain only the MIR representation of a program and not the LLVM IR and machine code as they do now. This issue will try to collect some advantages, disadvantages, and other concerns such an approach would entail:
Advantages
-C metadata).libstdis compiled with-Cdebuginfo=1, which is good in general but as a side-effect increases the size of Rust binaries, even if they are built without debuginfo (because the debuginfo fromlibstdgets statically linked into the binaries). This problem would not exist with MIR-only rlibs.libstd(see #38699), as @japaric points out.libstd) can be compiled with-C target-cpu=native, potentially resulting in better code, as @japaric points out.Disadvantages
pub #[no_mangle]items being exported from RLIBs and link against them directly. This would not be possible anymore, as @nagisa points out.Non-Advantages
cfgswitches, MIR is not platform independent either.Mitigation strategies for disadvantages:
-C codegen-unitsalready, which provides a means of reducing super-linear optimizations.Open Questions
Please help collect more data on the viability of MIR-only RLIBs.
cc @rust-lang/core @rust-lang/compiler @rust-lang/tools @rkruppe