-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement shipping a per-target LLVM backend #46819
Comments
cc @rust-lang/compiler, if y'all have more information about converting |
Notes from IRC:
|
With mentoring I might like to work on this, seeing as how I've been interested in getting back into compiler development specifically at my own frustrated impotence at helping out with the LLVM upgrade. :) Things are going to be crazy for a few weeks with the holidays though so don't let me dissuade anyone who wants to snatch this up before then. |
Ok awesome! Just lemme know if any help is needed! |
Further things this will enable:
|
And of course we can also use this to get rid of |
Will it still be possible to easily build against only a single LLVM tree if e.g. I don't care about emscripten, or really anything except x86 and my target architecture? |
@whitequark From what I heard, the |
Ok I've implemented the first PR at #47671 |
And the next and final part at #47730 |
rustc: Load the `rustc_trans` crate at runtime Building on the work of #45684 this commit updates the compiler to unconditionally load the `rustc_trans` crate at runtime instead of linking to it at compile time. The end goal of this work is to implement #46819 where rustc will have multiple backends available to it to load. This commit starts off by removing the `extern crate rustc_trans` from the driver. This involved moving some miscellaneous functionality into the `TransCrate` trait and also required an implementation of how to locate and load the trans backend. This ended up being a little tricky because the sysroot isn't always the right location (for example `--sysroot` arguments) so some extra code was added as well to probe a directory relative to the current dll (the rustc_driver dll). Rustbuild has been updated accordingly as well to have a separate compilation invocation for the `rustc_trans` crate and assembly it accordingly into the sysroot. Finally, the distribution logic for the `rustc` package was also updated to slurp up the trans backends folder. A number of assorted fallout changes were included here as well to ensure tests pass and such, and they should all be commented inline.
This commit introduces a separately compiled backend for Emscripten, avoiding compiling the `JSBackend` target in the main LLVM codegen backend. This builds on the foundation provided by rust-lang#47671 to create a new codegen backend dedicated solely to Emscripten, removing the `JSBackend` of the main codegen backend in the process. A new field was added to each target for this commit which specifies the backend to use for translation, the default being `llvm` which is the main backend that we use. The Emscripten targets specify an `emscripten` backend instead of the main `llvm` one. There's a whole bunch of consequences of this change, but I'll try to enumerate them here: * A *second* LLVM submodule was added in this commit. The main LLVM submodule will soon start to drift from the Emscripten submodule, but currently they're both at the same revision. * Logic was added to rustbuild to *not* build the Emscripten backend by default. This is gated behind a `--enable-emscripten` flag to the configure script. By default users should neither check out the emscripten submodule nor compile it. * The `init_repo.sh` script was updated to fetch the Emscripten submodule from GitHub the same way we do the main LLVM submodule (a tarball fetch). * The Emscripten backend, turned off by default, is still turned on for a number of targets on CI. We'll only be shipping an Emscripten backend with Tier 1 platforms, though. All cross-compiled platforms will not be receiving an Emscripten backend yet. This commit means that when you download the `rustc` package in Rustup for Tier 1 platforms you'll be receiving two trans backends, one for Emscripten and one that's the general LLVM backend. If you never compile for Emscripten you'll never use the Emscripten backend, so we may update this one day to only download the Emscripten backend when you add the Emscripten target. For now though it's just an extra 10MB gzip'd. Closes rust-lang#46819
This commit introduces a separately compiled backend for Emscripten, avoiding compiling the `JSBackend` target in the main LLVM codegen backend. This builds on the foundation provided by rust-lang#47671 to create a new codegen backend dedicated solely to Emscripten, removing the `JSBackend` of the main codegen backend in the process. A new field was added to each target for this commit which specifies the backend to use for translation, the default being `llvm` which is the main backend that we use. The Emscripten targets specify an `emscripten` backend instead of the main `llvm` one. There's a whole bunch of consequences of this change, but I'll try to enumerate them here: * A *second* LLVM submodule was added in this commit. The main LLVM submodule will soon start to drift from the Emscripten submodule, but currently they're both at the same revision. * Logic was added to rustbuild to *not* build the Emscripten backend by default. This is gated behind a `--enable-emscripten` flag to the configure script. By default users should neither check out the emscripten submodule nor compile it. * The `init_repo.sh` script was updated to fetch the Emscripten submodule from GitHub the same way we do the main LLVM submodule (a tarball fetch). * The Emscripten backend, turned off by default, is still turned on for a number of targets on CI. We'll only be shipping an Emscripten backend with Tier 1 platforms, though. All cross-compiled platforms will not be receiving an Emscripten backend yet. This commit means that when you download the `rustc` package in Rustup for Tier 1 platforms you'll be receiving two trans backends, one for Emscripten and one that's the general LLVM backend. If you never compile for Emscripten you'll never use the Emscripten backend, so we may update this one day to only download the Emscripten backend when you add the Emscripten target. For now though it's just an extra 10MB gzip'd. Closes rust-lang#46819
rustc: Split Emscripten to a separate codegen backend This commit introduces a separately compiled backend for Emscripten, avoiding compiling the `JSBackend` target in the main LLVM codegen backend. This builds on the foundation provided by #47671 to create a new codegen backend dedicated solely to Emscripten, removing the `JSBackend` of the main codegen backend in the process. A new field was added to each target for this commit which specifies the backend to use for translation, the default being `llvm` which is the main backend that we use. The Emscripten targets specify an `emscripten` backend instead of the main `llvm` one. There's a whole bunch of consequences of this change, but I'll try to enumerate them here: * A *second* LLVM submodule was added in this commit. The main LLVM submodule will soon start to drift from the Emscripten submodule, but currently they're both at the same revision. * Logic was added to rustbuild to *not* build the Emscripten backend by default. This is gated behind a `--enable-emscripten` flag to the configure script. By default users should neither check out the emscripten submodule nor compile it. * The `init_repo.sh` script was updated to fetch the Emscripten submodule from GitHub the same way we do the main LLVM submodule (a tarball fetch). * The Emscripten backend, turned off by default, is still turned on for a number of targets on CI. We'll only be shipping an Emscripten backend with Tier 1 platforms, though. All cross-compiled platforms will not be receiving an Emscripten backend yet. This commit means that when you download the `rustc` package in Rustup for Tier 1 platforms you'll be receiving two trans backends, one for Emscripten and one that's the general LLVM backend. If you never compile for Emscripten you'll never use the Emscripten backend, so we may update this one day to only download the Emscripten backend when you add the Emscripten target. For now though it's just an extra 10MB gzip'd. Closes #46819
LLVM is quite a flexible compiler with a huge number of targets, but sometimes targets require custom versions or forks of LLVM. Up to now we've got two primary example of this:
While each of these targets may have a lot more going on with it in terms of future plans and whatnot, it suffices to say that for the near future (6mo -1y) it seems like Emscripten in particular won't be moving away from its LLVM fork and we'd like to keep its functionality. This desire to keep Emscripten results in a tension with upgrading LLVM on our end as we can't do so until Emscripten does so.
As a result, let's ship multiple copies of LLVM!
General idea
The overall idea for this issue is to allow each target to optionally have a custom LLVM backend. We would then be compiling LLVM multiple times, once per necessary, and shipping multiple copies of LLVM to users. At compile time the compiler would select which version of LLVM is appropriate, dynamically load it, and then use it to compile and generate code.
This means that our build system will need to prepare itself for building multiple copies of LLVM. By default developers probably won't be building multiple copies of LLVM, but the bots on Travis/AppVeyor would all be compiling multiple copies when making dist builds.
The current thinking is that
rustc_driver
-the-crate will no longer depend onrustc_trans
. Insteadrustc_trans
will be compiled as usual except it will also expose a C interface. The driver will then dynamically select the right trans backend, open it up, and use the C API to register hooks and whatnot.Compiler changes
I believe the first thing that'll need to be changed is how we build the compiler, specifically with how
librustc_trans
is loaded. I've been told that therustc_trans
crate is very close to only exposing basically a C API, and this would require us to complete that work. So the first task for this issue would be to work with the compiler team to ensure that therustc_trans
crate has a C API and therustc_driver
crate only uses this C API.Once that's been done the dependency between
librust_driver
andlibrustc_trans
can be broken. Instead we'll be doing something like:rustc_trans
fromlibrustc_driver/Cargo.toml
librustc_llvm
to compile only as an rlib, not as both an rlib and a dylib.RustcTrans
RustcTrans
similar to the step calledRustc
, but this step will compile just thelibrustc_trans
targetRustcTransLink
similar toRustcLink
, except it'll link just the onerustc_trans
dylib into the sysroot in a specific location (detailed below)Assemble
step to requireRustcTransLink
in addition toRustcLink
The sysroot (on unix) currently looks like:
I think what we'll want to move to is something that looks like:
Specifically the
librustc_trans.so
dynamic library no longer lives inlib
. Instead multiple copies of it will live inlib/rustlib/backends
. TheRustcTransLink
step is what will assemble thebackends
folder. Initially we'll just have thestandard
dynamic library sitting inside there.Once this is done the driver needs to be modified when loading
rustc_trans
the crate. At runtime the driver will determine the target and look at an optional field in the custom target spec. This'll default toNone
which say sto load the "standard" backend, and if it'sSome
rustc will instead look for a different backend. For now we'll add this later though.Ok so at this point, hopefully, rustc_driver is now loading librustc_trans through a dynamic library at runtime and we're ready for the next step!
Changes to rustbuild
Next up we need to get a second version of LLVM compiling. For now we'll stick to the motivational use case for this, Emscripten. First thing to do is to add a config option to
config.toml.example
, let's say something like:We'll then modify the
Assemble
step to check this config option. For each configured backend we'll executeRustcTransLink
appropriately (adding a new option for the LLVM backend we'd like to create) and plumb that option all the way down to theLlvm
target which will get modified appropriately.Once this is done you should be able to configure via
config.toml
that you'd like to have an emscripten backend and when./x.py build
is executed it'll compile LLVM/librustc_trans twice into two separate directories.In order to ensure that
librustc_trans
builds are cached appropriately this may want to also add features to therustc_trans
crate which get toggled depending on the LLVM backend, but this can be played around with when implementing.Now at this point we've got multiple LLVM compilations, so let's put some polishing touches on things!
Distribution changes
We'll want to change the
rustc
component package to include thebackends
folder that we're creating. This will involve changing theRustc
step indist.rs
, and when you run./x.py dist
therustc
packages created should all have thelibrustc_trans
dylib inside them at thebackends
location.Eventually we'll also want to enable the multiple llvm backends by default when the configured release channel is not
dev
and theDEPLOY
env var is set to 1. This can be done most likely insrc/ci/run.sh
by passing a new option.Finally what we'll want to do is add a second submodule. We'll want, for example, a
src/llvm-emscripten
submodule. This won't actually get checked out on most builds, but for the dist builds on the bots we'll make sure to update the submodule and run with it.And... I think that may be it? I'm sure I'll need to fill in a lot of cracks along the way but I'm more than willing to help mentor this issue! If you're interested in implementing this please just let me know!
The text was updated successfully, but these errors were encountered: