Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAssembly backend for rustc #33205

Closed
brson opened this issue Apr 25, 2016 · 23 comments
Closed

WebAssembly backend for rustc #33205

brson opened this issue Apr 25, 2016 · 23 comments
Labels
C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@brson
Copy link
Contributor

brson commented Apr 25, 2016

With the new MIR->LLVM backend being all but done, and being cleaner than the old trans, we can seriously think about writing yet more backends. A relatively important, and relatively easy target to translate to is WebAssembly.

The right starting strategy per @eddyb is to fork miri and prototype the new backend out of tree.

One interesting point that @kripken brought up recently is that wasm is an AST with controlflow, whereas MIR is just a graph of blocks. This means that somebody has to "reloop" the controlflow back out of the graph. Most likely binaryen, the C++ tool that
consumes wasm, will just additionally accept a basic-block form of wasm and do the conversion itself.

@kripken is interested in helping with this, so I'm setting up this issue as a place for interested parties to discuss.

cc @tsion @nagisa

@nagisa
Copy link
Member

nagisa commented Apr 25, 2016

One interesting point that @kripken brought up recently is that wasm is an AST with controlflow, whereas MIR is just a graph of blocks. This means that somebody has to "reloop" the controlflow back out of the graph.

Look at how C’s goto is implemented in wasm, maybe? You cannot reloop everything, and while that might result in non-optimal code, I think it might as well be just good enough for a MVP version of wasm. Also see: https://github.com/WebAssembly/design/blob/master/FutureFeatures.md#more-expressive-control-flow

Seeing what early stages wasm still is, I think there’s not much point in pondering about such issues.

EDIT: in regards of relooping, it might be much more hard to discern original flow constructs from the MIR CFG once we run the transformation passes.

@kripken
Copy link

kripken commented Apr 25, 2016

I'm very excited about this! I think compiling directly to wasm is interesting for a language like rust for several reasons:

  • LLVM is obviously a very powerful optimizing compiler. However, that comes at the cost of slower compilation times. Compiling to wasm directly can be a lot faster: We can skip a bunch of IR layers and transformations between them, by having wasm be the input IR to the backend (binaryen), which then operates directly on that IR all the way to the final binary.
  • Of course, we are trading off compilation speed for power. However, a lot of the power in LLVM happens at the machine level: register allocation, machine architecture-specific techniques, etc. And those things matter a lot less when compiling to something higher level and platform independent like wasm.
  • And on the other hand, a bunch of the optimization opportunities in a language like rust are very language-specific, and must be done by rust anyhow, and not left to LLVM. As I understand it, that's a major motivation for the new MIR. Those optimizations will still happen.
  • Finally, I believe wasm is a good backend input IR. And it should be pretty easy to emit it. Part of that is no coincidence, as wasm is carefully designed to be friendly as an input IR to web browsers.

Basically, an option to compile directly to wasm should give a big boost in compile times, for some loss of throughput but hopefully not too much, and with hopefully not too much effort. I am of course signing up to do the work on the binaryen side.

Practically speaking, emitting wasm for binaryen means emitting an IR whose types are i32, i64, f32, f64, and all the usual math operations on those (note: no odd things like i33 which LLVM has, nor i1 for that matter), but little else. I imagine rustc emits something similar to that when emitting LLVM IR? Then splitting off LLVM vs wasm could be done close to where LLVM is currently emitted in rustc, perhaps?

As @brson and @nagisa mentioned, control flow is structured in wasm and binaryen: loops, blocks, breaks, ifs, etc., which is a concern. I think as a first step here, I'll add support for basic blocks/control flow graphs as input to binaryen - the exact same IR as usual, except with basic blocks that end in branches, etc., much the same as LLVM's input would be. And binaryen will create structured control flow from that, much as emscripten does on LLVM cfgs now ("relooping", which can handle arbitrary cfgs), as @brson said.

Aside from that, there are a bunch of LLVM features we'll need to add special support for, like invoke/landingpad. Currently wasm doesn't have native support for those things, however, we know we can implement them (with some overhead) in an emulated manner (that's what emscripten does for asm.js), while waiting for native support. So we can just add that support to binaryen as needed. But I'm not sure of the full list of such features that rust needs?

Binaryen is a C++ project. I assume rust will want a C API to communicate with? Then a second step (after cfgs in binaryen) for me could be to add a C API to binaryen.

My main question for the rust side is whether emitting something at the wasm IR level (+ basic blocks; as described above) is practical for your project? And overall thoughts on the other details I wrote?

Sorry for the long comment here, I just think this could be a very cool project, hopefully it makes sense for us to do :)

@nagisa
Copy link
Member

nagisa commented Apr 25, 2016

Binaryen is a C++ project. I assume rust will want a C API to communicate with? Then a second step (after cfgs in binaryen) for me could be to add a C API to binaryen.

As long as the C API is exhaustive. LLVM C API is suitable for small projects but is very lacking otherwise and we end up doing some of the binding ourselves anyway.

Aside from that, there are a bunch of LLVM features we'll need to add special support for, like invoke/landingpad.

Yes, this is certainly something rust needs. That being said, unwinding in Rust is pretty much always the cold path (unlike, say, C++), so, personally, it is fine with me if its not very performant. I’m not sure what else wasm hasn’t that we use, but a few things that are more likely to come up:

  • Integer operations with overflow checking;
  • unreachable and abort;
  • bit counting operations like count leading zeroes: I believe these are also non-optional, though it is possible that there might be a fallback to compiler-rt;
  • atomic ops: the pointer sized atomic ops not optional and are stable in the standard library;
  • the math intrinsics, I guess?

I think as a first step here, I'll add support for basic blocks/control flow graphs as input to binaryen - the exact same IR as usual, except with basic blocks that end in branches, etc., much the same as LLVM's input would be.

An interesting idea by @eddyb was to have a CFG wrapper over plain wasm AST: basically CFG blocks with control flow and the blocks would contain regular wasm. Something to consider.

nor i1 for that matter

Rust’s booleans are defined to only have two valid representations (1 and 0), and we could use i1 if that was available. That being said, it is also not a problem to represent them as a i8 (like we do in LLVM most of the time).

@kripken
Copy link

kripken commented Apr 26, 2016

As long as the C API is exhaustive. LLVM C API is suitable for small projects but is very lacking otherwise and we end up doing some of the binding ourselves anyway.

Understood. Yeah, I'll make sure the C API provides everything so that Rust doesn't need any extra binding work.

Integer operations with overflow checking;
unreachable and abort;
bit counting operations like count leading zeroes: I believe these are also non-optional, though it is possible that there might be a fallback to compiler-rt;
atomic ops: the pointer sized atomic ops not optional and are stable in the standard library;
the math intrinsics, I guess?

Thanks for the list! Ok, nothing here sounds like a problem. Wasm already has support for some of those (like abort, ctz, etc.), and I can add compiler-rt type support for the others (most are in emscripten's runtime support libraries anyhow, in JS, which can be linked with automatically, at the cost of some runtime overhead - we can start there and optimize later).

Note though that wasm doesn't have threads yet, so atomics don't matter, and we can only start with single-threaded code. But threads + atomics are on the wasm roadmap.

An interesting idea by @eddyb was to have a CFG wrapper over plain wasm AST: basically CFG blocks with control flow and the blocks would contain regular wasm. Something to consider.

Yeah, that's exactly what I've been thinking too. That way there isn't another special IR, just some additions to the IR for the ends of basic blocks. Then relooping replaces just those as it goes to structured control flow, but everything else is already in the final IR.

That being said, it is also not a problem to represent them as a i8 (like we do in LLVM most of the time).

Note that wasm doesn't have i8 either - only i32, i64, f32, f64.

@eddyb
Copy link
Member

eddyb commented Apr 26, 2016

@kripken Does it at least have byte-level memory access?

@kripken
Copy link

kripken commented Apr 26, 2016

@eddyb: Yes, while locals are i32, i64, f32, f64, the load and store operations can load 8, 16, 32 or 64 bits, and can also load signed or unsigned values for integers. So e.g. i32.load8_s loads one byte, sign-extends it to 32 bits, and returns that i32 value.

@DanielKeep
Copy link
Contributor

A heads up: I have the beginnings of a WAsm stack floating about. Thus far, I've got it parsing the text format and interpreting the results. I'm going through the reference test cases, implementing stuff as I go. Might possibly be of some use for testing purposes.

@kripken
Copy link

kripken commented May 3, 2016

An initial proposal for a C API for binaryen is at WebAssembly/binaryen#427. Feedback is very welcome.

Note that this is not the basic-block + relooping API yet. That will be a second step.

@kripken
Copy link

kripken commented May 10, 2016

There is now what should be a sufficient C API for binaryen, including relooping/arbitrary CFGs as inputs. Also wrote some docs.

@kripken
Copy link

kripken commented May 12, 2016

@brson put up a project for this, mir2wasm, which has some early code. All the setup of binding to binaryen's C API is done, and we have some trivial data being sent (function names; no types nor code yet).

@eholk
Copy link
Contributor

eholk commented Jun 16, 2016

Out of curiosity, what's the status of mir2wasm? I tried to build it but unfortunately it seems to be broken at the moment. I'm interested in contributing, but I probably only have a handful of hours a week to spare so I want to try to make the most of the time I have.

It looks like mir2wasm is forked from miri, but the two code bases seem to have diverged rather significantly.

@badboy
Copy link
Member

badboy commented Jun 16, 2016

@eholk I have a small patch lying around to make it compile on a recent nightly. I will submit it tomorrow.
I also like to help improve it.

@sunjay
Copy link
Member

sunjay commented Jul 7, 2016

Any progress on this? I'm in love with the Rust type system and going back to JavaScript/ES6 just isn't enough.

@eddyb
Copy link
Member

eddyb commented Jul 7, 2016

@sunjay It's at https://github.com/brson/mir2wasm (which has been linked above too).

@brson brson added the E-help-wanted Call for participation: Help is requested to fix this issue. label Sep 1, 2016
@ticki
Copy link
Contributor

ticki commented Sep 5, 2016

For a more directed, but a lot more incomplete, alternative to mir2wasm, see cyano.

@badboy
Copy link
Member

badboy commented Sep 5, 2016

@ticki Is there a way to actually use cyano at the moment? Also, it will only target JavaScript, not WebAssembly, right?

@ticki
Copy link
Contributor

ticki commented Sep 5, 2016

It targets JS, right, and apparantly the HEAD commit is broken. I'll fix it.

@brson brson removed the E-help-wanted Call for participation: Help is requested to fix this issue. label Sep 7, 2016
@brson
Copy link
Contributor Author

brson commented Sep 7, 2016

Oops, I didn't realize this was specifically about mir2wasm. I've opened another issue to track the progress on the LLVM-based wasm backend.

@alexcrichton
Copy link
Member

cc #36339, a PR for this.

@benaryorg
Copy link
Contributor

benaryorg commented Sep 12, 2016

@brson

Oops, I didn't realize this was specifically about mir2wasm. I've opened another issue to track the progress on the LLVM-based wasm backend.

Now I'm confused, didn't you start this issue specifically with LLVM in mind?

Edit: paragraph that made me think that:

With the new MIR->LLVM backend being all but done, and being cleaner than the old trans, we can seriously think about writing yet more backends. A relatively important, and relatively easy target to translate to is WebAssembly.

@brson
Copy link
Contributor Author

brson commented Sep 15, 2016

@benaryorg The MIR->LLVM backend is an enabler for MIR->anything because it is much simpler than the old AST->LLVM backend. It is the rust translation pass done right. So now we can take the lessons from mirtrans to make other backends that target other IRs, and hopefully factor out the common parts. It should get increasingly easier over time to create backends for Rust that don't target LLVM.

@brson brson added the O-wasm Target: WASM (WebAssembly), http://webassembly.org/ label Dec 31, 2016
@steveklabnik steveklabnik added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed A-compiler labels Mar 24, 2017
@WillemMali
Copy link

@brson I think this issue can be closed now, seeing how the WASM backend landed in stable half a year ago?

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jul 25, 2017
@alexcrichton
Copy link
Member

I'm going to close this in favor of #44006, and we've also got a wasm32-unknown-emscripten target for testing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests