Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future of Emterpreter/Asyncify #8561

Closed
kripken opened this issue May 8, 2019 · 4 comments

Comments

Projects
None yet
2 participants
@kripken
Copy link
Member

commented May 8, 2019

Background

The Emterpreter and Asyncify features help with problem of synchronous source code that can't run synchronously on the Web, like a synchronous network access in C, etc. In general main loops can be rewritten using the emscripten_set_main_loop method, but that doesn't work for many other things, where it would just be too much work to rewrite code with many synchronous operations.

Asyncify

Asyncify transforms the CFG at the LLVM level. It makes it possible to jump from the program entry to any location that an async operation might happen, so that you can leave the function and resume there later. It also can save and reload all the values in LLVM's virtual registers.

Asyncify works well, but on large functions every new edge that is added can lead to a great many phis, exponentially so in bad cases. This can lead to very large amounts of code.

Emterpreter

The second effort in this space, the Emterpreter, aimed to avoid Ayncify's worst-case code size blowups by running interpreted bytecode. There is then a guarantee that the bytecode is of similar size to the original code (usually smaller, in fact). While the code can run much slower, the hope was that projects can selectively emterpret only the necessary code, for example, the main loop might be emterpreted but physics simulation code that it calls does not need to, if async operations do not happen during physics. And a bytecode made sense because it also allowed experimenting with other things like fast startup (avoiding JS parse time or asm.js AOT time).

The Emterpreter works fairly well, and it does turn out that selective emterpretation is practical in many cases. However, the other use cases for a bytecode like fast startup have become irrelevant in time, as browsers have improved their startup speeds for JS and wasm.

The current challenge

We are close to switching our backend from fastcomp to the upstream LLVM backend. Both Asyncify and the Emterpreter can't work with that backend, Asyncify because it's an LLVM pass inside fastcomp, and the Emterpreter because it runs on asm.js output.

In general we hoped that an async solution for the wasm backend may not be needed, if threads show up fast enough, as they allow pausing and resuming execution at least on background threads. However, Spectre has slowed adoption of threads, with only Chrome shipping it currently, and only on desktop. And in any case, only allowing it on the main thread is limiting as well.

One option here might be to compile wasm to JS using wasm2js, then run the normal Emterpreter code on that. A problem though is that it might run more slowly than the current Emterpreter, since wasm2js does not emit validating asm.js (it emits more flexible JS in order to be smaller and support more wasm features). Running the emterpreted code more slowly might be fine, but it also means running the rest more slowly, unless we split it out and don't convert all the original program to JS, which is possible but not trivial. So we don't have a quick solution here, and must do at least some amount of actual work.

Proposal

We can implement a new solution in Binaryen, using the lessons learned from Asyncify and the Emterpreter:

  • A Binaryen pass can break up functions into basic blocks, emulating control flow between them using a loop in a switch. This allows leaving and resuming code fairly easily. We can then also add the code to save and load the locals as needed.
  • Doing this at the Binaryen level means we don't need to add phis. We will have mostly the same amount of locals as before, with the addition of one for the control flow emulation, and extra ones to handle corner cases of nested control flow that we must flatten out. But code size and local counts will probably not increase by more than a factor of 2 or so, so this should be smaller and more efficient than Asyncify (which had the right idea overall, but doing it at the LLVM level was the problem.)
  • As a future optimization, we can emulate control flow only around calls where an async operation might begin. But based on the Emterpreter experience, focusing on speed may not be that crucial here, as users can selectively apply this transformation only to the necessary code. And this will be much faster than the Emterpreter even if it emulates all control flow, since each basic block will run at full speed.
@jgravelle-google

This comment has been minimized.

Copy link
Collaborator

commented May 8, 2019

Sounds fairly reasonable?

One thought is we could transform (pseudocode)

fn foo(x, y) {
  [code]
}

into something like

fn foo(x, y) {
  return foo_resumable(x, y, 0);
}
fn foo_resumable(x, y, pc) {
  [transformed code]
}

which should let us model the storage of locals externally, re-inject them on resume, and not have to modify calls to foo elsewhere in the module. In theory that external local storage could let us do something something coroutines. I haven't given this too much thought so, caveat emptor.

Another idea was: is there a reason not to reimplement the emterpreter, but use wasm as the bytecode? Meaning, selectively interpret specific functions already present in the module. In theory this lets us run the wasm either natively or in an interpreter, but in practice I'm not sure if we'd ever want to do that?

@kripken

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

@jgravelle-google yeah, a wrapper that looks like the original function is indeed useful for keeping things working as they did before. That's how the Emterpreter works, more or less (and I think Asyncify, but not sure).

Using wasm as the bytecode is an option, but it would be slower than the proposal here, and it's not a small interpreter in terms of number of instructions, so my guess is it would be more work. But yeah, it would be cool since you could maybe even reuse the downloaded wasm, somehow...

@kripken

This comment has been minimized.

Copy link
Member Author

commented May 18, 2019

Thinking some more, I believe we can do better than the loop-switch part - there is something both simpler to implement and more efficient to run, that keeps wasm as structured as possible. Basically it adds ifs in the right places to avoid running code when pausing/resuming, but otherwise leaves the shape as is. I have no code yet but in my head I'm referring to this as "bysyncify" (for the obvious reason, and also nice it starts with a "b", as a successor to "asyncify").

kripken added a commit to WebAssembly/binaryen that referenced this issue Jun 15, 2019

Bysyncify: async transform for wasm (#2172)
This adds a new pass, Bysyncify, which transforms code to allow unwind and rewinding the call stack and local state. This allows things like coroutines, turning synchronous code asynchronous, etc.

The new pass file itself has a large comment on top with docs.

So far the tests here seem to show this works, but this hasn't been tested heavily yet. My next step is to hook this up to emscripten as a replacement for asyncify/emterpreter, see emscripten-core/emscripten#8561

Note that this is completely usable by itself, so it could be useful for any language that needs coroutines etc., and not just ones using LLVM and/or emscripten. See docs on the ABI in the pass source.
@kripken

This comment has been minimized.

Copy link
Member Author

commented Jul 3, 2019

Bysyncify is basically complete now.

@kripken kripken closed this Jul 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.