Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Future of Emterpreter/Asyncify #8561
The Emterpreter and Asyncify features help with problem of synchronous source code that can't run synchronously on the Web, like a synchronous network access in C, etc. In general main loops can be rewritten using the
Asyncify transforms the CFG at the LLVM level. It makes it possible to jump from the program entry to any location that an async operation might happen, so that you can leave the function and resume there later. It also can save and reload all the values in LLVM's virtual registers.
Asyncify works well, but on large functions every new edge that is added can lead to a great many phis, exponentially so in bad cases. This can lead to very large amounts of code.
The second effort in this space, the Emterpreter, aimed to avoid Ayncify's worst-case code size blowups by running interpreted bytecode. There is then a guarantee that the bytecode is of similar size to the original code (usually smaller, in fact). While the code can run much slower, the hope was that projects can selectively emterpret only the necessary code, for example, the main loop might be emterpreted but physics simulation code that it calls does not need to, if async operations do not happen during physics. And a bytecode made sense because it also allowed experimenting with other things like fast startup (avoiding JS parse time or asm.js AOT time).
The Emterpreter works fairly well, and it does turn out that selective emterpretation is practical in many cases. However, the other use cases for a bytecode like fast startup have become irrelevant in time, as browsers have improved their startup speeds for JS and wasm.
The current challenge
We are close to switching our backend from fastcomp to the upstream LLVM backend. Both Asyncify and the Emterpreter can't work with that backend, Asyncify because it's an LLVM pass inside fastcomp, and the Emterpreter because it runs on asm.js output.
In general we hoped that an async solution for the wasm backend may not be needed, if threads show up fast enough, as they allow pausing and resuming execution at least on background threads. However, Spectre has slowed adoption of threads, with only Chrome shipping it currently, and only on desktop. And in any case, only allowing it on the main thread is limiting as well.
One option here might be to compile wasm to JS using wasm2js, then run the normal Emterpreter code on that. A problem though is that it might run more slowly than the current Emterpreter, since wasm2js does not emit validating asm.js (it emits more flexible JS in order to be smaller and support more wasm features). Running the emterpreted code more slowly might be fine, but it also means running the rest more slowly, unless we split it out and don't convert all the original program to JS, which is possible but not trivial. So we don't have a quick solution here, and must do at least some amount of actual work.
We can implement a new solution in Binaryen, using the lessons learned from Asyncify and the Emterpreter:
Sounds fairly reasonable?
One thought is we could transform (pseudocode)
into something like
which should let us model the storage of locals externally, re-inject them on resume, and not have to modify calls to foo elsewhere in the module. In theory that external local storage could let us do something something coroutines. I haven't given this too much thought so, caveat emptor.
Another idea was: is there a reason not to reimplement the emterpreter, but use wasm as the bytecode? Meaning, selectively interpret specific functions already present in the module. In theory this lets us run the wasm either natively or in an interpreter, but in practice I'm not sure if we'd ever want to do that?
@jgravelle-google yeah, a wrapper that looks like the original function is indeed useful for keeping things working as they did before. That's how the Emterpreter works, more or less (and I think Asyncify, but not sure).
Using wasm as the bytecode is an option, but it would be slower than the proposal here, and it's not a small interpreter in terms of number of instructions, so my guess is it would be more work. But yeah, it would be cool since you could maybe even reuse the downloaded wasm, somehow...
Thinking some more, I believe we can do better than the loop-switch part - there is something both simpler to implement and more efficient to run, that keeps wasm as structured as possible. Basically it adds ifs in the right places to avoid running code when pausing/resuming, but otherwise leaves the shape as is. I have no code yet but in my head I'm referring to this as "bysyncify" (for the obvious reason, and also nice it starts with a "b", as a successor to "asyncify").