Skip to content
Vladislav Ivanishin edited this page Mar 30, 2016 · 20 revisions

Design documentation

As of now, LLV8 can compile a rather large subset of JavaScript features including arithmetic, control flow operations, object manipulation, arrays and whatnot (although some cases remain unimplemented). The most challenging issues have been safepoints, relocations, deoptimization, inlining support and proper ABI. This document provides a high-level architectural overview and goes into some detail on each of these topics.

This document assumes the reader is already familiar with V8's architecture to some extent.

Overview

Basically LLV8 takes the Hydrogen graph with known type representations for all the values and loweres all the nodes to LLVM IR in the same fashion they have been normally lowered to Lithium. However since the code generation is now relinquished to another compiler we have little control over the process. We can modify LLVM's source, of course, and in some cases we do, but the point here is that we don't know exactly what the machine code will look like until after the compilation is finished. And it is the most prominent difference between LLV8's and Lithium's lowering processes. The information we want usually can be extracted after the LLVM IR has been compiled to native code from the generated Stack Map section.

LLV8 is supposed to be a third-tier compiler for V8. To enable this use-case one has to add execution counters to crankshafted code and tiering-up from Crankshaft. Right now LLV8 is used as an alternative second-tier backend.

Lowering

The lowering of Hydrogen to LLVM IR is for the most part a rewrite of src/x64/lithium-x64.cc and src/x64/lithium-codegen-x64.cc in terms of LLVM IR. Of course there are caveats (described in this document), but if one wants to implement the lowering of a new Hydrogen node, first thing they should do is look at these files.

Deoptimization

In order to properly perform OSR exit (eager deoptimization) we need to take care of two things:

  1. Generate the checks of speculative assumptions and jump to a certain routine which will replace the code if they do not hold;
  2. Determine the state the Full Codegen's stack VM should end up in after such transition.

The first is fairly straightforward (nothing new here).

We use the same mechanism of Environments (see LLVMEnvironment) as Crankshaft does to do the second. The locations of values (stack slot / register) Full Codegen is going to need are recorded into a Stack Map and after the compilation is finished a TranslationBuffer is filled with the values from the Stack Map and appended to the code object as usual.

TODO: dematerialized objects are not supported at the moment of this writing.

Relocations

For the most part we gracefully use the aforementioned Stack Map functionality, namely the patchpoint intrinsic for handling e.g. code relocations. We also do one gross thing (see #5), but it's not conceptual. Again, to get the relocation information we review what we've got after LLVM's code generation.

Safepoints

The sets of live values are computed using SSA representation (LLVM IR). We use the experimental gc.statepoint intrinsics and utility passes.

Inlining

Inlining itself is done by V8 on the Hydrogen graph (before it is passed to LLV8). The difficulty is only in de-inlining as a deoptimization happens. It is supported, and it is nothing novel. Same stuff as in Crankshaft.

Application binary interface

For some reason V8 uses a lot of different calling conventions for calling Stubs. They are implemented in our fork of LLVM. See here.

Another complication is interaction with the GC. V8's GC expects the stack frames to have a certain layout as it wants to walk the call stack and collect pointers. We've done some work to conform:

  • RPB, RSI (context) and RDI (the function) are always saved onto the stack
  • LLVM might use preallocation of stack slots and later mov callee's arguments to the slots rather than pushing the exact number of arguments onto the stack right before the call. We store the necessary information (number of parameters for each call) in safepoint tables and use it to avoid dereferencing garbage if the code is llvmed.

OSR Entry

LLVM only supports one entry point per function. FTL JIT solved this problem by having two slightly different copies of a function. We only compile a function once. An additional register is passed to the function indicating whether it's a normal call or an OSR entry.