2.7 FEXPR vs. vtable

FEXPR vs. vtable: how LispE evaluates instructions

The LispE paradox: it is an interpreter of compiled objects.

Ah!!! The famous opposition between interpreter and compiler. On one side, you keep the original code, which buys you flexibility and malleability, with little room for machine-level optimization. On the other, compilation hands you an efficient binary, but one horribly local to the machine it was compiled for. At least, you can usually optimize in every direction, though the code is forever frozen in its form. Except that for a Lisp, that's a bit of a shame. Building an ultra-optimized Lisp but losing homoiconicity along the way is not exactly optimal.

But LispE is a walking paradox, because LispE refuses this choice: it elects to be an interpreter of compiled objects. I know, put that way it sounds very hard-boiled. And yet... What LispE proposes is to turn each instruction of the language into a derived class that exposes an eval method. Every class has its own override of eval. And since all these classes derive from a single mother class with an original and evocative name, Element, you can build a C++ vector that holds them all. And that is where the miracle happens: the AST is kept alive, but each node is an object that can be optimized to the hilt with the full C++ arsenal.

The claim of this text is that this miracle rests on a single equivalence, which the rest of the argument makes precise and defends:

datum:        f()           ⟺ F.eval()
instruction:  f(a₁,..,aₙ)   ⟺ F(a₁,..,aₙ).eval()

where F is an instance of a class derived from a universal root class, holding its arguments as sub-nodes, and eval is a method without arguments, uniform across the whole hierarchy. A datum is the zero-arity case, an instruction with no sub-node whose eval returns itself. We will see that the equivalence holds only if F is immutable, and that everything LispE does with its instructions follows from this.

But first, let us go back in time, back to a far-distant era when people asked real questions about Lisp, back to the debate over FEXPRs.

1. What is a FEXPR?

In the earliest Lisps (Lisp 1.5, MacLisp), an operator could be of two kinds:

an EXPR: an ordinary function, whose arguments are evaluated before the call (applicative order);
a FEXPR: a function that receives its arguments unevaluated, the raw form, and that decides itself, at runtime, what to evaluate, calling eval back as needed.

A FEXPR is therefore a runtime mechanism: it is the operator itself that drives the evaluation of its operands. This is very expressive, you can write if, and, quote, loops… as plain FEXPRs, but this expressiveness comes at a prohibitive cost: opacity to the compiler.

;; The idea of a FEXPR: receives (test then else) UNevaluated
(fexpr my-if (test then else)
  (if (eval test) (eval then) (eval else)))

Because it is the FEXPR that decides at runtime which arguments to evaluate, and how, the static analyzer can know nothing about the form: no inlining, no optimization, no early resolution. Every call becomes a black box. This is Kent Pitman's central argument (Special Forms in Lisp, 1980), and the reason why historical Lisps moved away from user FEXPRs to preserve compilability. LispE answers the same constraint, but by a different route, and the route is the equivalence above.

2. The equivalence f(a₁,..,aₙ) ⟺ F(a₁,..,aₙ).eval()

The ordinary way to read function application is functional: f(a₁,..,aₙ) denotes the value obtained by applying f to its arguments. The object-oriented reading says the same thing differently: build an object F holding those arguments, and send it the message eval. The two are not analogous, they are the same act under two notations. Applying a function to its arguments is constructing an object from them and asking it to evaluate itself.

This reverses the usual hierarchy. We tend to think of objects as a way to implement functions, reifying f into an F so it can be carried around. The equivalence says the opposite: f(a₁,..,aₙ) is the surface notation, and F(a₁,..,aₙ).eval() is the underlying form, an object that carries its own evaluation rule. The (op . args) of Lisp, the F(args) of C++, and the f(args) of the mathematician are three spellings of Element(args).eval().

A subtlety must be stated, because it closes the model. A datum is not an object of a different nature than (+ a b). It is an instruction whose evaluation rule is the identity: X.eval() returns X. In the notation above, it is the zero-arity case, f() ⟺ F.eval(), an instruction with no sub-node whose evaluation produces the instance itself. The data is the degenerate case of the instruction, the fixed point of eval. This is what lets a single root class Element cover everything: a list whose evaluation descends, and a value whose evaluation returns itself, are two regimes of one type. An instruction is a std::vector<Element*>, each cell an Element that knows how to evaluate itself, leaf or sub-list alike. No tag distinguishes "code" cells from "data" cells, because there is none to make. This is homoiconicity realized in C++: not "code looks like data," but "data is the minimal code, the one whose evaluation is a fixed point."

Why the equivalence requires F to be immutable

The equivalence is exact only under one condition: F must be immutable. The reason is referential transparency. f(a₁,..,aₙ) when evaluated twice yields the same result, and you can substitute the value for the expression anywhere. Hence for F(a₁,..,aₙ).eval(), eval must not modify F. If evaluation mutated the object or its sub-nodes, the second call would see a different state from the first, and the equality would break on the object side. An eval that mutates is no longer a function but a side-effecting procedure, and the whole construction collapses back into the imperative.

One distinction must be drawn here: this purity is a property of the interpreter, not of the LispE programs it runs. The methods that evaluate the tree are pure functions while the program they execute may well be impure. There is no contradiction: a pure evaluator can run an impure language.

So immutability is not an implementation refinement, it is the precondition for eval to be a function at all. And it is what makes the equivalence richer than f(a₁,..,aₙ) rather than merely equal to it. Because F is immutable, the instance persists: the node (* a b) exists as a stable object in the AST, and a loop that runs it a thousand times sends eval to the same instance a thousand times. One instance, a thousand evaluations. A naive interpreter that re-decides the form on every pass cannot do this; here the form is decided once and frozen in the type, so re-execution changes nothing. The thousand-and-first eval starts from the same state as the first. What changes from one turn to the next is not F but the execution context: F is the invariant, the stack is the variant, and eval is the pure function that reads one and advances the other. The instruction itself is stateless, it holds no mutable state of its own; all the state lives in the stack, outside the instruction.

Note that eval itself takes no argument other than the interpreter. The arguments a₁,..,aₙ are not passed to eval; they are recorded in the instance at composition, as sub-nodes. F(a₁,..,aₙ).eval() is the accurate form: F holds its sub-nodes, each an immutable Element, and eval, uniform across the hierarchy, evaluates each according to its own regime, fixed point for a constant, resolution on the stack for an atom, descent for a sub-instruction. This uniformity of eval is precisely what makes dispatch by C++ vtable possible: one virtual method, one signature, overridden per class.

3. How LispE implements its instructions

The substrate follows directly. What LispE evaluates is an abstract syntax tree whose every node is an Element. On input, each form (op …) is compiled into an instance of a subclass of List chosen according to op:

(cons a b)   ->  instance of List_cons_eval
(+ a b)      ->  instance of List_plusn
(defun f …)  ->  calls to f compiled into List_function_eval
(defpat p …) ->  calls to p compiled into List_pattern_eval

Each such instance is immutable and carries, in its very type, its evaluation rule. Evaluating the tree is then a matter of sending eval to each node, which dispatches through the host's virtual method table to the right overridden eval. There is no central evaluator that inspects the form and branches; the tree evaluates itself, each node carrying its own rule. This is exactly what a vtable is, a dispatch distributed in the objects rather than centralized in a loop, and it is the precise opposite of the FEXPR, which is the archetype of centralized runtime evaluation.

evall_block makes the descent concrete. A block is a list of forms evaluated in sequence; its value is that of the last one:

Element* List::evall_block(LispE* lisp) {
    long listsize = liste.size();

    if (listsize == 2)
        return liste[1]->eval(lisp);

    Element* element = null_;

    for (long i = 1; i < listsize && element->type != l_return; i++) {
        element->release();
        element = liste[i]->eval(lisp);
    }

    return element;
}

The loop is the traversal itself: each liste[i]->eval(lisp) dispatches on its node's own type, no inspection needed on the parent side. Because the instances are immutable, this very block can be evaluated again and again, inside a loop, without reconstruction. Each intermediate result is freed in passing by element->release() as soon as the next supersedes it; the instructions themselves, being immutable, are never touched.

The argument that closes section 1 is now visible. The reproach made to the FEXPR was its opacity to the compiler: because it decides at runtime what it evaluates, no static optimization is possible. LispE takes the inverse path: because it decides at composition, freezing each instruction into a native C++ class, its code becomes ordinary code, exposed to the compiler like any other C++. Where the FEXPR makes optimization limited, the compilation of instructions allows it, for the same reason inverted, the moment the decision is made. The "interpreter of compiled objects" is exactly this: the instance is the immutable functional object F, and its class carries native C++ code, so that the same artifact is at once a pure functional term and an object optimized down to the silicon.

The same statelessness that lets a loop reuse one instance lets several threads share it. Since instructions hold no mutable state, the AST can be evaluated concurrently with no copying and no locks on the tree: give each thread its own stack and the immutable instructions are read-only, shared safely by construction. What would otherwise be the hard part of a parallel interpreter, protecting shared evaluation state, simply does not arise, because there is no shared mutable state to protect. Concurrency falls out of the same decision as purity and re-execution: the instruction is the invariant, the stack is the per-thread variant.

4. eval is only one facet

F(a₁,..,aₙ).eval() is one facet of the instance, not its definition. The same immutable object answers other messages. asString() produces its textual form; unify() matches it against another term as a pattern. Nothing forces these to be a single method, and nothing forces the instance to exist only to be evaluated. One C++ object, several behaviors, each selected by the method you send, all reading the same immutable data.

This is why the same Element hierarchy serves evaluation, serialization, and unification without three separate representations. A number knows how to evaluate itself (return itself), how to print itself, and how to unify with another term (match if equal). A List_cons_eval knows how to evaluate (build a pair), how to print ((cons a b)), and how to unify (match head against head, tail against tail). The instruction is not a procedure that evaluation runs; it is an object that happens to support evaluation among other things, the way it supports printing. eval is the facet this article is about, but it sits beside the others on equal footing, and they all rest on the same fact: the instance is immutable, so every facet reads it without fear of another having changed it.

This is, in effect, the same idea as Haskell's type classes. The root Element declares a set of methods that every derived class is expected to override:

u_ustring asUString(LispE* lisp);
Element* equal(LispE* lisp, Element* e);
Element* less(LispE* lisp, Element* e);
Element* lessorequal(LispE* lisp, Element* e);
Element* more(LispE* lisp, Element* e);
Element* moreorequal(LispE* lisp, Element* e);

(and there are others). asUString is Show, the way a value renders to text. equal is Eq. The four ordering methods are Ord. Where Haskell splits these contracts into separate classes that a type implements one by one, with instance Show, instance Eq, instance Ord, LispE folds them into the mother class and requires every derived class to fill them in. Same principle, a type supplying one behavior per operation, with a different grouping: type classes resolve the instance from the type, LispE resolves it through the vtable. And as before, the facets coexist only because the object underneath never changes.

//Comparing two elements
Element* List_eq_eval::eval(LispE* lisp) {
    Element* first_element = liste[1]->eval(lisp);
    Element* second_element = liste[2]->eval(lisp);

    //Same pointer 
    //or comparison according to first_element underlying implementation
    bool test = ( (first_element == second_element) || first_element->egal(second_element));

    second_element->release();        
    first_element->release();
    return booleans_[test];
}

5. The dynamic case: the explicit table evals[]

There remains the code built on the fly:

(eval (list 'cons 'a 'b))

Here, there is no List_cons_eval object: the list is built at runtime, its type cannot be known at compilation, and C++'s vtable, which assumes a static type, can therefore dispatch nothing. It is then List::eval that takes over, via an indexing table evals[] that LispE manages itself, indexed by the label of the instruction:

Element* List::eval(LispE* lisp) {
    Element* e = (this->*lisp->delegation->evals[lisp->checkState(this, liste.size())])(lisp);
    ...
}

checkState returns the first label of the instruction as a 16-bit integer; it also takes the opportunity to check the interpreter's state.

evals[t_atom]     = &List::eval_call_function;
evals[t_function] = &List::evalt_function;
evals[t_pattern]  = &List::evalt_pattern;
evals[t_lambda]   = &List::evalt_lambda;
...

The decisive point is that this is not a second, centralized mechanism opposed to the first. The vtable and the evals[] table do the same work, indexing a type to a method, under two different regimes. When the type is known at compilation, C++ provides its dispatch vtable implicitly. When the type is resolved only at runtime, this implicit mechanism no longer applies, and LispE provides the same table explicitly, evals[], indexed by hand. Dispatch remains distributed by type in both cases; only who holds the table changes. Actually, each instruction definition feeds the two mechanisms at the same time, which ensures that the evals[] table will be in sync with the class implementations.

And the explicit table converges toward the vtable. On this path, eval_list_instruction borrows, in the buffer-sharing sense without copy, a List_cons_eval object to evaluate the form:

Element* List::eval_list_instruction(LispE* lisp) {
    List* l = lisp->borrowing((Listincode*)this, size());
    Element* e;
    try { 
        e = l->eval(lisp); 
    }
    catch(Error* err) { 
        lisp->relax(l); 
        return lisp->check_error(this, err, -1); 
    }
    lisp->relax(l);
    return e;
}

The explicit table is therefore only a preamble that reconstructs, for the dynamic case, the specialized object that composition could not create, after which we rejoin the normal vtable regime. Functionally, this path receives the raw form and drives its evaluation: it is the behavior of a FEXPR, but confined to the dynamic case alone, not exposed as an extension mechanism, and which does not oppose the distributed dispatch but serves it, by bringing the on-the-fly code back into it. The table is a complete net rather than a catch-all: evals[] is initialized everywhere to eval_error, and only the valid entries are filled in, so a list head of a non-evaluable type falls back on a clean error, never on undefined behavior.

6. Conclusion

LispE rests on the equivalence f(a₁,..,aₙ) ⟺ F(a₁,..,aₙ).eval(), valid because F is immutable. From that single identity everything follows: each instruction is an immutable instance of a class derived from Element, carrying its own eval; the AST is kept alive as a tree of such instances and evaluates itself by virtual dispatch; an instance can be run a thousand times in a loop because nothing in it changes, only the surrounding context advancing; and where the type is not known until runtime, the same type-indexed dispatch is supplied explicitly by the evals[] table, which reconstructs the specialized object and rejoins the vtable. The FEXPR decided late and stayed opaque; LispE decides at composition and stays transparent to the compiler. That is why it can be, without paradox, an interpreter of compiled objects.

2.7 FEXPR vs. vtable

FEXPR vs. vtable: how LispE evaluates instructions

1. What is a FEXPR?

2. The equivalence f(a₁,..,aₙ) ⟺ F(a₁,..,aₙ).eval()

Why the equivalence requires F to be immutable

3. How LispE implements its instructions

4. eval is only one facet

5. The dynamic case: the explicit table evals[]

6. Conclusion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!