The idea of specializing interpreters on programs is known for a long time. It's known to be described for the first time by Futamura in his seminal 1971 paper, Partial Evaluation of Computation Process -- An approach to a Compiler-Compiler, but also realized by Turchin(inventor of supercompilation) independently1.
In this short project I'm going to try two approaches of specializing interpreters on programs. I'm going to perform interpreter specialization using two different methods: Partial evaluation and multi-stage programming. The object language is Unlambda: A variant of SKI combinator calculus that has some extensions like "read a character", "write a character", "compare read character with X" and call/cc functions.
For the meta language with partial evaluator, I'm going to use Idris2. For the language with multi-staging constructs, I'm going to use MetaOCaml.
These two methods, partial evaluation and multi-stage programming, represent two ends of the program specialization tools spectrum: Partial evaluation is completely automated, it's not possible to give hints to or guide a partial evaluator3. In the other end of the spectrum is multi-stage programming, which is completely manual. A programmer must annotate program terms with constructs like MetaOCaml's "bracket", "escape"(or "splice"), and "run" to be able to generate code and run it4.
Simplest form of interpreter specialization is parsing the object program and generating meta language representations in specialization time. More advanced specializations involve doing reductions, evaluating open terms, evaluating branches of conditionals with dynamic conditions etc. similar to what's done in supercompilation. Main issue is to not specialize too much or too aggressively. We don't want our specializer to loop forever when we try to specialize a looping program. In case of evaluating open terms we can have looping specializer even if the program is guaranteed to terminate in runtime.
So we have to be conservative. Instead of finding and implementing an heuristic that terminates for all programs, in current implementation we have some tuning parameters for the users to specify how much to specialize. One can start with a very powerful specialization parameters, and disable some specializations if his/her program causes specializer to loop.
After having working specializers, I'll compare them for these:
-
Once we had an interpreter, how hard was it to convert it to a specializer? Note that by itself this question doesn't mean a lot: It may be equally simple to get some kind of specializer from an interpreter in both languages with very minimal changes, but those specializers will probably do very different specializations and will have different properties. Answer to this question may still give some ideas.
-
Termination checking is undecidable, which means we can't know if our specializers will loop forever in general and we should implement conservative specializers. Were we able to implement specializers with different powers? For example, we should be able to write specializers that
- Specializes very aggressively, but loops if input program loops.
- Specializes a lot, doesn't loop if program loops depending on a dynamic input. (e.g. we don't specialize branches of conditionals if the condition is dynamic, that is, not known in specialization time)
- Fairly conservative, but specializes some specified cases. (will be described later)
- Only parses the program in specialization time. This is most basic form of compilation. Always terminates.
Couple of notes before describing the implementation and results: What's meant by "specialization" is sometimes becoming confusing. Let's define it as eliminating interpretation costs. A more powerful specializer means eliminating more interpretation costs. Most of the time specialization itself is an optimization, but there are also optimizations that can be done in specialization time. In this project I try to have an optimizing specializer that does some transformations too.
Note that using a multi-stage programming it's sometimes possible to compile
object language programs to meta language programs, eliminating all the
interpretation costs. This is demonstrated in UnlambdaCompiler.ml
(OCaml
functions corresponding to Unlambda builtins are defined in UnlambdaFuns.ml
).
This is most powerful specialization possible, but since it's not an interpreter
specialization, it's outside of the scope of this paper.
Once one starts modifying interpreters for specialization it's not clear when to
stop to still have "specialized interpreter" and not a "compiler evaluated by
partial evaluator". If you change it too much, you end up implementing a
compiler, as I did in UnlambdaCompiler.ml
. (you can try it using -compile
parameter) In a multi-stage language, both versions similarly easy to use. In a
partial evaluator however, the story is little different, because you can't run
this "compiler" in runtime, so you need some kind of interpreter at hand at all
times.
At the time I did this writing, Idris was pretty unstable, and some of the bugs
actually effected the partial evaluator(see
#2234 as an example).
Also, printing partially evaluated programs in Idris is not easy. It doesn't
have any functions for that, so we need to load modules in REPL and inspect them
using :printdef
. outputs
directory in source repository keeps all the
outputs mentioned in the text.(I simplified them a lot and added some comments
to make them easier to read)
We start by implementing interpreters in both languages. OCaml implementations
is given in UnlambdaInterp.ml
, Idris implementation is eval
in
Unlambda.idr
.
This step is important for two reasons: 1) We make sure we get the semantics right 2) Because of the reasons described above, we need to be conservative in specializers. When we stop specializing, we need to call interpreter to continue evaluation. This is implementing using interpreters, e.g. we fall back to interpretation.
To make implementation of call/cc easier, we implement interpreters in continuation passing style, and save/restore interpreter continuations(which are represented as functions) when we see call/cc or continuation application.
Our interpreters have one state that persists even across continuation calls:
current character. @
function in Unlambda is used to read a character. That
character should then be saved. |
function is used to compare "current
character" with a given character, and branch accordingly. Thus we have an extra
argument for "current character" in our interpreters and specializers.
In the specializers we won't be only manipulating terms, we'll also be manipulating continuations. And when we want to fall back to the interpreter, we'll need to pass a continuation to it. This means we need to convert continuations of manipulated in specializers to continuations for interpreters.
This is tricky to do if we represent continuations as functions in our meta level operations. As a solution, we implement same interpreters using ADT implementation of continuations.
Aside: I remember having this problem before. Implementing call/cc with a CPS interpreter is easy enough, but sometimes you need something lower level, maybe you're writing a VM or compiling to C and sometimes it's not easy to see how many different continuations you need and how to compile/interpret them. In this project I came up with this solution: I already had CPS interpreter. I looked at the code, and looked at where did I create new continuations. There were 3 places where I created a new continuation, and no two of those were same continuations. Each continuation was capturing a value from the scope. So I ended up creating 3 continuation values like this:
data Continuation
= DelayGuard ExpS
| ApplyTo ExpS
| ApplyDelayed ExpS
Then in the interpreter each one of those handled with a code that looks almost identical to the continuation functions in CPS interpreter.
Now we can pass continuations from specializers to interpreters without any
issues. The interpreter and specializer will interpret continuations
differently. MetaOCaml implementation is in UnlambdaCont.ml
and Idris
implementation is eval_cont
in Unlambda.idr
.
Implementation of staged interpreter is given in UnlambdaStaged.ml
. It's based
on UnlambdaCont.ml
interpreter. Only differences are:
- We add brackets(
.< ... >.
) and escapes(.~
) for staging. - We add checks for optimizations parameters.
As an example, here's the code that evaluates an S
application in our
interpreter:
match e1 with
...
| S2_S (x, y) ->
eval (Backtick_S (Backtick_S (x, e2), Backtick_S (y, e2))) cc cont
...
And here's staged version:
match e1 with
...
| S2_S (x, y) ->
if opts.eval_S then
eval (Backtick_S (Backtick_S (x, e2), Backtick_S (y, e2))) cc cont
else
.< UnlambdaCont.eval (Backtick_S
(Backtick_S ( .~ (lift_exp_s x), .~ (lift_exp_s e2)),
Backtick_S ( .~ (lift_exp_s y), .~ (lift_exp_s e2))))
cc .~ (lift_conts cont) >.
...
If -eval-S
is given, then we continue specializing the term using staged
interpreter. Otherwise we generate a code that falls back to interpreter. This
case is exactly the same with our interpreter's handler for S
, except we add
brackets and lift
calls to be able to lift current terms to code values of
next stage.
We implement other specializations similarly. If we remove the tuning parameters, our staged interpreter exactly the same with non-staged version in most cases, we just wrap things with brackets, escapes and lifts5.
Some example executions:
-
Hello.unl
is a "hello world" program that prints "Hello world" a couple of times. If we run staged interpreter with-staged -eval-S
, and print generated code, our MetaOCaml implementations generates this code:let _ = Pervasives.print_char 'H' in let _ = Pervasives.print_char 'e' in let _ = Pervasives.print_char 'l' in let _ = Pervasives.print_char 'l' in let _ = Pervasives.print_char 'o' in let _ = Pervasives.print_char ' ' in let _ = Pervasives.print_char 'w' in let _ = Pervasives.print_char 'o' in let _ = Pervasives.print_char 'r' in let _ = Pervasives.print_char 'l' in let _ = Pervasives.print_char 'd' in let _ = Pervasives.print_char '!' in let _ = Pervasives.print_char '\n' in ... repeats a couple of times ... Syntax.I_S
With this specialization parameters, what we had in effect is that we compiled the Unlambda program to OCaml, we eliminated all the interpretation costs without actually writing a compiler. Another interesting thing is that we unrolled the loop, which we do all the time with
-eval-S
. (this is why it leads to loops in most programs) -
Trivial.unl
is a program that prints "Unlambda, c'est trivial!" forever. If we run it using-staged -eval-S
, it loops because the program is looping usingS
function. If we use-staged -eval-cc
, it generates the code listed inoutputs/trivial_metaocaml
. This code is not very different than-parse-only
generated code, because without-eval-S
, we just fall back to interpreter without trying to specialize arguments ofS
. I think this improvement can be implemented without too much trouble, but current version doesn't do this. -
eof-test.unl
is a program that tries to read a character. It prints 'T' when it succeeds and 'F' otherwise(e.g. EOF happens). With-staged -eval-S
it specializes the code a lot, but leaves the code with a conditional with non-specialized branches. We can use-eval-eof
parameter to specialize one branch of the conditional with the assumption that condition is true. Here's the generated code without-eval-eof
:UnlambdaCont.apply Syntax.I_S Syntax.V_S None [Syntax.ApplyTo (Syntax.S2_S ((Syntax.K1_S Syntax.C_S), (Syntax.S2_S ((Syntax.K1_S (Syntax.S1_S (Syntax.K1_S (Syntax.K1_S Syntax.K_S)))), (Syntax.S2_S (Syntax.S_S, (Syntax.K1_S (Syntax.K1_S (Syntax.K1_S Syntax.I_S))))))))); Syntax.DelayGuard (Syntax.Print_S 'F'); Syntax.DelayGuard (Syntax.Print_S 'T'); Syntax.DelayGuard Syntax.I_S; Syntax.ApplyTo (Syntax.Print_S '\n')]
It falls back to the interpreter. Here's the code generated with
-eval-eof
:let _ = Pervasives.print_char 'F' in let _ = Pervasives.print_char '\n' in Syntax.I_S
It evaluated the term in specialization time and generated OCaml code that prints 'F' and returns the value of the term.
To print generated code and run it -run
argument can be used. -parse-only
only parses the code in code-generation time. Generated program is an
interpreter and it's specialized for the given program, but no optimizations are
done.
Specializing Idris implementation of the interpreter is just a matter of adding
some [static]
annotations in interpreter functions. But adding parameters for
tuning specializations is a lot more tricky.
One might think that implementing a simple compilation like MetaOCaml
implementation's -parse-only
is easily possible by just adding [static]
to
parsing functions and not adding it to evaluator. However, for some reason,
doing that doesn't prevent Idris from partially evaluating the interpreter. For
example, in Unlambda.idr
, peHello
is compiled to a sequence of putchar
calls by the partial evaluator, even though we don't have [static]
in
eval
function. Similarly, peLoop
loops. I assume that this should have
worked and it's a bug in Idris.
Tuning the specializer is a lot tricky when compared with MetaOCaml
implementation: we just had to add a couple of conditionals and brackets. We
need to add some optimization functions that does the optimizations we want.
These functions should have type Exp -> (Exp, [Continuation])
and they have to
be terminating on all inputs. We then need to make sure those functions are
applied completely in compile time by the partial evaluator. This is the
approach used by Sperber and Thiemann in '96 paper(see 3).
For this purpose we implement eval_static
, which unlike the name doesn't
actually evaluate, it's an optimizer. It does the transformations depending on
the optimization parameters.
This approach of applying optimizers statically using partial evaluation may
look OK at first sight, but it's actually a lot less flexible. As an example,
see implementation of -eval-eof
optimization in apply_static
in
Unlambda.idr
. We can evaluate the branch statically, but incorporating the
result is tricky. We need to generate a term of object language that reads a
character, takes the optimized branch if it's failed, and takes the other branch
otherwise. Thankfully, in this case this is possible, because object language is
expressive enough for this. The problem is, evaluating that term takes more than
20 steps. If we didn't do a lot of optimizations in EOF branch, then we may end
up de-optimizing the program by replacing a small term with a big one that takes
a lot more to reduce. (see also comments in Idris file)
What we did in MetaOCaml is that we generated OCaml code that does this, instead of generating object level terms. Here is the generated code:
let c_1 = try Some (Pervasives.input_char Pervasives.stdin) with | _ -> None in
match c_1 with
| None ->
let _ = Pervasives.print_char 'F' in
let _ = Pervasives.print_char '\n' in Syntax.I_S
| Some _ ->
UnlambdaCont.apply Syntax.I_S Syntax.I_S c_1
[Syntax.ApplyTo
(Syntax.S2_S
((Syntax.K1_S Syntax.C_S),
(Syntax.S2_S
((Syntax.K1_S
(Syntax.S1_S (Syntax.K1_S (Syntax.K1_S Syntax.K_S)))),
(Syntax.S2_S
(Syntax.S_S,
(Syntax.K1_S (Syntax.K1_S (Syntax.K1_S Syntax.I_S)))))))));
Syntax.DelayGuard (Syntax.Print_S 'F');
Syntax.DelayGuard (Syntax.Print_S 'T');
Syntax.DelayGuard Syntax.I_S;
Syntax.ApplyTo (Syntax.Print_S '\n')]
But in partial evaluation our optimizers have to return object language terms. We're not generating Idris code that represents runs optimized procedures and returns same value that our object language term returns, like we did in MetaOCaml.
It should be possible to do exactly the same thing using partial evaluation, but it's not clear to me how to do this. In the worst case we can cheat and add a new term to the object language that does whatever we want, and then generate that syntax in optimizers.
Idris implementation has optimized and non-optimized versions of programs we described in MetaOCaml section. However, it optimizes less because of the limitation mentioned above, and generated code is very hard to read and understand, because unlike MetaOCaml, Idris is printing internal representation of Idris AST instead of programs written in concrete Idris syntax.
MetaOCaml implementation was almost trivial. After the initial interpreter I followed these steps:
- I implemented a multi-stage version by only adding some bracket/escapes to original interpreter.
- It worked fine, but interpreter was looping most of the time. I decided to stop staged interpreter and generate code that calls normal interpreter in some places to avoid looping.
- This wasn't possible because two interpreters were using different representations of continuations. Defined an alternative representation of continuations to be able to evaluate them in both interpreters.
- Added a couple of conditionals to tune optimizations.
Staging constructs leave no room for confusion -- it's trivial to see when we're falling back to the interpreter, when we're specializing sub-terms and combining results etc.
In the partial evaluation side, I had two major problems. First one is related with the idea itself. Partial evaluation is completely automated. It's not easy to see what will the generated code be, or when a partial evaluator loops.
For example, peHello
loops, even though I'd expect it to just parse in
compilation time and doesn't do any further processing.
To make sure partial evaluation is doing the work I'm expecting it to do, I need to print the code and make sure.
In the end, it seems to me like partial evaluation is too complicated to completely automate.
Second problem is that even though staged interpreters(as described in 3) help with getting optimizations done in partial evaluation, I think it's still more limited than multi-stage programming approach, as already demonstrated in the case of optimized branches. Even if it's possible to optimize this case like in MetaOCaml implementation, it's not clear how to do so, whereas in MetaOCaml it was trivial.
Footnotes
-
Mentioned in Intorduction to Supercompilation, Sørensen and Glück, without referencing Turchin's papers. It's not clear when Turchin realized the idea and whether he published it or not. EDIT: According to A Roadmap to Metacomputation by Supercompilation by Robert Glück and Morten Heine Sørensen, this is described in Turchin's book "Basic Refal and its implementation on computer". The name is a translation; it's published in Russian. The book is published in 1977. (see the paper for original name of the book) ↩
-
But it's possible to guide a partial evaluator for some program transformations, using staged interpreters. We'll explore this in this experiment and it's also described in Realistic Compilation by Partial Evaluation, Sperber and Thiemann. ↩ ↩2 ↩3
-
There are some alternatives to this approach. One example is "lightweight modular staging", described in Lightweight Modular Staging, A Pragmatic Approach to Runtime Code Generation and Compiled DSLs, Rompf and Odersky, which allows staging by changing type annotations. TODO: Improve this part. IIRC, LMS also needs some kind of overloading to be able to represent some terms as code. ↩
-
One thing to note here is that explicit lifting should not be necessary, and it's adding a lot of noise to the code. But in practice MetaOCaml is failing to print lifted(CSP) values. The whole story is long and a bit complicated, so let me just refer interested readers to the related Caml-list discussion: Starts here, and continues here. ↩