<br><br><br><br><br>

# Compiling and transpiling to another language

<br><br><br><br><br>

<br><br><br>

In the previous tutorial, we **interpreted** our little language, rather than **compiling** it.

The interpreter simply did in Python what was written in the new language—we filled the symbol table with functions like:

```python
builtins["+"] = lambda x, y: x + y
```

Have we accomplished anything?

<br><br><br>

<br><br>

What we'll see in this tutorial is that a compiler isn't much more substantial than a translator, either. Ultimately, it just replaces each `+` AST node with the machine instruction for `+`.

<br>

**Programming languages do not perform actions. They only express a user's intention in another layer of abstraction.**

<br>

In our interpreter, there were 6 layers of abstraction:

<center>transistor gates → machine code → C compiler → Python implementation (C code) → our interpreter → our little language</center>

<br><br>

**Interpreters vs compilers:**

   * An **interpreter** walks over the AST (or even parses the source code!) at runtime.
   * A **compiler** serializes the AST into a state machine or a sequence of instructions, virtual or physical.
   * A **transpiler** serializes the AST into code in another human-readable language. (Subjective: what's human-readable?)

**Compilation targets:**

   * A **finite state machine** is a graph of executable steps _only_ (not a full interpreter). Regular expressions are often compiled to finite state machines. A non-recurrent neural network is also a finite state machine.
   * A **push-down machine** is a state machine with a stack of memory—parsers are almost always push-down machines.
   * A **virtual machine** is a complete processor+memory driven by a sequence of instructions, like a physical computer, but implemented in software. Python and Java are not interpreters: they _compile_ their source code to _virtual machines._
   * A **Von Neumann machine** is a physical computer driven by a sequence of instructions.

**Other:**

   * **FPGA/ASIC:** physical computer consisting of raw gates, not instructions; Verilog isn't _compiled_ like C, it's _synthesized_.

<br><br><br>

The general flow of a compiler is

<center style="margin-top: 20px; margin-bottom: 20px"><b>linear (source code) → tree-like data structure (AST) → linear (instructions or other source code)</b></center>

In this notebook, we'll write a transpiler, converting our little language into C++.

<br><br><br>

In [1]:
# First of all, did you know that you can do this?
import ROOT

ROOT.gInterpreter.Declare("""
double new_function(double x, double y) {
    return sqrt(x*x + y*y);
}""")

ROOT.new_function(3, 4)

Welcome to JupyROOT 6.17/01


5.0

In [2]:
# And what about this?
import pycparser.c_parser, pycparser.c_generator    # pure Python C99 compiler/toolkit

c_parser = pycparser.c_parser.CParser()
ast = c_parser.parse("double f(double x) { return x*x; }")
ast.show()

FileAST: 
  FuncDef: 
    Decl: f, [], [], []
      FuncDecl: 
        ParamList: 
          Decl: x, [], [], []
            TypeDecl: x, []
              IdentifierType: ['double']
        TypeDecl: f, []
          IdentifierType: ['double']
    Compound: 
      Return: 
        BinaryOp: *
          ID: x
          ID: x


In [3]:
c_generator = pycparser.c_generator.CGenerator()

ast = c_parser.parse("double f(double x) { return x*x; }")

print(c_generator.visit(ast))

double f(double x)
{
  return x * x;
}




We can compile and run C++ code in ROOT (Cling) and we have a general C99 AST in a Python library (pycparser). The compilation chain could look like this:

<center style="margin-top: 20px; margin-bottom: 20px"><b>our source language → our AST → C99 AST → C++ source code → compile and run in ROOT</b></center>

We can compile and run C++ code in ROOT (Cling) and we have a general C99 AST in a Python library (pycparser). The compilation chain could look like this:

<center style="margin-top: 20px; margin-bottom: 20px"><b>our source language → <font color="darkorange">our AST → C99 AST</font> → C++ source code → compile and run in ROOT</b></center>

Why not output C++ strings directly from our AST?

We can compile and run C++ code in ROOT (Cling) and we have a general C99 AST in a Python library (pycparser). The compilation chain could look like this:

<center style="margin-top: 20px; margin-bottom: 20px"><b>our source language → <font color="darkorange">our AST → C99 AST</font> → C++ source code → compile and run in ROOT</b></center>

Why not output C++ strings directly from our AST?

It's hard to compose source code strings properly and it's hard to debug them. ([Coffeescript famously skipped this step.](https://www.kickstarter.com/projects/michaelficarra/make-a-better-coffeescript-compiler) :)

<img src="img/coffeescript-rise-and-fall.png" width="90%">

In [4]:
import lark        # a grammar for Dirac's bra-ket notation!
grammar = """
start:  term
term:   factor [term]
factor: bra "|" ket | bra "|" operators "|" ket
bra:    "<" value
ket:    value ">"

operators: operator [operators]
operator: spinor | operator "*" -> conjugate
spinor:   "σ₁" -> s1 | "sigma1" -> s1 | "σ₂" -> s2 | "sigma2" -> s2 | "σ₃" -> s3 | "sigma3" -> s3

value:   CNAME | complex complex
complex: NUMBER -> real | NUMBER "i" -> imag | "i" -> imag1
       | NUMBER "+" NUMBER "i" -> complex | NUMBER "+" "i" -> complex1

%import common.CNAME
%import common.NUMBER
%import common.WS
%ignore WS
"""
parser = lark.Lark(grammar)

In [5]:
# the cute thing about bra-ket notation is that it turns type errors into syntax errors
print(parser.parse("<0 1| σ₁* σ₂ |0 1>").pretty())

start
  term
    factor
      bra
        value
          real	0
          real	1
      operators
        conjugate
          operator
            s1
        operators
          operator
            s2
      ket
        value
          real	0
          real	1



In [6]:
# I said that it's unwise to turn our AST directly into strings of the output language.
# That's because strings are hard to COMPOSE. Don't avoid using strings to create or understand ASTs.

from pycparser.c_ast import *

def c_ast(c_code):
    return c_parser.parse("void f() {" + c_code + ";}").ext[0].body.block_items[0]

ast = c_ast("x * y")
ast.right.name = "z"
ast

BinaryOp(op='*',
         left=ID(name='x'
                 ),
         right=ID(name='z'
                  )
         )

In [7]:
def mul(x, y): return BinaryOp("*", x, y)                             # helper functions
def call(f, args): return FuncCall(f, ExprList(args))
def transpileall(args, names): return [transpile(x, names) for x in args]

def transpile(ast, names):
    "Converts bra-ket Parsing Tree into a C99 AST (skipping the PT → AST step 'toast')."
    
    if (ast.data == "term" or ast.data == "operators") and len(ast.children) == 2:
        return mul(*transpileall(ast.children, names))

    elif ast.data == "factor":                                        # <bra|op|ket> or <bra|ket>
        scalar = mul(*transpileall(ast.children[:2], names))
        if len(ast.children) > 2:
            scalar = mul(scalar, transpile(ast.children[2], names))
        return call(scalar, [Constant("int", "0")] * 2)               # get scalar value (0, 0)

    elif ast.data == "bra" or ast.data == "ket":
        return call(ID(str(ast.data)), transpile(ast.children[0], names))

    elif ast.data == "conjugate":                                     # conjugate function
        return call(ID(str(ast.data)), [transpile(ast.children[0], names)])

    elif ast.data == "s1" or ast.data == "s2" or ast.data == "s3":    # Pauli spin matrix
        return ID(str(ast.data))

    elif ast.data == "value" and len(ast.children) == 1:              # one named value
        n = str(ast.children[0])
        names.append(n)
        return [call(ID("C"), [ID("r1_" + n), ID("i1_" + n)]), call(ID("C"), [ID("r2_" + n), ID("i2_" + n)])]
    elif ast.data == "value" and len(ast.children) == 2:              # two complex numbers
        return transpileall(ast.children, names)

    elif ast.data == "real":                                          # real number
        return call(ID("C"), [Constant("double", str(ast.children[0])), Constant("double", "0")])
    elif ast.data == "imag":                                          # pure imaginary number
        return call(ID("C"), [Constant("double", "0"), Constant("double", str(ast.children[0]))])
    elif ast.data == "imag1":                                         # pure imaginary number i
        return call(ID("C"), [Constant("double", "0"), Constant("double", "1")])
    elif ast.data == "complex":                                       # complex number
        return call(ID("C"), [Constant("double", str(ast.children[0])), Constant("double", str(ast.children[1]))])
    elif ast.data == "complex1":                                      # complex number with single i
        return call(ID("C"), [Constant("double", str(ast.children[0])), Constant("double", "1")])

    else:
        return transpile(ast.children[0], names)                      # pass-through structure

In [8]:
names = []
ast = transpile(parser.parse("<0 1| σ₁* σ₂ |0 1>"), names)

ast.show()
# print(c_generator.visit(ast))
names

FuncCall: 
  BinaryOp: *
    BinaryOp: *
      FuncCall: 
        ID: bra
        ExprList: 
          FuncCall: 
            ID: C
            ExprList: 
              Constant: double, 0
              Constant: double, 0
          FuncCall: 
            ID: C
            ExprList: 
              Constant: double, 1
              Constant: double, 0
      BinaryOp: *
        FuncCall: 
          ID: conjugate
          ExprList: 
            ID: s1
        ID: s2
    FuncCall: 
      ID: ket
      ExprList: 
        FuncCall: 
          ID: C
          ExprList: 
            Constant: double, 0
            Constant: double, 0
        FuncCall: 
          ID: C
          ExprList: 
            Constant: double, 1
            Constant: double, 0
  ExprList: 
    Constant: int, 0
    Constant: int, 0


[]

In [9]:
# Define some helper functions in the output language instead of complicating the generated output.

ROOT.gInterpreter.Declare("""
complex<double> C(double real, double imag) {
    return complex<double>(real, imag);
}
ROOT::Math::SMatrix<complex<double>, 1, 2> bra(complex<double> up, complex<double> down) {
    return ROOT::Math::SMatrix<complex<double>, 1, 2>((complex<double>[]){up, down}, 2);
}
ROOT::Math::SMatrix<complex<double>, 2, 1> ket(complex<double> up, complex<double> down) {
    return ROOT::Math::SMatrix<complex<double>, 2, 1>((complex<double>[]){up, down}, 2);
}
ROOT::Math::SMatrix<complex<double>, 2, 2> conjugate(ROOT::Math::SMatrix<complex<double>, 2, 2> S) {
    auto out = ROOT::Math::Transpose(S);
    out(0, 0) = conj(out(0, 0));
    out(0, 1) = conj(out(0, 1));
    out(1, 0) = conj(out(1, 0));
    out(1, 1) = conj(out(1, 1));
    return out;
}
ROOT::Math::SMatrix<complex<double>, 2, 2> matrix(complex<double> a, complex<double> b, complex<double> c, complex<double> d) {
    return ROOT::Math::SMatrix<complex<double>, 2, 2>((complex<double>[]){a, b, c, d}, 4);
}
auto s1 = matrix(C(0, 0), C(1,  0), C(1, 0), C( 0, 0));
auto s2 = matrix(C(0, 0), C(0, -1), C(0, 1), C( 0, 0));
auto s3 = matrix(C(1, 0), C(0,  0), C(0, 0), C(-1, 0));
""")

True

In [10]:
# Much of the work is actually rearranging inputs and outputs (matching Python types to C++ types).
def c_args(names):
    return ", ".join("double r1_{0}, double i1_{0}, double r2_{0}, double i2_{0}".format(x) for x in names)

def python_args(names, root_function):
    def prepare(kwargs):
        for n in names:
            x, y = complex(kwargs[n][0]), complex(kwargs[n][1])
            yield x.real
            yield x.imag
            yield y.real
            yield y.imag
    return lambda **kwargs: root_function(*prepare(kwargs))

def python_ret(root_complex):
    return complex(root_complex.real(), root_complex.imag())

print(c_args(["x", "y"]))
# python_args(["x", "y"], print)(x=[0, 1j], y=[1, 0])

double r1_x, double i1_x, double r2_x, double i2_x, double r1_y, double i1_y, double r2_y, double i2_y


In [11]:
# Again, don't shy away from using strings of the output language when COMPOSING is not an issue.
def braket(code):
    names = []
    c_ast = transpile(parser.parse(code), names)

    braket.function_num += 1           # new function name each time a new function is compiled
    function_name = "braket_{0}".format(braket.function_num)

    ROOT.gInterpreter.Declare("""      // this amount of composing C++ is not hard (or unreadable!)
complex<double> {function_name}({args}) {{
    return {c_code};
}}""".format(
        function_name = function_name,
        args = c_args(names),
        c_code = c_generator.visit(c_ast)))
    root_function = getattr(ROOT, function_name)

    return lambda **kwargs: python_ret(python_args(names, root_function)(**kwargs))   # wrap args

braket.function_num = 0                # the function knows the number of times it's been called

In [12]:
compiled1 = braket("<0 1| σ₁* σ₂ |0 i>")
print(compiled1())

compiled2 = braket("<x| σ₁* σ₂ |y>")
print(compiled2(x=(0, 1), y=(0, 1j)))

(1+0j)
(1+0j)


## Exercise (on your own):

The ROOT function for transposing a matrix is `ROOT::Math::Transpose`. Add a transpose operator to the grammar:

```
operator: spinor
        | operator "*" -> conjugate
        | operator "ᵀ" -> transpose | operator "T" -> transpose
```

and propagate it through the transpiler to the final output so that `<0 1| σ₁ᵀ σ₂ |0 1>` returns `-1j`.

## Summary

Some of the work that an **interpreter** performs is unnecessarily repeated. Performing that part once to create a lean executable is **compilation**.

All **compilation** simply translates code from one language to another, though machine instructions are rather hard to read. Compilation to a human-readable language is called source-to-source compilation or **transpilation**.

When transpiling, it's recommended to convert the input language's AST into the output language's AST, rather than directly emitting strings in the output language because strings of code are hard to compose. It's not a hard-edged rule to always avoid strings, but don't rely on _composing_ strings.