<br><br><br><br><br>

# Compiling and embedding DSLs

<br><br><br><br><br>

<br><br><br>

In the previous tutorial, we **interpreted** our little language, rather than **compiling** it.

The interpreter simply did in Python what was written in the new language—we filled the symbol table with functions like:

```python
builtins["+"] = lambda x, y: x + y
```

Have we accomplished anything?

<br><br><br>

<br>

What we'll see in this tutorial is that a compiler isn't much more substantial than that. Ultimately, it replaces a `+` AST node with the machine instruction for `+` and we've merely passed the buck to another layer.

<br>

What we'll see in this tutorial is that a compiler isn't much more substantial than that. Ultimately, it replaces a `+` AST node with the machine instruction for `+` and we've merely passed the buck to another layer.

<br><br>

**Programming languages do not perform actions. They only translate a user's intention to another level of abstraction.**

<br>

What we'll see in this tutorial is that a compiler isn't much more substantial than that. Ultimately, it replaces a `+` AST node with the machine instruction for `+` and we've merely passed the buck to another layer.

<br><br>

**Programming languages do not perform actions. They only translate a user's intention to another level of abstraction.**

<br><br>

In our interpreter, there were 6 layers of abstraction:

transistor gates → machine code → C compiler → Python implementation (C code) → our interpreter → our little language

**Interpreters vs compilers:**

   * An **interpreter** walks over the AST (or even parses the source code!) at runtime.
   * A **compiler** serializes the AST into a state machine or a sequence of instructions, virtual or physical.
   * A **transpiler** serializes the AST into code in another human-readable language. (Subjective: what's human-readable?)

**Compilation targets:**

   * A **finite state machine** is a graph of executable steps _only_ (not a full interpreter). Regular expressions are often compiled to finite state machines. A non-recurrent neural network is also a finite state machine.
   * A **push-down machine** is a state machine with a stack of memory—parsers are almost always push-down machines.
   * A **virtual machine** is a computation engine driven by a sequence of instructions, like a physical computer, but implemented in software. Python and Java are not interpreters: they _compile_ their source code to _virtual machines._
   * A **Von Neumann machine** is a physical computer driven by a sequence of instructions.

**Other:**

   * **FPGA/ASIC:** physical computer consisting of raw gates, not instructions; Verilog isn't _compiled_ like C, it's _synthesized_.

<br><br><br>

The general flow of a compiler is

<center style="margin-top: 20px; margin-bottom: 20px"><b>linear (source code) → tree (AST) → linear (instructions or other source code)</b></center>

In this notebook, we'll write a transpiler, converting our little language into C++.

<br><br><br>

In [1]:
# First of all, did you know that you can do this?

import ROOT

ROOT.gInterpreter.Declare("""
double new_function(double x, double y) {
    return sqrt(x*x + y*y);
}""")

ROOT.new_function(3, 4)

Welcome to JupyROOT 6.17/01


5.0

In [6]:
# And what about this?
import pycparser.c_parser, pycparser.c_generator
parser    = pycparser.c_parser.CParser()
generator = pycparser.c_generator.CGenerator()
ast = parser.parse("double f(double x) { return x*x; }")
ast.show()

FileAST: 
  FuncDef: 
    Decl: f, [], [], []
      FuncDecl: 
        ParamList: 
          Decl: x, [], [], []
            TypeDecl: x, []
              IdentifierType: ['double']
        TypeDecl: f, []
          IdentifierType: ['double']
    Compound: 
      Return: 
        BinaryOp: *
          ID: x
          ID: x


In [8]:
ast = parser.parse("double f(double x) { return x*x; }")
print(generator.visit(ast))

double f(double x)
{
  return x * x;
}




We can compile and run C++ code in ROOT (Cling) and we have a general C99 AST in a Python library (pycparser).

The compilation chain could look like this:

<center style="margin-top: 20px; margin-bottom: 20px"><b>our source language → our AST → C99 AST → C++ source code → compile and run in ROOT</b></center>

We can compile and run C++ code in ROOT (Cling) and we have a general C99 AST in a Python library (pycparser).

The compilation chain could look like this:

<center style="margin-top: 20px; margin-bottom: 20px"><b>our source language → our AST → C99 AST → C++ source code → compile and run in ROOT</b></center>

Why not output C++ strings directly from our AST? It's a hard-to-debug shortcut.

We can compile and run C++ code in ROOT (Cling) and we have a general C99 AST in a Python library (pycparser).

The compilation chain could look like this:

<center style="margin-top: 20px; margin-bottom: 20px"><b>our source language → our AST → C99 AST → C++ source code → compile and run in ROOT</b></center>

Why not output C++ strings directly from our AST? It's a hard-to-debug shortcut. ([Coffeescript famously skipped that step.](https://www.kickstarter.com/projects/michaelficarra/make-a-better-coffeescript-compiler) :)

<img src="img/coffeescript-rise-and-fall.png" width="90%">