<br><br><br><br><br>

# Embedded DSLs

<br><br><br><br><br>

Do we need a new language?

Or more specifically, do we need a new syntax?

An **embedded DSL** is a domain-specific language that is implemented in another language. The difference between an embedded DSL and an ordinary library is subjective.

For example:

   * [Numpy](https://docs.scipy.org/doc/numpy/reference/) implements array programming idioms in Python, which had been language features in MATLAB, R, and APL.
   * My [awkward-array](https://github.com/scikit-hep/awkward-array) is an embedded DSL for physics: `(muons[:, 0] + muons[:, 1]).mass` → Z peak.
   * [Akka](https://akka.io/) implements the actor model of concurrency, which had been a language feature in Erlang.
   * [Boost.Proto](https://www.boost.org/doc/libs/1_58_0/doc/html/proto.html) provides tools to build DSLs in C++.
   * [Scala's syntax rules](https://scalac.io/encog-dsl-scala-part1) are flexible enough for some radical DSLs (next pages).

Libraries are more on the "embedded DSL" end of the spectrum if they consist of shallow functions that only do interesting things when combined, like the constructs of a programming language.

In Scala, a class's methods can be accessed through a dot `.` (like most languages) or a space ` `. Methods taking a single argument need not have parentheses `(` `)`, and they can be named with unicode characters.

The following are equivalent:

```scala
a.cross(b)      // a and b are ThreeVectors, which has a cross method
a cross b       // the dot and parentheses may be omitted
a × b           // cross can also be named ×
```

Scala also lets functions decide whether they want to be eagerly or lazily evaluated, and has Lisp-like macros. This leads to some extreme DSLs, like [ScalaTest](http://www.scalatest.org/):

```scala
x should not equal 1

List("one", "two", "three") should contain ("two")

an [IndexOutOfBoundsException] should be thrownBy string.charAt(-1)
```

The [Chisel](https://chisel.eecs.berkeley.edu/) library specifies circuits for FPGAs several times more succinctly than Verilog. This is a vending machine:

```scala
class VendingMachine extends Component {
  val io = new Bundle {
    val nickel = Bool(INPUT)
    val dime   = Bool(INPUT)
    val ready  = Bool(OUTPUT)
  }
  val s_idle :: s_5 :: s_10 :: s_15 :: s_ok :: Nil = Enum(5){ UFIx() }
  val state = Reg(resetVal = s_idle)
  switch (state) {
    is (s_idle) { when (io.nickel) { state := s_5 }  when (io.dime) { state := s_10 } }
    is (s_5)    { when (io.nickel) { state := s_10 } when (io.dime) { state := s_15 } }
    is (s_10)   { when (io.nickel) { state := s_15 } when (io.dime) { state := s_ok } }
    is (s_15)   { when (io.nickel) { state := s_ok } when (io.dime) { state := s_ok } }
    is (s_ok)   { state := s_idle }
  }
  io.ready := (state === s_ok)
}
```

<br>

But most physicists use Python for high-level analysis.

Python's syntax is not nearly as flexible as Scala's, nor does it give control over eager/lazy evaluation like Scala and C# (unless you ask the user to always pass functions as arguments!), so options for embedded DSLs in Python are limited.

<br>

In all the demos I've shown so far, we've defined new grammars.

   * This has the _advantage_ of complete freedom in syntax and interpretation.
   * This has the _disadvantage_ that the language must always live in quoted strings or separate files.

It's okay for small snippets of the new language, like regular expressions or SQL queries, but for longer programs, the user will be limited by their favorite editor not interpreting the language for syntax highlighting, auto-indentation, tab-completion, integrated documentation...

<br>

In [1]:
# Best of both? Code in Python FUNCTIONS are syntax-checked but otherwise unevaluated.

def function(x, y):
    return x**2 + y**2

# The source code has been compiled for Python's virtual machine, which is another language,
# but that language can be parsed (not by humans).

print(function.__code__.co_code)

b'|\x00d\x01\x13\x00|\x01d\x01\x13\x00\x17\x00S\x00'


In [2]:
import sys, uncompyle6.parser, uncompyle6.scanner

def parse(function):
    code = function.__code__
    scanner = uncompyle6.scanner.get_scanner(float(sys.version[0:3]))
    parser = uncompyle6.parser.get_python_parser(float(sys.version[0:3]), compile_mode="exec")
    tokens, customize = scanner.ingest(code)
    return uncompyle6.parser.parse(parser, tokens, customize)

parse(lambda x, y: x**2 + y**2)

stmts
    sstmt
        stmt
            return (2)
                 0. ret_expr
                    expr
                        binary_expr (3)
                             0. expr
                                binary_expr (3)
                                     0. expr
                                        L.  10       0  LOAD_FAST                'x'
                                     1. expr
                                                     2  LOAD_CONST            2  2
                                     2. binary_op
                                                     4  BINARY_POWER     
                             1. expr
                                binary_expr (3)
                                     0. expr
                                                     6  LOAD_FAST                'y'
                                     1. expr
                                                     8  LOAD_CONST            2  2
                                     2. bina

In [3]:
# Make our own AST, not that parsing tree or Python's own AST. We want to control the interpretation.

class AST:
    _fields = ()
    def __init__(self, *args):
        for n, x in zip(self._fields, args):
            setattr(self, n, x)

class Literal(AST):
    _fields = ("value",)
    def __str__(self): return str(self.value)

class Symbol(AST):
    _fields = ("symbol",)
    def __str__(self): return self.symbol

class Call(AST):
    _fields = ("function", "arguments")
    def __str__(self):
        return "{0}({1})".format(str(self.function), ", ".join(str(x) for x in self.arguments))

In [4]:
# Now we just need a toaster for uncompyle6's parsing tree.

def toast(ptnode):
    if ptnode.kind == "binary_expr":
        return Call(toast(ptnode[2]), [toast(x) for x in ptnode[:2]])
    elif ptnode.kind == "BINARY_ADD":
        return Symbol("add")
    elif ptnode.kind == "BINARY_POWER":
        return Symbol("pow")
    elif ptnode.kind == "LOAD_FAST":
        return Symbol(ptnode.pattr)
    elif ptnode.kind == "LOAD_CONST":
        return Literal(ptnode.pattr)
    else:
        # a lot of nodes are purely structural
        return toast(ptnode[0])

# It's incomplete, but it covers enough for our example.
print(toast(parse(lambda x, y: x**2 + y**2)))

add(pow(x, 2), pow(y, 2))


[Python's grammar](https://docs.python.org/3/reference/grammar.html) is not very large, not hard to fully translate—at least the expressions (86 rules, 36 are for expressions):

```
test:           or_test ['if' or_test 'else' test] | lambdef
test_nocond:    or_test | lambdef_nocond
lambdef:        'lambda' [varargslist] ':' test
lambdef_nocond: 'lambda' [varargslist] ':' test_nocond
or_test:        and_test ('or' and_test)*
and_test:       not_test ('and' not_test)*
not_test:       'not' not_test | comparison
comparison:     expr (comp_op expr)*
comp_op:        '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'
star_expr:      '*' expr
expr:           xor_expr ('|' xor_expr)*
xor_expr:       and_expr ('^' and_expr)*
and_expr:       shift_expr ('&' shift_expr)*
shift_expr:     arith_expr (('<<'|'>>') arith_expr)*
arith_expr:     term (('+'|'-') term)*
term:           factor (('*'|'@'|'/'|'%'|'//') factor)*
factor:         ('+'|'-'|'~') factor | power
power:          atom_expr ['**' factor]
atom_expr:      ['await'] atom trailer*
atom:           ('(' [yield_expr|testlist_comp] ')' | '[' [testlist_comp] ']' |
                '{' [dictorsetmaker] '}' | NAME | NUMBER | STRING+ | '...' | 'None' | 'True' | 'False')
testlist_comp:  (test|star_expr) ( comp_for | (',' (test|star_expr))* [','] )
trailer:        '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist:  subscript (',' subscript)* [',']
subscript:      test | [test] ':' [test] [sliceop]
sliceop:        ':' [test]
exprlist:       (expr|star_expr) (',' (expr|star_expr))* [',']
testlist:       test (',' test)* [',']
dictorsetmaker: (((test ':' test | '**' expr) (comp_for | (',' (test ':' test | '**' expr))* [','])) |
                ((test | star_expr) (comp_for | (',' (test | star_expr))* [','])))
arglist:        argument (',' argument)*  [',']
argument:       ( test [comp_for] | test '=' test | '**' test | '*' test )
comp_iter:      comp_for | comp_if
sync_comp_for:  'for' exprlist 'in' or_test [comp_iter]
comp_for:       ['async'] sync_comp_for
comp_if:        'if' test_nocond [comp_iter]
yield_expr:     'yield' [yield_arg]
yield_arg:      'from' test | testlist
```

In [5]:
# And the whole apparatus can be wrapped up in a Python decorator.
# (Numba uses the same mechanism to specify that a function is to be compiled.)

def quote(function):                   # like Lisp's "quote"
    return toast(parse(function))

@quote
def sum_quadrature(x, y):
    return x**2 + y**2

print(sum_quadrature)

add(pow(x, 2), pow(y, 2))


With the ability to turn any Python function into an AST, we can add

   * lazy evaluation or a declarative interpretation;
   * type-checking, possibly with physics-motivated features, like refinement types;
   * compilation through C++ or directly into bytecode (though Numba does that);
   * AST transformations, such as derivatives (Calculus);
   * AST rewriting, such as algebraic simplifications;
   * ...

New syntax vs Python syntax is a _separate question_ from the above features. It's also possible to have two syntaxes for the same language (different parsing steps produce the same ASTs).

Using Python as a syntax limits us to Python syntax rules and Pythonic expectations (users would be upset if we changed the _meaning_ of the operators), but it provides a better editing experience.

I started thinking along these lines in a project called [rejig](https://github.com/diana-hep/rejig).