<br><br><br><br><br>

# Language basics: parsing and interpreting

<br><br><br><br><br>

**Parsing is the conversion of source code text into a tree representing relationships among tokens (words & symbols).**

<img src="https://web.archive.org/web/20180815032316im_/http://www.german-latin-english.com/diagram2.gif" width="50%">

Reports about medicines in newspapers and on television commonly contain little or no information about drugs' risks and cost, and often cite medical "experts" without disclosing their financial ties to the pharmaceutical industry, according to a new study.

   - Susan Okie, The Washington Post (published on June 1, 2000, in Louisville, KY, in The Courier-Journal, page A3)

<img src="https://web.archive.org/web/20181030174508im_/https://www.nltk.org/book/tree_images/ch08-tree-1.png" width="45%"><img src="https://web.archive.org/web/20181030174508im_/https://www.nltk.org/book/tree_images/ch08-tree-2.png" width="45%">

<br>
"How he got into my pajamas, I'll never know." — Groucho Marx

**Grammar:** a list of rules to convert tokens into trees and trees into bigger trees.

```
sentence (S):              noun_phrase verb_phrase
prepositional_phrase (PP): preposition noun_phrase
verb_phrase (VP):          verb noun_phrase | verb noun_phrase prepositional_phrase
noun_phrase (NP):          "John" | "Mary" | "Bob"
                           | determiner noun | determiner noun prepositional_phrase
preposition (P):           "in" | "on" | "by" | "with"
verb (V):                  "saw" | "ate" | "walked"
determiner (Det):          "a" | "an" | "the" | "my"
noun (N):                  "man" | "dog" | "cat" | "telescope" | "park"
```

<center><img src="https://web.archive.org/web/20181030174508im_/https://www.nltk.org/book/tree_images/ch08-tree-4.png" width="20%"><img src="https://web.archive.org/web/20181030174508im_/https://www.nltk.org/book/tree_images/ch08-tree-5.png" width="20%"></center>

<img src="https://web.archive.org/web/20181030174508im_/https://www.nltk.org/images/rdparser1-6.png" width="90%">

<table><tr><td width="20%">
<img src="https://web.archive.org/web/20190219060132im_/https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_genastdot_01.png" width="100%">
</td><td width="80%">
<p style="font-size: 14px; margin-left: 10px; margin-bottom: 30px">Mathematical expressions and computer programs can be parsed the same way.</p>
<p style="font-size: 14px; margin-left: 10px; margin-bottom: 30px"><tt>7 + 3 * (10 / (12 / (3 + 1) - 1))</tt></p>
<p style="font-size: 14px; margin-left: 10px; margin-bottom: 30px">We definitely don't need to write the parsing algorithm—decades of computer science research has already gone into that.</p>
<p style="font-size: 14px; margin-left: 10px; margin-bottom: 30px">I used to have a favorite (PLY), but while preparing this demo, I found a better one (Lark).</p>
<p style="font-size: 14px; margin-left: 10px; margin-bottom: 30px"><b>Let's get started!</b></p>
</td></tr></table>

## Lark - a modern parsing library for Python

Parse any context-free grammar, FAST and EASY!

**Beginners**: Lark is not just another parser. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so efficiently. It also constructs a parse-tree for you, without additional code on your part.

**Experts**: Lark implements both Earley(SPPF) and LALR(1), and several different lexers, so you can trade-off power and speed, according to your requirements. It also provides a variety of sophisticated features and utilities.

Lark can:

 - Parse all context-free grammars, and handle any ambiguity
 - Build a parse-tree automagically, no construction code required
 - Outperform all other Python libraries when using LALR(1) (Yes, including PLY)
 - Run on every Python interpreter (it's pure-python)
 - Generate a stand-alone parser (for LALR(1) grammars)

In [1]:
import lark

expression_grammar = """
arith:   term   | term "+" term     -> add | term "-" term     -> sub
term:    factor | factor "*" factor -> mul | factor "/" factor -> div
factor:  pow    | "+" factor        -> pos | "-" factor        -> neg
pow:     call ["**" factor]
call:    atom   | call trailer
atom:    "(" expression ")" | CNAME -> symbol | NUMBER -> literal
trailer: "(" arglist ")"
arglist: expression ("," expression)*

%import common.CNAME
%import common.NUMBER
%import common.WS
"""

grammar = "\n".join(["start: expression", "expression: arith", "%ignore WS"]) + expression_grammar

parser = lark.Lark(grammar)

In [2]:
print(parser.parse("2 + 2").pretty())

start
  expression
    add
      term
        factor
          pow
            call
              literal	2
      term
        factor
          pow
            call
              literal	2



If the prepositional phrase "in my pajamas" had a well-defined operator precedence in "I shot an elephant in my pajamas," there would be no ambiguity.

Alternatively, if we could use parentheses `(` `)` in English to denote nesting, there would be no ambiguity.

Building the operator precedence into the grammar created a lot of superflous tree nodes, though.

<table><tr><td width="35%">
<img src="https://web.archive.org/web/20190219060132im_/https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_ast_02.png" width="100%">
</td><td width="65%">
<p style="font-size: 14px; margin-left: 10px; margin-bottom: 30px">The parsing tree has too much detail because it includes nodes for rules even if they were just used to set up operator precedence.</p>
<p style="font-size: 14px; margin-left: 10px; margin-bottom: 30px">Let's reduce it to a tree that contains only what is necessary to understand the meaning of the program.</p>
<p style="font-size: 14px; margin-left: 10px; margin-bottom: 30px">Such a tree is called an <b>Abstract Syntax Tree</b> (AST).</p>
<p style="font-size: 14px; margin-left: 10px; margin-bottom: 30px">This is easy enough (and particular enough to our specific needs) that we should write it ourselves.</p>
</td></tr></table>

In [3]:
class AST:                                       # only three types (and a superclass to set them up)
    _fields = ()
    def __init__(self, *args, line=None):
        self.line = line
        for n, x in zip(self._fields, args):
            setattr(self, n, x)
            if self.line is None: self.line = x.line

class Literal(AST):                              # Literal: value that appears in the program text
    _fields = ("value",)
    def __str__(self): return str(self.value)

class Symbol(AST):                               # Symbol: value referenced by name
    _fields = ("symbol",)
    def __str__(self): return self.symbol

class Call(AST):                                 # Call: evaluate a function on arguments
    _fields = ("function", "arguments")
    def __str__(self):
        return "{0}({1})".format(str(self.function), ", ".join(str(x) for x in self.arguments))

In [4]:
def toast(ptnode):  # Recursively convert parsing tree (PT) into abstract syntax tree (AST).
    if ptnode.data in ("add", "sub", "mul", "div", "pos", "neg"):
        arguments = [toast(x) for x in ptnode.children]
        return Call(Symbol(str(ptnode.data), line=arguments[0].line), arguments)
    elif ptnode.data == "pow" and len(ptnode.children) == 2:
        arguments = [toast(ptnode.children[0]), toast(ptnode.children[1])]
        return Call(Symbol("pow", line=arguments[0].line), arguments)
    elif ptnode.data == "call" and len(ptnode.children) == 2:
        return Call(toast(ptnode.children[0]), toast(ptnode.children[1]))
    elif ptnode.data == "symbol":
        return Symbol(str(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "literal":
        return Literal(float(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "arglist":
        return [toast(x) for x in ptnode.children]
    else:
        return toast(ptnode.children[0])    # many other cases, all of them simple pass-throughs

print(toast(parser.parse("2 + 2")))

add(2.0, 2.0)


## Execution

The simplest way to run a program is to repeatedly walk over the AST, evaluating each step. This is an **interpreter**.

_Historical interlude:_

   * The first high-level programming language, [Short Code](https://www.computer.org/csdl/magazine/an/1988/01/man1988010007/13rRUxCitB8) ("Short Order Code"), was an interpreter.
   * Created by a physicist, John Mauchly, in 1949 for UNIVAC I.
   * It ran 50× slower than the corresponding machine instructions.
   * His company hired Grace Hopper, who improved the situation by inventing compilers (in particular, COBOL in 1959).

A **compiler** scans the AST to generate a sequence of machine instructions, natively recognized and executed by the computer.

<img src="https://web.archive.org/web/20190322182736im_/https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_compiler_interpreter.png" width="80%">

In [5]:
def run(astnode, symbols):
    if isinstance(astnode, Literal):
        return astnode.value

    elif isinstance(astnode, Symbol):
        return symbols[astnode.symbol]

    elif isinstance(astnode, Call):
        function = run(astnode.function, symbols)
        arguments = [run(x, symbols) for x in astnode.arguments]
        return function(*arguments)

import math, operator
symbols = {"add": operator.add, "sub": operator.sub, "mul": operator.mul, "div": operator.truediv,
           "pos": operator.pos, "neg": operator.neg, "pow": math.pow, "sqrt": math.sqrt, "x": 5}
run(toast(parser.parse("2 + 2")), symbols)

4.0

This is the pattern we will use for the rest of this tutorial:

   * **Lark:** grammar → parsing tree
   * **toast:** parsing tree → AST
   * **interpreter:** AST + inputs → outputs

We will just be adding to the grammar, the AST, and the interpreter as we go.

## Error handling

If a bad condition is encountered at runtime, like `sqrt(-5)`, the interpreter stops because the underlying Python execution engine raises an exception.

When writing a language, we must distinguish between our own internal errors and the users' logic mistakes. In the latter case, we have to let them know that they can fix it and provide a hint about where to start.

Line numbers are the most useful hint—but only when they're lines in the user's code, not the execution engine itself. The parser knows about line numbers—we must propagate that information into the AST (for an interpreter) and the final executable (for a compiler with debugging symbols included).

In [6]:
# We've already propagated line numbers from parsing tree tokens to all AST nodes.
def showline(ast):
    if isinstance(ast, list):
        for x in ast:
            showline(x)
    if isinstance(ast, AST):
        print("{0:5s} {1:10s} {2}".format(str(ast.line), type(ast).__name__, ast))
        for n in ast._fields:
            showline(getattr(ast, n))

print("{0:5s} {1:10s} {2}".format("line", "AST type", "expression"))
print("--------------------------------------------------------")
showline(toast(parser.parse("""sqrt(-5)""")))

line  AST type   expression
--------------------------------------------------------
1     Call       sqrt(neg(5.0))
1     Symbol     sqrt
1     Call       neg(5.0)
1     Symbol     neg
1     Literal    5.0


In [7]:
# Short exercise: change the line below to report UserErrors with line numbers from the source code.

class UserError(Exception): pass

def run(astnode, symbols):
    if isinstance(astnode, Literal):
        return astnode.value
    elif isinstance(astnode, Symbol):
        return symbols[astnode.symbol]
    elif isinstance(astnode, Call):
        function = run(astnode.function, symbols)
        arguments = [run(x, symbols) for x in astnode.arguments]
        try:
            return function(*arguments)
        except Exception as err:
            raise err   # CHANGE THIS LINE

run(toast(parser.parse("""sqrt(-5)""")), {**operator.__dict__, **math.__dict__})

ValueError: math domain error

## Assignments

So far, all we've implemented is a calculator. As a next step, let's extend the language to include assignments.

For clarity, we'll use `:=` as an assignment operator.

These are our first **statements**, which do not compose as **expressions** do. Whereas expressions can be nested in parentheses like mathematical formulae, statements form a sequence that can only be nested with some block-syntax. We'll use curly brackets: `{...}`.

A block of statements could be used as an expression if it has a value. We'll use a common convention (among functional languages) in which the last statement of a block is an expression, and its value is the value of the block.

In [8]:
assignment_grammar = """
statements: NEWLINE* (assignment (NEWLINE | ";"))* expression NEWLINE*
assignment: CNAME ":=" expression
          | CNAME ":=" "{" statements "}"

%import common.WS_INLINE
%import common.NEWLINE

%ignore WS_INLINE
"""

grammar = "\n".join(["start: statements", "expression: arith"]
                   ) + expression_grammar + assignment_grammar
parser = lark.Lark(grammar)

In [9]:
print(parser.parse("""
x := 5
x
""").pretty())

start
  statements
    

    assignment
      x
      expression
        arith
          term
            factor
              pow
                call
                  literal	5
    

    expression
      arith
        term
          factor
            pow
              call
                symbol	x
    




In [10]:
class Block(AST):                                # Block: sequence of statements ending in expression
    _fields = ("statements",)                    # (Doesn't have to be an AST element; could be a
    def __str__(self):                           # function that returns its last argument...)
        return "{" + "; ".join(str(x) for x in self.statements) + "}"

class Assign(AST):                               # Assign: put a value into a symbol
    _fields = ("symbol", "value")
    def __str__(self):
        return "{0} := {1}".format(str(self.symbol), str(self.value))

In [11]:
def toast(ptnode):
    if ptnode.data == "statements":
        statements = [toast(x) for x in ptnode.children if x != "\n"]
        return Block(statements, line=statements[0].line)

    elif ptnode.data == "assignment":
        return Assign(str(ptnode.children[0]), toast(ptnode.children[1]), line=ptnode.children[0].line)

    ######################################### from this point onward, it's the same as before...
    elif ptnode.data in ("add", "sub", "mul", "div", "pos", "neg"):
        arguments = [toast(x) for x in ptnode.children]
        return Call(Symbol(str(ptnode.data), line=arguments[0].line), arguments)
    elif ptnode.data == "pow" and len(ptnode.children) == 2:
        arguments = [toast(ptnode.children[0]), toast(ptnode.children[1])]
        return Call(Symbol("pow", line=arguments[0].line), arguments)
    elif ptnode.data == "call" and len(ptnode.children) == 2:
        return Call(toast(ptnode.children[0]), toast(ptnode.children[1]))
    elif ptnode.data == "symbol":
        return Symbol(str(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "literal":
        return Literal(float(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "arglist":
        return [toast(x) for x in ptnode.children]
    else:
        return toast(ptnode.children[0])    # many other cases, all of them simple pass-throughs

In [12]:
print(toast(parser.parse("x := 5; x")))

{x := 5.0; x}


In [13]:
# Short exercise: change toast so that x := {5} produces the same AST as x := 5 (ignore extra {})

def toast(ptnode):
    if ptnode.data == "statements":
        statements = [toast(x) for x in ptnode.children if x != "\n"]
        return Block(statements, line=statements[0].line)

    elif ptnode.data == "assignment":
        return Assign(str(ptnode.children[0]), toast(ptnode.children[1]), line=ptnode.children[0].line)

    ######################################### the change you need to make is ABOVE this line...
    elif ptnode.data in ("add", "sub", "mul", "div", "pos", "neg"):
        arguments = [toast(x) for x in ptnode.children]
        return Call(Symbol(str(ptnode.data), line=arguments[0].line), arguments)
    elif ptnode.data == "pow" and len(ptnode.children) == 2:
        arguments = [toast(ptnode.children[0]), toast(ptnode.children[1])]
        return Call(Symbol("pow", line=arguments[0].line), arguments)
    elif ptnode.data == "call" and len(ptnode.children) == 2:
        return Call(toast(ptnode.children[0]), toast(ptnode.children[1]))
    elif ptnode.data == "symbol":
        return Symbol(str(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "literal":
        return Literal(float(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "arglist":
        return [toast(x) for x in ptnode.children]
    else:
        return toast(ptnode.children[0])    # many other cases, all of them simple pass-throughs

print(toast(parser.parse("x := {5}; x")))

{x := {5.0}; x}


In [14]:
def run(astnode, symbols):
    if isinstance(astnode, Literal):
        return astnode.value
    elif isinstance(astnode, Symbol):
        return symbols[astnode.symbol]
    elif isinstance(astnode, Call):
        function = run(astnode.function, symbols)
        arguments = [run(x, symbols) for x in astnode.arguments]
        return function(*arguments)
    elif isinstance(astnode, Block):
        for statement in astnode.statements:
            last = run(statement, symbols)
        return last
    elif isinstance(astnode, Assign):
        symbols[astnode.symbol] = run(astnode.value, symbols)

symbols = {**operator.__dict__, **math.__dict__}
run(toast(parser.parse("x := 5; x + 2")), symbols)

7.0

In [15]:
# Should variables be accessible outside of the block where they're defined (leak)?

symbols = {**operator.__dict__, **math.__dict__}
run(toast(parser.parse("x := {y := 5; y + 2}; y")), symbols)

5.0

<img src="https://web.archive.org/web/20190218171218im_/https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_ast_st01.png" width="100%">

<img src="https://web.archive.org/web/20190218171218im_/https://ruslanspivak.com/lsbasi-part11/lsbasi_part9_ast_st02.png" width="100%">

<img src="https://web.archive.org/web/20190219061247im_/https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img14.png" width="70%">

In [16]:
class SymbolTable:
    def __init__(self, parent=None, **symbols):
        self.parent = parent
        self.symbols = symbols

    def __getitem__(self, symbol):
        if symbol in self.symbols:
            return self.symbols[symbol]
        elif self.parent is not None:
            return self.parent[symbol]
        else:
            raise KeyError(symbol)

    def __setitem__(self, symbol, value):
        self.symbols[symbol] = value

builtins = SymbolTable(**{**operator.__dict__, **math.__dict__})

In [17]:
# Short exercise: change the following so that scopes are nested.
# Bonus: what SHOULD reassigning a variable from a parent's scope do? Can you make that happen?
def run(astnode, symbols):
    if isinstance(astnode, Literal):
        return astnode.value
    elif isinstance(astnode, Symbol):
        return symbols[astnode.symbol]
    elif isinstance(astnode, Call):
        function = run(astnode.function, symbols)
        arguments = [run(x, symbols) for x in astnode.arguments]
        return function(*arguments)
    elif isinstance(astnode, Block):
        for statement in astnode.statements:
            last = run(statement, symbols)
        return last
    elif isinstance(astnode, Assign):
        symbols[astnode.symbol] = run(astnode.value, symbols)

run(toast(parser.parse("x := {y := 5; y + 2}; y")), SymbolTable(builtins))

5.0

## Branching

A calculator that can assign quantities is still just a calculator—though it may make the expressions easier to read.

The next level is to introduce control structures—if/then/else, while loops, and subroutines. For brevity, we'll just do if/then/else.

We could either allow variable definitions in then/else clauses to leak or we could always require an else clause and let the if/then/else block have a value. The former is a state-changing imperative language; the latter is functional (and easier to implement). We'll do the latter.

In [18]:
branching_grammar = """
block:      expression | "{" statements "}"
branch:     "if" expression "then" block "else" block

or:         and        | and "or" and
and:        not        | not "and" not
not:        comparison | "not" not -> not_test
comparison: arith | arith "==" arith -> eq | arith "!=" arith -> ne
                  | arith ">" arith -> gt  | arith ">=" arith -> ge
                  | arith "<" arith -> lt  | arith "<=" arith -> le
"""

grammar = "\n".join(["start: statements", "expression: or | branch"]
                   ) + expression_grammar + assignment_grammar + branching_grammar
parser = lark.Lark(grammar)

In [19]:
print(parser.parse("if x > 0 then 1 else -1").pretty())

start
  statements
    expression
      branch
        expression
          or
            and
              not
                gt
                  arith
                    term
                      factor
                        pow
                          call
                            symbol	x
                  arith
                    term
                      factor
                        pow
                          call
                            literal	0
        block
          expression
            or
              and
                not
                  comparison
                    arith
                      term
                        factor
                          pow
                            call
                              literal	1
        block
          expression
            or
              and
                not
                  comparison
                    arith
                      term
                        neg
                 

In [20]:
def toast(ptnode):
    if ptnode.data == "branch":
        predicate, consequent, alternate = [toast(x) for x in ptnode.children]
        return Call(Symbol("if", line=predicate.line), [predicate, consequent, alternate])

    elif ptnode.data in ("or", "and", "eq", "ne", "gt", "ge", "lt", "le") and len(ptnode.children) > 1:
        arguments = [toast(x) for x in ptnode.children]
        return Call(Symbol(str(ptnode.data), line=arguments[0].line), arguments)

    elif ptnode.data == "not_test":
        argument = toast(ptnode.children[0])
        return Call(Symbol("not", line=argument.line), [argument])

    ######################################### from this point onward, it's the same as before...
    elif ptnode.data == "statements":
        statements = [toast(x) for x in ptnode.children if x != "\n"]
        if len(statements) == 1:
            return statements[0]
        else:
            return Block(statements, line=statements[0].line)
    elif ptnode.data == "assignment":
        return Assign(str(ptnode.children[0]), toast(ptnode.children[1]), line=ptnode.children[0].line)
    elif ptnode.data in ("add", "sub", "mul", "div", "pos", "neg"):
        arguments = [toast(x) for x in ptnode.children]
        return Call(Symbol(str(ptnode.data), line=arguments[0].line), arguments)
    elif ptnode.data == "pow" and len(ptnode.children) == 2:
        arguments = [toast(ptnode.children[0]), toast(ptnode.children[1])]
        return Call(Symbol("pow", line=arguments[0].line), arguments)
    elif ptnode.data == "call" and len(ptnode.children) == 2:
        return Call(toast(ptnode.children[0]), toast(ptnode.children[1]))
    elif ptnode.data == "symbol":
        return Symbol(str(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "literal":
        return Literal(float(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "arglist":
        return [toast(x) for x in ptnode.children]
    else:
        return toast(ptnode.children[0])    # many other cases, all of them simple pass-throughs

In [21]:
print(toast(parser.parse("if not x == 0 or y > 1 and z <= 2 then 2 + 2 else 111 * 9")))

if(or(not(eq(x, 0.0)), and(gt(y, 1.0), le(z, 2.0))), add(2.0, 2.0), mul(111.0, 9.0))


In [22]:
# The interpreter doesn't need any new features; all of these operators are just functions!

builtins["if"]  = lambda predicate, consequent, alternate: consequent if predicate else alternate
builtins["or"]  = lambda p, q: p or q
builtins["and"] = lambda p, q: p and q
builtins["not"] = lambda p: not p

run(toast(parser.parse("if not x == 0 or y > 1 and z <= 2 then 2 + 2 else 111 * 9")),
    SymbolTable(builtins, x = 0, y = 10, z = 1))

4.0

In [23]:
# However, both then/else clauses are evaluated, regardless of the predicate.
def show(x, y, f):
    print(f, x, y, f(x, y))
    return f(x, y)
builtins["add"] = lambda x, y: show(x, y, operator.add)
builtins["mul"] = lambda x, y: show(x, y, operator.mul)

run(toast(parser.parse("if x == x then 2 + 2 else 111 * 9")), SymbolTable(builtins, x = 0))

<built-in function add> 2.0 2.0 4.0
<built-in function mul> 111.0 9.0 999.0


4.0

In [24]:
# same for and/or, which are traditionally only evaluated fully if their result is unknown.

builtins["gt"] = lambda x, y: show(x, y, operator.gt)
builtins["lt"] = lambda x, y: show(x, y, operator.lt)

run(toast(parser.parse("x > 0 and x < 0")), SymbolTable(builtins, x = 0))

<built-in function gt> 0 0.0 False
<built-in function lt> 0 0.0 False


False

Much like the decision that if/then/else would return a value, rather than changing state, decisions about order of evaluation has a subtle effect on what kinds of programs will be written in the language.

The and/or operators are left-right symmetric: we could

(1) evaluate `q` in `p and q` only if `p` is true, or

(2) evaluate `p` in `p and q` only if `q` is true (and equivalently for `p or q`).

Both are mathematically valid, but (2) would break every bash script on Earth.

In [25]:
# Short exercise: customize if-handling to only evaluate consequent or alternate, but not both.

def run(astnode, symbols):
    if isinstance(astnode, Call) and astnode.function.symbol == "if":
        predicate  = run(astnode.arguments[0], symbols)
        consequent = run(astnode.arguments[1], symbols)
        alternate  = run(astnode.arguments[2], symbols)
        return consequent if predicate else alternate

    ######################################### the change you need to make is ABOVE this line...
    elif isinstance(astnode, Literal):
        return astnode.value
    elif isinstance(astnode, Symbol):
        return symbols[astnode.symbol]
    elif isinstance(astnode, Call):
        function = run(astnode.function, symbols)
        arguments = [run(x, symbols) for x in astnode.arguments]
        return function(*arguments)
    elif isinstance(astnode, Block):
        symboltable = SymbolTable(symbols)
        for statement in astnode.statements:
            last = run(statement, symboltable)
        return last
    elif isinstance(astnode, Assign):
        symbols[astnode.symbol] = run(astnode.value, symbols)

# only "add" or "mul" will be printed; not both
run(toast(parser.parse("if x == x then 2 + 2 else 111 * 9")), SymbolTable(symboltable, x = 0))

NameError: name 'symboltable' is not defined

To evaluate only one of the two branches, we had to implement a special rule in the interpreter because the general rule is "evaluate all function arguments before evalutaing the function."

Some languages provide control over argument evaluation, such that users of the language can create control structures like if/then/else. Lisp has a general-purpose "quote" form:

```lisp
(setq consequent (quote (+ 2 2)))                 ; setq is assignment
(setq alternate  (quote (* 999 1)))
(if (> x 0) (eval consequent) (eval alternate))   ; eval is the opposite of quote
```

that passes on the ASTs of `(+ 2 2)` and `(* 999 1)` to be evaluated later (selectively).

We can also do this in a language that allows functions to be defined and passed around as objects.

In [26]:
function_grammar = """
paramlist: CNAME | "(" ("," CNAME)* ")"
function: paramlist "=>" block
"""

grammar = "\n".join(["start: statements", "expression: or | branch | function"]
                   ) + expression_grammar + assignment_grammar + branching_grammar + function_grammar
parser = lark.Lark(grammar)

In [27]:
print(parser.parse("x => x**2").pretty())

start
  statements
    expression
      function
        paramlist	x
        block
          expression
            or
              and
                not
                  comparison
                    arith
                      term
                        factor
                          pow
                            call
                              symbol	x
                            factor
                              pow
                                call
                                  literal	2



In [28]:
class Function(AST):                             # Function: defines a new function (lambda expression)
    _fields = ("paramlist", "body")
    def __str__(self):
        return "({0}) => {1}".format(", ".join(self.paramlist), str(self.body))

def toast(ptnode):
    if ptnode.data == "function":
        paramlist = [str(x) for x in ptnode.children[0].children]
        body = toast(ptnode.children[1])
        return Function(paramlist, body, line=body.line)
    
    ######################################### from this point onward, it's the same as before...
    elif ptnode.data == "branch":
        predicate, consequent, alternate = [toast(x) for x in ptnode.children]
        return Call(Symbol("if", line=predicate.line), [predicate, consequent, alternate])
    elif ptnode.data in ("or", "and", "eq", "ne", "gt", "ge", "lt", "le") and len(ptnode.children) > 1:
        arguments = [toast(x) for x in ptnode.children]
        return Call(Symbol(str(ptnode.data), line=arguments[0].line), arguments)
    elif ptnode.data == "not_test":
        argument = toast(ptnode.children[0])
        return Call(Symbol("not", line=argument.line), [argument])
    elif ptnode.data == "statements":
        statements = [toast(x) for x in ptnode.children if x != "\n"]
        if len(statements) == 1:
            return statements[0]
        else:
            return Block(statements, line=statements[0].line)
    elif ptnode.data == "assignment":
        return Assign(str(ptnode.children[0]), toast(ptnode.children[1]), line=ptnode.children[0].line)
    elif ptnode.data in ("add", "sub", "mul", "div", "pos", "neg"):
        arguments = [toast(x) for x in ptnode.children]
        return Call(Symbol(str(ptnode.data), line=arguments[0].line), arguments)
    elif ptnode.data == "pow" and len(ptnode.children) == 2:
        arguments = [toast(ptnode.children[0]), toast(ptnode.children[1])]
        return Call(Symbol("pow", line=arguments[0].line), arguments)
    elif ptnode.data == "call" and len(ptnode.children) == 2:
        return Call(toast(ptnode.children[0]), toast(ptnode.children[1]))
    elif ptnode.data == "symbol":
        return Symbol(str(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "literal":
        return Literal(float(ptnode.children[0]), line=ptnode.children[0].line)
    elif ptnode.data == "arglist":
        return [toast(x) for x in ptnode.children]
    else:
        return toast(ptnode.children[0])    # many other cases, all of them simple pass-throughs

In [29]:
print(toast(parser.parse("""
f := x => x**2
f(y)
""")))

{f := (x) => pow(x, 2.0); f(y)}


In [30]:
def run(astnode, symbols):
    if isinstance(astnode, Function):
        def function(*args):
            if len(args) != len(astnode.paramlist):
                raise UserError(astnode.line, "wrong number of arguments")
            symboltable = SymbolTable(symbols, **dict(zip(astnode.paramlist, args)))
            return run(astnode.body, symboltable)
        return function

    ######################################### from this point onward, it's the same as before...
    elif isinstance(astnode, Literal):
        return astnode.value
    elif isinstance(astnode, Symbol):
        return symbols[astnode.symbol]
    elif isinstance(astnode, Call):
        function = run(astnode.function, symbols)
        arguments = [run(x, symbols) for x in astnode.arguments]
        return function(*arguments)
    elif isinstance(astnode, Block):
        symboltable = SymbolTable(symbols)
        for statement in astnode.statements:
            last = run(statement, symboltable)
        return last
    elif isinstance(astnode, Assign):
        symbols[astnode.symbol] = run(astnode.value, symbols)

In [31]:
symboltable = SymbolTable(**{**operator.__dict__, **math.__dict__})

run(toast(parser.parse("""
f := x => 2*x
f(y)
""")), SymbolTable(symboltable, y = 5))

10.0

In [32]:
# Now we can define "if" as a plain function that takes and calls zero-argument then/else clauses
# to customize the order of evaluation.

symboltable = SymbolTable(**{**operator.__dict__, **math.__dict__})

symboltable["if"] = (lambda predicate, consequent, alternate:
                         consequent() if predicate else alternate())

symboltable["add"] = lambda x, y: show(x, y, operator.add)
symboltable["mul"] = lambda x, y: show(x, y, operator.mul)

run(toast(parser.parse("if x == x then () => 2 + 2 else () => 111 * 9")),
    SymbolTable(symboltable, x = 0))

<built-in function add> 2.0 2.0 4.0


4.0

In [33]:
# Passing functions as arguments allows us to create control structures that don't otherwise exist.

symboltable["for"] = lambda n, f: [f(i) for i in range(int(n))]

run(toast(parser.parse("for(6, i => i**2)")), SymbolTable(symboltable))

[0.0, 1.0, 4.0, 9.0, 16.0, 25.0]

Recently, we have been talking about **Domain Specific Languages** (DSL) but referring to them as "declarative programming."

**Declarative** has to do with the **evaluation order** we've just seen.

   * **Strictly Evaluated** languages evaluate expressions in lexical order (i.e. arguments before function calls).
   
   * **Lazily Evaluated** languages eventually produce the same results but give the program more flexibility in deciding _when_ or _where_ the code will run. There are several kinds of objects representing a calculation that has not been executed:
      * An **AST** is the most powerful (Lisp's [quote](https://www.gnu.org/software/emacs/manual/html_node/elisp/Quoting.html) and C#'s [expression tree](https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/expression-trees/index)): you can edit the code as data!
      * Passing **function objects** is the fundamental idea of functional programming—you don't need the language to have control structures because you can build them yourself.
      * **Promises/futures** are placeholders for calculations running elsewhere—another thread or another computer.
   
   
   * **Declarative** languages produce the same output as strictly or lazily evaluated languages, but hide the distinction between them: _the order in which the code runs is an implementation detail_.

<br><br>

Examples of **declarative evaluation** include:

   * rendering HTML: order determines placement, but not the sequence of graphics commands to draw it
   * execution of SQL: user's queries are rewritten and optimized by query planner
   * commands in a Makefile: only executed if targets are out of date
   * cached function calls ("memoized"): function is only executed if not cached; e.g. an HTTP GET request
   * machine code instructions in a CPU: modern processors sometimes engage in [speculative execution](https://en.wikipedia.org/wiki/Out-of-order_execution).

One place we might consider declarative evaluation is in hiding the distinction between columnar and fused array operations (ask me later).

<br><br>

**We have now seen all the essential elements of a programming language.**

   * **Parsing:** conversion of source code text into a **Parsing Tree**, then an **Abstract Syntax Tree** (AST).
   * **Interpreter:** runs the program by walking over the AST at runtime.
   * **Compiler:** converts the program to another language, such as machine instructions, and runs that.
   * **Expressions:** nested elements of a mathematical formula.
      * **Literal:** values in the source code text.
      * **Symbol:** named reference to a value.
      * **Function Call:** evaluation of a function (including binary operations).
      * **Function Definition:** creation of a new function (named or unnamed).
   * **Statements:** sequential elements defining a process.
      * **Assign:** creation or replacement of a named reference (possibly to a function).
   * **Symbol Scope:** at which parts of a program symbols are bound to values (only considered **Lexical Scope**).
   * **Evaluation Order:** temporal order in which expressions and statements are evaluated.
      * **Strict/Eager Evaluation:** the usual order; arguments of a function call before the function call.
      * **Lazy Evaluation:** pass unevaluated code (e.g. function definition) to let the called function decide when to evaluate.
      * **Declarative:** evaluation order is not visible to the programmer.