In [None]:
from IPython.display import HTML
HTML(open('../style.css').read())

In [None]:
%load_ext nb_mypy

# An EBNF based Parser for Arithmetic Expressions

In this notebook we implement an <span style="font-variant:small-caps;">Ebnf</span> recursive-descend parser for arithmetic expressions.  This parser implements the following <span style="font-variant:small-caps;">Ebnf</span> grammar:
$$
  \begin{eqnarray*}
  \mathrm{expr}    & \rightarrow & \mathrm{product}\;\;\bigl((\texttt{'+'}\;|\;\texttt{'-'})\;\; \mathrm{product}\bigr)^* \\[0.2cm]
  \mathrm{product} & \rightarrow & \mathrm{factor} \;\;\bigl((\texttt{'*'}\;|\;\texttt{'/'})\;\; \mathrm{factor}\bigr)^*  \\[0.2cm]   
  \mathrm{factor}  & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                             \\
                   & \mid        & \texttt{NUMBER} 
  \end{eqnarray*}
$$

## The Scanner

In [None]:
import re

The function `tokenize` receives a string `s` as argument and returns a list of tokens.
The string `s` is supposed to represent an arithmetical expression. 

**Note:** 
 - We need to set the flag `re.VERBOSE` in our call of the function `findall`
   below because otherwise we are not able to format the regular expression `lexSpec` the way 
   we have done it.  Furthermore, we wouldn't have been able to add comments inside the regular expression.
 - Since the regular expression does not allow white space, `tokenList` will contain lots of 
   empty strings.  These have to be removed.

In [None]:
def tokenize(s: str) -> list[str]:
    lexSpec = r'''[1-9][0-9]*|0 |  # numbers
                  [-+*/()]      |  # arithmetical operators and parentheses
               '''
    tokenList = re.findall(lexSpec, s, re.VERBOSE)
    return [t for t in tokenList if t != '']

In [None]:
print(tokenize('12 * 13 + 14 * 4 / 6 - 7'))

## Implementing the Recursive Descend Parser

Since the functions `parseExpr`, `parseProduct`, and `parseFactor` are mutually recursive,
we need to declare their types in a forward declaration.

In [None]:
def parseExpr(TL: list[str]) -> tuple[float, list[str]]:
    return None # type: ignore

def parseProduct(TL: list[str]) -> tuple[float, list[str]]:
    return None # type: ignore

def parseFactor(TL: list[str]) -> tuple[float, list[str]]:
    return None # type: ignore

The function `parse` takes a string `s` as input and parses this string according to the recursive grammar
shown above.

In [None]:
def parse(s: str) -> float:
    TL           = tokenize(s)
    result, Rest = parseExpr(TL)
    assert Rest == [], f'Parse Error: could not parse {TL}, Rest = {Rest}'
    return result

The function `parseExpr(TL)` takes a list of tokens `TL` and tries to parse an expresssion according to the following
<span style="font-variant:small-caps;">Ebnf</span> grammar rule:
$$ \mathrm{expr} \;\rightarrow\; \mathrm{product}\;\;\bigl((\texttt{'+'}\;|\;\texttt{'-'})\;\; \mathrm{product}\bigr)^* $$
It returns the value of the expression and a list of all the tokens that have not been consumed during parsing.

In [None]:
def parseExpr(TL: list[str]) -> tuple[float, list[str]]:
    result, Rest = parseProduct(TL)
    while len(Rest) >= 2 and Rest[0] in {'+', '-'}: 
        operator = Rest[0]
        arg, Rest = parseProduct(Rest[1:])
        if operator == '+': 
            result += arg
        else:             # operator == '-': 
            result -= arg
    return result, Rest

The function `parseProduct(TL)` takes a list of tokens `TL` and tries to parse a product according to the following
<span style="font-variant:small-caps;">Ebnf</span> grammar rule:
$$ \mathrm{product} \;\rightarrow\; \mathrm{factor} \;\;\bigl((\texttt{'*'}\;|\;\texttt{'/'})\;\; \mathrm{factor}\bigr)^*  $$
It returns the value of the product and a list of all the tokens that have not been consumed during parsing.

In [None]:
def parseProduct(TL: list[str]) -> tuple[float, list[str]]:
    result, Rest = parseFactor(TL)
    while len(Rest) >= 2 and Rest[0] in {'*', '/'}:
        operator = Rest[0]
        arg, Rest = parseFactor(Rest[1:])
        if operator == '*':
            result *= arg
        else:             # operator == '/':
            result /= arg
    return result, Rest

The function `parseFactor` implements the following grammar rules:
$$
  \begin{eqnarray*}
  \mathrm{factor}      & \;\rightarrow\; & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                \\
                       & \;\mid          & \;\texttt{NUMBER} 
  \end{eqnarray*}
$$

It takes one argument:
- `TL` is the list of tokens that still need to be consumed.

It returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed while trying to parse a factor.

In [None]:
def parseFactor(TL: list[str]) -> tuple[float, list[str]]:
    if TL[0] == '(': 
        expr, Rest = parseExpr(TL[1:])
        assert Rest[0] == ')', f"ERROR: ')' expected, got {Rest[0]}"
        return expr, Rest[1:]
    else:
        return float(TL[0]), TL[1:]

## Testing

In [None]:
def test(s: str) -> float:
    r1 = parse(s)
    r2 = eval(s)
    assert r1 == r2
    return r1

In [None]:
parse('12 * 13 + 14 * 4 / 6 - 7')

In [None]:
test('11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('0*11+22*(33-44)/(5-10*5/(4-3))')