In [None]:
from IPython.display import HTML
HTML(open('../style.css').read())

In [None]:
%load_ext nb_mypy

# A Recursive Parser for Arithmetic Expressions

In this notebook we implement a simple *recursive descend* parser for arithmetic expressions.
This parser will implement the following grammar:
$$
  \begin{eqnarray*}
  \mathrm{expr}        & \rightarrow & \mathrm{product}\;\;\mathrm{exprRest}            \\[0.2cm]
  \mathrm{exprRest}    & \rightarrow & \texttt{'+'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \texttt{'-'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \lambda                                      \\[0.2cm]
  \mathrm{product}     & \rightarrow & \mathrm{factor}\;\;\mathrm{productRest}          \\[0.2cm]
  \mathrm{productRest} & \rightarrow & \texttt{'*'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \texttt{'/'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \lambda                                      \\[0.2cm]
  \mathrm{factor}      & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                \\
                       & \mid        & \texttt{NUMBER} 
  \end{eqnarray*}
$$

## Implementing a Scanner

We implement a scanner with the help of the module `re`.

In [None]:
import re

The function `tokenize` receives a string `s` as argument and returns a list of tokens.
The string `s` is supposed to represent an arithmetical expression. 

**Note:** 
1. We need to set the flag `re.VERBOSE` in our call of the function `findall`
   below because otherwise we are not able to format the regular expression `lexSpec` the way 
   we have done it.  Furthermore, we wouldn't have been able to add comments inside the regular expression.
2. The regular expression `lexSpec` contains 5 parenthesized groups.  Therefore,
   `findall` returns a list of 5-tuples where the 5 components correspond to the 5
   groups of the regular expression.

In [None]:
def tokenize(s: str) -> list[str]:
    '''Transform the string s into a list of tokens.  The string s
       is supposed to represent an arithmetic expression.
    '''
    lexSpec = r'''[ \t]+        |  # blanks and tabs
                  [1-9][0-9]*|0 |  # numbers
                  [-+*/()]      |  # arithmetical operators and parentheses
               '''
    tokenList = re.findall(lexSpec, s, re.VERBOSE)
    result    = []
    for token in tokenList:
        if token == '' or token[0] in [' ', '\t']:        # skip blanks and tabs
            continue
        result += [ token ]
    return result

In [None]:
tokenize('123 + (234 +  345 - 2**0)/7')

## Implementing the Recursive Descend Parser

The next cell contains type declarations needed by `mypy`.

In [None]:
def parseExpr(TL: list[str]) -> tuple[float, list[str]]:
    return None # type: ignore

def parseExprRest(Sum: float, TL: list[str]) -> tuple[float, list[str]]:
    return None # type: ignore

def parseProduct(TL: list[str]) -> tuple[float, list[str]]:
    return None # type: ignore

def parseProductRest(product: float, TL: list[str]) -> tuple[float, list[str]]:
    return None # type: ignore

def parseFactor(TL: list[str]) -> tuple[float, list[str]]:
    return None # type: ignore

The function `parse` takes a string `s` as input and parses this string according to the recursive grammar
shown above.  The function returns the floating point number that results from evaluation the expression given in the string `s`. 

In [None]:
def parse(s: str) -> float:
    TL           = tokenize(s)
    result, Rest = parseExpr(TL)
    assert Rest == [], f'Parse Error: could not parse {TL}'
    return result

The function `parseExpr` implements the following grammar rule:
$$ \mathrm{expr} \rightarrow \;\mathrm{product}\;\;\mathrm{exprRest} $$
It takes a token list `TL` as its input and returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed during the parse process. 

In [None]:
def parseExpr(TL: list[str]) -> tuple[float, list[str]]:
    product, Rest = parseProduct(TL)
    return parseExprRest(product, Rest)

The function `parseExprRest` implements the following grammar rules:
$$
  \begin{eqnarray*}
  \mathrm{exprRest}    & \rightarrow & \texttt{'+'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \texttt{'-'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \;\lambda                                     \\[0.2cm]
  \end{eqnarray*}
$$
It takes two arguments:
- `Sum` is the value that has already been computed from the tokens parsed so far,
- `TL` is the list of tokens that still need to be consumed.

It returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed during the parse process. 

In [None]:
def parseExprRest(Sum: float, TL: list[str]) -> tuple[float, list[str]]:
    match TL:
        case []:
            return Sum, []
        case '+', *RL:
            product, Rest = parseProduct(RL)
            return parseExprRest(Sum + product, Rest)
        case '-', *RL:
            product, Rest = parseProduct(RL)
            return parseExprRest(Sum - product, Rest)
        case _:
            return Sum, TL

The function `parseProduct` implements the following grammar rule:
$$ \mathrm{product} \rightarrow \;\mathrm{factor}\;\;\mathrm{productRest} $$

It takes one argument:
- `TL` is the list of tokens that need to be consumed.

It returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed while trying to parse a product.

In [None]:
def parseProduct(TL: list[str]) -> tuple[float, list[str]]:
    factor, Rest = parseFactor(TL)
    return parseProductRest(factor, Rest)

The function `parseProductRest` implements the following grammar rules:
$$
  \begin{eqnarray*}
  \mathrm{productRest} & \rightarrow & \texttt{'*'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \texttt{'/'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \;\lambda    \\                                  
  \end{eqnarray*}
$$

It takes two arguments:
- `product` is the value that has already been parsed,
- `TL` is the list of tokens that still need to be consumed.

It returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed while trying to parse the rest of a product.

In [None]:
def parseProductRest(product: float, TL: list[str]) -> tuple[float, list[str]]:
    match TL:
        case []:
            return product, []
        case '*', *RL: 
            factor, Rest = parseFactor(RL)
            return parseProductRest(product * factor, Rest)
        case '/', *RL:
            factor, Rest = parseFactor(RL)
            return parseProductRest(product / factor, Rest)
        case _:
            return product, TL

The function `parseFactor` implements the following grammar rules:
$$
  \begin{eqnarray*}
  \mathrm{factor}      & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                \\
                       & \mid        & \;\texttt{NUMBER} 
  \end{eqnarray*}
$$

It takes one argument:
- `TL` is the list of tokens that still need to be consumed.

It returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed while trying to parse a factor.

In [None]:
def parseFactor(TL: list[str]) -> tuple[float, list[str]]:
    match TL:
        case '(', *RL: 
            expr, Rest = parseExpr(RL)
            assert Rest[0] == ')', 'Parse Error: expected ")"'
            return expr, Rest[1:]
        case _: 
            return float(TL[0]), TL[1:]

## Testing

In [None]:
def test(s: str) -> float:
    r1 = parse(s)
    r2 = eval(s)
    assert r1 == r2
    return r1

In [None]:
test('11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('0*11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('2-3-4')