In [None]:
from IPython.core.display import HTML
with open ("../style.css", "r") as file:
    css = file.read()
HTML(css)

# A Recursive Parser for Arithmetic Expressions

In this notebook we implement a simple *recursive descend* parser for arithmetic expressions.
This parser will implement the following grammar:
$$
  \begin{eqnarray*}
  \mathrm{expr}        & \rightarrow & \mathrm{product}\;\;\mathrm{exprRest}            \\[0.2cm]
  \mathrm{exprRest}    & \rightarrow & \texttt{'+'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \texttt{'-'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \varepsilon                                      \\[0.2cm]
  \mathrm{product}     & \rightarrow & \mathrm{factor}\;\;\mathrm{productRest}          \\[0.2cm]
  \mathrm{productRest} & \rightarrow & \texttt{'*'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \texttt{'/'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \varepsilon                                      \\[0.2cm]
  \mathrm{factor}      & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                \\
                       & \mid        & \texttt{NUMBER} 
  \end{eqnarray*}
$$

## Implementing a Scanner

We implement a scanner with the help of the module `re`.

In [None]:
import re

The function `tokenize` receives a string `s` as argument and returns a list of tokens.
The string `s` is supposed to represent an arithmetical expression. 

**Note:** 
1. We need to set the flag `re.VERBOSE` in our call of the function `findall`
   below because otherwise we are not able to format the regular expression `lexSpec` the way 
   we have done it.
2. The regular expression `lexSpec` contains 5 parenthesized groups.  Therefore,
   `findall` returns a list of 5-tuples where the 5 components correspond to the 5
   groups of the regular expression.

In [None]:
def tokenize(s):
    '''Transform the string s into a list of tokens.  The string s
       is supposed to represent an arithmetic expression.
    '''
    lexSpec = r'''([ \t]+)        |  # blanks and tabs
                  ([1-9][0-9]*|0) |  # number
                  ([()])          |  # parentheses 
                  ([-+*/])        |  # arithmetical operators
                  (.)                # unrecognized character
               '''
    tokenList = re.findall(lexSpec, s, re.VERBOSE)
    result    = []
    # print(tokenList)
    for ws, number, parenthesis, operator, error in tokenList:
        if ws:        # skip blanks and tabs
            pass
        if number:
            result += [ number ]
        if parenthesis:
            result += [ parenthesis ]
        if operator:
            result += [ operator ]
        if error:
            result += [ f'ERROR({error})']
    return result

In [None]:
tokenize('1 + (2 + @ 34 - 2**0)/7')

## Implementing the Recursive Descent Parser

The function `parse` takes a string `s` as input and parses this string according to the recursive grammar
shown above.

In [None]:
def parse(s):
     TL           = tokenize(s)
     result, Rest = parseExpr(TL)
     assert Rest == [], f'Parse Error: could not parse {TL}'
     return result

The function `parseExpr` implements the following grammar rule:
$$ \mathrm{expr} \rightarrow \;\mathrm{product}\;\;\mathrm{exprRest} $$
It takes a token list `TL` as its input and returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed during the parse process. 

In [None]:
def parseExpr(TL):
    product, Rest = parseProduct(TL)
    return parseExprRest(product, Rest)

The function `parseExprRest` implements the following grammar rules:
$$
  \begin{eqnarray*}
  \mathrm{exprRest}    & \rightarrow & \texttt{'+'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \texttt{'-'} \;\;\mathrm{product}\;\;\mathrm{exprRest}   \\
                       & \mid        & \;\varepsilon                                      \\[0.2cm]
  \end{eqnarray*}
$$
It takes two arguments:
- `sum` is the value that has already been parsed,
- `TL` is the list of tokens that still need to be consumed.

It returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed during the parse process. 

In [None]:
def parseExprRest(sum, TL):
    if TL == []:
        return sum, []
    elif TL[0] == '+':
        product, Rest = parseProduct(TL[1:])
        return parseExprRest(sum + product, Rest)
    elif TL[0] == '-':
        product, Rest = parseProduct(TL[1:])
        return parseExprRest(sum - product, Rest)
    else:
        return sum, TL

The function `parseProduct` implements the following grammar rule:
$$ \mathrm{product} \rightarrow \;\mathrm{factor}\;\;\mathrm{productRest} $$

It takes one argument:
- `TL` is the list of tokens that need to be consumed.

It returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed while trying to parse a product.

In [None]:
def parseProduct(TL):
    factor, Rest = parseFactor(TL)
    return parseProductRest(factor, Rest)

The function `parseProductRest` implements the following grammar rules:
$$
  \begin{eqnarray*}
  \mathrm{productRest} & \rightarrow & \texttt{'*'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \texttt{'/'} \;\;\mathrm{factor}\;\;\mathrm{productRest} \\
                       & \mid        & \;\varepsilon    \\                                  
  \end{eqnarray*}
$$

It takes two arguments:
- `product` is the value that has already been parsed,
- `TL` is the list of tokens that still need to be consumed.

It returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed while trying to parse the rest of a product.

In [None]:
def parseProductRest(product, TL):
    if TL == []:
        return product, []
    elif TL[0] == '*': 
        factor, Rest = parseFactor(TL[1:])
        return parseProductRest(product * factor, Rest)
    elif TL[0] == '/':
        factor, Rest = parseFactor(TL[1:])
        return parseProductRest(product / factor, Rest)
    else:
        return product, TL

The function `parseFactor` implements the following grammar rules:
$$
  \begin{eqnarray*}
  \mathrm{factor}      & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}                \\
                       & \mid        & \;\texttt{NUMBER} 
  \end{eqnarray*}
$$

It takes one argument:
- `TL` is the list of tokens that still need to be consumed.

It returns a pair of the form `(value, Rest)` where
- `value` is the result of evaluating the arithmetical expression
  that is represented by `TL` and
- `Rest` is a list of those tokens that have not been consumed while trying to parse a factor.

In [None]:
def parseFactor(TL):
    if TL[0] == '(': 
        expr, Rest = parseExpr(TL[1:])
        assert Rest[0] == ')', 'Parse Error: expected ")"'
        return expr, Rest[1:]
    else: 
        return int(TL[0]), TL[1:]

## Testing

In [None]:
def test(s):
    r1 = parse(s)
    r2 = eval(s)
    assert r1 == r2
    return r1

In [None]:
test('11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('0*11+22*(33-44)/(5-10*5/(4-3))')