In [None]:
from IPython.display import HTML
HTML(open('../style.css').read())

# A Shift-Reduce Parser for Arithmetic Expressions

In this notebook we implement a generic *shift reduce parser*.  The parse table that we use 
implements the following grammar for arithmetic expressions:
$$
  \begin{eqnarray*}
  \mathrm{expr}        & \rightarrow & \mathrm{expr}\;\;\texttt{'+'}\;\;\mathrm{product}   \\
                       & \mid        & \mathrm{expr}\;\;\texttt{'-'}\;\;\mathrm{product}   \\
                       & \mid        & \mathrm{product}                                    \\[0.2cm]
  \mathrm{product}     & \rightarrow & \mathrm{product}\;\;\texttt{'*'}\;\;\mathrm{factor} \\
                       & \mid        & \mathrm{product}\;\;\texttt{'/'}\;\;\mathrm{factor} \\
                       & \mid        & \mathrm{factor}                                     \\[0.2cm]
  \mathrm{factor}      & \rightarrow & \texttt{'('} \;\;\mathrm{expr} \;\;\texttt{')'}     \\
                       & \mid        & \texttt{NUMBER} 
  \end{eqnarray*}
$$

In [None]:
%load_ext nb_mypy

## Implementing a Scanner

In order to parse, we need a scanner.  We will use a scanner that is similar to the one that we have already used for our *top down parser* discussed earlier in Chapter 6.

In [None]:
import re

The function `tokenize` scans the string `s` into a list of tokens using Python's regular expressions.  The scanner distinguishes between
* whitespace, which is discarded,
* numbers,
* arithmetical operators and parenthesis,
* all remaining characters, which are treated as lexical errors.

See below for an example.

In [None]:
def tokenize(s: str) -> list[str]:
    '''Transform the string s into a list of tokens.  The string s
       is supposed to represent an arithmetic expression.
    '''
    lexSpec = r'''([ \t\n]+)      |  # blanks and tabs
                  ([1-9][0-9]*|0) |  # number
                  ([-+*/()])      |  # arithmetical operators
                  (.)                # unrecognized character
               '''
    tokenList = re.findall(lexSpec, s, re.VERBOSE)
    result    = []
    for ws, number, operator, error in tokenList:
        if ws:        # skip blanks and tabs
            continue
        elif number:
            result += [ 'NUMBER' ]
        elif operator:
            result += [ operator ]
        else:
            result += [ f'ERROR({error})']
    return result

In [None]:
tokenize('11 + 22 * (33 - 45) / 007')

In [None]:
Token      = str
Variable   = str
Symbol     = Token | Variable
State      = str
Rule       = tuple[Variable, tuple[Symbol, ...]]
Action     = str | tuple[str, State] | tuple[str, Rule]
ActionTable= dict[tuple[State, Token], Action]
GotoTable  = dict[tuple[State, Variable], State]

Assume a grammar $G = \langle V, T, R, S \rangle$ is given.  A  *shift-reduce parser*
is defined as a 4-Tuple
$$P = \langle Q, q_0, \texttt{action}, \texttt{goto} \rangle$$
where
- $Q$ is the set of *states* of the shift-reduce parser.  

  For the purpose of the shift-reduce-parser, states are purely abstract. 
- $q_0 \in Q$ is the *start state*.
- $\texttt{action}$ is a function taking two arguments. The first argument is a state $q \in Q$
  and the second argument is a token $t \in T$.  The result of this function is an element from the set
  $$\texttt{Action} :=
       \bigl\{ \langle\texttt{shift}, q\rangle  \mid q \in Q \bigr\}               \cup 
       \bigl\{ \langle\texttt{reduce}, r\rangle \mid r \in R \bigr\} \cup 
       \bigl\{ \texttt{accept} \bigr\}                        \cup
       \bigl\{ \texttt{error}  \bigr\}.                         
  $$
  Here `shift`, `reduce`, `accept`, and `error` are strings that serve to
  distinguish the different kinds of results returned by the function 
  `action`:
  $$\texttt{action}: Q \times T \rightarrow \texttt{Action}.$$
- `goto` is a function that takes a state $q \in Q$ and a syntactical variable
  $v \in V$ and computes a new state:
  $$\texttt{goto}: Q \times V \rightarrow Q.$$

The class `ShiftReduceParser` maintains two tables that are implemented as dictionaries:
- `mActionTable` encodes the function $\texttt{action}: Q \times T \rightarrow \texttt{Action}$.
- `mGotoTable` encodes the function $\texttt{goto}: Q \times V \rightarrow Q$.

The constructor takes these tables as arguments and stores them in the member variables `mActionTable` and `mGotoTable`.

In [None]:
class ShiftReduceParser():
    def __init__(self, actionTable: ActionTable, gotoTable: GotoTable) -> None:
        self.mActionTable: ActionTable = actionTable
        self.mGotoTable  : GotoTable   = gotoTable
        
    def parse(self, TL: list[str]) -> bool:
        return None # type: ignore

The method `parse` takes a list of tokens `TL` as its argument.  It returns `True` if the token list can be parsed successfully or `False` otherwise.  
The algorithm that is applied is known as *shift/reduce parsing*.

In [None]:
def parse(self, TL: list[Token]) -> bool:
    index                 = 0      # points to next token
    Symbols: list[Symbol] = []     # stack of symbols, i.e. tokens or variables
    States:  list[State]  = ['s0'] # stack of states, s0 is start state
    TL += ['EOF']
    while True:
        q = States[-1]
        t = TL[index]
        # Any undefined table entries are interpreted as error entries.
        match self.mActionTable.get((q, t), 'error'):
            case 'error': 
                return False
            case 'accept':
                return True
            case 'shift', s: 
                Symbols += [t]
                States  += [s]
                index   += 1
            case 'reduce', (head, body):
                n          = len(body)
                Symbols    = Symbols[:-n]
                States     = States [:-n] 
                Symbols    = Symbols + [head]
                state      = States[-1]
                States    += [ self.mGotoTable[state, head] ]
            
ShiftReduceParser.parse = parse # type: ignore
del parse

## Testing

In [None]:
%run Parse-Table.ipynb

In [None]:
def test(s:str) -> None: 
    parser = ShiftReduceParser(actionTable, gotoTable) # type: ignore
    TL     = tokenize(s)
    if parser.parse(TL):
        print('Parse successful!')
    else:
        print('Parse failed!')

In [None]:
test('(1 + 2) * 3')

In [None]:
test('1 * 2 + 3 * (4 - 5) / 2')

In [None]:
test('11+22*(33-44)/(5-10*5/(4-3))')

In [None]:
test('1+2*3-')

In [None]:
test('1+2*3-007')