In [None]:
%%HTML
<style>
.container { width:100% } 
</style>

# The Shunting Yard Algorithm (Operator Precedence Parsing)

The function $\texttt{toInt}(s)$ tries to convert the string $s$ to an integer.  If this works out, the integer is returned.  Otherwise, the string $s$ is returned unchanged.

In [None]:
def toInt(s):
    try:
        return int(s)   
    except ValueError:
        return s

In [None]:
toInt('123')

In [None]:
toInt('**')

The module `re` provides support for <a href='https://en.wikipedia.org/wiki/Regular_expression'>regular expressions</a>.  These are needed for
<em style="color:blue;">tokenizing</em> a string.

In [None]:
import re

The function $\texttt{tokenize}(s)$ takes a string $s$ representing an arithmetic expression and splits this string into a list of tokens.
The string `regExp` in the implementation below is interpreted as follows:

  - The `r` in front of the apostrophe `'` specifies that the regular expression is defined as a <em style="color:blue;">raw string</em>.  In a *raw string* the backslash does not have to be escaped because it is treated as a literal character.</br>  
  - The regular expression is divided into three parts. These parts are separated by the character `|`.  
      1. `[0-9]+` matches a natural number.  For example, it matches `0` or `123`.  It would also match a string like `007`.
         The `+` at the end of the substring `[0-9]*` specifies that there are any positive number of the characters in the range `[0-9]`.</br>
      2. `\*\*` matches the operator `**`.</br>
      3. `[()+*/%-]` matches a parenthesis or an arithmetical operator.  Note that we have 
         to put the symbol `-` last in this group as otherwise this symbol would be 
         interpreted as a range operator.

In [None]:
def tokenize(s):
    regExp = r'([0-9]+|\*\*|[()+*%/-])'
    L = [ toInt(t) for t in re.findall(regExp, s) ]
    return list(reversed(L))

In [None]:
tokenize('12+34*56/3-(17+2**4)')

The function $\texttt{evalBefore}(\texttt{o}_1, \texttt{o}_2)$ receives to strings representing artithmetical operators.  It returns `True` if the operator $\texttt{o}_1$ should be evaluated before the operator $\texttt{o}_2$ in an arithmetical expression of the form $a \;\texttt{o}_1\; b \;\texttt{o}_2\; c$.

In [None]:
def evalBefore(stackOp, nextOp):
    if stackOp == '(':
        return False
    Precedence = { '+': 1, '-': 1, '*': 2, '/': 2, '%': 2, '**' : 3 }
    if Precedence[stackOp] > Precedence[nextOp]:
        return True
    elif Precedence[stackOp] == Precedence[nextOp]:
        if stackOp == nextOp:
            return stackOp in { '+', '-', '*', '/', '%' }
        else:
            return True
    else:
        return False

In [None]:
import stack

The class `Calculator` supports three member variables:
  - the token stack `mTokenStack` 
  - the operator stack `mOperators`
  - the argument stack `mArguments`
  
The constructor takes a string that is tokenized and pushes the tokens onto the token stack such that the first token is on top of the token stack.

In [None]:
class Calculator:
    def __init__(self, s):
        self.mTokens    = stack.createStack(tokenize(s))
        self.mOperators = stack.Stack()
        self.mArguments = stack.Stack()    

The method `__str__` is used to convert an object of class `Calculator` to a string.

In [None]:
def toString(self):
    return '\n'.join(['_'*50, 
                      'Tokens:    ' + str(self.mTokens), 
                      'Arguments: ' + str(self.mArguments), 
                      'Operators: ' + str(self.mOperators), 
                      '_'*50])

Calculator.__str__ = toString
del toString

The function $\texttt{evaluate}(\texttt{self})$ evaluates the expression that is given by the tokens on the `mTokenStack`.

In [None]:
def evaluate(self):
    while not self.mTokens.isEmpty():
        print(self)
        nextOp = self.mTokens.top(); self.mTokens.pop()
        if isinstance(nextOp, int):
            self.mArguments.push(nextOp)
            continue
        if self.mOperators.isEmpty() or nextOp == "(":
            self.mOperators.push(nextOp)
            continue
        stackOp = self.mOperators.top()
        if stackOp == "(" and nextOp == ")":
            self.mOperators.pop()
        elif nextOp == ")" or evalBefore(stackOp, nextOp):
            self.popAndEvaluate()
            self.mTokens.push(nextOp)
        else:
            self.mOperators.push(nextOp)
    while not self.mOperators.isEmpty():
        print(self)
        self.popAndEvaluate()
    return self.mArguments.top()
    
Calculator.evaluate = evaluate

The method $\texttt{popAndevaluate}(\texttt{self})$ removes the two topmost numbers $\texttt{rhs}$ and $\texttt{lhs}$ from the argument stack and 
removes the topmost operator $\texttt{op}$ from the argument stack.  It computes the value
$$ \texttt{lhs} \;\texttt{op}\; \texttt{rhs} $$
and pushes this value on the argument stack.

In [None]:
def popAndEvaluate(self):
    rhs = self.mArguments.top(); self.mArguments.pop()
    lhs = self.mArguments.top(); self.mArguments.pop()
    op  = self.mOperators.top(); self.mOperators.pop()
    result = None
    if op == '+':
        result = lhs + rhs
    if op == '-':
        result = lhs - rhs
    if op == '*':
        result = lhs * rhs
    if op == '/':
        result = lhs // rhs
    if op == '%':
        result = lhs % rhs
    if op == '**':
        result = lhs ** rhs
    assert result != None, f'ERROR: *** Unknown Operator *** "{op}"'
    self.mArguments.push(result)
    
Calculator.popAndEvaluate = popAndEvaluate
del popAndEvaluate

In [None]:
C = Calculator('2*(3+4)**2')

In [None]:
C.evaluate()