In [1]:
from IPython.core.display import HTML
with open('../../style.css', 'r') as file:
    css = file.read()
HTML(css)

# Implementing an SLR-Table-Generator

## A Grammar for Grammars

As the goal is to generate an *SLR-table-generator* we first need to implement a parser for context free grammars.
The file `arith.g` contains an example grammar that describes arithmetic expressions.

In [2]:
!cat Examples/arith.g

expr: expr '+' product
    | expr '-' product
    | product
    ;
 
product: product '*' factor
       | product '/' factor
       | factor
       ;
       
factor: '(' expr ')'
      | NUMBER
      ;


We use <span style="font-variant:small-caps;">Antlr</span> to develop a parser for context free grammars.  The pure grammar used to parse context free grammars is stored in the file `Pure.g4`.  It is similar to the grammar that we have already used to implement *Earley's algorithm*, but allows additionally the use of the operator `|`, so that all grammar rules that define a variable can be combined in one rule.

In [3]:
!cat Pure.g4

grammar Pure;

start: grmrl+;

grmrl: VARIABLE ':' body ('|' body)* ';';

body: item* ;
 
item : VARIABLE 
     | TOKEN  
     | LITERAL
     ;

VARIABLE: [a-z][a-zA-Z_]*;
TOKEN   : [A-Z][a-zA-Z_]*;
LITERAL : '\'' ~('\'')+ '\'';
        
WS      : [ \t\n\r]     -> skip;
COMMENT : '//' ~('\n')* -> skip;


The annotated grammar is stored in the file `Grammar.g4`.
The parser will return a list of grammar rules, where each rule of the form
$$ a \rightarrow \beta $$
is stored as the tuple `(a,) + ùõΩ`.

In [4]:
!cat -n Grammar.g4

     1	grammar Grammar;
     2	
     3	start returns [g]
     4	    : {Rules = []}
     5	      (grmrl {Rules += $grmrl.rl})+
     6	      {$g = Rules}
     7	    ;
     8	
     9	grmrl returns [rl]
    10	    : {RuleList = []}
    11	      v=VARIABLE ':' b1=body {RuleList.append(($v.text,) + $b1.il)}
    12	      ('|' b2=body {RuleList.append(($v.text,) + $b2.il)})* ';' 
    13	      {$rl = RuleList}
    14	    ;
    15	
    16	body returns [il]
    17	    : {Body = []} (i=item {Body.append($i.atom)})*
    18	      {$il = tuple(Body)}
    19	    ;
    20	
    21	item returns [atom]
    22	    : v=VARIABLE {$atom = $v.text}
    23	    | t=TOKEN    {$atom = $t.text}
    24	    | l=LITERAL  {$atom = $l.text}
    25	    ;
    26	
    27	VARIABLE: [a-z][a-zA-Z_]*;
    28	TOKEN   : [A-Z][a-zA-Z_]*;
    29	LITERAL : '\'' ~('\'')+ '\'';
    30	        
    31	WS      : [ \t\n\r]     -> skip ;
    32	COMMENT : '//' ~('\n')* -> skip ;


We start by generating both scanner and parser.  

In [5]:
!antlr4 -Dlanguage=Python3 Grammar.g4

In [6]:
from GrammarLexer  import GrammarLexer
from GrammarParser import GrammarParser

In [7]:
%load_ext nb_mypy

Version 1.0.5


We define a few *type aliases* in order to make the types more readable.

In [8]:
Variable = str
Token    = str
Symbol   = Variable | Token
Symbols  = tuple[Symbol, ...]

## The Class `GrammarRule`

The class `GrammarRule` is used to store a single grammar rule.  As we have to use objects of type `GrammarRule` as *keys* in a dictionary later, we have to provide the methods `__eq__`, `__ne__`, and `__hash__`.

In [9]:
class GrammarRule:
    def __init__(self, variable: Variable, body: Symbols) -> None:
        self.mVariable: Variable = variable
        self.mBody    : Symbols  = tuple(body)
        
    def __eq__(self, other) -> bool:
        return isinstance(other, GrammarRule)    and \
               self.mVariable == other.mVariable and \
               self.mBody     == other.mBody
    
    def __ne__(self, other) -> bool:
        return not self.__eq__(other)
    
    def __hash__(self) -> int:
        return hash(self.__repr__())
    
    def __repr__(self) -> str:
        return f'{self.mVariable} ‚Üí {" ".join(self.mBody)}'

The function `parse_grammar` takes a string `filename` as its argument and returns the grammar that is stored in the specified file.  The grammar is represented as list of rules.  Each rule is represented as a tuple.  The example below will clarify this structure.

In [10]:
import antlr4

In [11]:
def parse_grammar(filename: str) -> list[GrammarRule]:
    input_stream  = antlr4.FileStream(filename, encoding="utf-8")
    lexer         = GrammarLexer(input_stream)       # type: ignore
    token_stream  = antlr4.CommonTokenStream(lexer)  
    parser        = GrammarParser(token_stream)      # type: ignore
    grammar       = parser.start()
    return [GrammarRule(head, body) for head, *body in grammar.g]

In [12]:
grammar = parse_grammar('Examples/arith.g')
grammar

[expr ‚Üí expr '+' product,
 expr ‚Üí expr '-' product,
 expr ‚Üí product,
 product ‚Üí product '*' factor,
 product ‚Üí product '/' factor,
 product ‚Üí factor,
 factor ‚Üí '(' expr ')',
 factor ‚Üí NUMBER]

Given a string `name`, which is either a *variable*, a *token*, or a *literal*, the function `is_var` checks whether `name` is a variable.  The function can distinguish variable names from tokens and literals because variable names consist only of lower case letters, while tokens are all uppercase and literals start with the character "`'`".

In [13]:
def is_var(name: Symbol) -> bool:
    return name[0] != "'" and name.islower()

**Fun Fact:** The invocation of `"'return'".islower()` returns `True`.  This is the reason that we have to test that
`name` does not start with a `"'"` character because otherwise keywords like `'return'` or `'while'` appearing in a grammar would be mistaken for variables.

In [14]:
"'return'".islower()

True

Given a list `Rules` of `GrammarRules`, the function `collect_variables(Rules)` returns the set of all *variables* occuring in `Rules`.

In [15]:
def collect_variables(Rules: list[GrammarRule]) -> set[Variable]:
    Variables: set[str] = set()
    for rule in Rules:
        Variables.add(rule.mVariable)
        for item in rule.mBody:
            if is_var(item):
                Variables.add(item)
    return Variables

In [16]:
collect_variables(grammar)

{'expr', 'factor', 'product'}

Given a set `Rules` of `GrammarRules`, the function `collect_tokens(Rules)` returns the set of all *tokens* and *literals* occuring in `Rules`.

In [17]:
def collect_tokens(Rules: list[GrammarRule]) -> set[Token]:
    Tokens: set[str] = set()
    for rule in Rules:
        for item in rule.mBody:
            if not is_var(item):
                Tokens.add(item)
    return Tokens

In [18]:
collect_tokens(grammar)

{"'('", "')'", "'*'", "'+'", "'-'", "'/'", 'NUMBER'}

## Marked Rules

The class `MarkedRule` stores a single *marked rule* of the form
$$ v \rightarrow \alpha \bullet \beta $$
where the *variable* $v$ is stored in the member variable `mVariable`, while $\alpha$ and $\beta$ are stored in the variables `mAlpha`and `mBeta` respectively.  These variables are assumed to contain tuples of *grammar symbols*.  A *grammar symbol* is either
- a *variable*,
- a *token*, or
- a *literal*, i.e. a string enclosed in single quotes.


Later, we need to maintain sets of *marked rules* to represent *states*.  Therefore, we have to define the methods `__eq__`, `__ne__`, and `__hash__`.

In [19]:
class MarkedRule():
    def __init__(self, variable: Variable, alpha: Symbols, beta: Symbols) -> None:
        self.mVariable: Variable = variable
        self.mAlpha   : Symbols  = alpha
        self.mBeta    : Symbols  = beta
        
    def __eq__(self, other) -> bool:
        return isinstance(other, MarkedRule)     and \
               self.mVariable == other.mVariable and \
               self.mAlpha    == other.mAlpha    and \
               self.mBeta     == other.mBeta
    
    def __ne__(self, other) -> bool:
        return not self.__eq__(other)
    
    def __hash__(self) -> int:
        return hash(self.__repr__())
    
    def __repr__(self) -> str:
        alphaStr = ' '.join(self.mAlpha)
        betaStr  = ' '.join(self.mBeta)
        return f'{self.mVariable} ‚Üí {alphaStr} ‚Ä¢ {betaStr}'

Given a *marked rule* `self`, the function `is_complete` checks, whether the *marked rule* `self` has the form
$$ c \rightarrow \alpha\; \bullet,$$
i.e. it checks, whether the $\bullet$ is at the end of the grammar rule.

In [20]:
def is_complete(self: MarkedRule) -> bool:
    return len(self.mBeta) == 0

MarkedRule.is_complete = is_complete # type: ignore
del is_complete

Given a *marked rule* `self` of the form
$$ c \rightarrow \alpha \bullet X\, \delta, $$
the function `symbol_after_dot` returns the *symbol* $X$. If there is no symbol after the $\bullet$, the method returns `None`.

In [21]:
def symbol_after_dot(self: MarkedRule) -> Symbol | None:
    if len(self.mBeta) > 0:
        return self.mBeta[0]
    return None

MarkedRule.symbol_after_dot = symbol_after_dot # type: ignore
del symbol_after_dot

Given a marked rule of the form
$$ c \rightarrow \alpha \bullet b \delta, $$
this function returns the variable $b$ following the dot.  If there is no variable following the dot, the function returns `None`.  

In [22]:
def next_var(self: MarkedRule) -> Variable | None:
    if len(self.mBeta) > 0:
        var = self.mBeta[0]
        if is_var(var):
            return var
    return None

MarkedRule.next_var = next_var # type: ignore
del next_var

The function `move_dot(self)` transforms a *marked rule*  of the form 
$$ c \rightarrow \alpha \bullet X\, \beta $$
into a *marked rule* of the form
$$ c \rightarrow \alpha\, X \bullet \beta, $$
i.e. the $\bullet$ is moved over the next symbol.  Invocation of this method assumes that there is a symbol
following the $\bullet$.

In [23]:
def move_dot(self: MarkedRule) -> MarkedRule:
    return MarkedRule(self.mVariable, 
                      self.mAlpha + (self.mBeta[0],), 
                      self.mBeta[1:])

MarkedRule.move_dot = move_dot # type: ignore
del move_dot

The function `to_rule(self)` turns the *marked rule* `self` into  a `GrammarRule`, i.e. the *marked rule*
$$ c \rightarrow \alpha \bullet \beta $$
is turned into the grammar rule
$$ c \rightarrow \alpha\, \beta. $$

In [24]:
def to_rule(self: MarkedRule) -> GrammarRule:
    return GrammarRule(self.mVariable, self.mAlpha + self.mBeta)

MarkedRule.to_rule = to_rule # type: ignore
del to_rule

## SLR-Table-Generation

The class `Grammar` represents a context free grammar.  It stores a list of the `GrammarRules` of the given grammar.
Each grammar rule is of the form
$$ a \rightarrow \beta $$
where $\beta$ is a tuple of variables, tokens, and literals.
The start symbol is assumed to be the variable on the left hand side of the first rule. The grammar is *augmented* with the rule
$$ \widehat{s} \rightarrow s\, \$. $$
Here $s$ is the start variable of the given grammar and $\widehat{s}$ is a new variable that is the start variable of the *augmented grammar*. The symbol `$` denotes the end of input.  The non-obvious member variables of the class `Grammar` have the following interpretation
- `mStates` is the set of all states of the *SLR-parser*.  These states are sets of *marked rules*.
- `mStateNames`is a dictionary assigning names of the form `s0`, `s1`, $\cdots$, `sn` to the states stored in 
  `mStates`.  The functions `action` and `goto` will be defined for *state names*, not for *states*, because 
  otherwise the table representing these functions would become both huge and unreadable.
- `mConflicts` is a Boolean variable that will be set to true if the table generation discovers 
  *shift/reduce conflicts* or *reduce/reduce conflicts*.

In [29]:
class Grammar():
    def __init__(self, Rules: list[GrammarRule]):
        self.mRules     : list[GrammarRule] = Rules
        self.mStart     : Variable          = Rules[0].mVariable
        self.mVariables : set[Variable]     = collect_variables(Rules)
        self.mTokens    : set[Token]        = collect_tokens(Rules)
        self.mStates    : set[MarkedRule]   = set()
        self.mConflicts : bool              = False
        self.mStateNames: dict[str, set[MarkedRule]] = {}
        self.mVariables.add('≈ù')
        self.mTokens.add('$') # short fo EOF
        self.mRules.append(GrammarRule('≈ù', (self.mStart, '$'))) # augment the grammar
        self.mRuleNames: dict[GrammarRule, str] = {} 
        self.compute_tables()                                    # type: ignore

Given a set of `Variables`, the function `initialize_dictionary` returns a dictionary that assigns the empty set to all variables.
This function is needed to initialize the member variable `mFirst` and `mFollow` that are dictionaries storing the *first-set* and
*follow-sets* of the syntactical variables.

In [30]:
def initialize_dictionary(Variables: set[Variable]) -> dict[Variable, set[Token]]:
    return { a: set() for a in Variables }

Given a `Grammar`, the function `compute_tables` computes
- the sets `First(v)` and `Follow(v)` for every variable `v`,
- the set of all *states* of the *SLR-Parser*,
- the *action table*, and
- the *goto table*. 

Given a grammar `g`,
- the set `g.mFirst` is a dictionary such that `g.mFirst[a] = First(a)` and
- the set `g.mFollow` is a dictionary such that `g.mFollow[a] = Follow(a)` for all variables `a`.

In [31]:
def compute_tables(self: Grammar) -> None:
    self.mFirst  = initialize_dictionary(self.mVariables) # type: ignore
    self.mFollow = initialize_dictionary(self.mVariables) # type: ignore
    self.compute_first()         # type: ignore
    self.compute_follow()        # type: ignore
    self.compute_rule_names()    # type: ignore
    self.all_states()            # type: ignore
    self.compute_action_table()  # type: ignore
    self.compute_goto_table()    # type: ignore
    
Grammar.compute_tables = compute_tables # type: ignore
del compute_tables

The function `compute_rule_names` assigns a unique name to each *rule* of the grammar.  These names are used later
to represent *reduce actions* in the *action table*.

In [32]:
def compute_rule_names(self: Grammar) -> None:
    counter = 0
    for rule in self.mRules:
        self.mRuleNames[rule] = 'r' + str(counter)
        counter += 1
        
Grammar.compute_rule_names = compute_rule_names # type: ignore
del compute_rule_names

The function `compute_first(self)` computes the sets $\texttt{First}(c)$ for all variables $c$ and stores them in the dictionary `mFirst`.  Abstractly, given a variable $c$ the function $\texttt{First}(c)$ is the set of all tokens that can start a string that is derived from $c$:
$$\texttt{First}(\texttt{c}) := 
  \Bigl\{ t \in T \Bigm| \exists \gamma \in (V \cup T)^*: \texttt{c} \Rightarrow^* t\,\gamma \Bigr\}.
$$
The definition of the function $\texttt{First}()$ is extended to strings from $(V \cup T)^*$ as follows:
- $\texttt{FirstList}(\varepsilon) = \{\}$.
- $\texttt{FirstList}(t \beta) = \{ t \}$  if $t \in T$.
- $\texttt{FirstList}(\texttt{a} \beta) = \left\{
       \begin{array}[c]{ll}
         \texttt{First}(\texttt{a}) \cup \texttt{FirstList}(\beta) & \mbox{if $\texttt{a} \Rightarrow^* \varepsilon$;} \\
         \texttt{First}(\texttt{a})                                & \mbox{otherwise.}
       \end{array}
       \right.
      $ 

If $\texttt{a}$ is a variable of $G$ and the rules defining $\texttt{a}$ are given as 
$$\texttt{a} \rightarrow \alpha_1 \mid \cdots \mid \alpha_n, $$
then we have
$$\texttt{First}(\texttt{a}) = \bigcup\limits_{i=1}^n \texttt{FirstList}(\alpha_i). $$
The dictionary `mFirst` that stores this function is computed via a *fixed point iteration*.

In [34]:
def compute_first(self: Grammar) -> None:
    change = True
    while change:
        change = False
        for rule in self.mRules:
            a, body = rule.mVariable, rule.mBody
            first_body = self.first_list(body)      # type: ignore
            if not (first_body <= self.mFirst[a]):  # type: ignore
                change = True
                self.mFirst[a] |= first_body        # type: ignore   
    print('First sets:')
    for v in self.mVariables:
        print(f'First({v}) = {self.mFirst[v]}')     # type: ignore
        
Grammar.compute_first = compute_first               # type: ignore
del compute_first

Given a tuple of variables and tokens `alpha`, the function `first_list(alpha)` computes the function $\texttt{FirstList}(\alpha)$ that has been defined above.  If `alpha` is *nullable*, then the result will contain the empty string $\lambda$.

In [36]:
def first_list(self: Grammar, alpha: Symbols) -> set[Token]:
    if len(alpha) == 0:
        return { '' }
    elif is_var(alpha[0]): 
        v, *r = alpha
        return eps_union(self.mFirst[v], self.first_list(r)) # type: ignore
    else:
        t = alpha[0]
        return { t }
    
Grammar.first_list = first_list                              # type: ignore
del first_list

The arguments `S` and `T` of `eps_union` are sets that contain tokens and, additionally, they might contain the empty string $\lambda$.  The specification of `eps_union` is:
$$ \texttt{eps_union}(S, T) = \left\{ \begin{array}{ll}
                                       S          & \mbox{if $\lambda \not\in S$} \\
                                       S \cup T   & \mbox{if $\lambda \in S \wedge \lambda \in T$} \\
                                       S \cup T - \{\lambda \} & \mbox{if $\lambda \in S \wedge \lambda \not\in T$}
                                      \end{array}
                              \right.
$$

In [38]:
def eps_union(S: set[Token], T: set[Token]) -> set[Token]:
    if '' in S: 
        if '' in T: 
            return S | T
        return (S - { '' }) | T
    return S

Given an augmented grammar $G = \langle V,T,R\cup\{\widehat{s} \rightarrow s\,\$\}, \widehat{s}\rangle$ 
and a variable $a$, the set of tokens that might follow $a$ is defined as:
$$\texttt{Follow}(a) := 
 \bigl\{ t \in \widehat{T} \,\bigm|\, \exists \beta,\gamma \in (V \cup \widehat{T})^*: 
                           \widehat{s} \Rightarrow^* \beta \,a\, t\, \gamma 
  \bigr\}.
$$
The function `compute_follow` computes the sets $\texttt{Follow}(a)$ for all variables $a$ via a *fixed-point iteration*.

In [41]:
def compute_follow(self: Grammar) -> None:
    self.mFollow[self.mStart] = { '$' }                           # type: ignore
    change = True
    while change:
        change = False
        for rule in self.mRules:
            a, body = rule.mVariable, rule.mBody
            for i in range(len(body)):
                if is_var(body[i]):
                    yi        = body[i]
                    Tail      = self.first_list(body[i+1:])       # type: ignore
                    firstTail = eps_union(Tail, self.mFollow[a])  # type: ignore
                    if not (firstTail <= self.mFollow[yi]):       # type: ignore
                        change = True
                        self.mFollow[yi] |= firstTail             # type: ignore
    print('Follow sets (note that "$" denotes the end of file):')
    for v in self.mVariables:
        print(f'Follow({v}) = {self.mFollow[v]}')                 # type: ignore
        
Grammar.compute_follow = compute_follow                           # type: ignore
del compute_follow

If $\mathcal{M}$ is a set of *marked rules*, then the *closure* of $\mathcal{M}$ is the smallest set $\mathcal{K}$ such that
we have the following:
- $\mathcal{M} \subseteq \mathcal{K}$,
- If $a \rightarrow \beta \bullet c\, \delta$ is a *marked rule* from 
  $\mathcal{K}$, and $c$ is a variable and if, furthermore,
  $c \rightarrow \gamma$ is a grammar rule,
  then the marked rule $c \rightarrow \bullet \gamma$
  is an element of $\mathcal{K}$:
  $$(a \rightarrow \beta \bullet c\, \delta) \in \mathcal{K} 
         \;\wedge\; 
         (c \rightarrow \gamma) \in R
         \;\Rightarrow\; (c \rightarrow \bullet \gamma) \in \mathcal{K}
  $$

We define $\texttt{closure}(\mathcal{M}) := \mathcal{K}$.  The function `cmp_closure` computes this closure for a given set of *marked rules* via a *fixed-point iteration*.

In [45]:
def cmp_closure(self, Marked_Rules: set[MarkedRule]) -> frozenset[MarkedRule]:
    All_Rules = Marked_Rules
    New_Rules = Marked_Rules
    while True:
        More_Rules = set()
        for rule in New_Rules:
            c = rule.next_var()                                  # type: ignore
            if c == None:
                continue
            for rule in self.mRules:
                head, alpha = rule.mVariable, rule.mBody         # type: ignore
                if c == head:
                    More_Rules |= { MarkedRule(head, (), alpha) }
        if More_Rules <= All_Rules:
            return frozenset(All_Rules)
        New_Rules  = More_Rules - All_Rules
        All_Rules |= New_Rules

Grammar.cmp_closure = cmp_closure                                 # type: ignore
del cmp_closure

Given a set of *marked rules* $\mathcal{M}$ and a *grammar symbol* $X$, the function $\texttt{goto}(\mathcal{M}, X)$ 
is defined as follows:
$$\texttt{goto}(\mathcal{M}, X) := \texttt{closure}\Bigl( \bigl\{ 
   a \rightarrow \beta\, X \bullet \delta \bigm| (a \rightarrow \beta \bullet X\, \delta) \in \mathcal{M} 
   \bigr\} \Bigr).
$$

In [47]:
def goto(self, Marked_Rules, x):
    Result = set()
    for mr in Marked_Rules:
        if mr.symbol_after_dot() == x:
            Result.add(mr.move_dot())
    return self.cmp_closure(Result)

Grammar.goto = goto  # type: ignore
del goto

The function `all_states` computes the set of all states of an *SLR-parser*.  The function starts with the state
$$ \texttt{closure}\bigl(\{ \widehat{s} \rightarrow \bullet s \, $\}\bigr) $$
and then tries to compute new states by using the function `goto`.  This computation proceeds via a 
*fixed-point iteration*.  Once all states have been computed, the function assigns names to these states.
This association is stored in the dictionary *mStateNames*.

In [50]:
def all_states(self) -> None: 
    start_state  = self.cmp_closure({ MarkedRule('≈ù', (), (self.mStart, '$')) })
    self.mStates = { start_state }
    New_States   = self.mStates
    while True:
        More_States = set()
        for Rule_Set in New_States:
            for mr in Rule_Set: 
                if not mr.is_complete():
                    x = mr.symbol_after_dot()
                    if x != '$':
                        More_States |= { self.goto(Rule_Set, x) }
        if More_States <= self.mStates:
            break
        New_States = More_States - self.mStates;
        self.mStates |= New_States
    print("All SLR-states:")
    counter = 1
    self.mStateNames[start_state] = 's0'
    print(f's0 = {set(start_state)}')
    for state in self.mStates - { start_state }:
        self.mStateNames[state] = f's{counter}'
        print(f's{counter} = {set(state)}')
        counter += 1

Grammar.all_states = all_states # type: ignore
del all_states

The following function computes the *action table* and is defined as follows:
- If $\mathcal{M}$ contains a *marked rule* of the form $a \rightarrow \beta \bullet t\, \delta$
  then we have
  $$\texttt{action}(\mathcal{M},t) := \langle \texttt{shift}, \texttt{goto}(\mathcal{M},t) \rangle.$$
- If $\mathcal{M}$ contains a marked rule of the form $a \rightarrow \beta\, \bullet$ and we have
  $t \in \texttt{Follow}(a)$, then we define
  $$\texttt{action}(\mathcal{M},t) := \langle \texttt{reduce}, a \rightarrow \beta \rangle$$
- If $\mathcal{M}$ contains the marked rule $\widehat{s} \rightarrow s \bullet \$ $, then we define 
  $$\texttt{action}(\mathcal{M},\$) := \texttt{accept}. $$
- Otherwise, we have
  $$\texttt{action}(\mathcal{M},t) := \texttt{error}. $$

In [51]:
def compute_action_table(self):
    self.mActionTable = {}
    print('\nAction Table:')
    for state in self.mStates:
        stateName = self.mStateNames[state]
        actionTable = {}
        # compute shift actions
        for token in self.mTokens:
            if token != '$':
                newState  = self.goto(state, token)
                if newState != set():
                    newName = self.mStateNames[newState]
                    actionTable[token] = ('shift', newName)
                    self.mActionTable[stateName, token] = ('shift', newName)
                    print(f'action("{stateName}", {token}) = ("shift", {newName})')
        # compute reduce actions
        for mr in state:
            if mr.is_complete():
                for token in self.mFollow[mr.mVariable]:
                    action1 = actionTable.get(token)
                    action2 = ('reduce', mr.to_rule())
                    if action1 == None:
                        actionTable[token] = action2  
                        r = self.mRuleNames[mr.to_rule()]
                        self.mActionTable[stateName, token] = ('reduce', r)
                        print(f'action("{stateName}", {token}) = {action2}')
                    elif action1 != action2: 
                        self.mConflicts = True
                        print('')
                        print(f'conflict in state {stateName}:')
                        print(f'{stateName} = {state}')
                        print(f'action("{stateName}", {token}) = {action1}')     
                        print(f'action("{stateName}", {token}) = {action2}')
                        print('')
        for mr in state:
            if mr == MarkedRule('≈ù', (self.mStart,), ('$',)):
                actionTable['$'] = 'accept'
                self.mActionTable[stateName, '$'] = 'accept'
                print(f'action("{stateName}", $) = accept')

Grammar.compute_action_table = compute_action_table # type: ignore
del compute_action_table

The function `compute_goto_table` computes the *goto table*.

In [53]:
def compute_goto_table(self) -> None:
    self.mGotoTable = {}
    print('\nGoto Table:')
    for state in self.mStates:
        for var in self.mVariables:
            newState = self.goto(state, var)
            if newState != set():
                stateName = self.mStateNames[state]
                newName   = self.mStateNames[newState]
                self.mGotoTable[stateName, var] = newName
                print(f'goto({stateName}, {var}) = {newName}')

Grammar.compute_goto_table = compute_goto_table # type: ignore
del compute_goto_table

In [54]:
grammar

[expr ‚Üí expr '+' product,
 expr ‚Üí expr '-' product,
 expr ‚Üí product,
 product ‚Üí product '*' factor,
 product ‚Üí product '/' factor,
 product ‚Üí factor,
 factor ‚Üí '(' expr ')',
 factor ‚Üí NUMBER]

In [55]:
%%time
g = Grammar(grammar)

First sets:
First(factor) = {"'('", 'NUMBER'}
First(expr) = {"'('", 'NUMBER'}
First(≈ù) = {"'('", 'NUMBER'}
First(product) = {"'('", 'NUMBER'}
Follow sets (note that "$" denotes the end of file):
Follow(factor) = {"')'", '$', "'/'", "'+'", "'*'", "'-'"}
Follow(expr) = {"'+'", '$', "'-'", "')'"}
Follow(≈ù) = set()
Follow(product) = {"')'", '$', "'/'", "'+'", "'*'", "'-'"}
All SLR-states:
s0 = {≈ù ‚Üí  ‚Ä¢ expr $, expr ‚Üí  ‚Ä¢ expr '-' product, product ‚Üí  ‚Ä¢ factor, expr ‚Üí  ‚Ä¢ product, expr ‚Üí  ‚Ä¢ expr '+' product, product ‚Üí  ‚Ä¢ product '*' factor, factor ‚Üí  ‚Ä¢ '(' expr ')', factor ‚Üí  ‚Ä¢ NUMBER, product ‚Üí  ‚Ä¢ product '/' factor}
s1 = {product ‚Üí product ‚Ä¢ '/' factor, product ‚Üí product ‚Ä¢ '*' factor, expr ‚Üí expr '-' product ‚Ä¢ }
s2 = {factor ‚Üí  ‚Ä¢ '(' expr ')', factor ‚Üí  ‚Ä¢ NUMBER, product ‚Üí product '/' ‚Ä¢ factor}
s3 = {product ‚Üí product ‚Ä¢ '/' factor, expr ‚Üí expr '+' product ‚Ä¢ , product ‚Üí product ‚Ä¢ '*' factor}
s4 = {product ‚Üí product '/

In [56]:
def strip_quotes(t):
    if t[0] == "'" and t[-1] == "'":
        return t[1:-1]
    return t

In [58]:
def dump_parse_table(self, file: str) -> None:
    with open(file, 'w', encoding="utf-8") as handle:
        handle.write('# Grammar rules:\n')
        for rule in self.mRules:
            rule_name = self.mRuleNames[rule] 
            handle.write(f'{rule_name} = ("{rule.mVariable}", {rule.mBody})\n')
        handle.write('\n# Action table:\n')
        handle.write('actionTable = {}\n')
        for s, t in self.mActionTable:
            action = self.mActionTable[s, t]
            t = strip_quotes(t)
            if action[0] == 'reduce':
                rule_name = action[1]
                handle.write(f"actionTable['{s}', '{t}'] = ('reduce', {rule_name})\n")
            elif action == 'accept':
                handle.write(f"actionTable['{s}', '{t}'] = 'accept'\n")
            else:
                handle.write(f"actionTable['{s}', '{t}'] = {action}\n")
        handle.write('\n# Goto table:\n')
        handle.write('gotoTable = {}\n')
        for s, v in self.mGotoTable:
            state = self.mGotoTable[s, v]
            handle.write(f"gotoTable['{s}', '{v}'] = '{state}'\n")
        
Grammar.dump_parse_table = dump_parse_table # type: ignore
del dump_parse_table

In [60]:
g.dump_parse_table('parse-table.py') # type: ignore

In [61]:
!cat parse-table.py

# Grammar rules:
r0 = ("expr", ('expr', "'+'", 'product'))
r1 = ("expr", ('expr', "'-'", 'product'))
r2 = ("expr", ('product',))
r3 = ("product", ('product', "'*'", 'factor'))
r4 = ("product", ('product', "'/'", 'factor'))
r5 = ("product", ('factor',))
r6 = ("factor", ("'('", 'expr', "')'"))
r7 = ("factor", ('NUMBER',))
r8 = ("≈ù", ('expr', '$'))

# Action table:
actionTable = {}
actionTable['s8', '('] = ('shift', 's12')
actionTable['s8', 'NUMBER'] = ('shift', 's10')
actionTable['s9', ')'] = ('reduce', r5)
actionTable['s9', '$'] = ('reduce', r5)
actionTable['s9', '/'] = ('reduce', r5)
actionTable['s9', '+'] = ('reduce', r5)
actionTable['s9', '*'] = ('reduce', r5)
actionTable['s9', '-'] = ('reduce', r5)
actionTable['s1', '/'] = ('shift', 's2')
actionTable['s1', '*'] = ('shift', 's5')
actionTable['s1', '+'] = ('reduce', r1)
actionTable['s1', '$'] = ('reduce', r1)
actionTable['s1', '-'] = ('reduce', r1)
actionTable['s1', ')'] = ('reduce', r1)
actionTable['s2', '

In [62]:
!rm GrammarLexer.* GrammarParser.* Grammar.tokens GrammarListener.py Grammar.interp 
!rm -r __pycache__

In [63]:
!ls

[34mExamples[m[m                       SLR-Table-Generator.ipynb
Grammar.g4                     Shift-Reduce-Parser-Pure.ipynb
Parse-Table.ipynb              Shift-Reduce-Parser.ipynb
Pure.g4                        parse-table.py
