In [None]:
class DeterministicNone:
    def __repr__(self): return "NONE"
    def __bool__(self): return False
    def __hash__(self): return 52604
NONE = DeterministicNone() # This is small hack to have reproducible jupyter outputs

# LR parser
This notebook contains both theory and implementation of LR(0) parser according to the
[Dragon Book](https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools).

LR parser is a bottom-up parser that can parse context-free languages in linear time.
It reads input tokens, concatenates them into AST nodes in hope that
at the end the whole input will collapse into one big AST node, which will be the AST itself.
If you read this notebook in hopes of understanding LR parser,
please make sure you have already understood what is
[CFG](https://en.wikipedia.org/wiki/Context-free_grammar) and
[AST](https://en.wikipedia.org/wiki/Abstract_syntax_tree),
since I wan't cover them here.


### LR(0) parser
LR(0) parser is the simplest one. It's also sometimes called SLR.
* S - Simple. But I can't call it simple: it's much more complex than PEG or LL.
* L - Left-to-right: the parser reads an input from left to right without peeking at the end of the input.
* R - Rightmost derivation in reverse: the parser builds tree by operating on the right end of list of nodes.

I don't yet understand what zero in parenthes means, I will find it out when I will be writing LR(1) parser.

This notebook contains the theory and the code of LR(0) parser
splitted into bunch of sections. Every section consists of **header**, description and `the code itself`.
But before that I will tell you the core idea of LR parser:
> To build LR parser one should take finite automaton of LL parser with conflicts and
resolve them by transformating this this nondeterministic finite automaton into deterministic one.
Obtained finite automaton is the LR parser.

I don't expect anyone to understand what I just wrote,
but for me that description of the parser have divided my life into before and after:
before I understood the thing and after. So I had to include it in this notebook.

### Example context-free grammar
Before doing any kind of experiments with grammars, we need a lab rat.
For that purpose I have copy-pasted rules of a CFG grammar from [wikipedia](https://en.wikipedia.org/wiki/Context-free_grammar#Well-formed_parentheses).
# TODO
But I don't force you to use it:
you can change the variable `grammar_source` to your own lab rat and see if the experiment give the same outcome.

In [None]:
grammar_source = """
    S → S U
    S → U
    U → ( U )
    U → ( )
"""

These rules correspond to context-free grammar,
that desribes context-free language that contains these sentences:
    
    (), (()), ()(), (()()), ((())()), (()(()(()))), ()()()()()(), (())((()))(((())))

In other words, this is the grammar of well-formed parentheses.

### Representing grammar rules
Plain text rules are cool, but we need to represent them with some kind of data structure.
I will NamedTuple for that purpose.

In [None]:
from collections import namedtuple
Rule = namedtuple("Rule", ["head", "body"])
Rule.__str__ = lambda rule: rule[0] + " → " + " ".join(rule[1])
print(Rule("S", ("S", "U")))

S → S U


### Parser of rules
Everyone knows that to write a parser you have to write a parser.
So here is the code of a parser of grammar rules.
We need this parser to translate our lab rat into list of rules.

In [None]:
def parse_rules(source):
    for rule in source.strip().split("\n"):
        head, body = rule.strip().split(" → ")
        yield Rule(head, tuple(body.split(" ")))
rules = tuple(parse_rules(grammar_source))
rules

(Rule(head='S', body=('S', 'U')),
 Rule(head='S', body=('U',)),
 Rule(head='U', body=('(', 'U', ')')),
 Rule(head='U', body=('(', ')')))

### Derive variables, terminals, start symbol from rules
Mathematically speaking we [should](https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions) specify variables, terminals, rules and start symbol in order to call it a grammar,
but we(programmer) are too lazy for this.
Why write all this when you can simply extract all the information you need solely from the rules of grammar.
So instead of specifying all these things I wrote a function `derive_symbols()`
to derive terminals and variables from grammar rules.
The idea of derivation is based on the fact, that a variable can be rule head,
while a terminal may occure only in the body.

In [None]:
################################################################################
def derive_symbols(rules):
    variables = {variable for variable, body in rules}
    terminals = {t for _, body in rules for t in body if t not in variables}
    return frozenset(variables), frozenset(terminals)
variables, terminals = derive_symbols(rules)
print("variables: " + ", ".join(map(repr, variables)))
print("terminals: " + ", ".join(map(repr, terminals)))

variables: 'U', 'S'
terminals: ')', '('


I will assume the head of the first rule to be the start symbol.
this assumption is based solely on the fact that it is true for **my** lab rat

In [None]:
start_symbol = rules[0].head
start_symbol

'S'

### Grammar representation

I am too lazy to bring all four variables(variables, terminals, rules, start symbol) everywhere where I need them,
so it makes sence to implement a `Grammar` class according to its [mathematical definition](https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions).
However I am also too lazy to write the whole class myself, so instead I will once again use named tuple.

In [None]:
class Grammar(namedtuple("Grammar", "variables terminals rules start_symbol")):
    @staticmethod
    def from_source(source):
        rules = tuple(parse_rules(source))
        variables, terminals = derive_symbols(rules)
        start_symbol = rules[0].head
        return Grammar(variables, terminals, rules, start_symbol)

Grammar(variables, terminals, rules, start_symbol)

Grammar(variables=frozenset({'U', 'S'}), terminals=frozenset({')', '('}), rules=(Rule(head='S', body=('S', 'U')), Rule(head='S', body=('U',)), Rule(head='U', body=('(', 'U', ')')), Rule(head='U', body=('(', ')'))), start_symbol='S')

Since grammar is the most important class in this notebook.
Thus we should grant him a pretty `__str__` and `_repr_pretty_` methods.

In [None]:
def grammar_to_str(grammar):
    return ("start:\t" + grammar.start_symbol
        + "\nvariables: " + " ".join(map(str, grammar.variables))
        + "\nterminals: " + " ".join(map(str, grammar.terminals))
        + "\nrules:\t" + "\n\t".join(map(str, grammar.rules)))
Grammar.__str__ = grammar_to_str
Grammar._repr_pretty_ = lambda grammar, p, _: p.text(str(grammar))

Grammar(variables, terminals, rules, start_symbol)

start:	S
variables: U S
terminals: ) (
rules:	S → S U
	S → U
	U → ( U )
	U → ( )

# THEORY
In section I will try to explain the theory behind LR parsers.
I will try really hard to explain the thing to you. Please, be dead serious.\
P.S. english is not my native language, so I don't know who is "dead serious", I know only "dead morose".

Imagine that you are a racoon.
You know... the one that lives in a coffee machine and makes coffee.
It's hard for me to imagine, since I have never seen a coffee machine.
Let's assume that you(a racoon) have decided to quit.
You no longer make coffee, now you are a parser.
You parse for living.
Parsing is not what your mother wished for you,
but it's a well-paying job and you like it.\
A client silently gives you a **grammar** and a stack of green banknotes.
A few days later owls bring you **tokens**: one per owl.
You're sitting at home and building the **tree** out of tokens.
You should be fast, because there are more and more owls.
The last one of them will gives you the last token and takes the tree from you.

Once you get a grammar with rules `S → + T S`, `S → * T S` `S → T`, `T → 0`, `T → 1`.\
You know right away what it is.
It is a grammar of [reverse reverse polish notation](https://en.wikipedia.org/wiki/Polish_notation).
There is no one at home to appreciate such an ingenious insight, but you are ok with it.
You are already used to freelancing: you are alone at while your family is somewhere else... \
**Owls are approaching!**\
You prepare yourself. You know the job is to build the node, that corresponds to start symbol, which is `S`.
And you know what rules can be used to build it: `S → + T S`, `S → * T S` `S → T`.\
You receive the first token: it's `+`.
So you know that you are not going to use `S → * T S` or `S → T`, since they don't have `+` at the beginning.
You will use the rule `S → + T S`, which tells you that after `+` you should receive something
that can be used to assemble `T`: `0` or `1`. \
You recive the second token: it's `1`. You use it to build AST node corresponding to `T`.
And you put it(the node) on your desk near the `+`.
Now you are expecting something to build `S` node. \
Aaand... you recive the last token: it's `0`. You use it to build AST node corresponding to `T`.
Then you use the node `T` to build AST node corresponding to `S`.
You put `S` near `T` and `+` to fold it using rule `S → + T S`. After folding you have the AST tree:

      S
     /|\
    + T S
      | |
      1 T
        |
        0

After each owl arrival you know exactly what rule you sould apply. you are precise and untroubled.
And nothing can disturb you. Nothing except the next grammar: 

    S → S S
    S → ( S )
    S → ( )

When you recive the first token `(` you know nothing.
You don't know what rule you should apply because every one of them has `(` at the beginning.
Maybe you should apply third rule and expect the ending `)`:

    S → (•)
         └── you are here

Or maybe you should apply second rule and expect `(` as next token:

    S → (•S )
         └── you are here

Also it's possible to inside the first `S` of the body of the first rule:

    S → S S
        └── you are inside that S
        S → (•S )
             ├── you are here or there
        S → (•)

You are confused since you don't know what rule to choose.
So you decide to write down received token and
all possible places(inside rules) where you may be right now:
    
    "("  S → (•S )  S → (•)

After receiving a few more `(`, you have written this line:

    "(" S→(•S),S→(•)    "(" S→(•S),S→(•)    "(" S→(•S),S→(•)

You receive `)` and now you can clearly see that from places like `S→(•S)` and `S→(•)`
you could get only into `S → ( )•`.
So now you know precisely that you are here: `S → ( )•`.
Thus you use the rule `S → ( )` to build symbol `S` and get into this state:

    "(" S→(•S),S→(•)    "(" S→(•S),S→(•)    S  S→(S•)
                                                   └── we got here from  S→(•S)

A few minutes later you recieve `)`, so you use it to build new `S` out of `(`, `S`, `)`:

    "(" S→(•S),S→(•)     S  S→(S•)

Finally you recieve last token `)` and use it put everything into one big `S`.

**TO BE CONTINUED...**

### LR Items
Positions within the rules are called LR items.
In other words LR items are just rules with dot in body, e.g. `E → E •+ B`, `S → S •S`, `S → ( )•`.
Items indicate that the parser has recognized a string correspondig to the part of rule before the dot,
e.g. `E → E * •B` means that the parser has recognized `E` and `*` on the input and now expects to read `B`.

In [None]:
class Item(namedtuple("Item", "rule dot_position")):
    def __str__(item):
        body = list(item.rule.body)
        body.insert(item.dot_position, "•")
        return str(item.rule.head) + " → " + " ".join(map(str, body))

print(Item(rule=rules[0], dot_position=1))

S → S • U


The operation of taking the next symbol after the dot will be very useful for us in later stages.
So it makes sense to implement it as a method of the `Item` class.

In [None]:
def get_next_symbol_after_the_dot(item):
    if item.dot_position < len(item.rule.body):
        return item.rule.body[item.dot_position]
    return None
Item.next_symbol = property(get_next_symbol_after_the_dot)

### Closure of items
Closure of a set of items is the set combined with items that can be obtained
by pushing the dot from the head into the body of a rule,
e.g.
    
    closure of S → ( •S ) =
        S → ( •S )
        S → •S S
        S → •( S )
        S → •( )


In [None]:
def close(grammar, items):
    closure, rules = set(items), grammar.rules
    for variable in (item.next_symbol for item in items):
        closure |= {Item(rule, 0) for rule in rules if rule.head == variable}
    return close(grammar, closure) if closure > items else frozenset(closure)
                    
grammar = Grammar(variables, terminals, rules, start_symbol)
item = Item(rule=rules[0], dot_position=1)
closure = close(grammar, {item})
print(f"    closure of {item} =\n\t"+"\n\t".join(map(str, closure)))
################################################################################

    closure of S → S • U =
	S → S • U
	U → • ( )
	U → • ( U )


Btw, it's quite convenient to have this functionality as part of grammar class.
So I will bind it to the class as a method.

In [None]:
Grammar.close = close

### States (sets of items)
The core idea of LR parser is that its states are just sets of possible items.
When parser have already read something from input, it doesn't "know" yet what
rule he is going to apply and what AST node he is going to build,
but he does know what items correspond to already read symbols.
Actually all possible items corresponding to some state fully specify this state.
And the set of all possible sets of items is finite.
Thus number of states is finite.
And we are going to precompute all the states.
But before all that we need a class for set of items.

With purpose of saving memory a set of items can be represented by items that
can't be computed as closure of other items in this set.
For example set {`E → E * •B`, `B → •1`, `B → •0`} can be represented by item
`E → E * •B` alone, since items `B → •1`, `B → •0` can be obtained by finding closure of `E → E * •B`.
So the rule `E → E * •B` is a **kernel item** of set {`E → E * •B`, `B → •1`, `B → •0`}.

How to understand which items are kernel items?\
In general case it's quite complex and requires topological sort of rules...
However, according to the Dragon Book, only a small subset of item sets appears during parsing,
and all of them have kernel items with non-zero dot position.
In other words we can just compare dot position with zero to find out if item is a kernel item.

Here is the class `ItemSet` that implements memory effective representation of item sets.

In [None]:
class ItemSet(namedtuple("ItemSet", "grammar kernel_items")):
    @staticmethod
    def from_items(grammar, items):
        kernel_items = {item for item in items if item.dot_position > 0}
        return ItemSet(grammar, frozenset(kernel_items))
    
    def __iter__(self):
        yield from self.grammar.close(self.kernel_items)
        
    def __str__(self):
        return "{" + ", ".join(sorted(map(str, self))) + "}"

    def __bool__(self):
        return bool(self.kernel_items)
################################################################################
items = ItemSet.from_items(grammar, {item})
print(f"Set {items} has kernel items", ", ".join(map(str, items.kernel_items)))

Set {S → S • U, U → • ( ), U → • ( U )} has kernel items S → S • U


### GOTO

    GOTO(current_parser_state, next_symbol) -> next_parser_state
 
The GOTO function computes next parser state(item set)
based on its current state(item set).
Since a state is just a set of items, the function is pretty straightforward:
assuming `next_symbol=Y` for every item `W → X •Y Z` from current set of items
add `W → X Y• Z` into the next state.

In [None]:
def goto(grammar, items, next_symbol):
    next_items = set()
    for item in items:
        if item.next_symbol == next_symbol:
            next_items.add(Item(item.rule, item.dot_position + 1))
    return ItemSet.from_items(grammar, next_items)
################################################################################
Grammar.goto = goto
print(f'goto({items}, "(")  =\n\t', items2 := grammar.goto(items, "("))
print(f'goto({items2}, ")")  =\n\t', grammar.goto(items2, ")"))

goto({S → S • U, U → • ( ), U → • ( U )}, "(")  =
	 {U → ( • ), U → ( • U ), U → • ( ), U → • ( U )}
goto({U → ( • ), U → ( • U ), U → • ( ), U → • ( U )}, ")")  =
	 {U → ( ) •}


### The states precomputed
We can use the functoin `goto()` to precompute all reachable states of parser.
With this purpose in mind we will need a starting state, a starting item.
We need a rule that will contain the whole input in its body.
There are several options:

1. `NEW_START_SYMBOL → • OLD_START_SYMBOL $`\
This is the starting item used in the Dragon book.
`NEW_START_SYMBOL` is some random name that's not going to be used anywhere.
And `$` is the end of input.
This option has disadvantage compared to option #2:
the new rule adds some exceptions to the way we find kernel items...


2. `NONE → NONE •START_SYMBOL NONE`\
This is the item I am going to use.
I understand it like that:
rule without a head contains the whole tree(`START_SYMBOL`) with no terminals before or after.
`NONE` in the body denotes start and end of input.

So we augment our grammar with such a rule:
we add new start symbol `START` and new rule `START → $ OLD_START $`,
where `$` denotes start or end of input.

In [None]:
################################################################################
def reachable_states(grammar):
    symbols = grammar.variables | grammar.terminals
    start_item = Item(Rule(NONE, (NONE, grammar[3], NONE)), dot_position=1)
    start_state = ItemSet.from_items(grammar, {start_item})
    states, unprocessed_states = set(), {start_state}
    while unprocessed_states:
        state = unprocessed_states.pop()
        states.add(state)
        yield state
        for new_state in (grammar.goto(state, symbol) for symbol in symbols):
            if new_state and new_state not in states:
                unprocessed_states.add(new_state)

states = list(reachable_states(grammar))
print("\n".join(f"{i}: {state}" for i, state in enumerate(states)))

0: {NONE → NONE • S NONE, S → • S U, S → • U, U → • ( ), U → • ( U )}
1: {NONE → NONE S • NONE, S → S • U, U → • ( ), U → • ( U )}
2: {S → U •}
3: {U → ( • ), U → ( • U ), U → • ( ), U → • ( U )}
4: {S → S U •}
5: {U → ( U • )}
6: {U → ( ) •}
7: {U → ( U ) •}


States are useful enough to bind them as a property of `Grammar`.
Also they are needed too often to be left uncached.
Thus I use `functools.cache()` to cache them.

In [None]:
import functools
get_states = lambda grammar: tuple(reachable_states(grammar))
Grammar.states = property(functools.cache(get_states))
################################################################################
print("\n".join(f"{i}: {state}" for i, state in enumerate(grammar.states)))

0: {NONE → NONE • S NONE, S → • S U, S → • U, U → • ( ), U → • ( U )}
1: {NONE → NONE S • NONE, S → S • U, U → • ( ), U → • ( U )}
2: {S → U •}
3: {U → ( • ), U → ( • U ), U → • ( ), U → • ( U )}
4: {S → S U •}
5: {U → ( U • )}
6: {U → ( ) •}
7: {U → ( U ) •}


### Gotos precomputed
We precomputed the states, why not precompute goto(...) results?
During runtime we need goto(...) results for nonterminals to know what state to go when reducing something.
I will use numbers of states as keys and results instead of states itself,
since during runtime I don't want to store the item sets in memory.
I will save memory by storing only indexes of sets, not sets themself.

In [None]:
################################################################################
def precompute_goto(grammar):
    indexes = {state:i for i, state in enumerate(grammar.states)}
    symbols = grammar.terminals | grammar.variables
    gotos = {}
    for i, state in enumerate(grammar.states):
        for symbol in symbols:
            next_state = grammar.goto(state, symbol)
            if next_state in indexes:
                gotos[i, symbol] = indexes[next_state]
    return gotos

Grammar.gotos = property(functools.cache(precompute_goto))
print("\n".join(f"goto {i} {s} -> {j}" for (i, s), j in grammar.gotos.items()))

goto 0 U -> 2
goto 0 S -> 1
goto 0 ( -> 3
goto 1 U -> 4
goto 1 ( -> 3
goto 3 ) -> 6
goto 3 U -> 5
goto 3 ( -> 3
goto 5 ) -> 7


### Actions precomputed
We have states and gotos precomputed.
Cool! Now let's precompute for each state the action that should be executed.
For each possible state and each posible terminal on input we will compute desired action.

LR parser supports these types of actions:
1. SHIFT: push the terminal from input into the stack and enter another state.\
We use this action when the next token can lead us to some valid state.
2. REDUCE: pack a few symbols from stack into an AST node and go to another state according to the goto().\
This action should be applied when our current state(set of items)
contains only one item and this item has the dot at its end.
3. ACCEPT: accept current stack as succesfully built AST tree.\
This action should be performed when we have reаd all the input and
have `None → None S • None` in the current set of items.
4. DIE: raise an exception if there is no reasanable action\
This action should be performed when there are no other actions available.

Let's define function that computes actions for specified state

In [None]:
################################################################################
def get_actions(grammar, state_index):
    state = grammar.states[state_index]
    if any(i.dot_position == len(i.rule.body) for i in state.kernel_items):
        if len(state.kernel_items) != 1:
            raise ValueError(f"CONFLICT DETECTED: {state}")
        item = next(iter(state.kernel_items))
        action = ("reduce", item.rule.head, len(item.rule.body))
        return {term: action for term in grammar.terminals} | {NONE: action}
    actions = {}
    for next_terminal in grammar.terminals:
        next_state = grammar.gotos.get((state_index, next_terminal), None)
        if next_state is not None:
            actions[next_terminal] = ("shift", next_state)
    if any(i.next_symbol == NONE for i in state.kernel_items):
        actions[NONE] = "accept"
    return actions

And here is the function that precoputes the actions for all reachable states:

In [None]:
################################################################################
def precompute_actions(grammar):
    actions = {}
    for i, state in enumerate(grammar.states):
        actions.update({(i, t): a for t, a in get_actions(grammar, i).items()})
    return actions

Grammar.actions = property(functools.cache(precompute_actions))
for (i, symbol), action in grammar.actions.items():
    state = str(grammar.states[i])
    print(f"{i} {symbol}".ljust(8) + f"->  {action}".ljust(28) + state)

0 (     ->  ('shift', 3)            {NONE → NONE • S NONE, S → • S U, S → • U, U → • ( ), U → • ( U )}
1 (     ->  ('shift', 3)            {NONE → NONE S • NONE, S → S • U, U → • ( ), U → • ( U )}
1 NONE  ->  accept                  {NONE → NONE S • NONE, S → S • U, U → • ( ), U → • ( U )}
2 )     ->  ('reduce', 'S', 1)      {S → U •}
2 (     ->  ('reduce', 'S', 1)      {S → U •}
2 NONE  ->  ('reduce', 'S', 1)      {S → U •}
3 )     ->  ('shift', 6)            {U → ( • ), U → ( • U ), U → • ( ), U → • ( U )}
3 (     ->  ('shift', 3)            {U → ( • ), U → ( • U ), U → • ( ), U → • ( U )}
4 )     ->  ('reduce', 'S', 2)      {S → S U •}
4 (     ->  ('reduce', 'S', 2)      {S → S U •}
4 NONE  ->  ('reduce', 'S', 2)      {S → S U •}
5 )     ->  ('shift', 7)            {U → ( U • )}
6 )     ->  ('reduce', 'U', 2)      {U → ( ) •}
6 (     ->  ('reduce', 'U', 2)      {U → ( ) •}
6 NONE  ->  ('reduce', 'U', 2)      {U → ( ) •}
7 )     ->  ('reduce', 'U', 3)      {U → ( U ) •}
7 (     ->  (

### LR table
Paser table is a table that contains all the information needed during runtime.
In our case the table should contain all the actions precomputed and some of the gotos(...).
We need goto(...) results for nonoterminals,
since we will use them to find out which state to go to after reducing something.

In [None]:
gotos = grammar.gotos.items()
gotos = {(s, t): ns for (s, t), ns in gotos if t in grammar.variables}

### Runtime of the parser
Using all the precomputed information we can now write the parser, that uses only numbers/indexes of states, not the states itself.

In [None]:
################################################################################
def parse(actions, gotos, tokens, i=0, get_token_type=lambda s: s):
    stack = [(NONE, 0)]
    while True:
        token = get_token_type(tokens[i] if i < len(tokens) else NONE)
        previous_token, state = stack[-1]
        action = actions.get((state, token), "die")
        match action:
            case "shift", next_state:
                stack.append((tokens[i], next_state))
                i += 1
            case "reduce", head, body_size:
                body = tuple(node for node, _ in stack[-body_size:])
                del stack[-body_size:]
                stack.append(((head,) + body, gotos[stack[-1][1], head]))
            case "accept":
                return stack[-1][0]
            case "die":
                error_msg = f':( Died at token #{i} "{token}" in state {state}'
                raise Exception(error_msg)

ast = parse(grammar.actions, gotos, "(())")
ast

('S', ('U', '(', ('U', '(', ')'), ')'))

As you can see, the parser parsed the parentheses and returned much more parentheses.
The parser loves to parse parenthes, but human brains don't.
Let's write a small helper function to debug our AST.

In [None]:
################################################################################
def print_ast(ast, offset=0, token_printer=None):
    match ast:
        case tuple((head, *children)):
            print("│ " * (offset - 1) + "├─" * bool(offset) + str(head))
            for child in children:
                print_ast(child, offset + 1)
        case token:
            if token_printer:
                token_printer(token, offset = offset * 2)
            else:
                print("│ " * (offset - 1) + "├─" + repr(token))
print_ast(ast)

S
├─U
│ ├─'('
│ ├─U
│ │ ├─'('
│ │ ├─')'
│ ├─')'


In [None]:
print_ast(parse(grammar.actions, gotos, "()()((()))"))

S
├─S
│ ├─S
│ │ ├─U
│ │ │ ├─'('
│ │ │ ├─')'
│ ├─U
│ │ ├─'('
│ │ ├─')'
├─U
│ ├─'('
│ ├─U
│ │ ├─'('
│ │ ├─U
│ │ │ ├─'('
│ │ │ ├─')'
│ │ ├─')'
│ ├─')'


# THE PARSER
So the parser is just executing precomputed actions, which he takes from precomputed table.
Taking value from table or executing an action is constant time.
Therefore parsing is pretty fast.

In [None]:
def parser(grammar, get_token_type=lambda s:s):
    actions = grammar.actions
    gotos = grammar.gotos.items()
    gotos = {(s, t): ns for (s, t), ns in gotos if t in grammar.variables}
    return lambda tokens, i=0: parse(actions, gotos, tokens, i, get_token_type)

In [None]:
parentheses_parser = parser(grammar)
print_ast(parentheses_parser("()()"))

S
├─S
│ ├─U
│ │ ├─'('
│ │ ├─')'
├─U
│ ├─'('
│ ├─')'


In [None]:
print_ast(parentheses_parser("(())()()"))

S
├─S
│ ├─S
│ │ ├─U
│ │ │ ├─'('
│ │ │ ├─U
│ │ │ │ ├─'('
│ │ │ │ ├─')'
│ │ │ ├─')'
│ ├─U
│ │ ├─'('
│ │ ├─')'
├─U
│ ├─'('
│ ├─')'
