# LR parser
LR parser is a bottom-up parser that can parse context-free languages in linear time,
i.e. it reads input tokens, contatenates them into AST nodes in hope to build tree at the end.
This notebook contains an implementation of LR(0) parser according to the Dragon Book.

### Source context-free grammar
I use this CFG as example(I took it from [wikipedia](https://en.wikipedia.org/wiki/LR_parser#Additional_example_1+1)):

In [1]:
grammar_source = """
    E → E * B
    E → E + B
    E → B
    B → 0
    B → 1
"""

This context-free grammar desribes context-free language that contains these sentences/words:
    
    0, 1, 0*1, 1+0, 1*1, 0+0, 1+1*1, 1+0*0+1+0*0*0*1
It is ok to change the `grammar_source` to another grammar: the code will handle it correctly.

### Parse rules

In [2]:
def parse_rules(source):
    rules = []
    for rule in source.strip().split("\n"):
        variable, body = rule.strip().split(" → ")
        rules.append((variable, tuple(body.split(" "))))
    return rules
rules = parse_rules(grammar_source)
print("\n".join(f"{variable} → {' '.join(body)}" for variable, body in rules))

E → E * B
E → E + B
E → B
B → 0
B → 1


### Derive variables, terminals, start symbol from rules
Mathematically speaking we [should](https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions) specify variables, terminals, rules and start symbol in order to call it a grammar,
but I too lazy for that. So instead I wrote a function `derive_symbols()` to derive all these things from rules.

In [3]:
def derive_symbols(rules):
    symbols = set()
    for variable, body in rules:
        symbols.add(variable)
        symbols.update(set(body))
    variables = {variable for variable, body in rules}
    terminals = symbols - variables
    start = rules[0][0]
    return variables, terminals, start
variables, terminals, start = derive_symbols(rules)
print(f"{variables=}\n{terminals=}\n{start=}")

variables={'E', 'B'}
terminals={'0', '*', '+', '1'}
start='E'


On second thought I am too lazy to bring all four variables(variables, terminals, rules, start) everywhere,
so it makes sence to implement the `Grammar` class according to its [mathematical definition](https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions).

In [4]:
import dataclasses
@dataclasses.dataclass(frozen=True)
class Grammar:
    variables: set[str]
    terminals: set[str]
    rules: list[(str, tuple[str])]
    start: str
        
    def __str__(self):
        s = "start symbol: " + self.start + "\n"
        s += "variables: " + ", ".join(map(str, self.variables)) + "\n"
        s += "terminals: " + ", ".join(map(repr, self.terminals)) + "\n"
        rules = [f"{var} → {' '.join(body)}" for var, body in self.rules]
        return s + "rules:\t" + "\n\t".join(rules) + "\n"
    
    def __hash__(self):
        return id(self)
grammar = Grammar(variables, terminals, rules, start)
print(grammar)

start symbol: E
variables: E, B
terminals: '0', '*', '+', '1'
rules:	E → E * B
	E → E + B
	E → B
	B → 0
	B → 1

