A Hacker School project
Very much a work in progress, I’d like to write a simple shift-reduce parser from the ground up. Instead of writing a parser generator that generates a state machine, I’d like to evaluate the language’s rules as we go.
The idea is to optimize for clarity and code length — nothing else.
Pretty well, actually. We’ve got a lexer that takes grammar like this:
parsetoy.Lexer(
LexRule(compile(r'\s+'), lambda match: None),
LexRule(compile(r'(?:\.[0-9]+|[0-9]+(?:\.[0-9]+)?)'), lambda match: Token('number', float(match))),
LexRule(compile(r'\+'), lambda match: Token('add', None)),
LexRule(compile(r'-'), lambda match: Token('subtract', None)),
LexRule(compile(r'\*'), lambda match: Token('multiply', None)),
LexRule(compile(r'/'), lambda match: Token('divide', None)),
LexRule(compile(r'\^'), lambda match: Token('pow', None)),
LexRule(compile(r'\('), lambda match: Token('(', None)),
LexRule(compile(r'\)'), lambda match: Token(')', None))
)
…and a parser that takes rules and operator precedence like this:
parsetoy.Parser(
(
ParseRule(None, ( 'number', ), lambda n: Token('expression', n.value)),
ParseRule(None, ( '(', 'expression', ')' ), lambda l, ex, r: ex ),
ParseRule('add', ( 'expression', 'add', 'expression' ), lambda l, op, r: Token('expression', l.value + r.value) ),
ParseRule('subtract', ( 'expression', 'subtract', 'expression' ), lambda l, op, r: Token('expression', l.value - r.value) ),
ParseRule('multiply', ( 'expression', 'multiply', 'expression' ), lambda l, op, r: Token('expression', l.value * r.value) ),
ParseRule('divide', ( 'expression', 'divide', 'expression' ), lambda l, op, r: Token('expression', l.value / r.value) ),
ParseRule('pow', ( 'expression', 'pow', 'expression' ), lambda l, op, r: Token('expression', pow(l.value, r.value)) )
), (
OpPrecedence('left', { 'add', 'subtract' }),
OpPrecedence('left', { 'multiply', 'divide' }),
OpPrecedence('right', { 'pow' })
)
)
calculator.py
is an interactive calculator that uses this grammar:
$ ./calculator.py
> 1 + 1
Lexer results:
[Token(type='number', value=1.0), Token(type='add', value=None), Token(type='number', value=1.0)]
Parser results:
[Token(type='expression', value=2.0)]
Value:
2.0
> 5 + 3 * 5 - 2 ^ 2
Lexer results:
[Token(type='number', value=5.0), Token(type='add', value=None), Token(type='number', value=3.0), Token(type='multiply', value=None), Token(type='number', value=5.0), Token(type='subtract', value=None), Token(type='number', value=2.0), Token(type='pow', value=None), Token(type='number', value=2.0)]
Parser results:
[Token(type='expression', value=16.0)]
Value:
16.0
>
Try it out!