Skip to content
Parser generators
Rust Python Other
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
js_parser Parse BigInt and throw NotImplemented from AstBuilder (fixes #68) Jan 15, 2020
jsparagus Collect fallible methods in AstBuilder automatically Jan 7, 2020
rust Handle UnicodeEscapeSequence in Identifier (fixes #78) Jan 16, 2020
tests Update test expectations. Fixes #62. r=@arai-a Jan 15, 2020
.flake8 Don't run flake8 on scripts under rust/target. Sep 17, 2019
.gitignore Collect fallible methods in AstBuilder automatically Jan 7, 2020
Makefile Collect fallible methods in AstBuilder automatically Jan 7, 2020
README.md
journal.md Move some personal thoughts out of the README into journal.md. Jul 13, 2019
js-quirks.md js-quirks: Mention annex B.1.1 (legacy octal integer literals). Oct 25, 2019
pgen.pgen Rename one meaning of "action" to "reducer", part 1. Aug 12, 2019
requirements.txt PEP 8 fixes. Logistical changes to some scripts. Jul 15, 2019
test.sh Use /bin/sh for test.sh. Oct 11, 2019
update.sh Rust lexer Jul 16, 2019

README.md

jsparagus - A parser generator

This toy parser generator spits out Python 3 code.

The input to the parser generator looks like this:

grammar = {
    'expr': [
        ['term'],
        ['expr', '+', 'term'],
        ['expr', '-', 'term'],
    ],
    'term': [
        ['unary'],
        ['term', '*', 'unary'],
        ['term', '/', 'unary'],
    ],
    'unary': [
        ['prim'],
        ['-', 'unary'],
    ],
    'prim': [
        ['NUMBER'],
        ['NAME'],
        ['(', 'expr', ')'],
    ],
}

representing this grammar:

expr ::= term
       | expr "+" term
       | expr "-" term

term ::= unary
       | term "*" unary
       | term "/" unary

unary ::= prim
        | "-" unary

prim ::= NUMBER
       | NAME
       | "(" expr ")"

It generates a table-driven shift-reduce parser:

import pgen_runtime
from pgen_runtime import Apply

actions = [
    {'-': 1, 'NUMBER': 2, 'NAME': 3, '(': 4},
    {'-': 1, 'NUMBER': 2, 'NAME': 3, '(': 4},
    {None: -9, '*': -9, '/': -9, '+': -9, '-': -9, ')': -9},
    {None: -10, '*': -10, '/': -10, '+': -10, '-': -10, ')': -10},
    {'-': 1, 'NUMBER': 2, 'NAME': 3, '(': 4},
    {'+': 11, '-': 12, None: -9223372036854775807},
    {'*': 13, '/': 14, None: -1, '+': -1, '-': -1, ')': -1},
    {None: -4, '*': -4, '/': -4, '+': -4, '-': -4, ')': -4},
    {None: -7, '*': -7, '/': -7, '+': -7, '-': -7, ')': -7},
    {None: -8, '*': -8, '/': -8, '+': -8, '-': -8, ')': -8},
    {')': 15, '+': 11, '-': 12},
    {'-': 1, 'NUMBER': 2, 'NAME': 3, '(': 4},
    {'-': 1, 'NUMBER': 2, 'NAME': 3, '(': 4},
    {'-': 1, 'NUMBER': 2, 'NAME': 3, '(': 4},
    {'-': 1, 'NUMBER': 2, 'NAME': 3, '(': 4},
    {None: -11, '*': -11, '/': -11, '+': -11, '-': -11, ')': -11},
    {'*': 13, '/': 14, None: -2, '+': -2, '-': -2, ')': -2},
    {'*': 13, '/': 14, None: -3, '+': -3, '-': -3, ')': -3},
    {None: -5, '*': -5, '/': -5, '+': -5, '-': -5, ')': -5},
    {None: -6, '*': -6, '/': -6, '+': -6, '-': -6, ')': -6},
]

ctns = [
    {'expr': 5, 'term': 6, 'unary': 7, 'prim': 8},
    {'unary': 9, 'prim': 8},
    {},
    {},
    {'expr': 10, 'term': 6, 'unary': 7, 'prim': 8},
    {},
    {},
    {},
    {},
    {},
    {},
    {'term': 16, 'unary': 7, 'prim': 8},
    {'term': 17, 'unary': 7, 'prim': 8},
    {'unary': 18, 'prim': 8},
    {'unary': 19, 'prim': 8},
    {},
    {},
    {},
    {},
    {},
]

reductions = [
    ('expr', 1, lambda builder, x0: x0),
    ('expr', 3, lambda builder, x0, x1, x2: builder.expr_P1(x0, x1, x2)),
    ('expr', 3, lambda builder, x0, x1, x2: builder.expr_P2(x0, x1, x2)),
    ('term', 1, lambda builder, x0: x0),
    ('term', 3, lambda builder, x0, x1, x2: builder.term_P1(x0, x1, x2)),
    ('term', 3, lambda builder, x0, x1, x2: builder.term_P2(x0, x1, x2)),
    ('unary', 1, lambda builder, x0: x0),
    ('unary', 2, lambda builder, x0, x1: builder.unary_P1(x0, x1)),
    ('prim', 1, lambda builder, x0: x0),
    ('prim', 1, lambda builder, x0: x0),
    ('prim', 3, lambda builder, x0, x1, x2: builder.prim_P2(x0, x1, x2)),
]

class DefaultBuilder:
    def expr_P1(self, x0, x1, x2): return ('expr 1', x0, x1, x2)
    def expr_P2(self, x0, x1, x2): return ('expr 2', x0, x1, x2)
    def term_P1(self, x0, x1, x2): return ('term 1', x0, x1, x2)
    def term_P2(self, x0, x1, x2): return ('term 2', x0, x1, x2)
    def unary_P1(self, x0, x1): return ('unary 1', x0, x1)
    def prim_P2(self, x0, x1, x2): return ('prim 2', x0, x1, x2)

parse_expr = pgen_runtime.make_parse_fn(actions, ctns, reductions, 0, DefaultBuilder)

And the result of parsing the input 2 * ( x + y ) looks like this:

('term 1',
    '2',
    '*',
    ('prim 2',
        '(',
        ('expr 1', 'x', '+', 'y'),
        ')'
    )
)

The parser spits out tuples by default. It's easy to customize the method names of the DefaultBuilder, and it's easy to replace it with your own builder class that produces the AST you actually want.

This has a few more features geared toward being able to parse JavaScript, which has an idiosyncratic syntax. See js_parser/README.md for details.

Getting started

First, the usual dance:

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Then you can:

  • Generate parser tables for JavaScript (!):

    make js_parser/parser_tables.py

    Note: This takes about 3 minutes to run on my laptop. jsparagus is slow.

    If you don't have make, you can run the commands by hand:

    python3 -m js_parser.generate_js_parser_tables --progress -o js_parser/parser_generated.jsparagus_dump
    python3 -m js_parser.generate_js_parser_tables js_parser/parser_generated.jsparagus_dump -o js_parser/parser_tables.py
  • Run make check (or ./test.sh) to make sure things are working.

  • To see your parser run, make jsdemo (or python -m js_parser.try_it).

  • To see the rust project in action do the following steps.

    First generate the parser:

    python3 -m js_parser.generate_js_parser_tables --progress -o js_parser/parser_generated.jsparagus_dump
    python3 -m js_parser.generate_js_parser_tables js_parser/parser_generated.jsparagus_dump -o rust/parser/src/parser_generated.rs

    Then build all the rust crates and run the driver crate which let's you test the JS parser and bytecode emitter:

    cd rust
    cargo run
    

Limitations

It's all limitations, but I'll try to list the ones that are relevant to parsing JS.

  • Lookahead assertions are limited to one token. (The JS grammar contains an occasional [lookahead != `let [`] and even [lookahead != `async [no LineTerminator here] function`].)

  • There is nothing like the level of weirdness that would be needed to implement restricted productions ("[no LineTerminator here]").

  • Errors are poor: ( produces "expected one of {'(', 'VAR', 'NUM', '-'}, got None". ) produces "expected one of {'(', 'VAR', 'NUM', '-'}, got None". a b produces "expected one of {'-', '/', '+', '*', ')', None}, got 'VAR'".

  • Rust support is extremely rudimentary.

  • Support for feedback from syntactic context to lexical analysis (selecting the lexical goal symbol) is present in the Python version only, not yet in Rust.

    This is needed for resolving the conflict in the lexical grammar between RegExps and division operators, at least. It might also be used for conditional keywords and for parsing TemplateTail.

  • No table compaction or table optimization. I think there's plenty of low-hanging fruit there.

  • Documentation. There isn't any.

You can’t perform that action at this time.