nitcc, a parser and lexer generator for Nit
nitcc is a simple LR generator for Nit programs. It features a small subset of the functionalities of SableCC 3 and 4.
How to compile
Have a valid compiler in
make in the
How to run
nitcc generates a bunches of control files, a lexer, a parser, and a tester.
To compile and run the tester:
nitc file_test_parser.nit ./file_test_parser an_input_file_to_parse
Examples and regression tests
examples/ contains simple grammars and interpretors.
tests/ contains regression tests.
Features (aka TODO list)
- command line tool (
- Grammar syntax of SableCC4 (with pieces of SableCC3)
- Generates a Lexer
- Generates a SLR parser
- Generates a LALR parser
- Generates classes for the AST and utils
For the tool (and the code)
- bootstrap itself (see
For the lexer (and regexp, NFA, and DFA)
- interval of characters and subtraction of characters
- implicit priorities (by inclusion of languages)
- Except and And
- Shortest and Longest (but dummy semantic without lookahead)
- efficient implementation of intervals
- DFA minimization
For the parser (and grammar and LR)
- Modifiers (
- Empty (but not mandatory)
- Dangling (automatic, so mitigate the SLR limitations)
- simple transformation (unchecked)
- simple inlining (non automatic, except for
For the AST (generated classes, utils and their API)
- Common runtime-library
- Terminal nodes; see
- Heterogeneous non-terminal nodes with named fields; see
- Homogeneous non-terminal nodes for lists (
- Visitor design pattern; see
- Syntactic and lexical errors; see
- positions of tokens in the input stream; see
- positions of non-terminal nodes.
- API for the input source
- sane API to invoke/initialize the parser (and the lexer)
BUGS and limitations
- Limited error checking; bad grammars can produce uncompilable, or worse buggy, nit code.
- The SLR automaton is not very expressive; do not except to parse big and complex language like Nit or Java.
- The generated Nit code is inefficient and large; even if you get an acceptable grammar, do not except to parse efficiently big and complex language like Nit or Java.
- No real unicode support.
- Advanced features of SableCC4 are not planed.