Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


nitcc, a parser and lexer generator for Nit

nitcc is a simple LR generator for Nit programs. It features a small subset of the functionalities of SableCC 3 and 4.

How to compile

Have a valid compiler in bin/ Just run make in the contrib/nitcc/ directory

How to run


nitcc file.sablecc

nitcc generates a bunches of control files, a lexer, a parser, and a tester.

To compile and run the tester:

nitc file_test_parser.nit
./file_test_parser an_input_file_to_parse

Examples and regression tests

The sub-directory examples/ contains simple grammars and interpretors.

The sub-directory tests/ contains regression tests.

Features (aka TODO list)

  • command line tool (nitcc)
  • Grammar syntax of SableCC4 (with pieces of SableCC3)
  • Generates a Lexer
  • Generates a SLR parser
  • Generates a LALR parser
  • Generates classes for the AST and utils

For the tool (and the code)

  • usable
  • bootstrap itself (see nitcc.sablecc)

For the lexer (and regexp, NFA, and DFA)

  • Any
  • interval of characters and subtraction of characters
  • implicit priorities (by inclusion of languages)
  • Except and And
  • Shortest and Longest (but dummy semantic without lookahead)
  • efficient implementation of intervals
  • DFA minimization

For the parser (and grammar and LR)

  • Modifiers (?, *, +)
  • Ignored
  • Rejected
  • Empty (but not mandatory)
  • Opportunistic
  • Precedence
  • Separator
  • Dangling (automatic, so mitigate the SLR limitations)
  • simple transformation (unchecked)
  • simple inlining (non automatic, except for ? and *)

For the AST (generated classes, utils and their API)

  • Common runtime-library nitcc_runtime.nit
  • Terminal nodes; see NToken.
  • Heterogeneous non-terminal nodes with named fields; see NProd.
  • Homogeneous non-terminal nodes for lists (+ and *); see Nodes.
  • Visitor design pattern; see Visitor.
  • Syntactic and lexical errors; see NError.
  • positions of tokens in the input stream; see Position
  • positions of non-terminal nodes.
  • API for the input source
  • sane API to invoke/initialize the parser (and the lexer)

BUGS and limitations

  • Limited error checking; bad grammars can produce uncompilable, or worse buggy, nit code.
  • The SLR automaton is not very expressive; do not except to parse big and complex language like Nit or Java.
  • The generated Nit code is inefficient and large; even if you get an acceptable grammar, do not except to parse efficiently big and complex language like Nit or Java.
  • No real unicode support.
  • Advanced features of SableCC4 are not planed.