Skip to content

MiniParse How To

Ian Kjos edited this page Jan 25, 2019 · 1 revision

Quick Start:

  1. Import the miniparse module.
  2. Construct a miniparse.MiniParse object. The constructor accepts one or more "start" symbols.
  3. Use the .left(...), .right(...), and .nonassoc(...) methods to assign precedence levels (see note, below) and associativity as needed to terminal symbols.
  4. Declare production rules using the .rule(...) and .renaming(...) methods. (See "On Rules" below.)
  5. Call .parse(...) on a stream of tokens, which should be pairs consisting of (token_type, attribute).

No special support exists for location tracking, but you can certainly include that kind of information in the attribute cell from the scanner.

On Rules:

Each production rule is associated with a combining function of some form. By default, it will take one parameter for each element of the right-hand side. However, you can limit attention to a specific selection of right-hand symbols by prepending . to the relevant symbols, .like .this. That may enable you to simplify your coding significantly.

Examples:

There is a complete worked example at example\JSON.py. The first majority of that file is a scanner definition, but the parser definition is well-marked at the bottom.

The file at tests/test_miniparse defines a four-function calculator using the operator precedence features and Python's operator module as rule actions.

The miniscan module defines a grammar for regular expressions. It's in the middle; search for the identifier rex.

Note regarding precedence:

  • Precedence declarations are highest-first, which makes more sense to me than the YACC/BISON style of lowest-first. This could be made configurable if there is demand.
  • It is possible to declare the precedence of a rule independently, but by default it is that of its first nonterminal for which precedence has been declared.
  • If a shift-reduce conflict occurs on a symbol with a precedence declaration, it will be resolved accordingly. Otherwise, the default behavior is to shift and emit a warning.
  • There is no "expect" directive: The "dangling else problem" may be solved by declaring else to be right-associative.

Avenues for Growth:

I might like to add the canonical error-recovery algorithm. On the other hand, many modern languages stop parsing at the first error: This makes sense because often the "recovery" leaves too much to be desired. At any event, semantic analysis is shot if the syntax doesn't even parse.