A tiny toolkit for writing recursive-descent parsers in Ruby
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



A tiny toolkit for writing recursive-descent parsers in Ruby. This is not a parser-generator; it's for those circumstances where hand-rolling is quicker than bringing out the heavy machinery.

Two components:

  • a generic tokenzier that is seeded with a list of regular expressions and simple transformations to perform whenever a match is encountered. This was shamelessly inspired by Python's re.Scanner, gleaned of during a recent sniff around the Lamson sources.
  • a skeleton recursive descent parser implementing a few utility methods like accept and stop.

Using the tokenizer

Writing tokenizers is the grubbiest part of hacking together a parser so this is where SimpleParser helps you the most.

Initialize a tokenizer thusly:

tokenizer = SimpleParser::Tokenizer.new([
  # token rules

Where each rule obeys the format:

[ string_or_regexp, (optional) param_1, (optional) param_2 ]

string_or_regexp defines the token to be matched, and any regex must be unanchored. The tokenizer will try each rule in order until a match is found, so it's possible to control precedence through careful arrangement of your rules.

The optional parameters to each rule define how a raw match is transformed to an output value. Supported combinations are:

 Param 1     | Param 2     | Result
 not present | not present | (token is ignored)
 symbol      | not present | [symbol, token_string]
 block       | not present | block.call(token_string)
 symbol_1    | symbol_2    | [symbol_1, token_string.send(symbol_2)]
 symbol      | block       | [symbol, block.call(token_string)]

Once the tokenizer has been instantiated, a bunch of methods are available:

# scan text and return array of tokens

# re-initialize tokenizer with text

# returns true if no more tokens remain

# read the next token

# iterate (does not reset before iterating)
tokenizer.each do |t|
  # do some funky shit


There's a small example, utilising both the tokenizer and parser components, in calculator.rb.