A tiny toolkit for writing recursive-descent parsers in Ruby. This is not a parser-generator; it's for those circumstances where hand-rolling is quicker than bringing out the heavy machinery.
- a generic tokenzier that is seeded with a list of regular expressions and
simple transformations to perform whenever a match is encountered. This was
shamelessly inspired by Python's
re.Scanner, gleaned of during a recent sniff around the Lamson sources.
- a skeleton recursive descent parser implementing a few utility methods like
Using the tokenizer
Writing tokenizers is the grubbiest part of hacking together a parser so this is
SimpleParser helps you the most.
Initialize a tokenizer thusly:
tokenizer = SimpleParser::Tokenizer.new([ # token rules ])
Where each rule obeys the format:
[ string_or_regexp, (optional) param_1, (optional) param_2 ]
string_or_regexp defines the token to be matched, and any regex must be unanchored.
The tokenizer will try each rule in order until a match is found, so it's possible
to control precedence through careful arrangement of your rules.
The optional parameters to each rule define how a raw match is transformed to an output value. Supported combinations are:
Param 1 | Param 2 | Result -------------+-------------+----------------------------------------- not present | not present | (token is ignored) symbol | not present | [symbol, token_string] block | not present | block.call(token_string) symbol_1 | symbol_2 | [symbol_1, token_string.send(symbol_2)] symbol | block | [symbol, block.call(token_string)]
Once the tokenizer has been instantiated, a bunch of methods are available:
# scan text and return array of tokens tokenizer.scan(text) # re-initialize tokenizer with text tokenizer.reset(text) # returns true if no more tokens remain tokenizer.eos? # read the next token tokenizer.next_token # iterate (does not reset before iterating) tokenizer.each do |t| # do some funky shit end
There's a small example, utilising both the tokenizer and parser components, in