Rule Construction in Scala

kiedouglas edited this page Nov 25, 2013 · 7 revisions
Clone this wiki locally

A PEG consists of an arbitrary number of rules (also called expressions or productions) that are compositions of other rules, terminals (essentially characters or strings) and the seven primitive rules in the following table (where a and b denote other parsing rules).

Name Common Notation parboiled Primitive
Sequence a b a ~ b
Ordered Choice a / b a | b
Zero-or-more a * zeroOrMore(a)
One-or-more a + oneOrMore(a)
Optional a ? optional(a)
And-predicate & a &(a)
Not-predicate ! a !a

As you can see the PEG primitives are directly mapped to parboiled counterparts, which makes transcribing existing PEG grammars very easy. The parboiled primitives are methods defined in the various Rule derivatives as well as the Parser trait.
Typically you import org.parboiled.scala._ and define a custom parser class mixing in the Parser trait and defining rule methods. A rule method returns one of the various Rule implementations and mostly consists of a simple call to the method “rule” mixed in by the Parser trait. The function block passed as a parameter to the “rule” method contains the actual rule defining code.

The following grammar example will match any string containing a number of ‘a’ characters followed by the same number of ‘b’ characters:

Expression ← ‘a’ Expression* ‘b’

The parboiled for Scala parser for this language would look like this:

class ABExpressionParser extends Parser {
    def Expression = rule { "a" ~ zeroOrMore(Expression) ~ "b" }

At first this might look like resulting in an infinite recursion. However, the “rule” method prevents infinite recursions by inserting caching code and proxy objects where required.

Additionally to the primitive methods listed in the table above the following primitives are also available.

Method/Field Description
ANY Matches any single character except EOI
NOTHING Matches nothing, always fails
EMPTY Matches nothing, always succeeds
EOI Matches the special EOI character
ch(Char) Explicitly creates a rule matching one single character
{String} − {String} Matches a character from a given range (the two strings must be single character strings)
anyOf(String) Matches any one of the given strings characters
ignoreCase(Char) Matching one single character case-independently
ignoreCase(String) Matching a given string case-independently
str(String) Explicitly creates a rule matching a given string
nTimes(Int, Rule) Creates a rule matching a given sub rule exactly n times

The Parser trait also defines three implicit conversions from String, Array[Char] and Symbol to the respective rules, so you can generally use these directly without explicit wrappers.
Note that there is no implicit conversion from Char because parboiled for Scala discourages the use of Char literals in parser rules. Since Scalas Predef object defines implicit conversions from Char to Int (and other primitive widenings from Char) some constructs like the char range operator ‘-’ would not work on Chars. Since using strings is just as concise, allows for nice whitespace handling (see Handling Whitespace) and results in identical rule structures (one char strings are automatically converted to fast single char rules) there is no need to directly support Char literals in rule expressions.

Additionally to the rule primitives listed in this section there are a couple more that are used to support Parser Actions in Scala.