Rule Construction in Java

sirthias edited this page Oct 21, 2011 · 6 revisions
Clone this wiki locally

A PEG consists of an arbitrary number of rules (also called expressions or productions) that are compositions of other rules, terminals (essentially characters or strings) and the seven primitive rules in the following table (where a and b denote other parsing rules).

Name Common Notation parboiled Primitive
Sequence a b Sequence (a, b)
Ordered Choice a / b FirstOf (a, b)
Zero-or-more a * ZeroOrMore (a)
One-or-more a + OneOrMore (a)
Optional a ? Optional (a)
And-predicate & a Test (a)
Not-predicate ! a TestNot (a)

The parboiled primitive rules are simple instance methods of the BaseParser class, which is the required base class of any custom parboiled for Java parser. You define your own parsing grammar by deriving a custom class from the abstract base class BaseParser. Any method with the return type Rule will then serve as a grammar rule definition.

The following grammar for example will match any string containing a number of ‘a’ characters followed by the same number of ‘b’ characters:

Expression ← ‘a’ Expression* ‘b’

The parboiled parser for this language would look like this:

class ABExpressionParser extends BaseParser<Object> {
    Rule Expression() {
        return Sequence('a', ZeroOrMore(Expression()), 'b');
    }
}

Your IDE might warn you that this method will create an infinite recursion. With normal Java code this would be the case, however, parboiled will prevent infinite recursions by inserting caching code and proxy objects where required. You can therefore ignore the warning or disable the respective IDE inspection for the class. (Note that, by default, parboiled will only instrument parameter-less rule creating methods with this functionality, if you define rule creating methods with parameters you will have to either apply the @Cached annotation or prevent infinite recursions yourself). One requirement that results from this parser class instrumentation is that you must not create your parser object instance by directly calling its constructor. You have to use the Parboiled.createParser method to construct your instance. You can, however, still make use of constructors with arbitrary parameters.

When you look at the signature of the primitive rule creation methods from the table above you will see that they take one or more Object arguments. These arguments can be one of the following:

  • A Rule instance (most often created by a call to another rule method)
  • A character literal
  • A string literal
  • A char array
  • An action expression (see Parser Action Expressions)
  • An instance of a class implementing the Action interface

Additionally to the primitive methods listed in the table above the following primitives are also available.

Method/Field Description
ANY Matches any single character except EOI
NOTHING Matches nothing, always fails
EMPTY Matches nothing, always succeeds
EOI Matches the special EOI character
Ch (char) Explicitly creates a rule matching one single character
CharRange (char, char) Matches a character from a given range
AnyOf (string) Matches any one of the given strings characters
NoneOf (string) Matches any character except the ones of the given string and EOI
IgnoreCase (char) Matching one single character case-independently
IgnoreCase (string) Matching a given string case-independently
String (string) Explicitly creates a rule matching a given string

Your parser also inherits a number of helper methods from the BaseActions class. These are used primarily within Parser Action Expressions.