Working with the Value Stack

Irit Katriel edited this page May 12, 2015 · 5 revisions
Clone this wiki locally

In any non-trivial parsing project your parser actions will want to create custom objects corresponding in some way to the structure of the parsed input text. parboiled for Java gives you two tools to support you in the management of the custom objects created by your parser rules:

The value stack is a simple stack construct that serves as temporary storage for your custom objects. Your parser actions can push objects onto the stack, pop them off the stack, peek and poke into the stack, swap objects around, and so on. The value stack implementation is hidden behind the ValueStack interface, which defines a number of methods for operating on the stack.

All parser actions generally have access to the current value stack instance through the getValueStack method of the current Context. In order to make operations on the value stack less verbose the BaseActions class (which is the parent class of the BaseParser and therefore an ancestor of your parser class) additionally defines shortcuts for most value stack operations that can directly be used without prefix in inline Parser Action Expressions.

The way the value stack is typically used by the parser rules is as follows:

  • Rules matching separators, whitespace, and other “helper structures” generally do not affect the value stack
  • Certain bottom level rules (with regard to your custom object structure) create basic objects from matched input and push them onto the stack
  • Higher-level rules call one or more lower-level rules, pop their value objects off the stack, create higher-level objects from them and push these back onto the stack
  • The root rule is the highest-level rule which typically creates the root object your custom object structure

Most of the time, after a rule has completely matched, it has not pushed more than one object onto the stack (even though it might have done so temporarily during its execution). This way you can think of a rule as “creating a certain type of object” on the value stack if matched (As already pointed out in the general discussion of The Value Stack a rule will never affect the value stack if it did not match!)

Things to keep in mind during rule design

Since in parboiled for Java you work directly with the value stack without further structural support (which is difficult to achieve under the constraints of the Java syntax) it is important to keep a few things in mind, when designing your rules and parser actions.
The most important principle is that a rule should always supply a stable behavior with regard to its value stack operations, no matter what input it is confronted with. So, if a rule adds one value object of a certain type to the value stack, it should do so for every possible input. If this wasn’t the case outside rules referencing the rule would not be able to make assumptions on the state of the value stack after the rule has matched, which would greatly complicate action design.
The following discussion looks at the various PEG primitives and what to pay attention to when using them with parser actions affecting the value stack.

Sequence rules

Since they do not offer any optional components Sequence rules are rather straight-forward with regard to value stack operations. Their end result is inherently stable and consists simply of the concatenation of all their sub operations.

FirstOf rules

FirstOf rules offer several alternative sub rule matches. In order to provide a stable “output” to the outside it is important that all alternatives exhibit compatible value stack behavior.
Consider this example:

Rule R() {
    return FirstOf(A(), B(), C());
}

If sub rule A adds one value object to the stack sub rules B and C should too. If, for example, a match of rule B resulted in two more value objects on the stack any rule calling R would not be sure, whether the stack now has one or two additional objects, a clear violation of the “stable behavior” principle!

Optional rules

The sub rules of optional rules generally should not add or remove objects from the stack. Since an Optional rule matches always, even when its sub rule doesn’t, any change to the number of objects on the value stack would violate the “stable behavior” principle.
However, Optional rules can very well transform the contents of the stack without resulting in unstable behavior!
Consider the following example in a hypothetical BaseParser derived parser:

Rule R() {
    return Sequence(
        Number(), // number adds an Integer object to the stack
        Optional(
            '+',
            Number(), // another Integer object on the stack
            push(pop() + pop()) // pop two and repush one Integer
        )
    );
}

The end result of rule R is stable since it always adds one Integer to the stack, no matter what input it is confronted with.

ZeroOrMore and OneOrMore rules

Similar to Optional rules ZeroOrMore and OneOrMore rules are only stable with regard to their value stack behavior if their sub rules do not add or remove value stack elements. And as with Optional rules this does not mean that they are not allowed to alter existing stack values. The example in the discussion of the Optional rules above would work just as well when the Optional rule is replaced with either a ZeroOrMore or a OneOrMore rule.