Parser Actions in Scala

ebruchez edited this page Aug 19, 2013 · 10 revisions
Clone this wiki locally

In parboiled for Java you specify parser actions as boolean expressions, which are automatically turned into parser action rules. There is no further typing of actions that allows parboiled for Java to differentiate between actions that push one, push two or pop five values of the parser value stack. This means that the Java developer cannot rely on the compiler to validate the consistency of the value stack operations performed by these parser actions. As a result parboiled for Java demands more discipline during rule and action design (as outlined in Working with the Value Stack).

In parboiled for Scala Scala’s type inference capabilities make it feasible to support a higher level of abstraction with regard to parser actions than their counterparts in parboiled for Java. In parboiled for Scala parser actions do not directly operate on the value stack but are specified as functions. As such they are not simply opaque blocks of code but have a clearly defined type.

Depending on what parser actions are contained within a rule the actual class type of the rule changes. A rule without any effect on the parser value stack has the type Rule0. A rule that results in one more value object of type A being pushed onto the parser value stack has type Rule1[A]. A rule that results in two more value object of type A and B being pushed onto the parser value stack has type Rule2[A,B]. A rule that results in one value object of type A being popped off the stack has type PopRule1[Z].
Currently there are a total of 15 different concrete rule types defined underneath the common, abstract super type Rule.

This somewhat elaborate class structure allows parboiled for Scala to encode in the type of the rule how exactly the rule affects the parsers value stack and make sure that all the parser actions properly work together to produce the parsers end result value. Note that this does not impose any restrictions on the type of your value objects!

There are three different ways for your parser actions, i.e. action functions, to be linked into the rule structure:

  1. The action operators
  2. The push, test and run methods
  3. Stand-alone actions

The Action Operators

parboiled for Scala defines 9 different action operators. All of these link an action function into the grammars rule structure but differ in the type and semantics of their action function arguments. The following table gives an overview:

Action Argument(s)
Action Result    String Value Object pop Value Object peek Char IndexRange
Value Object ~> ~~> ~~~> ~:> ~>>
Boolean ~? ~~? ~~~?
Unit ~% ~~% ~~~%

The operators beginning with a single ‘~’ character are the way in which parser actions commonly receive matched input text. Their parameter is a function of type (String ⇒ …). The operator internally creates a new action rule that, when run, passes the input text matched by the immediately preceding peer rule as an argument to the given function.
The operators beginning with either “~~” or “~~~” take one or more value objects as arguments. They differ in whether their argument objects are popped off the stack or merely looked up without removal (peeked).

The operators ending with a ‘>’ character create one new value object, which is pushed onto the parsers value stack after their action function has run. The type of these actions result value is encoded in the return type of the operator. The operators ending with a ‘?’ character take a function returning a Boolean, which act as semantic predicates. If their action function returns false they stop the evaluation of the current rule sequence as unmatched and force the parser to backtrack and find other matching alternatives.
Finally the operators ending with a ‘%’ character allow you run arbitrary logic without any direct influence on the parsing process. Their action functions return Unit and are simply run whenever the parser “comes across them” during a parsing run.

The push, test and run methods

The action operators discussed in the previous section all allow your actions to “link into” the current parsing process by taking either the matched input text or produced value stack elements as parameters. Sometimes, however, your actions do not need any input since their position in the rule structure is all the context they need. In these cases you can use the “push”, “test” and “run” methods provided by the Parser trait to achieve the same results as the “…>”, “…?” and “…%” operators (respectively) discussed in the previous section.

The action rules created by these methods can be chained into the rule structure like any other rule. As an example here is one basic rule from the JsonParser1 example:

def JsonTrue = rule { "true " ~ push(True) }

Stand-alone Actions

As a last option you can specify parser actions as “Stand-alone Actions”. Stand-alone actions are functions directly taking a Context object as parameter. They can be used just like normal rules since the Parser trait provides the following two implicit conversions:

Method Semantics
toRunAction(f: (Context[Any]) ⇒ Unit): Rule0 General non-predicate action
toTestAction(f: (Context[Any]) ⇒ Boolean): Rule0 General semantic predicate action

The current parsing Context gives these general actions full access to the state of the parser (similar to the actions in parboiled for Java). They could even manipulate the parsers value stack by directly accessing the instance returned by the contexts getValueStack method. However, doing so is normally not recommended since this would circumvent the Scala compilers ability to validate the parsers value stack operations for consistency.

“withContext” Actions

As another convenience tool the parser trait provides a number of “withContext” methods that you can use to wrap an action function before passing it to an action operator. The “withContext” methods allow your action function to receive, in addition to their “regular” parameters, the current parsing Context.

For example there is a “withContext” method with a signature similar to the following:

def withContext[A, B, R](f: (A, B, Context[_]) => R): ((A, B) => R)

So, an action function wrapped with this method will appear to the outside like a function that, for example, pops two value objects of the stack and produces one new one. However, internally your action can also receive an instance of the current Context, which allows it, for example, to take a look at the current input position with its line number.

For understanding how these different action types work together to enable a concise and statically type-checked rule design take a look at the JSON Parser example.