Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
The Big Picture
parboiled provides a recursive descent PEG parser implementation that operates on PEG rules you specify.
Your grammar specification can include parser actions, which perform arbitrary logic at any given point during the parsing process, for example in order to augment input recognition with custom conditions or to construct an Abstract Syntax Tree (AST).
Your code interacts with parboiled in two phases. In the first phase, the “rule construction” phase, parboiled builds a tree (or rather a directed graph) of parser rules in the way your Java or Scala DSL code specifies. This phase is independent from actual input and only has to be performed once during the lifetime of an JVM instance (i.e. the built rule tree is reusable).
The second phase is the actual parsing phase, in which your rules are run against a specific input text (the “rule execution” phase). The end result of that phase is the following information:
- Boolean flag determining whether the input matched or did not match the root rule
- List of potentially encountered parse errors
- One or more value object(s) constructed by your parser actions
Rule construction happens by executing the rule defining code in your Java or Scala sources. parboiled provides two separate DSLs (one for Java, one for Scala) that try to make rule definition as “comfortable” as possible under the constraints of the respective source language.
In Java you derive a custom class from BaseParser, the required base class of all parboiled for Java parsers, and define methods returning Rule instances. These methods construct a rule instance from other rules, terminals, predefined primitives and action expressions. Because the Java syntax is somewhat restrictive parboiled employs a process called “Parser extension” (see Parser Extension in Detail) to support more concise rule construction code than otherwise possible.
Because Scala is much more expressive by itself parboiled for Scala does not need a separate parser extension step. In Scala your parser rule “tree” is being built directly from the parboiled for Scala language elements.
In order for your parser to be more than just a “recognizer” (i.e. a piece of code determining whether a given input conforms to the language defined by your grammar) your parser needs to include parser actions. Parser actions are snippets of custom code that are executed at specific points during rule execution. Apart from inspecting the parser state (e.g. by looking at matched input text segments) parser actions typically construct parser “values” (e.g. AST nodes) and can actively influence the parsing process as semantic predicates.
The Value Stack
During the rule execution phase your parser actions can make use of the “value stack” for organizing the construction of custom object like AST nodes. The value stack is a simple stack construct that serves as temporary storage for custom objects. The way you use the value stack depends on whether you are using parboiled for Java or parboiled for Scala.
The Parse Tree
During the rule execution phase parboiled can optionally construct a parse tree, whose nodes correspond to the recognized rules. Each parse tree Node contains a reference to the Matcher of the rule it was constructed from, the matched input text (position) as well as the current element at the top of the value stack. The parse tree can be viewed as the record of what rules have matched a given input and is particularly useful during debugging.
The ParseRunner is responsible for “supervising” a parsing run and optionally apply additional logic, most importantly the handling of illegal input characters (according to the grammar), aka parse errors. When you perform a parsing run with parboiled you can choose from these 5 predefined ParseRunners:
- BasicParseRunner, the fastest and most basic ParseRunner, performs no error handling
- ReportingParseRunner, creates a proper InvalidInputError object for the first parse error in the input
- RecoveringParseRunner, the most complex ParseRunner, reports all parse errors in the input and tries to intelligently recover from them (see Parse Error Handling)
- TracingParseRunner, selectively prints tracing statements for each rule that matched and/or mismatched (see Grammar and Parser Debugging)
- ProfilingParseRunner, a special ParseRunner producing detailed statistics about how your parser digested one or more inputs (see The ProfilingParseRunner)