Parsing Performance Tuning

Mathias edited this page Sep 14, 2010 · 2 revisions
Clone this wiki locally

In certain applications, where parsing performance is of the essence, you might want to increase the parsing speed of your grammar to the maximum possible with parboileds parsing engine. There are two basic approaches to optimizing parsing performance: Optimizing the parsing grammar itself and optimizing
parboileds “execution” of the grammar.

Optimizing the parsing grammar

One good way of optimizing performance is to think about how parboileds recursive descent parser will match common input in your applications language and identify “problematic” areas.

Left-Factoring

Consider the following rule:

Rule SomeRule() {
    return FirstOf(
        Sequence(A(), X()),
        Sequence(A(), B(), Y()),
        Sequence(A(), B(), Z())
    );
}

Even though this rule construct is perfectly legal it is not the most efficient for a recursive descent parser that does not perform match memoization (which parboiled not currently supports). It order to successfully match the third alternative of the FirstOf rule in this example parboiled will have to match rule A three times and rule B twice, since the first alternative will only fail after A has already been successfully matched and parboiled will have to backtrack to the input position before the A in order to try the second alternative (which in turn will only fail after A and B have been successfully matched).

Rewriting the rule to this construct:

Rule SomeRule() {
    return Sequence(
        A(),
        FirstOf(
            X(),
            Sequence(
                B(),
                FirstOf(
                    Y(),
                    Z()
                )
            )
        )
    );
}

can speed up parsing performance considerably, especially if the third alternative is very common in the input and the rules A and B are rather complex. Of course, complete left-factoring in the way of this example is not always possible or convenient since it does require changing the rule structure in a way that might not conform to the semantic logic of the grammar or complicate action design. However, it is helpful to keep left-factoring in mind when designing a grammar, especially for the “costly” rules higher-up in your rule tree .

Reordering of FirstOf-Alternatives

When optimizing a grammar for parsing performance it is also helpful to look at the statistical properties of the expected parsing input, especially with regard to the probability of occurrence of certain FirstOf alternatives.
Consider this simple example:

Rule SomeRule() {
    return FirstOf(
        A(),
        B(),
        C()
    );
}

If you already know that alternative C is much more likely to match than alternatives A and B in most inputs you expect your parser to see you might want to reorder the alternatives to this:

Rule SomeRule() {
    return FirstOf(
        C(),
        A(),
        B()
    );
}

This way parboiled will always try alternative C first and spend less time with trying and failing with the other two alternatives. However, this optimization is only safe when the alternatives being reordered are completely independent. If, for example, A, B, and C are defined to match a 4, 2 or 1 digit number as follows

Rule A() {
    return Sequence(Digit(), Digit(), Digit(), Digit());
}

Rule B() {
    return Sequence(Digit(), Digit());
}

Rule C() {
    return Digit();
}

you cannot simply move C first. If you did C would always match and A and B would never match, since C is a prefix of A and B.

Optimizing grammar execution

Apart from optimizing the grammar itself for increased parsing performance you can also tweak the way in which parboiled matches the various rules of your grammar.

Not building a parse-tree

Since building a parse tree can be quite expensive (time- and memory-wise) it is better to not enable parse-tree building when parsing performance is important.

Profiling your grammar with the ProfilingParseRunner

Many times it is not that easy to completely foresee, how a large and possibly complicated parser digests common input. Profiling common scenarios with The ProfilingParseRunner can yield valuable insights into which areas of your grammar are particularly stressed and where grammar optimization might be most beneficial.

@MemoMismatches

Currently parboiled only features limited “packratting” support, which is the technique of memoizing the results of previously matched rules for certain input locations to curb the effect of excessive backtracking. However, since “dumb” memoization comes with significant time and memory costs itself its effects on parsing performance heavily depend on the actual grammar, input and implementation details. In-depth testing of various parboiled parsers has shown that the effects of memoization are rarely beneficial to bottom-line parsing speed in somewhat optimized, real-world grammars.

One memoization option that parboiled currently supports is the memoization of rule mismatches for consecutive applications of a rule at the same input location. This memoization can be implemented with very little overhead and can help reduce the number of re-mismatches as reported by The ProfilingParseRunner.