A simple chart parsing library for Haskell. The parser is generic over what it parses into, so it can produce parse trees, evaluations, anything.
Chart parsing consists of iteratively adding more and more information to a parsing chart until the chart is "completed". Chart parsing is ideal for parsing ambiguous grammars, as it can produce all possible parses.
A chart is a graph where the edges are labeled by either unparsed tokens or by parsed items. The nodes of the chart represent the inter-token positions in the input, and an edge labeled l from node i to node j represents that the input from position i to j can be parsed as l.
For example, consider the BNF grammar for ambiguous arithmetic expressions
<exp> ::= "x" | "y" | "z" | <exp> "+" <exp> | <exp> "-" <exp>
Prior to adding parse edges, the expression
x + y - z
corresponds to the chart
We parse the expression by progressively adding labels to the chart
according to the grammar. For instance, we can add a label from 0 to 1
because the token
x can be parsed as an
After adding some more edges, we will have recognized that both spans
x + y and
y - z can be parsed as
This sort of ambiguity is ok in a chart parser. In this particular chart,
we're labeling with just non-terminal symbols (specifically, just
so the end result will not distinguish the two parses, but a richer label type
could be chosen, for instance, an AST.
When we're done adding edges to the chart, it looks like this:
Because there's an edge from the beginning of the chart to the end of the chart,
we know that the whole of the input parses. In this case, it's an
In general, chart parsers can add edges to the chart however they like. The chart parser in this library, however, uses a left-corner technique to parse the input one word at a time. We start out in a parse state with an empty chart, and a list of tokens waiting to be added to the chart.
We then read a token by adding an edge to the graph labeled by that token.
We then add to the chart all the edges we can that end after the newly added token. In this library, this is called saturating the chart.
Saturating like this ensures that the chart has as much information as possible after reading each token, so that parsing happens incrementally. After reading a few more tokens, the chart looks like this:
And finally we end up with the final chart we got before. The evolution of the parse state looks like this: