.. currentmodule:: parsy
.. function:: generate
generate
converts a generator function (one that uses the yield
keyword)
into a parser. The generator function must yield parsers. These parsers are
applied successively and their results are sent back to the generator using the
.send()
protocol. The generator function should return the final result of
the parsing. Alternatively it can return another parser, which is equivalent to
applying it and returning its result.
Constructing parsers by using combinators and :class:`Parser` methods to make
larger parsers works well for many simpler cases. However, for more complex
cases the generate
function decorator is both more readable and more
powerful. (For those coming from Haskell/Parsec, this method provides an
acceptable substitute for do
notation).
The first example just shows a different way of building a parser that could have easily been built using combinators:
from parsy import generate
@generate("form")
def form():
"""
Parse an s-expression form, like (a b c).
An equivalent to lparen >> expr.many() << rparen
"""
yield lparen
exprs = yield expr.many()
yield rparen
return exprs
In the example above, the parser was given a string name "form"
, which does
the same as :meth:`Parser.desc`. This is not required, as per the examples below.
Note that there is no guarantee that the entire function is executed: if any of the yielded parsers fails, the function will not complete, and parsy will try to backtrack to an alternative parser if there is one.
The second example shows how you can use multiple parse results to build up a complex object:
from datetime import date
from parsy import generate, regex, string
@generate
def date():
"""
Parse a date in the format YYYY-MM-DD
"""
year = yield regex("[0-9]{4}").map(int)
yield string("-")
month = yield regex("[0-9]{2}").map(int)
yield string("-")
day = yield regex("[0-9]{2}").map(int)
return date(year, month, day)
This could also have been achieved using :func:`seq` and :meth:`Parser.combine`.
The third example shows how we can use an earlier parsed value to influence the
subsequent parsing. This example parses Hollerith constants. Hollerith constants
are a way of specifying an arbitrary set of characters by first writing the
integer that specifies the length, followed by the character H, followed by the
set of characters. For example, pancakes
would be written 8Hpancakes
.
from parsy import generate, regex, string, any_char
@generate
def hollerith():
num = yield regex(r'[0-9]+').map(int)
yield string('H')
return any_char.times(num).concat()
(You may want to compare this with an implementation of Hollerith constants that uses pyparsing, originally by John Shipman from his pyparsing docs.)
There are also more complex examples in the :ref:`tutorial
<using-previous-values>` of using the generate
decorator to create parsers
where there is logic that is conditional upon earlier parsed values.
A fourth examples shows how you can use this syntax for grammars that you would like to define recursively (or mutually recursively).
Say we want to be able to parse an s-expression like syntax which uses parenthesis for grouping items into a tree structure, like the following:
(0 1 (2 3) (4 5 6) 7 8)
A naive approach would be:
simple = regex('[0-9]+').map(int)
group = string('(') >> expr.sep_by(string(' ')) << string(')')
expr = simple | group
The problem is that the second line will get a NameError
because expr
is
not defined yet.
One way to solve this is to use :ref:`forward-declarations`. But another uses
@generate
.
Using the @generate
syntax will introduce a level of laziness in resolving
expr
that allows things to work:
simple = regex('[0-9]+').map(int)
@generate
def group():
return (yield string('(') >> expr.sep_by(string(' ')) << string(')'))
expr = simple | group
>>> expr.parse("(0 1 (2 3) (4 5 6) 7 8)")
[0, 1, [2, 3], [4, 5, 6], 7, 8]