dine is a Parser Combinator Library targeting Python >=3.10.
- Python 3.10
At the moment, the project has not been published to pypi. You can instead install it directly from github as follows:
$ python -m pip install git+https://github.com/nathan-wien/dine.git
Using the dine.parser.Parser
class, you can create parsing functors. Functors are object that can be called like functions.
When called, each parsing functor (or parser for short) can accept either:
- a builtin
str
object, or - a
dine.parser.Stream
object.
The parser then returns an object of base class dine.result.ParseResult
, which can either be:
- A
ParseSuccess(loc, val, rs)
object if the parser parses successfully, where:loc: dine.stream.Location
is the parsed location (line and column) in the initial stream,val
is the parsed value, andrs: dine.stream.Stream
is the remaining stream after applying the parser.
- A
ParseFailure(loc, label, msg)
object if the parser fails to parse, where:loc: dine.stream.Location
is the location (line and column) in the initial stream where the parser (first) fails to parse,label: str
the label of the parser that fails to parse, andmsg: str
is the error message.
>>> from dine.parser import Parser
# functor that parses a digit
>>> digit_parser = Parser.digit()
>>> digit_parser('42')
ParseSuccess(
loc=(line=1,col=1),
val='4',
rs=Stream("2")
)
>>> digit_parser('hi')
ParseFailure(
loc=(line=1,col=1),
label='digit',
msg="unexpected character 'h'"))
)
# functor that parses a lowercase ASCII character
>>> lowercase_parser = Parser.ascii_lowercase()
>>> lowercase_parser('abc')
ParseSuccess(
loc=(line=1,col=1),
val='a',
rs=Stream("bc")
)
>>> lowercase_parser('ABC')
ParseFailure(
loc=(line=1,col=1),
label='dine.parser.Parser.ascii_lowercase',
msg="unexpected character 'A'"))
)
It is not a parser combinator library without the ability of combining parsers together to create more complex parsers.
The following shows some combinators that dine
offers. For an exhaustive list of combinator, please refer to the documentation.
>>> from dine.parser import Parser
# apply a parser after the other
>>> Parser.char('a').and_then(Parser.char('b'))('ab$')
ParseSuccess(
loc=(line=1,col=1),
val=('a', 'b'),
rs=Stream("$")
)
# alternatively
>>> (Parser.char('a') & Parser.char('b'))('ab$')
ParseSuccess(
loc=(line=1,col=1),
val=('a', 'b'),
rs=Stream("$")
)
# apply another parser if the first one fails
>>> Parser.char('a').or_else(Parser.char('b'))('ab$')
ParseSuccess(
loc=(line=1,col=1),
val='a',
rs=Stream("b$")
)
# alternatively
>>> (Parser.char('a') | Parser.char('b'))('ab$')
ParseSuccess(
loc=(line=1,col=1),
val='a',
rs=Stream("b$")
)
# parse 1 or more digits
>>> digits_parser = Parser.digit().many1()
>>> digits_parser('123abc')
ParseSuccess(
loc=(line=1,col=1),
val=['1', '2', '3'],
rs=Stream("abc")
)
# You can convert the parsed value (the `val` field in a `ParsedSuccess` object)
# to anything you want using the `map` method. For example:
>>> num_parser = digits_parser.map(lambda digit_list: int("".join(digit_list)))
>>> num_parser('123abc')
ParseSuccess(
loc=(line=1,col=1),
val=123,
rs=Stream("abc")
)
# Parser that sequences a bunch of parsers, one after the other
>>> abc_parser = Parser.sequence(
... [Parser.char('a'), Parser.char('b'), Parser.char('c')]
... ).set_label('abc_parser')
>>> abc_parser('abc$')
ParseSuccess(
loc=(line=1,col=1),
val=['a', 'b', 'c'],
rs=Stream("$")
)
>>> abc_parser('$')
ParseFailure(
loc=(line=1,col=1),
label='abc_parser',
msg="unexpected character '$'"))
)
# Parser that parses a bunch of alternatives
>>> oneof_abc_parser = Parser.choice(
... [Parser.char('a'), Parser.char('b'), Parser.char('c')]
... ).set_label('oneof_abc_parser')
>>> oneof_abc_parser('c$')
ParseSuccess(
loc=(line=1,col=1),
val='c',
rs=Stream("$")
)
>>> oneof_abc_parser('d$')
ParseFailure(
loc=(line=1,col=1),
label='oneof_abc_parser',
msg="unexpected character 'd'"))
)
# Parsers that throw away things
>>> Parser.char('b').preceded_by(Parser.string("@"))("@b$")
ParseSuccess(
loc=(line=1,col=1),
val='b',
rs=Stream("$")
)
>>> Parser.char('b').succeeded_by(Parser.string("@"))("b@$")
ParseSuccess(
loc=(line=1,col=1),
val='b',
rs=Stream("$")
)
# Parser that parses a list of numbers separated by commas
>>> comma_parser = Parser.char(',')
>>> num_list_parser = num_parser.many1_sep_by(comma_parser)
>>> num_list_parser('5,15,250,1000')
ParseSuccess(
loc=(line=1,col=1),
val=[5, 15, 250, 1000],
rs=Stream("")
)
The full documentation can be found here. The documentation will be updated with more details and examples in the future.
- Why is the minimum python version compatible with this library is 3.10?
- The implementation of this library makes heavy use of the structural pattern matching (a.k.a.
match
statement) feature, which is only available on python 3.10 or later.
- The implementation of this library makes heavy use of the structural pattern matching (a.k.a.
- The COMP4403 course (Compilers and Interpreters) at the University of Queensland.
- Scott Wlaschin's talk on parser combinator and his blog posts on the topic.
- Max Bo's Parser Combinator Talk at UQCS.