{{ message }}

pyrocat101 / opal Public

Self-contained monadic parser combinators for OCaml

Switch branches/tags
Nothing to show

Latest commit

`8d8a1f3`

Files

Failed to load latest commit information.
Type
Name
Commit time
Jan 17, 2015
Mar 6, 2017
Mar 9, 2015
Jun 22, 2016
Mar 6, 2017
Jun 22, 2016
Apr 3, 2020
Apr 3, 2020
Apr 3, 2020

Opal: Monadic Parser Combinators for OCaml

Opal is a minimum collection of useful parsers and combinators (~150 loc of OCaml) that makes writing parsers easier. It is designed to be small, self-contained, pure-functional, and only includes most essential parsers, so that one could include single file in the project or just embed it in other OCaml source code files.

I find myself writing lots of recursive-descent parsers from scratch in OCaml when I was solving Hackerrank FP challenges. That's why I wrote opal: to include it in the source code and build parsers on top of parser combinators easily.

Example

Trivial arithmetic calculator:

```open Opal

let parens = between (exactly '(') (exactly ')')
let integer = many1 digit => implode % int_of_string
let add = exactly '+' >> return ( + )
let sub = exactly '-' >> return ( - )
let mul = exactly '*' >> return ( * )
let div = exactly '/' >> return ( / )

let rec expr input = chainl1 term (add <|> sub) input
and term input = chainl1 factor (mul <|> div) input
and factor input = (parens expr <|> integer) input

let () =
let input = LazyStream.of_channel stdin in
match parse expr input with
| Some ans -> Printf.printf "%d\n" ans
| None -> print_endline "ERROR!"```

For non-trivial examples, see Hackerrank challenge solutions using opal in `examples/`.

Documentation

The expressiveness of parser combinators are attributed to higher-order functions and the extensive use of currying. However, due to lack of `do` syntax, the bind operation of monad would not be as succinct as that in Haskell.

A parser monad is either `None` (indicates failure), or `Some` pair of result and unconsumed input, where the result is a user-defined value. The input is a lazy stream of arbitrary token type. A parser is a function that accepts an input and returns a parser monad. Although most parsers in opal is polymorphic over token type and result type, some useful parsers only accepts `char` as input token type.

Since combinators in opal are roughly based on Haskell's Parsec. The following documentation is somehow a rip-off of Parsec's doc.

Lazy Stream

`type 'a LazyStream t`

Polymorphic lazy stream type.

`val LazyStream.of_stream : 'a Stream.t -> 'a LazyStream.t`

Build a lazy stream from stream.

`val LazyStream.of_function : (unit -> 'a) -> 'a LazyStream.t`

Build a lazy stream from a function `f`. The elements in the stream is populated by calling `f ()`.

`val LazyStream.of_string : string -> char LazyStream.t`

Build a char lazy stream from string.

`val LazyStream.of_channel : in_channel -> char LazyStream.t`

Build a char lazy stream from a input channel.

Utilities

`val implode : char list -> bytes`

Implode character list into a string. Useful when used with `many`.

`val explode : bytes -> char list`

Explode a string into a character list.

`val ( % ) : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c`

Infix operator for left-to-right function composition. `(f % g % h) x` is equivalent to `h (g (f x))`.

`val parse : ('token, 'result) parser -> 'token LazyStream -> 'result option`

`parse parser input` parses `input` with `parser`, and returns `Some result` if succeed, or `None` on failure.

Primitives

`type 'token input = 'token LazyStream.t`

`type ('token, 'result) monad = ('result * 'token input) option`

`type ('token, 'result) parser = 'token input -> ('token, 'result) monad`

A parser is a function that accepts an input and returns either `None` on failure, or `Some (result, input')`, where `result` is user-defined value and `input'` is unconsumed input after parsing.

`val return : 'result -> 'token input -> ('token, 'result) monad`

Accepts a value and an input, and returns a monad.

`val ( >>= ) : ('t, 'r) parser -> ('r -> ('t, 'r) monad) -> ('t, 'r) parser`

`x >>= f` returns a new parser that if parser `x` succeeds, applies function `f` on monad produced by `x`, and produces a new monad (a.k.a. `bind`).

`val ( let* ) : ('t, 'r) parser -> ('r -> ('t, 'r) monad) -> ('t, 'r) parser`

This operator is the same as `>>=` but using the `let` notation. It is usefull to avoid ugly sequences of bindings. For exemple, `p >>= fun x -> f x` can be rewritten `let* x = p in f x`. Combined with the `return` function, we can define complex parsers :

```let tuple_parser =
let* x = digit in
let* _ = exactly ',' in
let* y = digit in
return (x, y)```

`val ( <|> ) : ('t, 'r) parser -> ('t, 'r) parser -> ('t, 'r) parser`

Choice combinator. The parser `p <|> q` first applies `p`. If it succeeds, the value of `p` is returned. If `p` fails, parser `q` is tried.

`val mzero : 'a -> ('t, 'r) monad`

A parser that always fails.

`val any : ('t, 'r) parser`

The parser succeeds for any token in the input. Consumes a token and returns it.

`val satisfy : ('t, 'r) parser -> ('t -> bool) -> ('t, 'r) parser`

The parser `satisfy test` succeeds for any token for which the supplied function `test` returns `true`. Returns the token that is actually parsed.

`val eof : 'a -> ('t, 'a) parser`

The parser `eof x` succeeds if the input is exhausted. Returns value `x`.

Derived

`val ( => ) : ('t, 'r) parser -> ('r -> 'a) -> ('t, 'a) parser`

Map combinator. `x => f` parses `x`. If it succeeds, returns the value of `x` applied with `f`.

`val ( >> ) : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'b) parser`

Ignore-left combinator. `x >> y` parses `x` and then `y`. Returns the value returned by `y`.

`val ( << ) : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'a) parser`

Ignore-right combinator. `x >> y` parses `x` and then `y`. Returns the value returned by `x`.

`val ( <~> ) : ('t, 'r) parser -> ('t, 'r list) parser -> 't input -> ('t, 'r list) monad`

Cons combinator. `x <~> y` parses `x` and then `y`. Returns the value of `x` prepended to the value of `y` (a list).

`let ident = letter <~> many alpha_num`

`val choice : ('t, 'r) parser list -> ('t, 'r) parser`

`choice ps` tries to apply the parsers in the list `ps` in order, until one of them succeeds. Returns the value of the succeeding parser.

`val count : int -> ('t, 'r) parser -> 't input -> ('t, 'r list) monad`

`count n` parses `n` occurrences of `p`. If `n` is smaller or equal to zero, the parser equals to `return []`. Returns a list of `n` values returned by `p`.

`between : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'r) parser -> ('t, 'r) parser`

`between open close p` parses `open`, followed by `p` and `close`. Returns the value returned by `p`.

`let braces = between (exactly '{') (exactly '}')`

`val option : 'r -> ('t, 'r) parser -> ('t, 'r) parser`

`option default p` tries to apply parser `p`. If `p` fails, it returns the value `default`, otherwise the value returned by `p`.

`let priority = option 0 (digit => String.make 1 % int_of_string)`

`val optional : 'r -> ('t, 'r) parser -> ('t, unit) parser`

`optional p` tries to apply parser `p`. It will parse `p` or nothing. It only fails if `p` fails. Discard the result of `p`.

`val skip_many : ('t, 'r) parser -> ('t, unit) parser`

`skip_many p` applies `p` zero or more times, skipping its result.

`let spaces = skip_many space`

`val skip_many1 : ('t, 'r) parser -> ('t, unit) parser`

`skip_many1 p` applies `p` one or more times, skipping its result.

`val many : ('t, 'r) parser -> 't input -> ('t, 'r list) monad`

`many p` applies the parser `p` zero or more times. Returns a list of returned values of `p`.

`val many1 : ('t, 'r) parser -> 't input -> ('t, 'r list) monad`

`many1 p` applies the parser `p` one or more times. Returns a list of returned values of `p`.

`val sep_by : ('t, 'r) parser -> ('t, 'a) parser -> 't input -> ('t, 'r list) monad`

`sep_by p sep` parses zero or more occurrences of `p`, separated by `sep`. Returns a list of values returned by `p`.

`let comma_sep p = sep_by p (token ",")`

`val sep_by1 : ('t, 'r) parser -> ('t, 'a) parser -> 't input -> ('t, 'r list) monad`

`sep_by1 p sep` parses one or more occurrences of `p`, separated by `sep`. Returns a list of values returned by `p`.

`val end_by: ('t, 'r) parser -> ('t, 'a) parser -> ('t, 'r) parser`

`end_by p sep` parses zero or more ocurrences of `p`, separated and ended by `sep`. Returns a list of values returned by `p`.

`let statements = end_by statement (token ";")`

`val end_by1: ('t, 'r) parser -> ('t, 'a) parser -> ('t, 'r) parser`

`end_by1 p sep` parses one or more ocurrences of `p`, separated and ended by `sep`. Returns a list of values returned by `p`.

`val chainl : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> 'r -> ('t, 'r) parser`

`chainl p op default` parses zero or more occurrences of `p`, separated by `op`. Returns a value obtained by a left associative application of all functions by `op` to the values returned by `p`. If there are zero occurences of `p`, the value `default` is returned.

`val chainl1 : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> ('t, 'r) parser`

`chainl1 p op` parses one or more occurrences of `p`, separate by `op`. Returns a value obtained by a left associative application of all functions returned by `op` to the values returned by `p`. This parser can be used to eliminate left recursion which typically occurs in expression grammars. See the arithmetic caculator example above.

`val chainr : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> 'r -> ('t, 'r) parser`

`chainr p op default` parses zero or more occurrences of `p`, separated by `op`. Returns a value obtained by right associative application of all functions returned by `op` to the values returned by `p`. If there are no occurrences of `p`, the value `x` is returned.

`val chainr1 : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> ('t, 'r) parser`

`chainr p op` parses one or more occurrences of `p`, separated by `op`. Returns a value obtained by right associative application of all functions returned by `op` to the values returned by `p`.

Singletons

`val exactly : 'r -> ('t, 'r) parser`

`exactly x` parses a single token `x`. Returns the parsed token (i.e. `x`).

`let semi_colon = exactly ';'`

`val one_of : 'r list -> ('t, 'r) parser`

`one_of xs` succeeds if the current token is in the supplied list of tokens `xs`. Returns the parsed token.

`let vowel = one_of ['a'; 'e'; 'i'; 'o'; 'u']`

`val none_of : 'r list -> ('t, 'r) parser`

As the dual of `one_of`, `none_of xs` succeeds if the current token not in the supplied list of tokens `xs`. Returns the parsed token.

`let consonant = none_of ['a'; 'e'; 'i'; 'o'; 'u']`

`val range : 'r -> 'r -> ('t, 'r) parser`

`range low high` succeeds if the current token is in the range between `low` and `high` (inclusive). Returns the parsed token.

Char Parsers

`val space = (char, char) parser`

Parses a white space character (`'\s\t\r\n'`). Returns the parsed character.

`val spaces = (char, unit) parser`

Skip zero or more white spaces characters.

`val newline = (char, char) parser`

Parses a newline character (`'\n'`). Returns a newline character.

`val tab = (char, char) parser`

Parses a tab character (`'\t'`). Returns a tab character.

`val upper = (char, char) parser`

Parses an upper case letter (a character between 'A' and 'Z'). Returns the parsed character.

`val lower = (char, char) parser`

Parses a lower case letter (a character between 'a' and 'z'). Returns the parsed character.

`val digit : (char, char) parser`

Parses a digit. Returns the parsed character.

`val letter = (char, char) parser`

Parses a letter (an upper case or lower case letter). Returns the parsed character.

`val alpha_num = (char, char) parser`

Parses a letter or digit. Returns the parser character.

`val hex_digit = (char, char) parser`

Parses a hexadecimal digit (a digit or a letter between 'a' and 'f' or 'A' and 'F'). Returns the parsed character.

`val oct_digit = (char, char) parser`

Parses an octal digit (a character between '0' and '7'). Returns the parsed character.

Lex Helper

`val lexeme : (char, 'r) parser -> (char, 'r) parser`

`lexeme p` first applies `skip_many space` and then parser `p`. Returns the value returned by `p`.

`val token : string -> char input -> (char, char list) monad`

`token s` skips leading white spaces and parses a sequence of characters given by string `s`. Returns the parsed character sequence as a list.

`div_or_mod = token "div" <|> token "mod"`

Self-contained monadic parser combinators for OCaml

2 tags

Packages 0

No packages published

•
•
•
•