Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 54 lines (32 sloc) 3.297 kb
d9035e7 @plioi Update README and logo.
authored
1 ![Parsley](https://github.com/plioi/parsley/raw/master/parsley.png)
2 # Parsley
26faf5d @plioi Use markdown format for README
authored
3
d9035e7 @plioi Update README and logo.
authored
4 Parsley is a monadic parser combinator library inspired by Haskell's [Parsec](http://www.haskell.org/haskellwiki/Parsec) and F#'s [FParsec](http://www.quanttec.com/fparsec/). It can parse context-sensitive, infinite look-ahead grammars but it performs best on predictive (LL[1]) grammars.
9234946 @plioi Populate README
authored
5
9f0f476 @plioi Update README.
authored
6 Unlike Parsec/FParsec, Parsley provides separate lexer/parser phases. The lexer phase is usually performed with a prioritized list of regex patterns, and parser grammars are expressed in terms of the tokens produced by the lexer.
9234946 @plioi Populate README
authored
7
c7e797a @plioi Include installation instructions in README.
authored
8 ## Installation
9
10 First, [install NuGet](http://docs.nuget.org/docs/start-here/installing-nuget). Then, install Parsley from the package manager console:
11
12 PM> Install-Package Parsley
13
d9035e7 @plioi Update README and logo.
authored
14 ## Lexer Phase (Tokenization)
26faf5d @plioi Use markdown format for README
authored
15
9f0f476 @plioi Update README.
authored
16 Strings being parsed are represented with a `Text` instance, which tracks the original string as well as the current parsing position:
9234946 @plioi Populate README
authored
17
26faf5d @plioi Use markdown format for README
authored
18 var text = new Text("some input to parse");
9234946 @plioi Populate README
authored
19
9f0f476 @plioi Update README.
authored
20 The lexer phase is implemented by anything that produces an `IEnumerable<Token>`. The default implementation, `Lexer`, builds the series of tokens when given a prioritized series of `TokenKind` token recognizers. The most common `TokenKind` implementation is `Pattern`, which recognizes tokens via regex patterns. `TokenKinds` can be skippable, if you want them to be recognized but discarded:
9234946 @plioi Populate README
authored
21
26faf5d @plioi Use markdown format for README
authored
22 var text = new Text("1 2 3 a b c");
be5f2b3 @ChrisMissal Corrected tiny typo in README.md
ChrisMissal authored
23 var lexer = new Lexer(new Pattern("letter", @"[a-z]"),
9f0f476 @plioi Update README.
authored
24 new Pattern("number", @"[0-9]+"),
c181066 @ChrisMissal Also found an unmatched paranthesis.
ChrisMissal authored
25 new Pattern("whitespace", @"\s+", skippable: true));
9234946 @plioi Populate README
authored
26
26faf5d @plioi Use markdown format for README
authored
27 Token[] tokens = lexer.ToArray();
9234946 @plioi Populate README
authored
28
9f0f476 @plioi Update README.
authored
29 Above, the array `tokens` will contain 6 `Token` objects. Each `Token` contains the literal ("1", "a", etc), the `TokenKind` that matched it, and the `Position` (line/column number) where the token was found.
30
31 The collection of `Token` produced by the lexer phase is wrapped in a `TokenStream`, which allows the rest of the system to traverse the collection of tokens in an immutable fashion.
9234946 @plioi Populate README
authored
32
d9035e7 @plioi Update README and logo.
authored
33 ## Parser Functions
26faf5d @plioi Use markdown format for README
authored
34
9f0f476 @plioi Update README.
authored
35 A parser of thingies is a method that consumes a `TokenStream` and produces a parsed-thingy:
9234946 @plioi Populate README
authored
36
1d32dce @plioi Update README in light of recent API and integration test changes.
authored
37 public interface Parser<out T>
38 {
9f0f476 @plioi Update README.
authored
39 Reply<T> Parse(TokenStream tokens);
1d32dce @plioi Update README in light of recent API and integration test changes.
authored
40 }
9234946 @plioi Populate README
authored
41
9f0f476 @plioi Update README.
authored
42 A `Reply<T>` describes whether or not the parser succeeded, the parsed-thingy (on success), a possibly-empty error message list, and a reference to a `TokenStream` representing the remaining unparsed tokens.
9234946 @plioi Populate README
authored
43
d9035e7 @plioi Update README and logo.
authored
44 ## Grammars
26faf5d @plioi Use markdown format for README
authored
45
9f0f476 @plioi Update README.
authored
46 Grammars should inherit from `Grammar` to take advantage of several `Parser` primitives. Grammars should define each grammar rule in terms of these primitives, ultimately exposing the start rule as some `Parser<T>`. Grammar rule bodies may consist of LINQ queries, which allow you to glue together other grammar rules in sequence:
9234946 @plioi Populate README
authored
47
d9035e7 @plioi Update README and logo.
authored
48 See the integration tests for a [sample JSON grammar](https://github.com/plioi/parsley/tree/master/src/Parsley.Test/IntegrationTests/Json).
9234946 @plioi Populate README
authored
49
50 Finally, we can put all these pieces together to parse some text:
51
1d32dce @plioi Update README in light of recent API and integration test changes.
authored
52 const string input = "{\"zero\" : 0, \"one\" : 1, \"two\" : 2}";
53 var tokens = new JsonLexer(input);
54 var jsonDictionary = (Dictionary<string, object>) JsonGrammar.Json.Parse(tokens).Value;
Something went wrong with that request. Please try again.