Skip to content

Commit

Permalink
more doc
Browse files Browse the repository at this point in the history
  • Loading branch information
cgrand committed Aug 5, 2011
1 parent 51d4736 commit ad0f377
Showing 1 changed file with 147 additions and 46 deletions.
193 changes: 147 additions & 46 deletions README.asciidoc
Expand Up @@ -12,7 +12,9 @@ time* (worst case: it behaves like a restartable parser).

Parsley parsers have *no separate lexer*, this allows for better compositionality
of grammars.
For now Parsley uses the same technique as described in this paper:

For now Parsley uses the same technique (for lexer-less parsing) as described
in this paper:
Context-Aware Scanning for Parsing Extensible Languages
http://www.umsec.umn.edu/publications/Context-Aware-Scanning-Parsing-Extensible-Language

Expand All @@ -28,9 +30,9 @@ A keyword and another value form a production rule.
A simple grammar is:

----
:expr #{"x" ["(" :expr* ")"]}
:expr #{"x" ["(" :expr* ")"]}
----

`x` `()` `(xx)` `((x)())` are recognized by this grammar.

By default the main production of a grammar is the first one.
Expand All @@ -55,23 +57,23 @@ These two grammars specify the same language but the resulting parse-trees will
be different (additional `:expr-rep` nodes):

----
:expr #{"x" ["(" :expr* ")"]}
:expr #{"x" ["(" :expr* ")"]}
----

----
:expr #{"x" :expr-rep}
:expr-rep ["(" :expr* ")"]
:expr #{"x" :expr-rep}
:expr-rep ["(" :expr* ")"]
----

These two grammars specify the same language and the same parse-trees:

----
:expr #{"x" ["(" :expr* ")"]}
:expr #{"x" ["(" :expr* ")"]}
----

----
:expr #{"x" :expr-rep}
:expr-rep- ["(" :expr* ")"]
:expr #{"x" :expr-rep}
:expr-rep- ["(" :expr* ")"]
----


Expand All @@ -80,36 +82,36 @@ These two grammars specify the same language and the same parse-trees:
A parser is created using the `parser` or `make-parser` functions.

----
(require '[net.cgrand.parsley :as p])
(defn p (p/parser :expr #{"x" ["(" :expr* ")"]}))
(pprint (p "(x(x))"))
{:tag :net.cgrand.parsley/root,
:content
[{:tag :expr,
:content
["("
{:tag :expr, :content ["x"]}
{:tag :expr, :content ["(" {:tag :expr, :content ["x"]} ")"]}
")"]}]}
; running on malformed input with garbage
(pprint (p "a(zldxn(dez)"))
{:tag :net.cgrand.parsley/unfinished,
(require '[net.cgrand.parsley :as p])
(defn p (p/parser :expr #{"x" ["(" :expr* ")"]}))
(pprint (p "(x(x))"))
{:tag :net.cgrand.parsley/root,
:content
[{:tag :expr,
:content
["("
{:tag :expr, :content ["x"]}
{:tag :expr, :content ["(" {:tag :expr, :content ["x"]} ")"]}
")"]}]}
; running on malformed input with garbage
(pprint (p "a(zldxn(dez)"))
{:tag :net.cgrand.parsley/unfinished,
:content
[{:tag :net.cgrand.parsley/unexpected, :content ["a"]}
{:tag :net.cgrand.parsley/unfinished,
:content
["("
{:tag :net.cgrand.parsley/unexpected, :content ["zld"]}
{:tag :expr, :content ["x"]}
{:tag :net.cgrand.parsley/unexpected, :content ["n"]}
{:tag :expr,
:content
[{:tag :net.cgrand.parsley/unexpected, :content ["a"]}
{:tag :net.cgrand.parsley/unfinished,
:content
["("
{:tag :net.cgrand.parsley/unexpected, :content ["zld"]}
{:tag :expr, :content ["x"]}
{:tag :net.cgrand.parsley/unexpected, :content ["n"]}
{:tag :expr,
:content
["("
{:tag :net.cgrand.parsley/unexpected, :content ["dez"]}
")"]}]}]}
["("
{:tag :net.cgrand.parsley/unexpected, :content ["dez"]}
")"]}]}]}
----


Expand All @@ -118,14 +120,14 @@ A parser is created using the `parser` or `make-parser` functions.
Creating a buffer, editing it and getting its resulting parse-tree:

----
(-> p p/incremental-buffer (p/edit 0 0 "(") (p/edit 1 0 "(x)") p/parse-tree pprint)
(-> p p/incremental-buffer (p/edit 0 0 "(") (p/edit 1 0 "(x)") p/parse-tree pprint)
{:tag :net.cgrand.parsley/unfinished,
:content
[{:tag :net.cgrand.parsley/unfinished,
:content
["("
{:tag :expr, :content ["(" {:tag :expr, :content ["x"]} ")"]}]}]}
{:tag :net.cgrand.parsley/unfinished,
:content
[{:tag :net.cgrand.parsley/unfinished,
:content
["("
{:tag :expr, :content ["(" {:tag :expr, :content ["x"]} ")"]}]}]}
----

Incremental parsing at work:
Expand All @@ -148,6 +150,105 @@ nil
Hence, *reparsing the buffer only took a fraction of the original time* despite
the buffer having been modified at the start and at the end.


== Incremental parsing ==

The input string is split into _chunks_ (lines by default) and chunks are always
reparsed as a whole, so don't experiment with incremental parsing with 1-line
inputs!

Let's look at a bit more complex example:

-----
=> (def p (p/parser
{:main :expr*
:space :ws?
:make-node (fn [tag content] {:tag tag :content content :id (gensym)})}
:ws #"\s+"
:expr #{#"\w+" ["(" :expr* ")"]}))
----

This example introduces the option map: if the first arg to `parser` is a map
(instead of a keyword), it's a map of options. See <<options>> for more.

The important option here is that we redefine how nodes of the parse-tree are
constructed (via the `make-node` option). We add a unique identifier to each
node.

Now let's create a 3-line input and parse it:

-----
=> (def buf (-> p incremental-buffer (edit 0 0 "((a)\n(b)\n(c))")))
=> (-> buf parse-tree pprint)
nil
{:tag :net.cgrand.parsley/root,
:content
[{:tag :expr,
:content
["("
{:tag :expr,
:content ["(" {:tag :expr, :content ["a"], :id G__1806} ")"],
:id G__1807}
{:tag :ws, :content ["\n"], :id G__1808}
{:tag :expr,
:content ["(" {:tag :expr, :content ["b"], :id G__1809} ")"],
:id G__1810}
{:tag :ws, :content ["\n"], :id G__1811}
{:tag :expr,
:content ["(" {:tag :expr, :content ["c"], :id G__1812} ")"],
:id G__1813}
")"],
:id G__1814}],
:id G__1815}
----

Now, let's modify this "B" in "BOO" and parse the buffer again:

----
=> (-> buf (edit 6 1 "BOO") parse-tree pprint)
nil
{:tag :net.cgrand.parsley/root,
:content
[{:tag :expr,
:content
["("
{:tag :expr,
:content ["(" {:tag :expr, :content ["a"], :id G__1806} ")"],
:id G__1807}
{:tag :ws, :content ["\n"], :id G__1818}
{:tag :expr,
:content ["(" {:tag :expr, :content ["BOO"], :id G__1819} ")"],
:id G__1820}
{:tag :ws, :content ["\n"], :id G__1811}
{:tag :expr,
:content ["(" {:tag :expr, :content ["c"], :id G__1812} ")"],
:id G__1813}
")"],
:id G__1821}],
:id G__1822}
-----

We can spot that 5 out of the 10 nodes are shared with the previous parse-tree.


[[options]]
== Options ==

root tag, auto spacing, node generation, main rule
`:main` specifies the root production, by default this is the the first
production of the grammar.

`:root-tag` specifies the tag name to use for the root node
(`:net.cgrand.parsley/root` by default).

`:space` specifies a production which will be interspersed between every symbol
(terminal or not) *except in a sequence created with `unspaced`.*

`:make-node` specifies a function whose arglist is `[tag children-vec]` which
returns a new node. By default create instances the Node record with keys `tag`
and `content`.

`:make-unexpected` specifies a 1-arg function which converts a string (of
unexpected characters) to a node. By defaut delegates to `:make-node`.

`:make-leaf` specifies a 1-arg function which converts a string (token) to a
node, by default behaves like identity.

0 comments on commit ad0f377

Please sign in to comment.