Permalink
Browse files

+ Enabling the rest of the docu

  • Loading branch information...
1 parent b9ac13e commit d7b0ef7ad33adfed97f60682a32932139118ea29 @kschiess committed May 25, 2012
@@ -1,6 +1,5 @@
---
title: Overview
-layout: default
---
Parslet is a library with a clear philosophy: It makes parser writing easy and
@@ -1,46 +1,45 @@
---
title: Parser construction
-layout: default
---
A parser is nothing more than a class that derives from
<code>Parslet::Parser</code>. The simplest parser that one could write would
look like this:
-{% highlight ruby %}
-class SimpleParser < Parslet::Parser
- rule(:a_rule) { str('simple_parser') }
- root(:a_rule)
-end
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ class SimpleParser < Parslet::Parser
+ rule(:a_rule) { str('simple_parser') }
+ root(:a_rule)
+ end
+</code></pre>
The language recognized by this parser is simply the string "simple_parser".
Parser rules do look a lot like methods and are defined by
-{% highlight ruby %}
-rule(name) { definition_block }
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ rule(name) { definition_block }
+</code></pre>
Behind the scenes, this really defines a method that returns whatever you
return from it.
Every parser has a root. This designates where parsing should start. It is like
an entry point to your parser. With a root defined like this:
-{% highlight ruby %}
-root(:my_root)
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ root(:my_root)
+</code></pre>
you create a <code>#parse</code> method in your parser that will start parsing
by calling the <code>#my_root</code> method. You'll also have a <code>#root</code>
(instance) method that is an alias of the root method. The following things are
really one and the same:
-{% highlight ruby %}
-SimpleParser.new.parse(string)
-SimpleParser.new.root.parse(string)
-SimpleParser.new.a_rule.parse(string)
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ SimpleParser.new.parse(string)
+ SimpleParser.new.root.parse(string)
+ SimpleParser.new.a_rule.parse(string)
+</code></pre>
Knowing these things gives you a lot of flexibility; I'll explain why at the
end of the chapter. For now, just let me point out that because all of this is
@@ -54,17 +53,17 @@ A parser is constructed from parser atoms (or parslets, hence the name). The
atoms are what appear inside your rules (and maybe elsewhere). We've already
encountered an atom, the string atom:
-{% highlight ruby %}
-str('simple_parser')
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ str('simple_parser')
+</code></pre>
This returns a <code>Parslet::Atoms::Str</code> instance. These parser atoms
all derive from <code>Parslet::Atoms::Base</code> and have essentially just
one method you can call: <code>#parse</code>. So this works:
-{% highlight ruby %}
-str('foobar').parse('foobar') # => 'foobar'@0
-{% endhighlight %}
+<pre class="sh_ruby"><code title="parser atoms">
+ str('foobar').parse('foobar') # => "foobar"@0
+</code></pre>
The atoms are small parsers that can recognize languages and throw errors, just
like real <code>Parslet::Parser</code> subclasses.
@@ -74,9 +73,9 @@ h3. Matching character ranges
The second parser atom you will have to know about allows you to match
character ranges:
-{% highlight ruby %}
-match('[0-9a-f]')
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ match('[0-9a-f]')
+</code></pre>
The above atom would match the numbers zero through nine and the letters 'a'
to 'f' - yeah, you guessed right - hexadecimal numbers for example. The inside
@@ -85,28 +84,28 @@ a single character of input. Because we'll be using ranges so much with
<code>#match</code> and because typing ('[]') is tiresome, here's another way
to write the above <code>#match</code> atom:
-{% highlight ruby %}
-match['0-9a-f']
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ match['0-9a-f']
+</code></pre>
Character matches are instances of <code>Parslet::Atoms::Re</code>. Here are
some more examples of character ranges:
-{% highlight ruby %}
-match['[:alnum:]'] # letters and numbers
-match['\\n'] # newlines
-match('\\w') # word characters
-match('.') # any character
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ match['[:alnum:]'] # letters and numbers
+ match['\\n'] # newlines
+ match('\\w') # word characters
+ match('.') # any character
+</code></pre>
h3. The wild wild <code>#any</code>
The last example above corresponds to the regular expression <code>/./</code> that matches
any one character. There is a special atom for that:
-{% highlight ruby %}
-any
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ any
+</code></pre>
h2. Composition of Atoms
@@ -117,9 +116,9 @@ h3. Simple Sequences
Match 'foo' and then 'bar':
-{% highlight ruby %}
-str('foo') >> str('bar') # same as str('foobar')
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ str('foo') >> str('bar') # same as str('foobar')
+</code></pre>
Sequences correspond to instances of the class
<code>Parslet::Atoms::Sequence</code>.
@@ -128,36 +127,36 @@ h3. Repetition and its Special Cases
To model atoms that can be repeated, you should use <code>#repeat</code>:
-{% highlight ruby %}
-str('foo').repeat
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ str('foo').repeat
+</code></pre>
This will allow foo to repeat any number of times, including zero. If you
look at the signature for <code>#repeat</code> in <code>Parslet::Atoms::Base</code>,
you'll see that it has really two arguments: _min_ and _max_. So the following
code all makes sense:
-{% highlight ruby %}
-str('foo').repeat(1) # match 'foo' at least once
-str('foo').repeat(1,3) # at least once and at most 3 times
-str('foo').repeat(0, nil) # the default: same as str('foo').repeat
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ str('foo').repeat(1) # match 'foo' at least once
+ str('foo').repeat(1,3) # at least once and at most 3 times
+ str('foo').repeat(0, nil) # the default: same as str('foo').repeat
+</code></pre>
Repetition has a special case that is used frequently: Matching something
once or not at all can be achieved by <code>repeat(0,1)</code>, but also
through the prettier:
-{% highlight ruby %}
-str('foo').maybe # same as str('foo').repeat(0,1)
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ str('foo').maybe # same as str('foo').repeat(0,1)
+</code></pre>
These all map to <code>Parslet::Atoms::Repetition</code>. Please note this
little twist to <code>#maybe</code>:
-{% highlight ruby %}
-str('foo').maybe.as(:f).parse('') # => {:f=>nil}
-str('foo').repeat(0,1).as(:f).parse('') # => {:f=>[]}
-{% endhighlight %}
+<pre class="sh_ruby"><code title="maybes twist">
+ str('foo').maybe.as(:f).parse('') # => {:f=>nil}
+ str('foo').repeat(0,1).as(:f).parse('') # => {:f=>[]}
+</code></pre>
The 'nil'-value of <code>#maybe</code> is nil. This is catering to the
intuition that <code>foo.maybe</code> either gives me <code>foo</code> or
@@ -169,9 +168,9 @@ The most important composition method for grammars is alternation. Without
it, your grammars would only vary in the amount of things matched, but not
in content. Here's how this looks:
-{% highlight ruby %}
-str('foo') | str('bar') # matches 'foo' OR 'bar'
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ str('foo') | str('bar') # matches 'foo' OR 'bar'
+</code></pre>
This reads naturally as "'foo' or 'bar'".
@@ -181,10 +180,10 @@ The operators we have chosen for parslet atom combination have the operator
precedence that you would expect. No parenthesis are needed to express
alternation of sequences:
-{% highlight ruby %}
-str('s') >> str('equence') |
- str('se') >> str('quence')
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ str('s') >> str('equence') |
+ str('se') >> str('quence')
+</code></pre>
h3. And more
@@ -193,29 +192,29 @@ However, there seems to be a different kind of aesthetic about them; they
are pure Ruby and integrate well with the rest of your environment. Have a
look at this:
-{% highlight ruby %}
-# Also consumes the space after important things like ';' or ':'. Call this
-# giving the character you want to match as argument:
-#
-# arg >> (spaced(',') >> arg).repeat
-#
-def spaced(character)
- str(character) >> match["\s"]
-end
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ # Also consumes the space after important things like ';' or ':'. Call this
+ # giving the character you want to match as argument:
+ #
+ # arg >> (spaced(',') >> arg).repeat
+ #
+ def spaced(character)
+ str(character) >> match["\s"]
+ end
+</code></pre>
or even this:
-{% highlight ruby %}
-# Turns any atom into an expression that matches a left parenthesis, the
-# atom and then a right parenthesis.
-#
-# bracketed(sum)
-#
-def bracketed(atom)
- spaced('(') >> atom >> spaced(')')
-end
-{% endhighlight %}
+<pre class="sh_ruby"><code>
+ # Turns any atom into an expression that matches a left parenthesis, the
+ # atom and then a right parenthesis.
+ #
+ # bracketed(sum)
+ #
+ def bracketed(atom)
+ spaced('(') >> atom >> spaced(')')
+ end
+</code></pre>
You might say that because parslet is just plain old Ruby objects itself (PORO
(tm)), it allows for very tight code. Module inclusion, class inheritance, ...
@@ -228,10 +227,10 @@ Parslet will not generate a parser for you and neither will it generate your
abstract syntax tree for you. The method <code>#as(name)</code> allows you
to specify exactly how you want your tree to look like:
-{% highlight ruby %}
-str('foo').parse('foo') # => 'foo'@0
-str('foo').as(:bar).parse('foo') # => {:bar => 'foo'@0}
-{% endhighlight %}
+<pre class="sh_ruby"><code title="using as">
+ str('foo').parse('foo') # => "foo"@0
+ str('foo').as(:bar).parse('foo') # => {:bar=>"foo"@0}
+</code></pre>
So you think: <code>#as(name)</code> allows me to create a hash, big deal.
That's not all. You'll notice that annotating everything that you want to keep
@@ -241,22 +240,22 @@ has a set of clever rules that merge the annotated output from your atoms into
a tree. Here are some more examples, with the atom on the left and the resulting
tree (assuming a successful parse) on the right:
-{% highlight ruby %}
-# Normal strings just map to strings
-str('a').repeat "aaa"@0
+<pre class="sh_ruby"><code>
+ # Normal strings just map to strings
+ str('a').repeat "aaa"@0
-# Arrays capture repetition of non-strings
-str('a').repeat.as(:b) {:b=>"aaa"@0}
-str('a').as(:b).repeat [{:b=>"a"@0}, {:b=>"a"@1}, {:b=>"a"@2}]
+ # Arrays capture repetition of non-strings
+ str('a').repeat.as(:b) {:b=>"aaa"@0}
+ str('a').as(:b).repeat [{:b=>"a"@0}, {:b=>"a"@1}, {:b=>"a"@2}]
-# Subtrees get merged - unlabeled strings discarded
-str('a').as(:a) >> str('b').as(:b) {:a=>"a"@0, :b=>"b"@1}
-str('a') >> str('b').as(:b) >> str('c') {:b=>"b"@1}
+ # Subtrees get merged - unlabeled strings discarded
+ str('a').as(:a) >> str('b').as(:b) {:a=>"a"@0, :b=>"b"@1}
+ str('a') >> str('b').as(:b) >> str('c') {:b=>"b"@1}
-# #maybe will return nil, not the empty array
-str('a').maybe.as(:a) {:a=>"a"@0}
-str('a').maybe.as(:a) {:a=>nil}
-{% endhighlight %}
+ # #maybe will return nil, not the empty array
+ str('a').maybe.as(:a) {:a=>"a"@0}
+ str('a').maybe.as(:a) {:a=>nil}
+</code></pre>
h2. And more
@@ -1,6 +1,5 @@
---
title: Transformation
-layout: default
---
Parslet parsers output deep nested hashes. Those are nice for printing, but
@@ -1,6 +1,5 @@
---
title: Tricks for common situations
-layout: default
---
h2. Matching EOF (End Of File)

0 comments on commit d7b0ef7

Please sign in to comment.