Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

154 lines (148 sloc) 11.455 kb
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html;charset=UTF-8" http-equiv="Content-type" />
<title>
parslet -
Tricks for common situations
</title>
<meta content="Kaspar Schiess (http://absurd.li)" name="author" />
<link href="imgs/favicon3.ico" rel="shortcut icon" />
<link href="css/site.css" rel="stylesheet" type="text/css" />
<link href="css/pygments.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="everything">
<div class="main_menu">
<img alt="Parslet Logo" src="imgs/parsley_logo.png" />
<ul>
<li><a href="index.html">about</a></li>
<li><a href="get-started.html">get started</a></li>
<li><a href="install.html">install</a></li>
<li><a href="documentation.html">documentation</a></li>
<li><a href="contribute.html">contribute</a></li>
</ul>
</div>
<div class="content">
<h1>Tricks for common situations</h1>
<h2>Matching <span class="caps">EOF</span> (End Of File)</h2>
<p>Ahh Sir, you&#8217;ll be needin what us parsers call <em>epsilon</em>:</p>
<div class="highlight"><pre><code class="ruby">
<span class="n">rule</span><span class="p">(</span><span class="ss">:eof</span><span class="p">)</span> <span class="p">{</span> <span class="n">any</span><span class="o">.</span><span class="n">absent?</span> <span class="p">}</span>
</code></pre>
</div><p>Of course, most of us don&#8217;t use this at all, since any parser has <span class="caps">EOF</span> as
implicit last input.</p>
<h2>Matching Strings Case Insensitive</h2>
<p>Parslet is fully hackable: You can use code to create parsers easily. Here&#8217;s
how I would match a string in case insensitive manner:</p>
<div class="highlight"><pre><code class="ruby">
<span class="k">def</span> <span class="nf">stri</span><span class="p">(</span><span class="n">str</span><span class="p">)</span>
<span class="n">key_chars</span> <span class="o">=</span> <span class="n">str</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sr">//</span><span class="p">)</span>
<span class="n">key_chars</span><span class="o">.</span>
<span class="n">collect!</span> <span class="p">{</span> <span class="o">|</span><span class="n">char</span><span class="o">|</span> <span class="n">match</span><span class="o">[</span><span class="s2">&quot;</span><span class="si">#{</span><span class="n">char</span><span class="o">.</span><span class="n">upcase</span><span class="si">}#{</span><span class="n">char</span><span class="o">.</span><span class="n">downcase</span><span class="si">}</span><span class="s2">&quot;</span><span class="o">]</span> <span class="p">}</span><span class="o">.</span>
<span class="n">reduce</span><span class="p">(</span><span class="ss">:&gt;&gt;</span><span class="p">)</span>
<span class="k">end</span>
<span class="c1"># Constructs a parser using a Parser Expression Grammar </span>
<span class="n">stri</span><span class="p">(</span><span class="s1">&#39;keyword&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">parse</span> <span class="s2">&quot;kEyWoRd&quot;</span> <span class="c1"># =&gt; &quot;kEyWoRd&quot;</span>
</code></pre>
</div><h2>Testing</h2>
<p>Parslet helps you to create parsers that are in turn created out of many small
parsers. It is really turtles all the way down. Imagine you have a complex
parser:</p>
<div class="highlight"><pre><code class="ruby">
<span class="k">class</span> <span class="nc">ComplexParser</span> <span class="o">&lt;</span> <span class="no">Parslet</span><span class="o">::</span><span class="no">Parser</span>
<span class="n">root</span> <span class="ss">:lots_of_stuff</span>
<span class="n">rule</span><span class="p">(</span><span class="ss">:lots_of_stuff</span><span class="p">)</span> <span class="p">{</span> <span class="o">.</span><span class="n">.</span><span class="o">.</span> <span class="p">}</span>
<span class="c1"># and many lines later: </span>
<span class="n">rule</span><span class="p">(</span><span class="ss">:simple_rule</span><span class="p">)</span> <span class="p">{</span> <span class="n">str</span><span class="p">(</span><span class="s1">&#39;a&#39;</span><span class="p">)</span> <span class="p">}</span>
<span class="k">end</span>
</code></pre>
</div><p>Also imagine that the parser (as a whole) fails to consume the &#8216;a&#8217; that
<code>simple_rule</code> is talking about.</p>
<p>This kind of problem can very often be fixed by bisecting it into two possible
problems. Either:</p>
<ol>
<li>the <code>lots_of_stuff</code> rule somehow doesn&#8217;t place <code>simple_rule</code>
in the right context or</li>
<li>the <code>simple_rule</code> simply (hah!) fails to match its input.</li>
</ol>
<p>I find it very useful in this situation to eliminate 2. from our options:</p>
<div class="highlight"><pre><code class="ruby">
<span class="nb">require</span> <span class="s1">&#39;parslet/rig/rspec&#39;</span>
<span class="n">describe</span> <span class="no">ComplexParser</span> <span class="k">do</span>
<span class="n">let</span><span class="p">(</span><span class="ss">:parser</span><span class="p">)</span> <span class="p">{</span> <span class="no">ComplexParser</span><span class="o">.</span><span class="n">new</span> <span class="p">}</span>
<span class="n">context</span> <span class="s2">&quot;simple_rule&quot;</span> <span class="k">do</span>
<span class="n">it</span> <span class="s2">&quot;should consume &#39;a&#39;&quot;</span> <span class="k">do</span>
<span class="n">parser</span><span class="o">.</span><span class="n">simple_rule</span><span class="o">.</span><span class="n">should</span> <span class="n">parse</span><span class="p">(</span><span class="s1">&#39;a&#39;</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre>
</div><p>Parslet parsers have one method per rule. These methods return valid parsers
for a subset of your grammar.</p>
<h2>Error reports</h2>
<p>If your grammar fails and you&#8217;re aching to know why, here&#8217;s a bit of exception
handling code that will help you out:</p>
<div class="highlight"><pre><code class="ruby">
<span class="k">begin</span>
<span class="n">parser</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">some_input</span><span class="p">)</span>
<span class="k">rescue</span> <span class="no">Parslet</span><span class="o">::</span><span class="no">ParseFailed</span> <span class="o">=&gt;</span> <span class="n">error</span>
<span class="nb">puts</span> <span class="n">error</span><span class="p">,</span> <span class="n">parser</span><span class="o">.</span><span class="n">root</span><span class="o">.</span><span class="n">error_tree</span>
<span class="k">end</span>
</code></pre>
</div><p>This should print something akin to:</p>
<div class="highlight"><pre><code class="text">
Parsing 1++2: Don&#39;t know what to do with ++2 at line 1 char 2.
`- Unknown error in SUM / INTEGER
|- Failed to match sequence (INTEGER OPERATOR EXPRESSION) at line 1 char 3.
| `- Unknown error in [0-9]{1, } SPACE?
| `- Expected at least 1 of \\s at line 1 char 2.
| `- Failed to match \\s at line 1 char 3.
`- Unknown error in [0-9]{1, } SPACE?
`- Expected at least 1 of \\s at line 1 char 2.
`- Failed to match \\s at line 1 char 3.
</code></pre>
</div><p>These error reports are probably the fastest way to know exactly where you
went wrong (or where your input is wrong, which is aequivalent).</p>
<p>And since this is such a common idiom, we provide you with a shortcut: to
get the above, just:</p>
<div class="highlight"><pre><code class="ruby">
<span class="nb">require</span> <span class="s1">&#39;parslet/convenience&#39;</span>
<span class="n">parser</span><span class="o">.</span><span class="n">parse_with_debug</span><span class="p">(</span><span class="n">input</span><span class="p">)</span>
</code></pre>
</div><h2>Line numbers from parser output</h2>
<p>A traditional parser would parse and then perform several checking phases,
like for example verifying all type constraints are respected in the input.
During this checking phase, you will most likely want to report screens full
of type errors back to the user (&#8216;cause that&#8217;s what types are for, right?).
Now where did that &#8216;int&#8217; come from?</p>
<p>Parslet gives you slices (Parslet::Slice) of input as part of your tree. These
are essentially strings with line numbers. Here&#8217;s how to print that error
message:</p>
<div class="highlight"><pre><code class="ruby">
<span class="c1"># assume that type == &quot;int&quot;@0 - a piece from your parser output</span>
<span class="n">line</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="n">type</span><span class="o">.</span><span class="n">line_and_column</span>
<span class="nb">puts</span> <span class="s2">&quot;Sorry. Can&#39;t have </span><span class="si">#{</span><span class="n">type</span><span class="si">}</span><span class="s2"> at </span><span class="si">#{</span><span class="n">line</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">col</span><span class="si">}</span><span class="s2">!&quot;</span>
</code></pre>
</div>
</div>
<div class="copyright">
<p><span class="caps">MIT</span> License, 2010, &#169; <a href="http://absurd.li">Kaspar Schiess</a><br/>
Logo by <a href="http://floere.github.com">Florian Hanke</a>, <a href="http://creativecommons.org/licenses/by/1.0/">CC Attribution</a> license</p>
</div>
<script type="text/javascript">
//<![CDATA[
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-16365074-2']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
//]]>
</script>
</div>
</body>
</html>
Jump to Line
Something went wrong with that request. Please try again.