Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

168 lines (123 sloc) 7.512 kB
<?php include('common-header.inc'); ?>
<title>ParseKit Objective-C Parser Generation via Grammars</title>
<?php include('common-nav.inc'); ?>
<h1>ParseKit Documentation</h1>
<div id="content">
<h2>Objective-C Parser Generation via Grammars</h2>
<ul>
<li><a href="#basic-syntax">Basic Grammar Syntax</a></li>
<li><a href="#production-syntax">Individual Grammar Production Syntax</a></li>
<li><a href="#grouping">Grouping</a></li>
<li><a href="#instantiating">Instantiating Grammar Parsers in Objective-C</a></li>
</ul>
<a name="basic-syntax"></a>
<h3>Basic Grammar Syntax</h3>
<p>ParseKit allows users to build parsers for custom languages from a declarative, BNF-style grammar without writing any code (well, ok.. a <em>single line</em> of code). Under the hood, grammar support is implemented using the ParseKit Objective-C API, so the grammar syntax closely mirrors the features of the Objective-C API.</p>
<p>The grammar below describes a simple toy language called <b>Cold Beer</b> and will serve as a quick introduction to the ParseKit grammar syntax. The rules of the <b>Cold Beer</b> language are as follows. The language consists of a sequence of one or more sentences beginning with the word <tt>&laquo;cold&raquo;</tt> followed by a repetition of either <tt>&laquo;cold&raquo;</tt> or <tt>&laquo;freezing&raquo;</tt> followed by <tt>&laquo;beer&raquo;</tt> and terminated by the symbol <tt>&laquo;.&raquo;</tt>.</p>
<p>For example, each of the following lines are valid instances of the <b>Cold Beer</b> language (as is the example as a whole):</p>
<div class="code">
<pre>
cold cold cold freezing cold freezing cold beer.
cold cold freezing cold beer.
cold freezing beer.
cold beer.</pre>
</div>
<p>The following lines are <b><em>not</em></b> valid <b>Cold Beer</b> statements:</p>
<div class="code">
<pre>
freezing cold beer.
cold freezing beer
beer.</pre>
</div>
<p>Here is a complete ParseKit grammar for the <b>Cold Beer</b> language.</p>
<div class="code">
<pre>
@start = sentence+;
sentence = adjectives 'beer' '.';
adjectives = cold adjective*;
adjective = cold | freezing;
cold = 'cold';
freezing = 'freezing';</pre>
</div>
<p>As shown above, the ParseKit grammar syntax consists of individual language production declarations separated by <tt>&laquo;;&raquo;</tt>. Whitespace is ignored, so the productions can be formatted liberally with whitespace as the programmer prefers. Comments are also allowed and resemble the comment style of Objective-C. So a commented <b>Cold Beer</b> grammar may appear as:</p>
<div class="code">
<pre>
/*
A Grammar for the Cold Beer Language
by Todd Ditchendorf
*/
@start = sentence+; // outermost production
sentence = adjectives 'beer' '.';
adjectives = cold adjective*;
adjective = cold | 'freezing';
cold = 'cold';
freezing = 'freezing';</pre>
</div>
<a name="production-syntax"></a>
<h3>Individual Grammar Production Syntax</h3>
<p>Every ParseKit grammar <b>must</b> contain <b>one and only one</b> production named <tt>@start</tt>. This will be the <em>highest-level</em> or <em>outermost</em> production rule in the language. For <b>Cold Beer</b>, the outermost production is:</p>
<div class="code">
<pre>
@start = sentence+;</pre>
</div>
<p>Which states that the outermost production of this language consists of a <em>sequence</em> of one or more (<tt>&laquo;+&raquo;</tt>) instances of the <tt>sentence</tt> production.</p>
<div class="code">
<pre>
sentence = adjectives 'beer' '.';</pre>
</div>
<p>The <tt>sentence</tt> production states that sentences are a <em>sequence</em> of the <tt>adjective</tt> production followed by the literal strings <tt>beer</tt> and <tt>.</tt></p>
<div class="code">
<pre>
adjectives = cold adjective*;</pre>
</div>
<p>In turn, <tt>adjectives</tt> is a <em>sequence</em> of a single instance of the <tt>cold</tt> production followed by a <em>repetition</em> (<tt>&laquo;*&raquo;</tt> read as 'zero or more') of the <tt>adjective</tt> production.</p>
<div class="code">
<pre>
adjective = cold | freezing;
cold = 'cold';
freezing = 'freezing';</pre>
</div>
<p>The <tt>adjective</tt> production is an <em>alternation</em> of either an instance of the <tt>cold</tt> or the <tt>freezing</tt> production. The <tt>cold</tt> production is the literal string <tt>cold</tt> and <tt>freezing</tt> the literal string <tt>freezing</tt>.</p>
<a name="grouping"></a>
<h3>Grouping</h3>
<p>A language may be expressed in many different, yet equivalent grammars. Productions may be referenced in any order (even before they are defined) and grouped using parentheses (<tt>&laquo;(&raquo;</tt> and <tt>&laquo;)&raquo;</tt>).</p>
<p>For example, the <b>Cold Beer</b> language could also be represented by the following grammar:</p>
<div class="code">
<pre>
@start = ('cold' ('cold' | 'freezing')* 'beer' '.')+;</pre>
</div>
<a name="instantiating"></a>
<h3>Instantiating Grammar Parsers in Objective-C</h3>
<p>Create an Objective-C <tt>PKParser</tt> object by providing the grammar as an <tt>NSString</tt> and an <tt>assembler</tt> object (in this example, <tt>self</tt>).</p>
<div class="code">
<pre>
NSString *g = ... // fetch your grammar from a file on disk
PKParser *<b>parser</b> = nil;
parser = [[PKParserFactory factory] parserFromGrammar:g assembler:<b>self</b>];
NSString *s = @"cold freezing cold beer.";
<b>[parser parse:s]</b>;</pre>
</div>
<p>The provided <tt>assembler</tt> object will receive callbacks whenever one of the language grammar productions is matched -- if the callback method is implemented in the assembler object.</p>
<p>For example, an assembler for the <b>Cold Beer</b> language grammar above will receive the following callbacks if implemented:</p>
<div class="code">
<pre>
- (void)didMatchSentence:(PKAssembly *)a;
- (void)didMatchAdjectives:(PKAssembly *)a;
- (void)didMatchAdjective:(PKAssembly *)a;
- (void)didMatchCold:(PKAssembly *)a;
- (void)didMatchFreezing:(PKAssembly *)a;</pre>
</div>
<p>The <tt>PKAssembly</tt> argument will have the most-recently matched tokens on the top of its stack.</p>
<p>To prevent leaks when releasing a <tt>PKParser</tt> created via <tt>PKParserFactory</tt>, one final step is required:</p>
<div class="code">
<pre>
PKParser *<b>parser</b> = nil;
parser = [[PKParserFactory factory] parserFromGrammar:g assembler:self];
... // do parsing
// when you are done with the parser, this function must be called before releasing
<b>PKReleaseSubparserTree(parser);</b>
[p release]</pre>
</div>
<p>Most non-trivial language grammars define circular relationships in their production rules. The call to <tt>PKReleaseSubparserTree()</tt> is required to prevent Objective-C retain cycle leaks where the <tt>PKParser</tt> objects representing different rules in your grammar have strong references to one another.</p>
</div>
<?php include('common-footer.inc'); ?>
Jump to Line
Something went wrong with that request. Please try again.