Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

+ Post on Treetop stack traces

  • Loading branch information...
commit 84a444034a4ec8db1e0cfddad26f82c692740c90 1 parent b60206c
@kschiess authored
View
149 _posts/2011-01-11-stack_traces_in_parslet.textile
@@ -0,0 +1,149 @@
+---
+title: Stack Traces in Parslet
+tags:
+ - stacktrace
+ - parslet
+ - parser
+ - error reporting
+ - design
+---
+
+I get a lot of interesting puzzles from users of
+"parslet":http://github.com/kschiess/parslet. You think you're asking me about
+a feature, but generally, if you have to ask, I haven't yet written about it
+in "parslets documentation":http://kschiess.github.com/parslet/. And I usually
+need to, since it is obviously an issue.
+
+This post treats why parslet doesn't do any compilation; or rather: why parslet
+doesn't need to do any compilation.
+
+h2. Why the compilation phase?
+
+One of the reasons other parser frameworks ('generators') generate parser code
+for you is that when the parse fails, the exception will contain a stack trace
+useful to you. Let's look at a small sample in the
+"Treetop":http://treetop.rubyforge.org/ dialect:
+
+<pre class="sh_sourceCode"><code>
+ grammar Simple
+ rule addition
+ integer (plus integer)*
+ end
+
+ rule plus
+ '+' space?
+ end
+
+ rule integer
+ [0-9]+ space?
+ end
+
+ rule space
+ ' '+
+ end
+ end
+</code></pre>
+
+When you apply this to the text '<code>1 ++ 2</code>', Treetop returns
+<code>nil</code> and inspection of the associated failure reason yields this
+error message:
+
+bq. Expected at line 1, column 3 (byte 3) after +
+
+This means that Treetop has been expecting a ' ' (space) at byte 3, but saw a
+'+' (plus). What Treetop could do (but doesn't currently) at this point is
+raise an exception at the exact place the parse fails. I've inserted the
+above error message as
+
+<pre class="sh_ruby"><code>
+fail "Expected at line 1, column 3 (byte 3) after +"
+</code></pre>
+
+at the location the parse fails. This gets me the following stack trace:
+
+<pre class="sh_sourceCode"><code>
+(...)/simple.rb:202:in `block in _nt_space': Expected at line 1, column 3 (byte 3) after + (RuntimeError)
+ from (...)/simple.rb:197:in `loop'
+ from (...)/simple.rb:197:in `_nt_space'
+ from (...)/simple.rb:164:in `_nt_integer'
+ from (...)/simple.rb:41:in `_nt_addition'
+ from (...)/treetop-1.4.9/lib/treetop/runtime/compiled_parser.rb:18:in `parse'
+ from driver.rb:8:in `<main>'
+</code></pre>
+
+Since Treetop compiles your grammar down to ruby, the stack trace contains all
+rules as method names and indicates where in your grammar you might collide
+with the input provided. So *the stack trace is for debugging*. Or rather it
+could be, if <code>#terminal_parse_failure</code> in Treetop is overridden
+to raise an error.
+
+h2. Limitations
+
+The stack trace reflects the state of the call stack when the error happens.
+This stack is linear, and so must be the stack trace. But PEG parsing isn't
+linear: you might encounter an alternative ('<code>a|b</code>') and ultimately
+fail in '<code>b</code>' because neither one matches.
+
+Let's assume that '<code>a</code>' should have matched the input and contains
+an imperfection. We'll be interested in all the reasons '<code>a</code>'
+didn't match the input. _But the stack trace will show only why
+'<code>b</code>' didn't match_, since it was tried last.
+
+A linear stack trace will never capture why a grammar didn't consume a
+particular input. If we could design the ideal kind of error message, we would
+try to *include all alternatives* that have failed.
+
+What else would we wish for? There is one last property of stack traces that
+renders them unsuited for parser error reports: They capture the whole history
+of that process. If you give Treetop a rule like this one:
+
+<pre class="sh_sourceCode"><code>
+ rule rec
+ '.' rec / ''
+ end
+</code></pre>
+
+which matches any number of '.'s (dots), your stack trace will grow with the
+number of dots in the input. Yet all those recursive calls wont add to the
+informational content of the error report. All that it will say is that you
+were expecting a dot, but got something else. Ideally, this report should only
+show that the rule 'rec' failed, and what it encountered instead of a dot. *It
+should not repeat*.
+
+h2. Not Stack Traces, Error Trees
+
+The error reports "parslet":http://kschiess.github.com/parslet generates have
+both properties we've seen above. They preserve all relevant error
+information, not just the last alternative matched, and they never show all
+levels of recursion.
+
+Here's how that might look (taken from
+"Getting Started":http://kschiess.github.com/parslet/getting_started.html):
+
+<pre class="sh_sourceCode"><code>
+ Parsing 1++2: Don't know what to do with ++2 at line 1 char 2.
+ `- Unknown error in SUM / INTEGER
+ |- Failed to match sequence (INTEGER OPERATOR EXPRESSION) at line 1 char 3.
+ | `- Unknown error in [0-9]{1, } SPACE?
+ | `- Expected at least 1 of \\s at line 1 char 2.
+ | `- Failed to match \\s at line 1 char 3.
+ `- Unknown error in [0-9]{1, } SPACE?
+ `- Expected at least 1 of \\s at line 1 char 2.
+ `- Failed to match \\s at line 1 char 3.
+</code></pre>
+
+This illustrates both accounts very well:
+
+* The alternative <code>SUM / INTEGER</code> produces two error reports,
+ from left to right. The ascii tree helps reading these reports.
+* All parts are readable and refer our grammar directly. Line
+ numbers cease to mean much when we're dealing with PEGs.
+
+h2. And beyond
+
+There are many more cool parts to parslet; I suggest you check it out. The
+upcoming 1.1 release will improve execution speed vastly; of course it
+also features the cool error reports you've just been shown.
+
+It also features tree transformations to be able to act on your grammar, but
+that is a topic for another post.
View
12 samples/2011-01-11-treetop/driver.rb
@@ -0,0 +1,12 @@
+
+require 'treetop'
+
+$:.unshift '.'
+require 'simple'
+
+parser = SimpleParser.new
+result = parser.parse('1++2')
+
+if !result
+ puts parser.failure_reason
+end
View
228 samples/2011-01-11-treetop/simple.rb
@@ -0,0 +1,228 @@
+# Autogenerated from a Treetop grammar. Edits may be lost.
+
+
+
+module Simple
+ include Treetop::Runtime
+
+ def root
+ @root ||= :addition
+ end
+
+ module Addition0
+ def plus
+ elements[0]
+ end
+
+ def integer
+ elements[1]
+ end
+ end
+
+ module Addition1
+ def integer
+ elements[0]
+ end
+
+ end
+
+ def _nt_addition
+ start_index = index
+ if node_cache[:addition].has_key?(index)
+ cached = node_cache[:addition][index]
+ if cached
+ cached = SyntaxNode.new(input, index...(index + 1)) if cached == true
+ @index = cached.interval.end
+ end
+ return cached
+ end
+
+ i0, s0 = index, []
+ r1 = _nt_integer
+ s0 << r1
+ if r1
+ s2, i2 = [], index
+ loop do
+ i3, s3 = index, []
+ r4 = _nt_plus
+ s3 << r4
+ if r4
+ r5 = _nt_integer
+ s3 << r5
+ end
+ if s3.last
+ r3 = instantiate_node(SyntaxNode,input, i3...index, s3)
+ r3.extend(Addition0)
+ else
+ @index = i3
+ r3 = nil
+ end
+ if r3
+ s2 << r3
+ else
+ break
+ end
+ end
+ r2 = instantiate_node(SyntaxNode,input, i2...index, s2)
+ s0 << r2
+ end
+ if s0.last
+ r0 = instantiate_node(SyntaxNode,input, i0...index, s0)
+ r0.extend(Addition1)
+ else
+ @index = i0
+ r0 = nil
+ end
+
+ node_cache[:addition][start_index] = r0
+
+ r0
+ end
+
+ module Plus0
+ end
+
+ def _nt_plus
+ start_index = index
+ if node_cache[:plus].has_key?(index)
+ cached = node_cache[:plus][index]
+ if cached
+ cached = SyntaxNode.new(input, index...(index + 1)) if cached == true
+ @index = cached.interval.end
+ end
+ return cached
+ end
+
+ i0, s0 = index, []
+ if has_terminal?('+', false, index)
+ r1 = instantiate_node(SyntaxNode,input, index...(index + 1))
+ @index += 1
+ else
+ terminal_parse_failure('+')
+ r1 = nil
+ end
+ s0 << r1
+ if r1
+ r3 = _nt_space
+ if r3
+ r2 = r3
+ else
+ r2 = instantiate_node(SyntaxNode,input, index...index)
+ end
+ s0 << r2
+ end
+ if s0.last
+ r0 = instantiate_node(SyntaxNode,input, i0...index, s0)
+ r0.extend(Plus0)
+ else
+ @index = i0
+ r0 = nil
+ end
+
+ node_cache[:plus][start_index] = r0
+
+ r0
+ end
+
+ module Integer0
+ end
+
+ def _nt_integer
+ start_index = index
+ if node_cache[:integer].has_key?(index)
+ cached = node_cache[:integer][index]
+ if cached
+ cached = SyntaxNode.new(input, index...(index + 1)) if cached == true
+ @index = cached.interval.end
+ end
+ return cached
+ end
+
+ i0, s0 = index, []
+ s1, i1 = [], index
+ loop do
+ if has_terminal?('\G[0-9]', true, index)
+ r2 = true
+ @index += 1
+ else
+ r2 = nil
+ end
+ if r2
+ s1 << r2
+ else
+ break
+ end
+ end
+ if s1.empty?
+ @index = i1
+ r1 = nil
+ else
+ r1 = instantiate_node(SyntaxNode,input, i1...index, s1)
+ end
+ s0 << r1
+ if r1
+ r4 = _nt_space
+ if r4
+ r3 = r4
+ else
+ r3 = instantiate_node(SyntaxNode,input, index...index)
+ end
+ s0 << r3
+ end
+ if s0.last
+ r0 = instantiate_node(SyntaxNode,input, i0...index, s0)
+ r0.extend(Integer0)
+ else
+ @index = i0
+ r0 = nil
+ end
+
+ node_cache[:integer][start_index] = r0
+
+ r0
+ end
+
+ def _nt_space
+ start_index = index
+ if node_cache[:space].has_key?(index)
+ cached = node_cache[:space][index]
+ if cached
+ cached = SyntaxNode.new(input, index...(index + 1)) if cached == true
+ @index = cached.interval.end
+ end
+ return cached
+ end
+
+ s0, i0 = [], index
+ loop do
+ if has_terminal?(' ', false, index)
+ r1 = instantiate_node(SyntaxNode,input, index...(index + 1))
+ @index += 1
+ else
+ fail "Expected at line 1, column 3 (byte 3) after +"
+ terminal_parse_failure(' ')
+ r1 = nil
+ end
+ if r1
+ s0 << r1
+ else
+ break
+ end
+ end
+ if s0.empty?
+ @index = i0
+ r0 = nil
+ else
+ r0 = instantiate_node(SyntaxNode,input, i0...index, s0)
+ end
+
+ node_cache[:space][start_index] = r0
+
+ r0
+ end
+
+end
+
+class SimpleParser < Treetop::Runtime::CompiledParser
+ include Simple
+end
View
18 samples/2011-01-11-treetop/simple.tt
@@ -0,0 +1,18 @@
+
+grammar Simple
+ rule addition
+ integer (plus integer)*
+ end
+
+ rule plus
+ '+' space?
+ end
+
+ rule integer
+ [0-9]+ space?
+ end
+
+ rule space
+ ' '+
+ end
+end
Please sign in to comment.
Something went wrong with that request. Please try again.