Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Lazy unconsumed input #68

Closed
wants to merge 3 commits into from

3 participants

@jmettraux

see spec/parslet/eager_unconsumed_error_spec.rb for the motivation.

@jmettraux

last commit gets rid of Source#waterline and looks instead at the reparsed cause position to determine if raising UnconsumedInput makes sense. Much closer to the 1.4.0 behaviour, but with the "deepest" flexibility kicking in at will.

@kschiess
Owner

We've agreed on irc that I would implement a different strategy so that we can compare results. This is currently pending.

@kschiess kschiess referenced this pull request from a commit
@kschiess + Rough postfix handling implementation
Relates to issue #68 where jmettraux proposes a new way of handling
input that is not consumed because the parser only matches a prefix.
This commit explores a different strategy where extra input after a
successful parse is treated as parse error in the parse branch it
happens. This generates nice error messages so far.

The implementation is very rough and will probably become neater as time
goes on - but the idea seems to be valid.
4037846
@kschiess kschiess referenced this pull request from a commit
@kschiess + Adapted the regression spec from #68
This is more or less the regression spec from #68. It proves that the
last change might be ok.
bf52472
@kschiess
Owner

John, can you please take a look at this (postfix_handling branch) and verify? I've ran your spec against this implementation successfully, but am guessing that you have other ways of verifying it. If it works out ok, I'll clean up the implementation and merge.

@jmettraux

Hello Kaspar,

I tested it on our application. It works great. Many thanks!

@floere
Collaborator

@jmettraux The Technology Astronauts (ie. me and @kschiess) will be looking forward to your incredibly generous donation to support further development ;)

@jmettraux

Are code donations OK?

@floere
Collaborator

We prefer cold hard cash, but I guess code donations work too ;) Especially if they remove code while still greenlighting the tests.

@jmettraux

You can very easily cut down on the size of Parslet code files by removing trailing white spaces ;-) Funny suggestion for a Welsch nee?

@floere
Collaborator

Heh, thanks, but no. Kaspar is not there yet, I think. He might or might not arrive there.
I think it's very courteous of people wanting to help the poor Ruby lexer by adding parentheses, so I can sort of understand these people winks to @jmettraux.

@kschiess
Owner

guys, take this discussion elsewhere.

@kschiess
Owner

The recent commits on master solve this issue. Again, this has improved parslet a bunch, thanks jmettraux!!

@kschiess kschiess closed this
@jmettraux

Thanks Kaspar, integrated that here, works nicely so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 12, 2012
  1. @jmettraux

    + add [failing] spec

    jmettraux authored
  2. @jmettraux
  3. @jmettraux

    . switch from waterline to deepest.pos

    jmettraux authored
    when determining if a UnconsumedInput should be raised or not
This page is out of date. Refresh to see the latest.
View
57 lib/parslet/atoms/base.rb
@@ -21,47 +21,54 @@ class Parslet::Atoms::Base
# tree
#
def parse(io, options={})
- source = io.respond_to?(:line_and_column) ?
- io :
+ source = io.respond_to?(:line_and_column) ?
+ io :
Parslet::Source.new(io)
- # Try to cheat. Assuming that we'll be able to parse the input, don't
- # run error reporting code.
+ # Try to cheat. Assuming that we'll be able to parse the input, don't
+ # run error reporting code.
success, value = setup_and_apply(source, nil)
-
- # If we didn't succeed the parse, raise an exception for the user.
+
+ # If we didn't succeed the parse, raise an exception for the user.
# Stack trace will be off, but the error tree should explain the reason
# it failed.
- unless success
- # Cheating has not paid off. Now pay the cost: Rerun the parse,
- # gathering error information in the process.
- reporter = options[:reporter] || Parslet::ErrorReporter::Tree.new
- success, value = setup_and_apply(source, reporter)
-
- fail "Assertion failed: success was true when parsing with reporter" \
- if success
-
- # Value is a Parslet::Cause, which can be turned into an exception:
- value.raise
-
- fail "NEVER REACHED"
- end
-
+ reparse(source, options).raise unless success
+
# assert: success is true
-
+
# If we haven't consumed the input, then the pattern doesn't match. Try
# to provide a good error message (even asking down below)
if !options[:prefix] && source.chars_left > 0
+
old_pos = source.pos
+ cause = reparse(source, options)
+
+ cause.raise if cause && cause.pos > old_pos
+
Parslet::Cause.format(
- source, old_pos,
+ source, old_pos,
"Don't know what to do with #{source.consume(10).to_s.inspect}").
raise(Parslet::UnconsumedInput)
end
-
+
return flatten(value)
end
-
+
+ # The initial (quick) parse was not a success (so we need a reparse) with
+ # the error reporter on.
+ # Or, the initial parse was successful but we have a challenged
+ # unconsumed input... (the parser was able to consume everything, almost).
+ #
+ def reparse(source, options)
+
+ source.pos = 0
+
+ reporter = options[:reporter] || Parslet::ErrorReporter::Tree.new
+ success, value = setup_and_apply(source, reporter)
+
+ value.respond_to?(:raise) ? value : reporter.cause
+ end
+
# Creates a context for parsing and applies the current atom to the input.
# Returns the parse result.
#
View
8 lib/parslet/error_reporter/deepest.rb
@@ -69,7 +69,11 @@ def deepest(cause)
return deepest_cause
end
-
+
+ # Let's respond to #cause (reparse uses it).
+ #
+ alias cause deepest_cause
+
private
# Returns the leaf from a given error tree with the biggest rank.
#
@@ -92,4 +96,4 @@ def deepest_child(cause, rank=0)
end
end
end
-end
+end
View
12 lib/parslet/error_reporter/tree.rb
@@ -22,6 +22,12 @@ module ErrorReporter
# tree.
#
class Tree
+ attr_reader :cause
+
+ def initialize
+ @cause = nil
+ end
+
# Produces an error cause that combines the message at the current level
# with the errors that happened at a level below (children).
#
@@ -34,7 +40,7 @@ class Tree
#
def err(atom, source, message, children=nil)
position = source.pos
- Cause.format(source, position, message, children)
+ @cause = Cause.format(source, position, message, children)
end
# Produces an error cause that combines the message at the current level
@@ -50,8 +56,8 @@ def err(atom, source, message, children=nil)
#
def err_at(atom, source, message, pos, children=nil)
position = pos
- Cause.format(source, position, message, children)
+ @cause = Cause.format(source, position, message, children)
end
end
end
-end
+end
View
23 lib/parslet/source.rb
@@ -9,14 +9,14 @@ module Parslet
class Source
def initialize(str)
raise ArgumentError unless str.respond_to?(:to_str)
-
+
@pos = 0
@str = str
-
+
@line_cache = LineCache.new
@line_cache.scan_for_line_endings(0, @str)
end
-
+
# Checks if the given pattern matches at the current input position.
#
# @param pattern [Regexp, String] pattern to check for
@@ -26,27 +26,28 @@ def matches?(pattern)
@str.index(pattern, @pos) == @pos
end
alias match matches?
-
+
# Consumes n characters from the input, returning them as a slice of the
- # input.
+ # input.
#
def consume(n)
slice_str = @str.slice(@pos, n)
slice = Parslet::Slice.new(
- slice_str,
- pos,
+ slice_str,
+ @pos,
@line_cache)
-
+
@pos += slice_str.size
+
return slice
end
-
+
# Returns how many chars remain in the input.
#
def chars_left
@str.size - @pos
end
-
+
# Position of the parse as a character offset into the original string.
# @note: Encodings...
attr_accessor :pos
@@ -59,4 +60,4 @@ def line_and_column(position=nil)
@line_cache.line_and_column(position || self.pos)
end
end
-end
+end
View
2  spec/acceptance/regression_spec.rb
@@ -209,4 +209,4 @@ class TwoCharLanguage < Parslet::Parser
}.to raise_error(Parslet::UnconsumedInput)
end
end
-end
+end
View
12 spec/parslet/atoms_spec.rb
@@ -257,19 +257,19 @@ def src(str); Parslet::Source.new str; end
let (:parslet) { (str('a')).repeat(1) }
attr_reader :exception
before(:each) do
- begin
+ begin
parslet.parse('a.')
rescue => @exception
end
end
it "raises Parslet::UnconsumedInput" do
- exception.should be_kind_of(Parslet::UnconsumedInput)
- end
+ exception.should be_kind_of(Parslet::ParseFailed)
+ end
it "has the correct error message" do
- exception.message.should == \
+ exception.message.should ==
"Don't know what to do with \".\" at line 1 char 2."
- end
+ end
end
end
@@ -425,4 +425,4 @@ def call(val)
end
end
end
-end
+end
View
118 spec/parslet/eager_unconsumed_error_spec.rb
@@ -0,0 +1,118 @@
+
+require 'spec_helper'
+
+
+describe 'Parslet and unconsumed input' do
+
+ class MyParser < Parslet::Parser
+
+ rule(:nl) { match('[\s]').repeat(1) }
+ rule(:nl?) { nl.maybe }
+ rule(:sp) { str(' ').repeat(1) }
+ rule(:sp?) { str(' ').repeat(0) }
+ rule(:line) { sp >> str('line') }
+ rule(:body) { ((line | block) >> nl).repeat(0) }
+ rule(:block) { sp? >> str('begin') >> sp >> match('[a-z]') >> nl >>
+ body >> sp? >> str('end') }
+ rule(:blocks) { nl? >> block >> (nl >> block).repeat(0) >> nl? }
+
+ root(:blocks)
+ end
+
+ it 'parses a block' do
+
+ MyParser.new.parse(%q{
+ begin a
+ end
+ })
+ end
+
+ it 'parses nested blocks' do
+
+ MyParser.new.parse(%q{
+ begin a
+ begin b
+ end
+ end
+ })
+ end
+
+ it 'parses successive blocks' do
+
+ MyParser.new.parse(%q{
+ begin a
+ end
+ begin b
+ end
+ })
+ end
+
+ it 'fails gracefully on a missing end' do
+
+ lambda {
+ MyParser.new.parse(%q{
+ begin a
+ begin b
+ end
+ })
+ }.should raise_error(
+ Parslet::ParseFailed,
+ 'Failed to match sequence (NL? BLOCK (NL BLOCK){0, } NL?) at line 2 char 9.'
+ )
+ end
+
+ it 'fails gracefully on a missing end (2)' do
+
+ lambda {
+ MyParser.new.parse(%q{
+ begin a
+ end
+ begin b
+ begin c
+ end
+ })
+ }.should raise_error(
+ Parslet::UnconsumedInput,
+ "Don't know what to do with \"begin b\\n \" at line 4 char 9."
+ )
+ end
+
+ it 'fails gracefully on a missing end (3)' do
+
+ lambda {
+ MyParser.new.parse(%q{
+ begin a
+ end
+ begin b
+ begin c
+ li
+ end
+ end
+ })
+ }.should raise_error(
+ Parslet::UnconsumedInput,
+ "Don't know what to do with \"begin b\\n \" at line 4 char 9."
+ )
+ end
+
+ it 'fails gracefully on a missing end (4 deepest reporter)' do
+
+ lambda {
+ MyParser.new.parse(
+ %q{
+ begin a
+ end
+ begin b
+ begin c
+ li
+ end
+ end
+ },
+ :reporter => Parslet::ErrorReporter::Deepest.new)
+ }.should raise_error(
+ Parslet::ParseFailed,
+ 'Expected "end", but got "li\n" at line 6 char 15.'
+ )
+ end
+end
+
Something went wrong with that request. Please try again.