Skip to content
Browse files

Added support for occurrence ranges

  • Loading branch information...
1 parent 4d36d24 commit 3aaf1dfed606ec6dd08914f4ecec6bcf9d152dd3 @cjheath cjheath committed Apr 16, 2010
View
58 README.md
@@ -112,7 +112,16 @@ Any item in a rule may be followed by a '+' or a '*' character, signifying one-o
end
end
-The 'a'* will always eat up any 'a's that follow, and the subsequent 'a' will find none there, so the whole rule will fail. You might need to use lookahead to avoid matching too much.
+The 'a'* will always eat up any 'a's that follow, and the subsequent 'a' will find none there, so the whole rule will fail. You might need to use lookahead to avoid matching too much. Alternatively, you can use an occurrence range:
+
+ # toogreedy.treetop
+ grammar TooGreedy
+ rule two_to_four_as
+ 'a' 2..4
+ end
+ end
+
+In an occurrence range, you may omit either the minimum count or the maximum count, so that "0.. " works like "*" and "1.. " works like '+'.
Negative Lookahead
------------------
@@ -141,24 +150,33 @@ Positive lookahead
Sometimes you want an item to match, but only if the *following* text would match some pattern. You don't want to consume that following text, but if it's not there, you want this rule to fail. You can append a positive lookahead like this to a rule by appending the lookahead rule preceeded by an & character.
+Semantic predicates
+-------------------
+
+Warning: This is an advanced feature. You need to understand the way a packrat parser operates to use it correctly. The result of computing a rule containing a semantic predicate will be memoized, even if the same rule, applied later at the same location in the input, would work differently due to a semantic predicate returning a diffent value. If you don't understand the previous sentence yet still use this feature, you're on your own, test carefully!
+
+Sometimes, you need to run external Ruby code to decide whether this syntax rule should continue or should fail. You can do this using either positive or negative semantic predicates. These are Ruby code blocks (lambdas) which are called when the parser reaches that location. For this rule to succeed, the value must be true for a positive predicate (a block like &{ ... }), or false for a negative predicate (a block like !{ ... }).
+The block is called with one argument, the array containing the preceding syntax nodes in the current sequence. Within the block, you cannot use node names or labels for the preceding nodes, as the node for the current rule does not yet exist. You must refer to preceding nodes using their position in the sequence.
+
+ grammar Keywords
+ rule sequence_of_reserved_and_nonreserved_words
+ ( reserved / word )*
+ end
+
+ rule reserved
+ word &{ |s| symbol_reserved?(s[0].text_value) }
+ end
+
+ rule word
+ ([a-zA-Z]+ [ \t]+)
+ end
+ end
+
+One case where it is always safe to use a semantic predicate is to invoke the Ruby debugger, but don't forget to return true so the rule succeeds! Assuming you have required the 'ruby-debug' module somewhere, it looks like this:
+
+ rule problems
+ word &{ |s| debugger; true }
+ end
-Features to cover in the talk
-=============================
-
-* Treetop files
-* Grammar definition
-* Rules
-* Loading a grammar
-* Compiling a grammar with the `tt` command
-* Accessing a parser for the grammar from Ruby
-* Parsing Expressions of all kinds
-? Left recursion and factorization
- - Here I can talk about function application, discussing how the operator
- could be an arbitrary expression
-* Inline node class eval blocks
-* Node class declarations
-* Labels
-* Use of super within within labels
-* Grammar composition with include
-* Use of super with grammar composition
+When the debugger stops here, you can inspect the contents of the SyntaxNode for "word" by looking at s[0], and the stack trace will show how you got there.
View
159 lib/treetop/compiler/metagrammar.rb
@@ -575,7 +575,7 @@ module DeclarationSequence2
def declarations
[head] + tail
end
-
+
def tail
super.elements.map { |elt| elt.declaration }
end
@@ -955,11 +955,11 @@ module Choice2
def alternatives
[head] + tail
end
-
+
def tail
super.elements.map {|elt| elt.alternative}
end
-
+
def inline_modules
(alternatives.map {|alt| alt.inline_modules }).flatten
end
@@ -1076,17 +1076,17 @@ module Sequence2
def sequence_elements
[head] + tail
end
-
+
def tail
super.elements.map {|elt| elt.labeled_sequence_primary }
end
-
+
def inline_modules
(sequence_elements.map {|elt| elt.inline_modules}).flatten +
[sequence_element_accessor_module] +
node_class_declarations.inline_modules
end
-
+
def inline_module_name
node_class_declarations.inline_module_name
end
@@ -1199,15 +1199,15 @@ module Primary1
def compile(address, builder, parent_expression=nil)
prefix.compile(address, builder, self)
end
-
+
def prefixed_expression
atomic
end
-
+
def inline_modules
atomic.inline_modules
end
-
+
def inline_module_name
nil
end
@@ -1253,19 +1253,19 @@ module Primary5
def compile(address, builder, parent_expression=nil)
suffix.compile(address, builder, self)
end
-
+
def optional_expression
atomic
end
-
+
def node_class_name
node_class_declarations.node_class_name
end
-
+
def inline_modules
atomic.inline_modules + node_class_declarations.inline_modules
end
-
+
def inline_module_name
node_class_declarations.inline_module_name
end
@@ -1285,15 +1285,15 @@ module Primary7
def compile(address, builder, parent_expression=nil)
atomic.compile(address, builder, self)
end
-
+
def node_class_name
node_class_declarations.node_class_name
end
-
+
def inline_modules
atomic.inline_modules + node_class_declarations.inline_modules
end
-
+
def inline_module_name
node_class_declarations.inline_module_name
end
@@ -1422,11 +1422,11 @@ module LabeledSequencePrimary1
def compile(lexical_address, builder)
sequence_primary.compile(lexical_address, builder)
end
-
+
def inline_modules
sequence_primary.inline_modules
end
-
+
def label_name
if label.name
label.name
@@ -1585,15 +1585,15 @@ module SequencePrimary1
def compile(lexical_address, builder)
prefix.compile(lexical_address, builder, self)
end
-
+
def prefixed_expression
elements[1]
end
-
+
def inline_modules
atomic.inline_modules
end
-
+
def inline_module_name
nil
end
@@ -1635,15 +1635,15 @@ module SequencePrimary5
def compile(lexical_address, builder)
suffix.compile(lexical_address, builder, self)
end
-
+
def node_class_name
nil
end
-
+
def inline_modules
atomic.inline_modules
end
-
+
def inline_module_name
nil
end
@@ -1808,15 +1808,15 @@ module NodeClassDeclarations1
def node_class_name
node_class_expression.node_class_name
end
-
+
def inline_modules
trailing_inline_module.inline_modules
end
-
+
def inline_module
trailing_inline_module.inline_module
end
-
+
def inline_module_name
inline_module.module_name if inline_module
end
@@ -1886,8 +1886,13 @@ def _nt_repetition_suffix
if r2
r0 = r2
else
- @index = i0
- r0 = nil
+ r3 = _nt_occurrence_range
+ if r3
+ r0 = r3
+ else
+ @index = i0
+ r0 = nil
+ end
end
end
@@ -1896,6 +1901,94 @@ def _nt_repetition_suffix
r0
end
+ module OccurrenceRange0
+ def min
+ elements[1]
+ end
+
+ def max
+ elements[3]
+ end
+ end
+
+ def _nt_occurrence_range
+ start_index = index
+ if node_cache[:occurrence_range].has_key?(index)
+ cached = node_cache[:occurrence_range][index]
+ if cached
+ cached = SyntaxNode.new(input, index...(index + 1)) if cached == true
+ @index = cached.interval.end
+ end
+ return cached
+ end
+
+ i0, s0 = index, []
+ r2 = _nt_space
+ if r2
+ r1 = r2
+ else
+ r1 = instantiate_node(SyntaxNode,input, index...index)
+ end
+ s0 << r1
+ if r1
+ s3, i3 = [], index
+ loop do
+ if has_terminal?('\G[0-9]', true, index)
+ r4 = true
+ @index += 1
+ else
+ r4 = nil
+ end
+ if r4
+ s3 << r4
+ else
+ break
+ end
+ end
+ r3 = instantiate_node(SyntaxNode,input, i3...index, s3)
+ s0 << r3
+ if r3
+ if has_terminal?('..', false, index)
+ r5 = instantiate_node(SyntaxNode,input, index...(index + 2))
+ @index += 2
+ else
+ terminal_parse_failure('..')
+ r5 = nil
+ end
+ s0 << r5
+ if r5
+ s6, i6 = [], index
+ loop do
+ if has_terminal?('\G[0-9]', true, index)
+ r7 = true
+ @index += 1
+ else
+ r7 = nil
+ end
+ if r7
+ s6 << r7
+ else
+ break
+ end
+ end
+ r6 = instantiate_node(SyntaxNode,input, i6...index, s6)
+ s0 << r6
+ end
+ end
+ end
+ if s0.last
+ r0 = instantiate_node(OccurrenceRange,input, i0...index, s0)
+ r0.extend(OccurrenceRange0)
+ else
+ @index = i0
+ r0 = nil
+ end
+
+ node_cache[:occurrence_range][start_index] = r0
+
+ r0
+ end
+
def _nt_prefix
start_index = index
if node_cache[:prefix].has_key?(index)
@@ -2816,7 +2909,7 @@ module TrailingInlineModule1
def inline_modules
[inline_module]
end
-
+
def inline_module_name
inline_module.module_name
end
@@ -2826,11 +2919,11 @@ module TrailingInlineModule2
def inline_modules
[]
end
-
+
def inline_module
- nil
+ nil
end
-
+
def inline_module_name
nil
end
View
88 lib/treetop/compiler/metagrammar.treetop
@@ -8,11 +8,11 @@ module Treetop
end
}
end
-
+
rule require_statement
prefix:space? "require" [ \t]+ [^\n\r]+ [\n\r]
end
-
+
rule module_declaration
prefix:('module' space [A-Z] alphanumeric_char* space) module_contents:(module_declaration / grammar) suffix:(space 'end') {
def compile
@@ -34,7 +34,7 @@ module Treetop
def declarations
[head] + tail
end
-
+
def tail
super.elements.map { |elt| elt.declaration }
end
@@ -45,11 +45,11 @@ module Treetop
end
}
end
-
+
rule declaration
parsing_rule / include_declaration
end
-
+
rule include_declaration
'include' space [A-Z] (alphanumeric_char / '::')* {
def compile(builder)
@@ -71,11 +71,11 @@ module Treetop
def alternatives
[head] + tail
end
-
+
def tail
super.elements.map {|elt| elt.alternative}
end
-
+
def inline_modules
(alternatives.map {|alt| alt.inline_modules }).flatten
end
@@ -87,17 +87,17 @@ module Treetop
def sequence_elements
[head] + tail
end
-
+
def tail
super.elements.map {|elt| elt.labeled_sequence_primary }
end
-
+
def inline_modules
(sequence_elements.map {|elt| elt.inline_modules}).flatten +
[sequence_element_accessor_module] +
node_class_declarations.inline_modules
end
-
+
def inline_module_name
node_class_declarations.inline_module_name
end
@@ -113,15 +113,15 @@ module Treetop
def compile(address, builder, parent_expression=nil)
prefix.compile(address, builder, self)
end
-
+
def prefixed_expression
atomic
end
-
+
def inline_modules
atomic.inline_modules
end
-
+
def inline_module_name
nil
end
@@ -143,19 +143,19 @@ module Treetop
def compile(address, builder, parent_expression=nil)
suffix.compile(address, builder, self)
end
-
+
def optional_expression
atomic
end
-
+
def node_class_name
node_class_declarations.node_class_name
end
-
+
def inline_modules
atomic.inline_modules + node_class_declarations.inline_modules
end
-
+
def inline_module_name
node_class_declarations.inline_module_name
end
@@ -165,15 +165,15 @@ module Treetop
def compile(address, builder, parent_expression=nil)
atomic.compile(address, builder, self)
end
-
+
def node_class_name
node_class_declarations.node_class_name
end
-
+
def inline_modules
atomic.inline_modules + node_class_declarations.inline_modules
end
-
+
def inline_module_name
node_class_declarations.inline_module_name
end
@@ -185,11 +185,11 @@ module Treetop
def compile(lexical_address, builder)
sequence_primary.compile(lexical_address, builder)
end
-
+
def inline_modules
sequence_primary.inline_modules
end
-
+
def label_name
if label.name
label.name
@@ -201,7 +201,7 @@ module Treetop
end
}
end
-
+
rule label
(alpha_char alphanumeric_char*) ':' {
def name
@@ -221,15 +221,15 @@ module Treetop
def compile(lexical_address, builder)
prefix.compile(lexical_address, builder, self)
end
-
+
def prefixed_expression
elements[1]
end
-
+
def inline_modules
atomic.inline_modules
end
-
+
def inline_module_name
nil
end
@@ -251,53 +251,57 @@ module Treetop
def compile(lexical_address, builder)
suffix.compile(lexical_address, builder, self)
end
-
+
def node_class_name
nil
end
-
+
def inline_modules
atomic.inline_modules
end
-
+
def inline_module_name
nil
end
}
/
atomic
end
-
+
rule suffix
repetition_suffix / optional_suffix
end
-
+
rule optional_suffix
'?' <Optional>
end
-
+
rule node_class_declarations
node_class_expression trailing_inline_module {
def node_class_name
node_class_expression.node_class_name
end
-
+
def inline_modules
trailing_inline_module.inline_modules
end
-
+
def inline_module
trailing_inline_module.inline_module
end
-
+
def inline_module_name
inline_module.module_name if inline_module
end
}
end
rule repetition_suffix
- '+' <OneOrMore> / '*' <ZeroOrMore>
+ '+' <OneOrMore> / '*' <ZeroOrMore> / occurrence_range
+ end
+
+ rule occurrence_range
+ space? min:([0-9])* '..' max:([0-9])* <OccurrenceRange>
end
rule prefix
@@ -327,15 +331,15 @@ module Treetop
rule terminal
quoted_string / character_class / anything_symbol
end
-
+
rule quoted_string
(single_quoted_string / double_quoted_string) {
def string
super.text_value
end
}
end
-
+
rule double_quoted_string
'"' string:(!'"' ("\\\\" / '\"' / .))* '"' <Terminal>
end
@@ -375,7 +379,7 @@ module Treetop
def inline_modules
[inline_module]
end
-
+
def inline_module_name
inline_module.module_name
end
@@ -385,11 +389,11 @@ module Treetop
def inline_modules
[]
end
-
+
def inline_module
- nil
+ nil
end
-
+
def inline_module_name
nil
end
View
44 lib/treetop/compiler/node_classes/repetition.rb
@@ -16,28 +16,37 @@ def compile(address, builder, parent_expression)
builder.else_ do
builder.break
end
+ if max && !max.empty?
+ builder.if_ "#{accumulator_var}.size == #{max.text_value}" do
+ builder.break
+ end
+ end
end
end
-
+
def inline_module_name
parent_expression.inline_module_name
end
-
+
def assign_and_extend_result
assign_result "instantiate_node(#{node_class_name},input, #{start_index_var}...index, #{accumulator_var})"
extend_result_with_inline_module
end
end
-
+
class ZeroOrMore < Repetition
def compile(address, builder, parent_expression)
super
assign_and_extend_result
end_comment(parent_expression)
end
+
+ def max
+ nil
+ end
end
-
+
class OneOrMore < Repetition
def compile(address, builder, parent_expression)
super
@@ -50,6 +59,31 @@ def compile(address, builder, parent_expression)
end
end_comment(parent_expression)
end
+
+ def max
+ nil
+ end
+ end
+
+ class OccurrenceRange < Repetition
+ def compile(address, builder, parent_expression)
+ super
+
+ if min.empty? || min.text_value.to_i == 0
+ assign_and_extend_result
+ else
+ # We got some, but fewer than we wanted. There'll be a failure reported already
+ builder.if__ "#{accumulator_var}.size < #{min.text_value}" do
+ reset_index
+ assign_failure start_index_var
+ end
+ builder.else_ do
+ assign_and_extend_result
+ end
+ end
+ end_comment(parent_expression)
+ end
end
+
end
-end
+end
View
2 lib/treetop/version.rb
@@ -2,7 +2,7 @@ module Treetop #:nodoc:
module VERSION #:nodoc:
MAJOR = 1
MINOR = 4
- TINY = 5
+ TINY = 6
STRING = [MAJOR, MINOR, TINY].join('.')
end
View
191 spec/compiler/occurrence_range_spec.rb
@@ -0,0 +1,191 @@
+require 'spec_helper'
+require 'ruby-debug'
+Debugger.start
+
+module OccurrenceRangeSpec
+ class Foo < Treetop::Runtime::SyntaxNode
+ end
+
+ describe "zero to two of a terminal symbol followed by a node class declaration and a block" do
+ testing_expression '"foo"..2 <OccurrenceRangeSpec::Foo> { def a_method; end }'
+
+ it "successfully parses epsilon, reporting a failure" do
+ parse('') do |result|
+ result.should_not be_nil
+ result.should be_an_instance_of(Foo)
+ result.should respond_to(:a_method)
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 0
+ failure.expected_string.should == 'foo'
+ end
+ end
+
+ it "successfully parses epsilon, returning an instance declared node class and recording a terminal failure" do
+ parse('') do |result|
+ result.should_not be_nil
+ result.should be_an_instance_of(Foo)
+ result.should respond_to(:a_method)
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 0
+ failure.expected_string.should == 'foo'
+ end
+ end
+
+ it "successfully parses one of that terminal, returning an instance of the declared node class and recording a terminal failure" do
+ parse("foo") do |result|
+ result.should_not be_nil
+ result.should be_an_instance_of(Foo)
+ result.should respond_to(:a_method)
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 3
+ failure.expected_string.should == 'foo'
+ end
+ end
+
+ it "successfully parses two of that terminal, returning an instance of the declared node class and reporting no failure" do
+ parse("foofoo") do |result|
+ result.should_not be_nil
+ result.should be_an_instance_of(Foo)
+ result.should respond_to(:a_method)
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 0
+ end
+ end
+
+ it "fails to parses three of that terminal, returning an instance of the declared node class and reporting no failure" do
+ parse("foofoofoo") do |result|
+ result.should be_nil
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 0
+ end
+ end
+ end
+
+ describe "two to four of a terminal symbol followed by a node class declaration and a block" do
+ testing_expression '"foo" 2..4 <OccurrenceRangeSpec::Foo> { def a_method; end }'
+
+ it "fails to parse epsilon, reporting a failure" do
+ parse('') do |result|
+ result.should be_nil
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 0
+ failure.expected_string.should == 'foo'
+ end
+ end
+
+ it "fails to parse one of that terminal, returning an instance of the declared node class and recording a terminal failure" do
+ parse("foo") do |result|
+ result.should be_nil
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 3
+ failure.expected_string.should == 'foo'
+ end
+ end
+
+ it "successfully parses two of that terminal, returning an instance of the declared node class and reporting no failure" do
+ parse("foofoo") do |result|
+ result.should_not be_nil
+ result.should be_an_instance_of(Foo)
+ result.should respond_to(:a_method)
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 6
+ failure.expected_string.should == 'foo'
+ end
+ end
+
+ it "successfully parses four of that terminal, returning an instance of the declared node class and reporting no failure" do
+ parse("foofoofoofoo") do |result|
+ result.should_not be_nil
+ result.should be_an_instance_of(Foo)
+ result.should respond_to(:a_method)
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 0
+ end
+ end
+
+ it "fails to parses five of that terminal, returning an instance of the declared node class and reporting no failure" do
+ parse("foofoofoofoofoo") do |result|
+ result.should be_nil
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 0
+ end
+ end
+ end
+
+ describe "two to any number of a terminal symbol followed by a node class declaration and a block" do
+ testing_expression '"foo" 2.. <OccurrenceRangeSpec::Foo> { def a_method; end }'
+
+ it "fails to parse epsilon, reporting a failure" do
+ parse('') do |result|
+ result.should be_nil
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 0
+ failure.expected_string.should == 'foo'
+ end
+ end
+
+ it "fails to parse one of that terminal, returning an instance of the declared node class and recording a terminal failure" do
+ parse("foo") do |result|
+ result.should be_nil
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 3
+ failure.expected_string.should == 'foo'
+ end
+ end
+
+ it "successfully parses two of that terminal, returning an instance of the declared node class and reporting no failure" do
+ parse("foofoo") do |result|
+ result.should_not be_nil
+ result.should be_an_instance_of(Foo)
+ result.should respond_to(:a_method)
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 6
+ failure.expected_string.should == 'foo'
+ end
+ end
+
+ it "successfully parses four of that terminal, returning an instance of the declared node class and reporting a failure on the fifth" do
+ parse("foofoofoofoo") do |result|
+ result.should_not be_nil
+ result.should be_an_instance_of(Foo)
+ result.should respond_to(:a_method)
+
+ terminal_failures = parser.terminal_failures
+ terminal_failures.size.should == 1
+ failure = terminal_failures.first
+ failure.index.should == 12
+ failure.expected_string.should == 'foo'
+ end
+ end
+ end
+
+end
View
6 spec/compiler/zero_or_more_spec.rb
@@ -6,7 +6,7 @@ class Foo < Treetop::Runtime::SyntaxNode
describe "zero or more of a terminal symbol followed by a node class declaration and a block" do
testing_expression '"foo"* <ZeroOrMoreSpec::Foo> { def a_method; end }'
-
+
it "successfully parses epsilon, returning an instance declared node class and recording a terminal failure" do
parse('') do |result|
result.should_not be_nil
@@ -20,7 +20,7 @@ class Foo < Treetop::Runtime::SyntaxNode
failure.expected_string.should == 'foo'
end
end
-
+
it "successfully parses two of that terminal in a row, returning an instance of the declared node class and recording a failure representing the third attempt " do
parse("foofoo") do |result|
result.should_not be_nil
@@ -37,7 +37,7 @@ class Foo < Treetop::Runtime::SyntaxNode
describe "Zero or more of a sequence" do
testing_expression '("foo" "bar")*'
-
+
it "resets the index appropriately following partially matcing input" do
parse('foobarfoo', :consume_all_input => false) do |result|
result.should_not be_nil

0 comments on commit 3aaf1df

Please sign in to comment.
Something went wrong with that request. Please try again.