Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Added support for Left Recursive Grammars, using the algorithm from the Ometa paper #19

Open
wants to merge 9 commits into from

4 participants

@joachimm

Also added some stuff in the TextMate bundle

@mjackson
Owner

Wow! This is awesome. Thanks for this work. I'll be reviewing these commits as soon as I get the chance.

@davidkellis

Is http://www.vpri.org/pdf/tr2007002_packrat.pdf [1] the Ometa paper? I didn't know what ometa was until I started searching for it. The reason I ask is because I'm reading through http://tratt.net/laurie/research/publications/papers/tratt__direct_left_recursive_parsing_expression_grammars.pdf [2] and the author, Tratt, makes the observation that Warth's Ometa tool, which implements the technique described in [1], incorrectly parses the expression 1-2-3, as (1-(2-(3))), given the grammar:
expr <- expr "-" expr / num
num <- [0-9]+
joachimm, does your implementation of Warth's technique produce the same right-associative parse as Warth's Ometa tool? I'm thinking that Tratt's alteration to Warth's algorithm might parse a broader class of languages. On the other hand, this might not be an issue at all. I'm just bringing it up because I haven't tinkered around enough with your implementation to know, and reading through Tratt's paper brought the issue to my attention.

Thanks!

@mjackson
Owner

David, Thanks for researching this. As I mentioned above, I'd love to be able to support left recursion in Citrus, but I haven't had the time to thoroughly test the algorithm myself yet. It's an important change, and it deserves careful consideration.

@joachimm

David, yes [1] is the Ometa paper. According to this test I believe left associtivity works as intended https://github.com/joachimm/citrus/blob/master/test/indirect_left_recursion_test.rb

@joachimm joachimm closed this
@joachimm joachimm reopened this
@davidkellis

I just tried implementing the special case (a grammar production is both left-recursive and right-recursive) that Tratt documented in his paper [2].

The grammar is:
expr <- expr "-" expr / num
num <- [0-9]+

In left_recursion_test.rb, I slightly modified your LR grammar:

  grammar :LR do
    rule :expr do
      any(all(label(:expr, 'lhs'), '-', label(:expr, 'rhs')){lhs.value - rhs.value},
          :num)
    end

    rule :num do
      ext(/[0-9]+/){to_i}
    end
  end

  def test_lr
    match = LR.parse("3-4-5", {:leftrecursive=>true})
    assert(match)
    assert_equal("3-4-5", match)
    assert_equal(-6, match.value)
  end

Unless I've done something wrong, which is not unlikely, this test fails because 3-4-5 is parsed as (3 - (4 - 5)) and produces the result 4.

Can you verify that this is in fact an issue. I'm not 100% sure that I've set up a proper test to verify whether or not Tratt's observation is an issue in this implementation.

Thanks.

@joachimm

You are right. Making rules both left-recursive and right-recursive, seems to give unwanted results. Thanks for the link to the Tratt's paper. I will see if I can fit his idea into the algorithm.

@CMCDragonkai

Hey @joachimm have you made any progress on this? I'm interested in whether Warth's implementation is correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
View
1  README
@@ -1,5 +1,4 @@
-
~* Citrus *~
Parsing Expressions for Ruby
View
9 extras/Citrus.tmbundle/Syntaxes/Citrus Grammar.tmLanguage
@@ -297,6 +297,15 @@
<string>((/[eimnosux]*))</string>
<key>name</key>
<string>string.regexp.citrus</string>
+ <key>patterns</key>
+ <array>
+ <dict>
+ <key>match</key>
+ <string>\\.</string>
+ <key>name</key>
+ <string>constant.character.escape.citrus</string>
+ </dict>
+ </array>
</dict>
<key>rule-reference</key>
<dict>
View
150 lib/citrus.rb
@@ -3,6 +3,7 @@
require 'strscan'
require 'pathname'
require 'citrus/version'
+require 'set'
# Citrus is a compact and powerful parsing library for Ruby that combines the
# elegance and expressiveness of the language with the simplicity and power of
@@ -326,6 +327,144 @@ def apply_rule(rule, position, events) # :nodoc:
end
end
+ class LeftRecursiveInput < MemoizedInput
+ def initialize(string)
+ super(string)
+ @heads = {}
+ @lrstack = nil
+ end
+ class LR
+ attr_accessor :seed, :rule, :head, :next
+ def initialize(seed, rule, head, n)
+ @seed = seed
+ @rule = rule
+ @head = head
+ @next = n
+ end
+ end
+
+ class Head
+ attr_accessor :rule, :involved_set, :eval_set
+ def initialize(rule)
+ @rule = rule
+ @involved_set = Set.new
+ @eval_set = Set.new
+ end
+
+ def inspect
+ "<#Head rule=#{@rule} involved=[#{involved_set.to_a.join(", ")}] eval=[#{eval_set.to_a.join(", ")}]"
+ end
+ end
+
+ private
+ def apply_rule(rule, position, events) # :nodoc:
+ c = recall(rule, position)
+ if c
+ @cache_hits += 1
+ # if identical parent rule has already been here
+ # flag it
+ if c.kind_of?(LR)
+ setup_lr(rule,c)
+ c = c.seed
+ end
+ unless c.empty?
+ events.concat(c)
+ self.pos += events[-1]
+ end
+ else
+ index = events.size
+ lr = LR.new([], rule, nil, @lrstack)
+ @lrstack = lr
+ # Memoize lr, then evaluate R
+ memo = @cache[rule]
+ memo[position] = lr
+ rule.exec(self, events)
+ # pop lr off the rule invocation stack
+ @lrstack = @lrstack.next
+ # Memoize the result so we can use it next time this same rule is
+ # executed at this position.
+ memo[position] = events.slice(index, events.size)
+ # if this rule has been seen before, (same rule was accessed as child)
+ # start growing the seed
+ if lr.head
+ lr.seed = events.slice(index, events.size)
+ ans = lr_answer(rule, position, lr)
+ events.slice!(index, events.size)
+ events.concat(ans)
+ end
+ end
+
+ events
+ end
+
+ def grow_lr(rule, position, mem, head)
+ @heads[position]=head
+ progress = position
+ # Don't memoize while growing the seed
+ while true
+ head.eval_set = head.involved_set.dup
+ self.pos = position
+ events = []
+ rule.exec(self,events)
+ break if events.size==0 || pos <= progress
+ @cache[rule][position] = events
+ progress = pos
+ end
+ # set the last successful parse value
+ @heads[position]=nil
+ self.pos = progress
+ @cache[rule][position]
+ end
+
+
+ def setup_lr(rule, lr)
+ if lr.head == nil
+ lr.head = Head.new(rule)
+ end
+ s = @lrstack
+ while s.head != lr.head
+ s.head = lr.head
+ lr.head.involved_set << s.rule
+ s = s.next
+ end
+ end
+
+ def lr_answer(rule, position, mem)
+ head = mem.head
+ if head.rule != rule
+ return mem.seed
+ else
+ @cache[rule][position] = mem.seed
+ if mem.seed == []
+ return []
+ else
+ return grow_lr(rule,position,mem, head)
+ end
+ end
+ end
+
+
+ def recall(rule, position)
+ memo = @cache[rule] ||= {}
+ head = @heads[position]
+ unless head
+ return memo[position]
+ end
+
+ if memo[position] == nil && rule != head.rule && !head.involved_set.include?(rule)
+ return []
+ end
+ if head.eval_set.include? rule
+ head.eval_set.delete(rule)
+ events = []
+ rule.exec(self, events)
+ self.pos = position
+ return events
+ end
+ memo[position]
+ end
+ end
+
# Inclusion of this module into another extends the receiver with the grammar
# helper methods in GrammarMethods. Although this module does not actually
# provide any methods, constants, or variables to modules that include it, the
@@ -625,8 +764,15 @@ def default_options # :nodoc:
# to 0.
def parse(string, options={})
opts = default_options.merge(options)
+
+ if opts[:memoize]
+ input = MemoizedInput.new(string)
+ elsif opts[:leftrecursive]
+ input = LeftRecursiveInput.new(string)
+ else
+ input = Input.new(string)
+ end
- input = (opts[:memoize] ? MemoizedInput : Input).new(string)
input.pos = opts[:offset] if opts[:offset] > 0
events = input.exec(self)
@@ -1337,7 +1483,7 @@ def ==(other)
end
def inspect
- @string.inspect
+ "#<#{self.class.name} rule={#{@events.first}} #{captures.map{|k,v| "#{k}:\"#{v}\"" }.join(" ")} matches=#{matches.inspect}>"
end
# Prints the entire subtree of this match using the given +indent+ to
View
68 test/indirect_left_recursion_test.rb
@@ -0,0 +1,68 @@
+require File.expand_path('../helper', __FILE__)
+
+class IndirectLeftRecursionTest < Test::Unit::TestCase
+ def test_leftrecursived?
+ assert MemoizedInput.new('').memoized?
+ end
+
+ grammar :ILR do
+ rule :x do
+ :expr
+ end
+
+ rule :expr do
+ any(all(:x, '-', :num),:num)
+ end
+
+ rule :num do
+ /[0-9]+/
+ end
+ root :x
+ end
+
+ def test_ilr
+ match = ILR.parse("3-4-5", {:leftrecursive=>true})
+ end
+
+ grammar :BigILR do
+ rule :t do
+ ext(:term)
+ end
+
+ rule :term do
+ any(all(:t, '+', :f){t.value + f.value},
+ all(:t, '-', :f){t.value - f.value},
+ :f)
+ end
+
+ rule :f do
+ ext(:fa)
+ end
+ rule :fa do
+ ext(:fact)
+ end
+
+ rule :fact do
+ any(all(:f, '*', :num){f.value * num.value},
+ all(:f, '/', :num){f.value / num.value},
+ :num)
+ end
+
+ rule :num do
+ ext(/[0-9]+/){to_i}
+ end
+ end
+
+ def test_big_ilr
+ match = BigILR.parse("3-4-5", {:leftrecursive=>true})
+ assert(match)
+ assert_equal("3-4-5", match)
+ assert_equal(-6, match.value)
+
+ match = BigILR.parse("5*4*2-5*9*2", {:leftrecursive=>true})
+ assert(match)
+ assert_equal("5*4*2-5*9*2", match)
+ assert_equal(-50, match.value)
+ end
+
+end
View
47 test/left_recursion_test.rb
@@ -0,0 +1,47 @@
+require File.expand_path('../helper', __FILE__)
+
+class LeftRecursionTest < Test::Unit::TestCase
+ grammar :LR do
+ rule :expr do
+ any(all(:expr, '-', :num),:num)
+ end
+
+ rule :num do
+ /[0-9]+/
+ end
+ end
+
+ def test_lr
+ match = LR.parse("3-4-5", {:leftrecursive=>true})
+ assert(match)
+ assert_equal("3-4-5", match)
+ end
+
+ grammar :BigLR do
+ rule :term do
+ any(all(:term, '+', :fact),
+ all(:term, '-', :fact),
+ :fact)
+ end
+
+ rule :fact do
+ any(all(:fact, '*', :num),
+ all(:fact, '/', :num),
+ :num)
+ end
+
+ rule :num do
+ /[0-9]+/
+ end
+ end
+
+ def test_big_lr
+ match = BigLR.parse("3-4-5", {:leftrecursive=>true})
+ assert(match)
+ assert_equal("3-4-5", match)
+
+ match = BigLR.parse("5*4-5", {:leftrecursive=>true})
+ assert(match)
+ assert_equal("5*4-5", match)
+ end
+end
Something went wrong with that request. Please try again.