Permalink
Browse files

Matches no longer contain an offset.

Calculating the offset every time a match is created can slow down the
parse dramatically. Also, the actual offset of a match is not usually
very useful during interpretation.

The functionality was moved into citrus/debug.rb to preserve the ability
to inspect the offset of a match when debugging.
  • Loading branch information...
1 parent 994c5ce commit 2d464eec8802e96661831f7ccf7bd81e34703028 @mjackson committed Oct 24, 2010
Showing with 61 additions and 38 deletions.
  1. +9 −10 README
  2. +9 −10 doc/background.markdown
  3. +8 −13 lib/citrus.rb
  4. +34 −4 lib/citrus/debug.rb
  5. +1 −1 test/helper.rb
View
19 README
@@ -56,9 +56,9 @@ A [Rule](api/classes/Citrus/Rule.html) is an object that specifies some matching
behavior on a string. There are two types of rules: terminals and non-terminals.
Terminals can be either Ruby strings or regular expressions that specify some
input to match. For example, a terminal created from the string "end" would
-match any sequence of the characters "e", "n", and "d", in that order. A
-terminal created from a regular expression uses Ruby's regular expression engine
-to attempt to create a match.
+match any sequence of the characters "e", "n", and "d", in that order. Terminals
+created from regular expressions may match any sequence of characters that can
+be generated from that expression.
Non-terminals are rules that may contain other rules but do not themselves match
directly on the input. For example, a Repeat is a non-terminal that may contain
@@ -85,10 +85,10 @@ similar to Ruby's super keyword.
## Matches
Matches are created by rule objects when they match on the input. A
-[Match](api/classes/Citrus/Match.html) in Citrus is actually a
-[String](http://ruby-doc.org/core/classes/String.html) with some extra
-information attached such as the name(s) of the rule(s) which generated the
-match as well as its offset in the original input string.
+[Match](api/classes/Citrus/Match.html) is actually a
+[String](http://ruby-doc.org/core/classes/String.html) object with some extra
+information attached such as the name(s) of the rule(s) from which it was
+generated and any submatches it may contain.
During a parse, matches are arranged in a tree structure where any match may
contain any number of other matches. This structure is determined by the way in
@@ -97,9 +97,8 @@ match that is created from a non-terminal rule that contains several other
terminals will likewise contain several matches, one for each terminal.
Match objects may be extended with semantic information in the form of methods.
-These methods can interpret the text of a match using the wealth of information
-available to them including the text of the match, its position in the input,
-and any submatches.
+These methods should provide various interpretations for the semantic value of a
+match.
# Syntax
View
@@ -29,9 +29,9 @@ A [Rule](api/classes/Citrus/Rule.html) is an object that specifies some matching
behavior on a string. There are two types of rules: terminals and non-terminals.
Terminals can be either Ruby strings or regular expressions that specify some
input to match. For example, a terminal created from the string "end" would
-match any sequence of the characters "e", "n", and "d", in that order. A
-terminal created from a regular expression uses Ruby's regular expression engine
-to attempt to create a match.
+match any sequence of the characters "e", "n", and "d", in that order. Terminals
+created from regular expressions may match any sequence of characters that can
+be generated from that expression.
Non-terminals are rules that may contain other rules but do not themselves match
directly on the input. For example, a Repeat is a non-terminal that may contain
@@ -58,10 +58,10 @@ similar to Ruby's super keyword.
## Matches
Matches are created by rule objects when they match on the input. A
-[Match](api/classes/Citrus/Match.html) in Citrus is actually a
-[String](http://ruby-doc.org/core/classes/String.html) with some extra
-information attached such as the name(s) of the rule(s) which generated the
-match as well as its offset in the original input string.
+[Match](api/classes/Citrus/Match.html) is actually a
+[String](http://ruby-doc.org/core/classes/String.html) object with some extra
+information attached such as the name(s) of the rule(s) from which it was
+generated and any submatches it may contain.
During a parse, matches are arranged in a tree structure where any match may
contain any number of other matches. This structure is determined by the way in
@@ -70,6 +70,5 @@ match that is created from a non-terminal rule that contains several other
terminals will likewise contain several matches, one for each terminal.
Match objects may be extended with semantic information in the form of methods.
-These methods can interpret the text of a match using the wealth of information
-available to them including the text of the match, its position in the input,
-and any submatches.
+These methods should provide various interpretations for the semantic value of a
+match.
View
@@ -539,10 +539,8 @@ def extend_match(match, name)
match
end
- def create_match(data, input)
- match = Match.new(data)
- match.offset = input.pos - match.length
- extend_match(match, name)
+ def create_match(data)
+ extend_match(Match.new(data), name)
end
end
@@ -671,7 +669,7 @@ def initialize(rule='')
# Returns the Match for this rule on +input+, +nil+ if no match can be made.
def match(input)
m = input.scan(@rule)
- create_match(m, input) if m
+ create_match(m) if m
end
# Returns the Citrus notation of this rule as a string.
@@ -725,7 +723,7 @@ class AndPredicate
# Returns the Match for this rule on +input+, +nil+ if no match can be made.
def match(input)
- create_match('', input) if input.match(rule)
+ create_match('') if input.match(rule)
end
# Returns the Citrus notation of this rule as a string.
@@ -745,7 +743,7 @@ class NotPredicate
# Returns the Match for this rule on +input+, +nil+ if no match can be made.
def match(input)
- create_match('', input) unless input.match(rule)
+ create_match('') unless input.match(rule)
end
# Returns the Citrus notation of this rule as a string.
@@ -774,7 +772,7 @@ def match(input)
matches << m
end
# Create a single match from the aggregate text value of all submatches.
- create_match(matches.join, input) if matches.any?
+ create_match(matches.join) if matches.any?
end
# Returns the Citrus notation of this rule as a string.
@@ -856,7 +854,7 @@ def match(input)
break unless m
matches << m
end
- create_match(matches, input) if @range.include?(matches.length)
+ create_match(matches) if @range.include?(matches.length)
end
# The minimum number of times this rule must match.
@@ -937,7 +935,7 @@ def match(input)
break unless m
matches << m
end
- create_match(matches, input) if matches.length == rules.length
+ create_match(matches) if matches.length == rules.length
end
# Returns the Citrus notation of this rule as a string.
@@ -963,9 +961,6 @@ def initialize(data)
end
end
- # The offset in the input at which this match occurred.
- attr_accessor :offset
-
# An array of all names of this match. A name is added to a match object
# for each rule that returns that object when matching. These names can then
# be used to determine which rules were satisfied by a given match.
View
@@ -3,6 +3,16 @@
module Citrus
class Match
+ # The offset at which this match was found in the input.
+ attr_accessor :offset
+
+ def debug_attrs
+ { "names" => names.join(','),
+ "text" => to_s,
+ "offset" => offset
+ }
+ end
+
# Creates a Builder::XmlMarkup object from this match. Useful when
# inspecting a nested match. The +xml+ argument may be a Hash of
# Builder::XmlMarkup options.
@@ -13,12 +23,10 @@ def to_markup(xml={})
xml.instruct!
end
- attrs = { "names" => names.join(','), "text" => to_s, "offset" => offset }
-
if matches.empty?
- xml.match(attrs)
+ xml.match(debug_attrs)
else
- xml.match(attrs) do
+ xml.match(debug_attrs) do
matches.each {|m| m.to_markup(xml) }
end
end
@@ -36,4 +44,26 @@ def inspect # :nodoc:
to_xml
end
end
+
+ # Hijack all classes that use Rule#create_match to create matches. Now, when
+ # matches are created they will also record their offset to help debugging.
+ # This functionality is included in this file because calculating the offset
+ # of every match as it is created can slow things down quite a bit.
+ [ Terminal,
+ AndPredicate,
+ NotPredicate,
+ ButPredicate,
+ Repeat,
+ Sequence
+ ].each do |rule_class|
+ rule_class.class_eval do
+ alias original_match match
+
+ def match(input)
+ m = original_match(input)
+ m.offset = input.pos - m.length if m
+ m
+ end
+ end
+ end
end
View
@@ -57,7 +57,7 @@ def initialize(value)
end
def match(input)
- create_match(@value.to_s.dup, input) if @value.to_s == input.string
+ create_match(@value.to_s.dup) if @value.to_s == input.string
end
end

0 comments on commit 2d464ee

Please sign in to comment.