Skip to content

Commit

Permalink
Matches no longer contain an offset.
Browse files Browse the repository at this point in the history
Calculating the offset every time a match is created can slow down the
parse dramatically. Also, the actual offset of a match is not usually
very useful during interpretation.

The functionality was moved into citrus/debug.rb to preserve the ability
to inspect the offset of a match when debugging.
  • Loading branch information
mjackson committed Oct 24, 2010
1 parent 994c5ce commit 2d464ee
Show file tree
Hide file tree
Showing 5 changed files with 61 additions and 38 deletions.
19 changes: 9 additions & 10 deletions README
Expand Up @@ -56,9 +56,9 @@ A [Rule](api/classes/Citrus/Rule.html) is an object that specifies some matching
behavior on a string. There are two types of rules: terminals and non-terminals.
Terminals can be either Ruby strings or regular expressions that specify some
input to match. For example, a terminal created from the string "end" would
match any sequence of the characters "e", "n", and "d", in that order. A
terminal created from a regular expression uses Ruby's regular expression engine
to attempt to create a match.
match any sequence of the characters "e", "n", and "d", in that order. Terminals
created from regular expressions may match any sequence of characters that can
be generated from that expression.

Non-terminals are rules that may contain other rules but do not themselves match
directly on the input. For example, a Repeat is a non-terminal that may contain
Expand All @@ -85,10 +85,10 @@ similar to Ruby's super keyword.
## Matches

Matches are created by rule objects when they match on the input. A
[Match](api/classes/Citrus/Match.html) in Citrus is actually a
[String](http://ruby-doc.org/core/classes/String.html) with some extra
information attached such as the name(s) of the rule(s) which generated the
match as well as its offset in the original input string.
[Match](api/classes/Citrus/Match.html) is actually a
[String](http://ruby-doc.org/core/classes/String.html) object with some extra
information attached such as the name(s) of the rule(s) from which it was
generated and any submatches it may contain.

During a parse, matches are arranged in a tree structure where any match may
contain any number of other matches. This structure is determined by the way in
Expand All @@ -97,9 +97,8 @@ match that is created from a non-terminal rule that contains several other
terminals will likewise contain several matches, one for each terminal.

Match objects may be extended with semantic information in the form of methods.
These methods can interpret the text of a match using the wealth of information
available to them including the text of the match, its position in the input,
and any submatches.
These methods should provide various interpretations for the semantic value of a
match.


# Syntax
Expand Down
19 changes: 9 additions & 10 deletions doc/background.markdown
Expand Up @@ -29,9 +29,9 @@ A [Rule](api/classes/Citrus/Rule.html) is an object that specifies some matching
behavior on a string. There are two types of rules: terminals and non-terminals.
Terminals can be either Ruby strings or regular expressions that specify some
input to match. For example, a terminal created from the string "end" would
match any sequence of the characters "e", "n", and "d", in that order. A
terminal created from a regular expression uses Ruby's regular expression engine
to attempt to create a match.
match any sequence of the characters "e", "n", and "d", in that order. Terminals
created from regular expressions may match any sequence of characters that can
be generated from that expression.

Non-terminals are rules that may contain other rules but do not themselves match
directly on the input. For example, a Repeat is a non-terminal that may contain
Expand All @@ -58,10 +58,10 @@ similar to Ruby's super keyword.
## Matches

Matches are created by rule objects when they match on the input. A
[Match](api/classes/Citrus/Match.html) in Citrus is actually a
[String](http://ruby-doc.org/core/classes/String.html) with some extra
information attached such as the name(s) of the rule(s) which generated the
match as well as its offset in the original input string.
[Match](api/classes/Citrus/Match.html) is actually a
[String](http://ruby-doc.org/core/classes/String.html) object with some extra
information attached such as the name(s) of the rule(s) from which it was
generated and any submatches it may contain.

During a parse, matches are arranged in a tree structure where any match may
contain any number of other matches. This structure is determined by the way in
Expand All @@ -70,6 +70,5 @@ match that is created from a non-terminal rule that contains several other
terminals will likewise contain several matches, one for each terminal.

Match objects may be extended with semantic information in the form of methods.
These methods can interpret the text of a match using the wealth of information
available to them including the text of the match, its position in the input,
and any submatches.
These methods should provide various interpretations for the semantic value of a
match.
21 changes: 8 additions & 13 deletions lib/citrus.rb
Expand Up @@ -539,10 +539,8 @@ def extend_match(match, name)
match
end

def create_match(data, input)
match = Match.new(data)
match.offset = input.pos - match.length
extend_match(match, name)
def create_match(data)
extend_match(Match.new(data), name)
end
end

Expand Down Expand Up @@ -671,7 +669,7 @@ def initialize(rule='')
# Returns the Match for this rule on +input+, +nil+ if no match can be made.
def match(input)
m = input.scan(@rule)
create_match(m, input) if m
create_match(m) if m
end

# Returns the Citrus notation of this rule as a string.
Expand Down Expand Up @@ -725,7 +723,7 @@ class AndPredicate

# Returns the Match for this rule on +input+, +nil+ if no match can be made.
def match(input)
create_match('', input) if input.match(rule)
create_match('') if input.match(rule)
end

# Returns the Citrus notation of this rule as a string.
Expand All @@ -745,7 +743,7 @@ class NotPredicate

# Returns the Match for this rule on +input+, +nil+ if no match can be made.
def match(input)
create_match('', input) unless input.match(rule)
create_match('') unless input.match(rule)
end

# Returns the Citrus notation of this rule as a string.
Expand Down Expand Up @@ -774,7 +772,7 @@ def match(input)
matches << m
end
# Create a single match from the aggregate text value of all submatches.
create_match(matches.join, input) if matches.any?
create_match(matches.join) if matches.any?
end

# Returns the Citrus notation of this rule as a string.
Expand Down Expand Up @@ -856,7 +854,7 @@ def match(input)
break unless m
matches << m
end
create_match(matches, input) if @range.include?(matches.length)
create_match(matches) if @range.include?(matches.length)
end

# The minimum number of times this rule must match.
Expand Down Expand Up @@ -937,7 +935,7 @@ def match(input)
break unless m
matches << m
end
create_match(matches, input) if matches.length == rules.length
create_match(matches) if matches.length == rules.length
end

# Returns the Citrus notation of this rule as a string.
Expand All @@ -963,9 +961,6 @@ def initialize(data)
end
end

# The offset in the input at which this match occurred.
attr_accessor :offset

# An array of all names of this match. A name is added to a match object
# for each rule that returns that object when matching. These names can then
# be used to determine which rules were satisfied by a given match.
Expand Down
38 changes: 34 additions & 4 deletions lib/citrus/debug.rb
Expand Up @@ -3,6 +3,16 @@

module Citrus
class Match
# The offset at which this match was found in the input.
attr_accessor :offset

def debug_attrs
{ "names" => names.join(','),
"text" => to_s,
"offset" => offset
}
end

# Creates a Builder::XmlMarkup object from this match. Useful when
# inspecting a nested match. The +xml+ argument may be a Hash of
# Builder::XmlMarkup options.
Expand All @@ -13,12 +23,10 @@ def to_markup(xml={})
xml.instruct!
end

attrs = { "names" => names.join(','), "text" => to_s, "offset" => offset }

if matches.empty?
xml.match(attrs)
xml.match(debug_attrs)
else
xml.match(attrs) do
xml.match(debug_attrs) do
matches.each {|m| m.to_markup(xml) }
end
end
Expand All @@ -36,4 +44,26 @@ def inspect # :nodoc:
to_xml
end
end

# Hijack all classes that use Rule#create_match to create matches. Now, when
# matches are created they will also record their offset to help debugging.
# This functionality is included in this file because calculating the offset
# of every match as it is created can slow things down quite a bit.
[ Terminal,
AndPredicate,
NotPredicate,
ButPredicate,
Repeat,
Sequence
].each do |rule_class|
rule_class.class_eval do
alias original_match match

def match(input)
m = original_match(input)
m.offset = input.pos - m.length if m
m
end
end
end
end
2 changes: 1 addition & 1 deletion test/helper.rb
Expand Up @@ -57,7 +57,7 @@ def initialize(value)
end

def match(input)
create_match(@value.to_s.dup, input) if @value.to_s == input.string
create_match(@value.to_s.dup) if @value.to_s == input.string
end
end

Expand Down

0 comments on commit 2d464ee

Please sign in to comment.