Skip to content
Browse files

Make whitespace normalization Unicode-aware. Fixes #767

Also move normalize_whitespace and to_regexp into Capybara::Helpers, and
improve method documentation.

I'm not adding tests for the helpers. I think they're so simple that we
really only care whether they work in conjunction with other methods,
like Node#text.
  • Loading branch information...
1 parent 33cc2ca commit b2767a84135df7120522fb5191f92706d0e0e3b2 @joliss joliss committed Aug 1, 2012
View
3 History.txt
@@ -3,6 +3,9 @@
### Fixed
* `has_text` (`has_content`) now accepts non-string arguments, like numbers.
+ [Jo Liss]
+* `has_text` and `text` now correctly normalize Unicode whitespace, such as
+ ` `. [Jo Liss]
# Version 2.0.0.beta2
View
1 lib/capybara.rb
@@ -308,6 +308,7 @@ def session_pool
autoload :Selector, 'capybara/selector'
autoload :Query, 'capybara/query'
autoload :Result, 'capybara/result'
+ autoload :Helpers, 'capybara/helpers'
autoload :VERSION, 'capybara/version'
module Node
View
33 lib/capybara/helpers.rb
@@ -0,0 +1,33 @@
+module Capybara
+ module Helpers
+ class << self
+ ##
+ #
+ # Normalizes whitespace space by stripping leading and trailing
+ # whitespace and replacing sequences of whitespace characters
+ # with a single space.
+ #
+ # @param [String] text Text to normalize
+ # @return [String] Normalized text
+ #
+ def normalize_whitespace(text)
+ # http://en.wikipedia.org/wiki/Whitespace_character#Unicode
+ # We should have a better reference.
+ # See also http://stackoverflow.com/a/11758133/525872
+ text.to_s.gsub(/[\s\u0085\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000]+/, ' ').strip
+ end
+
+ ##
+ #
+ # Escapes any characters that would have special meaning in a regexp
+ # if text is not a regexp
+ #
+ # @param [String] text Text to escape
+ # @return [String] Escaped text
+ #
+ def to_regexp(text)
+ text.is_a?(Regexp) ? text : Regexp.escape(normalize_whitespace(text))
+ end
+ end
+ end
+end
View
44 lib/capybara/node/matchers.rb
@@ -193,16 +193,17 @@ def has_no_css?(path, options={})
# Checks if the page or current node has the given text content,
# ignoring any HTML tags and normalizing whitespace.
#
- # Unlike has_content this only matches displayable text and specifically
- # excludes text contained within non-display nodes such as script or head tags.
+ # This only matches displayable text and specifically excludes text
+ # contained within non-display nodes such as script or head tags.
#
# @param [String] content The text to check for
# @return [Boolean] Whether it exists
#
def has_text?(content)
synchronize do
- normalize_whitespace(text).match(to_regexp(content)) or
- raise ExpectationNotMet
+ unless Capybara::Helpers.normalize_whitespace(text).match(Capybara::Helpers.to_regexp(content))
+ raise ExpectationNotMet
+ end
end
return true
rescue Capybara::ExpectationNotMet
@@ -215,16 +216,17 @@ def has_text?(content)
# Checks if the page or current node does not have the given text
# content, ignoring any HTML tags and normalizing whitespace.
#
- # Unlike has_content this only matches displayable text and specifically
- # excludes text contained within non-display nodes such as script or head tags.
+ # This only matches displayable text and specifically excludes text
+ # contained within non-display nodes such as script or head tags.
#
# @param [String] content The text to check for
- # @return [Boolean] Whether it exists
+ # @return [Boolean] Whether it doesn't exist
#
def has_no_text?(content)
synchronize do
- !normalize_whitespace(text).match(to_regexp(content)) or
- raise ExpectationNotMet
+ if Capybara::Helpers.normalize_whitespace(text).match(Capybara::Helpers.to_regexp(content))
+ raise ExpectationNotMet
+ end
end
return true
rescue Capybara::ExpectationNotMet
@@ -458,30 +460,6 @@ def ==(other)
private
- ##
- #
- # Normalizes whitespace space by stripping leading and trailing
- # whitespace and replacing sequences of whitespace characters
- # with a single space.
- #
- # @param [String] text Text to normalize
- # @return [String] Normalized text
- #
- def normalize_whitespace(text)
- text.to_s.gsub(/\s+/, ' ').strip
- end
-
- ##
- #
- # Escapes any characters that would have special meaning in a regexp
- # if text is not a regexp
- #
- # @param [String] text Text to escape
- # @return [String] Escaped text
- #
- def to_regexp(text)
- text.is_a?(Regexp) ? text : Regexp.escape(normalize_whitespace(text))
- end
end
end
end
View
2 lib/capybara/rack_test/node.rb
@@ -1,6 +1,6 @@
class Capybara::RackTest::Node < Capybara::Driver::Node
def text
- unnormalized_text.strip.gsub(/\s+/, ' ')
+ Capybara::Helpers.normalize_whitespace(unnormalized_text)
end
def [](name)
View
3 lib/capybara/selenium/node.rb
@@ -1,6 +1,7 @@
class Capybara::Selenium::Node < Capybara::Driver::Node
def text
- native.text
+ # Selenium doesn't normalize Unicode whitespace.
+ Capybara::Helpers.normalize_whitespace(native.text)
end
def [](name)
View
2 lib/capybara/spec/views/with_html.erb
@@ -21,7 +21,7 @@
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum
dolore eu fugiat <a href="/redirect" id="red">Redirect</a> pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia
- text with
+ text with &nbsp;
whitespace
id est laborum.
</p>

0 comments on commit b2767a8

Please sign in to comment.
Something went wrong with that request. Please try again.