Permalink
Browse files

treat fullwidth whitespace as a blank character

  • Loading branch information...
1 parent 704ee0d commit 9c60860322d58e1a2fcd42f1a21ec86ec2caa50b @amatsuda amatsuda committed Jul 13, 2011
@@ -86,14 +86,18 @@ class Hash
end
class String
+ # 0x3000: fullwidth whitespace
+ NON_WHITESPACE_REGEXP = %r![^\s#{[0x3000].pack("U")}]!
@markburns

markburns Jul 14, 2011

Does this only cover one extra possibility for whitespace characters?

I.e. should it deal with the other possibilities?

From wikipedia:
U+0009–U+000D (control characters, containing Tab, CR and LF)
U+0020 SPACE
U+0085 NEL (control character next line)
U+00A0 NBSP (NO-BREAK SPACE)
U+1680 OGHAM SPACE MARK
U+180E MONGOLIAN VOWEL SEPARATOR
U+2000–U+200A (different sorts of spaces)
U+2028 LS (LINE SEPARATOR)
U+2029 PS (PARAGRAPH SEPARATOR)
U+202F NNBSP (NARROW NO-BREAK SPACE)
U+205F MMSP (MEDIUM MATHEMATICAL SPACE)
U+3000 IDEOGRAPHIC SPACE

or also see http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode

@pixeltrix

pixeltrix Jul 14, 2011

Owner

See the discussion on the ticket: #2052

+
# A string is blank if it's empty or contains whitespaces only:
#
# "".blank? # => true
# " ".blank? # => true
+ # " ".blank? # => true
# " something here ".blank? # => false
#
def blank?
- self !~ /\S/
+ self !~ NON_WHITESPACE_REGEXP
end
@spastorino

spastorino Jul 18, 2011

Owner

This throws Encoding::CompatibilityError when self is not compatible with UTF-8 and also makes this https://github.com/rails/rails/blob/master/activesupport/test/gzip_test.rb#L16 test fail since compressed is ASCII-8BIT

end
@@ -2,7 +2,7 @@
require 'active_support/core_ext/object/blank'
class BlankTest < Test::Unit::TestCase
- BLANK = [ EmptyTrue.new, nil, false, '', ' ', " \n\t \r ", [], {} ]
+ BLANK = [ EmptyTrue.new, nil, false, '', ' ', " \n\t \r ", ' ', [], {} ]
NOT = [ EmptyFalse.new, Object.new, true, 0, 1, 'a', [nil], { nil => 0 } ]
def test_blank

0 comments on commit 9c60860

Please sign in to comment.