Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

treat fullwidth whitespace as a blank character

  • Loading branch information...
commit 9c60860322d58e1a2fcd42f1a21ec86ec2caa50b 1 parent 704ee0d
@amatsuda amatsuda authored
View
6 activesupport/lib/active_support/core_ext/object/blank.rb
@@ -86,14 +86,18 @@ class Hash
end
class String
+ # 0x3000: fullwidth whitespace
+ NON_WHITESPACE_REGEXP = %r![^\s#{[0x3000].pack("U")}]!

Does this only cover one extra possibility for whitespace characters?

I.e. should it deal with the other possibilities?

From wikipedia:
U+0009–U+000D (control characters, containing Tab, CR and LF)
U+0020 SPACE
U+0085 NEL (control character next line)
U+00A0 NBSP (NO-BREAK SPACE)
U+1680 OGHAM SPACE MARK
U+180E MONGOLIAN VOWEL SEPARATOR
U+2000–U+200A (different sorts of spaces)
U+2028 LS (LINE SEPARATOR)
U+2029 PS (PARAGRAPH SEPARATOR)
U+202F NNBSP (NARROW NO-BREAK SPACE)
U+205F MMSP (MEDIUM MATHEMATICAL SPACE)
U+3000 IDEOGRAPHIC SPACE

or also see http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode

@pixeltrix Owner

See the discussion on the ticket: #2052

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+
# A string is blank if it's empty or contains whitespaces only:
#
# "".blank? # => true
# " ".blank? # => true
+ # " ".blank? # => true
# " something here ".blank? # => false
#
def blank?
- self !~ /\S/
+ self !~ NON_WHITESPACE_REGEXP
end
@spastorino Owner

This throws Encoding::CompatibilityError when self is not compatible with UTF-8 and also makes this https://github.com/rails/rails/blob/master/activesupport/test/gzip_test.rb#L16 test fail since compressed is ASCII-8BIT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
end
View
2  activesupport/test/core_ext/blank_test.rb
@@ -2,7 +2,7 @@
require 'active_support/core_ext/object/blank'
class BlankTest < Test::Unit::TestCase
- BLANK = [ EmptyTrue.new, nil, false, '', ' ', " \n\t \r ", [], {} ]
+ BLANK = [ EmptyTrue.new, nil, false, '', ' ', " \n\t \r ", ' ', [], {} ]
NOT = [ EmptyFalse.new, Object.new, true, 0, 1, 'a', [nil], { nil => 0 } ]
def test_blank
@markburns

Does this only cover one extra possibility for whitespace characters?

I.e. should it deal with the other possibilities?

From wikipedia:
U+0009–U+000D (control characters, containing Tab, CR and LF)
U+0020 SPACE
U+0085 NEL (control character next line)
U+00A0 NBSP (NO-BREAK SPACE)
U+1680 OGHAM SPACE MARK
U+180E MONGOLIAN VOWEL SEPARATOR
U+2000–U+200A (different sorts of spaces)
U+2028 LS (LINE SEPARATOR)
U+2029 PS (PARAGRAPH SEPARATOR)
U+202F NNBSP (NARROW NO-BREAK SPACE)
U+205F MMSP (MEDIUM MATHEMATICAL SPACE)
U+3000 IDEOGRAPHIC SPACE

or also see http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode

@spastorino

This throws Encoding::CompatibilityError when self is not compatible with UTF-8 and also makes this https://github.com/rails/rails/blob/master/activesupport/test/gzip_test.rb#L16 test fail since compressed is ASCII-8BIT

Please sign in to comment.
Something went wrong with that request. Please try again.