ActiveSupports implementation of String#blank? is inconsistent with Ruby MRI #9992

SamSaffron opened this Issue Mar 29, 2013 · 6 comments

4 participants


I have been looking at some perf issues around a large amount of blank? calls we are seeing at Discourse. see: #9958

I decided just to rewrite blank? in c for now, while we sort out what we do in AR and Active Support.

While writing this I noticed the blank? implementation is not consistent with Ruby MRI.

[1] pry(main)> "\u2001".blank?
=> true
[2] pry(main)> "\u2001".strip.length
=> 1

[3] pry(main)> "\r\n\v\f\s".strip.length
=> 0
[4] pry(main)> "\r\n\v\f\s".blank?
=> true

Meaning that my naive implementation that is consistent with strip etc, and could possibly be accepted into Ruby core is not a proper replacement for .blank?

Interestingly onigaruma neglects a few unicode spaces, making it even inconsistent with itself.

 # weird ass unicode crap
    # 1680, 180e, 200B, FEFF is missing from blank?
=> true

Is this unicode maddness really needed in blank?

Ruby on Rails member

I think so. Also I don't think we can change this without breaking a lot of applications.


@rafaelfranca one concern I had from testing is:

(16*16*16*16).times do |i|
  c = i.chr('UTF-8') rescue nil
  if c
    puts ("strip: #{i.to_s(16)}") if c.strip.length == 0
    puts ("blank: #{i.to_s(16)}") if c !~ /[^[:space:]]/


ruby test.rb 
strip: 0
strip: 9
blank: 9
strip: a
blank: a
strip: b
blank: b
strip: c
blank: c
strip: d
blank: d
strip: 20
blank: 20
blank: 85
blank: a0
blank: 1680
blank: 180e
blank: 2000
blank: 2001
blank: 2002
blank: 2003
blank: 2004
blank: 2005
blank: 2006
blank: 2007
blank: 2008
blank: 2009
blank: 200a
blank: 2028
blank: 2029
blank: 202f
blank: 205f
blank: 3000

For some reason

"\u0000".blank? is not considered blank, whereas strip would have stripped it. This could potentially affect a bunch of people .I would consider a null terminator to be blank, its also a likely bug in various spots, double null terminating strings etc.


Another slight concern is considering non-breaking spaces as blank, it just feels odd.

Ruby on Rails member

/cc @fxn

Ruby on Rails member
fxn commented Feb 5, 2014

The problem here is defining whitespace.

Once we accept blank strings may have length > 0, the next step is to define what is considered to be blank. The idea would be "zero or more regular characters that have no visible glyph" I guess, because you don't want a username to be empty, but also you don't want it to consist of four spaces or one tab. That suggests the notion of whitespace.

In an ASCII-centric world, we would accept the non-Unicode aware character class \s as the definition of whitespace, but those days are gone and we would wonder why we leave out the whitespace characters as defined by the modern Unicode aware \s.

There was a form not long time ago that required that I filled a "job title" field. I dislike job titles, so refused to enter anything. Solution? I entered a non-breaking space and it passed the validation. My job title is apparently empty in my profile.

I don't claim you cannot still bypass the Unicode validation with some non-visible character, I have not tried it but I bet you could send a control character or something that would pass. That is not the point though because blank is well-defined and doesn't cover those hacks. In any case, Unicode whitespace seems to me to be the most natural modern non-visible notion to base blank on. And that is the rationale behind the current implementation.

I wanted to explain that without resorting to backwards compat, just the rationale. But of course, regardless of the reasoning, backwards compat would forbid touching the semantics of the method.

Since fast_blank is known, and its trade-offs are known, I believe we have to leave things as they are. Users that prefer the speed and semantics of fast_blank can use the gem.

@fxn fxn closed this Feb 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment