~3.5x speedup of String#blank? for empty strings · rails/rails@697384d

kaspth · 2016-04-20T14:51:35Z

BOOM 💪

DefV · 2016-04-20T14:57:12Z

jeremy · 2016-04-20T17:07:54Z

🙇

schneems · 2016-04-20T19:50:25Z

Really interesting idea. I think it's unique especially since the method is so small, but by adding non-regex logic we can get some speed bumps. Thanks for this write up.

I profiled codetriage, and it looks like the majority of calls to blank? on string are on non empty strings. This is a case where the performance depends largely on the samples. I ran some benchmarks with some randomly generated strings

require 'benchmark/ips'

EMPTYREGEX = /\A[[:space:]]*\z/

def string_generate
  str = " abcdefghijklmnopqrstuvwxyz\t".freeze
  str[rand(0..(str.length - 1))] * rand(0..23)
end

strings = 100.times.map { string_generate }

Benchmark.ips do |x|
  x.report('regexp           ') { strings.each {|str| EMPTYREGEX === str } }
  x.report('empty            ') { strings.each {|str| str.empty? || EMPTYREGEX === str } }
  x.compare!
end

# Warming up --------------------------------------
#    regexp                1.941k i/100ms
#    empty                 1.982k i/100ms
# Calculating -------------------------------------
#    regexp                19.170k (± 8.2%) i/s -     97.050k
#    empty                 18.259k (±14.6%) i/s -     91.172k

# Comparison:
#    regexp           :    19170.2 i/s
#    empty            :    18258.6 i/s - same-ish: difference falls within error

Even while "randomly" generating the strings, i'm able to tweak the numbers to favor one or the other, but both fall in the same ballpark.

I think it is a good idea, if we seperate out empty? strings earlier we can use extra logic to avoid the regex in even more places.

Running with it

Since the majority of the cases is a string that is not soely composed of whitespace characters, and the slow part of the check is the regex, we could skip the regex if we know the first character of the string is not a whitespace character.

If we did that we could see about a 2x speed boost (off of my sample set).

require 'benchmark/ips'

EMPTYREGEX = /\A[[:space:]]*\z/

def string_generate
  str = " abcdefghijklmnopqrstuvwxyz\t".freeze
  str[rand(0..(str.length - 1))] * rand(0..23)
end

strings = 100.times.map { string_generate }

SPACE_HASH = {}
SPACE_HASH[" "]  = true
SPACE_HASH["\t"] = true

Benchmark.ips do |x|
  x.report('regexp           ') { strings.each {|str| EMPTYREGEX === str } }
  x.report('empty            ') { strings.each {|str| str.empty? || EMPTYREGEX === str } }
  x.report('first char       ') { strings.each {|str| str.empty? || SPACE_HASH[str[0]] ? EMPTYREGEX === str : false } }
  x.compare!
end

# Warming up --------------------------------------
#    regexp                1.868k i/100ms
#    empty                 1.920k i/100ms
#    first char            3.423k i/100ms
# Calculating -------------------------------------
#    regexp                19.244k (± 7.5%) i/s -     97.136k
#    empty                 19.247k (± 7.3%) i/s -     97.920k
#    first char            35.327k (± 6.4%) i/s -    177.996k

# Comparison:
#    first char       :    35327.1 i/s
#    empty            :    19247.2 i/s - 1.84x slower
#    regexp           :    19243.8 i/s - 1.84x slower

Which is pretty good. Unfortunately this would mean that we need to know all the characters that ruby considers a /[[:space:]]/. I checked the source and well... https://github.com/ruby/ruby/blob/20cd25c86fd28eb1b5068d0db607e6aa33107f65/enc/unicode/name2ctype.h#L2794-L2807 I would need help translating that into ruby strings. It looks like it's worth the leg work.

Make the regex faster

If we don't want to do that we could speed up the regex string a little by using + instead of * since we know that it has at least one char (since it is not empty?). We could also reverse the logic and the regex to detect a non-whitespace character !(/\S/ === str) which would be slightly fater than +, as it would stop at the first non-whitespace character instead of continuing to iterate, however I don't know if thats a technically correct solution (i.e. is /\S/ the technical oposite of /\A[[:space:]]*\z/?

fxn · 2016-04-20T20:18:27Z

Need to digest your comment (having dinner over here!), but just a quick answer to the last question in case it helps you continue investigating.

The difference is that [[:space:]] is a Unicode character class, whereas \s is an ASCII character class so to speak (a strict subset of the former). The complement of the Unicode class would be [[:^space:]].

schneems · 2016-04-20T20:51:27Z

Thank you, please enjoy your meal :).

I see that \S would return the wrong value for "\u0085". I submitted a pull request with the new regex logic: #24658

I still think we could speed up the check with a hash of "space" characters. But we need to be careful. Even if we do that, we'll still benifit from a faster regex.

-Original file line number
+Diff line change
@@ Expand Up / @@ -112,7 +112,12 @@ class String @@
         #
         # @return [true, false]
         def blank?
-          BLANK_RE === self
+          # In practice, the majority of blank strings are empty. As of this writing
+          # checking for empty? is about 3.5x faster than matching against the regexp
+          # in MRI, so we call the predicate first, and then fallback.
+          #
+          # The penalty for blank strings with whitespace or present ones is marginal.
+          empty? || BLANK_RE === self
         end
       end
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

6 comments on commit `697384d`

kaspth commented on `697384d` Apr 20, 2016

DefV commented on `697384d` Apr 20, 2016

jeremy commented on `697384d` Apr 20, 2016

schneems commented on `697384d` Apr 20, 2016 •

edited

Loading

fxn commented on `697384d` Apr 20, 2016

schneems commented on `697384d` Apr 20, 2016

Commit

There are no files selected for viewing

6 comments on commit 697384d

kaspth commented on 697384d Apr 20, 2016

Choose a reason for hiding this comment

DefV commented on 697384d Apr 20, 2016

Choose a reason for hiding this comment

jeremy commented on 697384d Apr 20, 2016

Choose a reason for hiding this comment

schneems commented on 697384d Apr 20, 2016 • edited Loading

Choose a reason for hiding this comment

Running with it

Make the regex faster

fxn commented on 697384d Apr 20, 2016

Choose a reason for hiding this comment

schneems commented on 697384d Apr 20, 2016

Choose a reason for hiding this comment

6 comments on commit `697384d`

kaspth commented on `697384d` Apr 20, 2016

DefV commented on `697384d` Apr 20, 2016

jeremy commented on `697384d` Apr 20, 2016

schneems commented on `697384d` Apr 20, 2016 •

edited

Loading

fxn commented on `697384d` Apr 20, 2016

schneems commented on `697384d` Apr 20, 2016