Speed up String#blank? Regex #24658

schneems · 2016-04-20T20:40:57Z

Follow up on 697384d#commitcomment-17184696.

The regex to detect a blank string /\A[[:space:]]*\z/ will loop through every character in the string to ensure that all of them are a :space: type. We can invert this logic and instead look for any non-:space: characters. When that happens, we would return on the first character found and the regex engine does not need to keep looking.

Thanks @nellshamrell for the regex talk at LSRC.

By defining a "blank" string as any string that does not have a non-whitespace character (yes, double negative) we can get a substantial speed bump.

Also an inline regex is (barely) faster than a regex in a constant, since it skips the constant lookup. A regex literal is frozen by default.

require 'benchmark/ips'

def string_generate
  str = " abcdefghijklmnopqrstuvwxyz\t".freeze
  str[rand(0..(str.length - 1))] * rand(0..23)
end

strings = 100.times.map { string_generate }

ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/

Benchmark.ips do |x|
  x.report('current regex            ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } }
  x.report('not a non-whitespace char') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } }
  x.compare!
end

# Warming up --------------------------------------
# current regex
#                          1.744k i/100ms
# not a non-whitespace char
#                          2.264k i/100ms
# Calculating -------------------------------------
# current regex
#                          18.078k (± 8.9%) i/s -     90.688k
# not a non-whitespace char
#                          23.580k (± 7.1%) i/s -    117.728k

# Comparison:
# not a non-whitespace char:    23580.3 i/s
# current regex            :    18078.2 i/s - 1.30x slower

This makes the method roughly 30% faster (23.580 - 18.078)/18.078 * 100.

cc/ @fxn

@nellshamrell

Follow up on rails@697384d#commitcomment-17184696. The regex to detect a blank string `/\A[[:space:]]*\z/` will loop through every character in the string to ensure that all of them are a `:space:` type. We can invert this logic and instead look for any non-`:space:` characters. When that happens, we would return on the first character found and the regex engine does not need to keep looking. Thanks @nellshamrell for the regex talk at LSRC. By defining a "blank" string as any string that does not have a non-whitespace character (yes, double negative) we can get a substantial speed bump. Also an inline regex is (barely) faster than a regex in a constant, since it skips the constant lookup. A regex literal is frozen by default. ```ruby require 'benchmark/ips' def string_generate str = " abcdefghijklmnopqrstuvwxyz\t".freeze str[rand(0..(str.length - 1))] * rand(0..23) end strings = 100.times.map { string_generate } ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/ Benchmark.ips do |x| x.report('current regex ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } } x.report('+ instead of * ') { strings.each {|str| str.empty? || /\A[[:space:]]+\z/ === str } } x.report('not a non-whitespace char') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } } x.compare! end # Warming up -------------------------------------- # current regex # 1.744k i/100ms # not a non-whitespace char # 2.264k i/100ms # Calculating ------------------------------------- # current regex # 18.078k (± 8.9%) i/s - 90.688k # not a non-whitespace char # 23.580k (± 7.1%) i/s - 117.728k # Comparison: # not a non-whitespace char: 23580.3 i/s # current regex : 18078.2 i/s - 1.30x slower ``` This makes the method roughly 30% faster `(23.580 - 18.078)/18.078 * 100`. cc/ @fxn

See the rationale in the comment in this patch. To benchmark this I ran a number of variations, ultimately narrowing to require 'benchmark/ips' str = '' regexp = /\A[[:space:]]*\z/ Benchmark.ips do |x| x.report('regexp') { regexp === str } x.report('empty') { str.empty? || regexp === str } x.compare! end This benchmark has consistently reported speedups around 3.5x: Calculating ------------------------------------- regexp 69.197k i/100ms empty 115.468k i/100ms ------------------------------------------------- regexp 2. 6.3%) i/s - 13.839M empty 9. 8.8%) i/s - 47.804M Comparison: empty: 9642607.6 i/s regexp: 2768351.9 i/s - 3.48x slower Sometimes even reaching 4x. Running the same bechmark on strings of 10 or 100 characters (with whitespace or present) has shown a slowdown of just about 1.01/1.02. Marginal, we seem to have a worthwhile trade-off here.

tenderlove · 2016-04-20T21:20:00Z

activesupport/lib/active_support/core_ext/object/blank.rb

-    empty? || BLANK_RE === self
+    # Regex check is slow, only check non-empty strings.
+    # A string not blank if it contains a single non-space string.
+    empty? || !(/[[:^space:]]/ === self)


Can you do !~ and remove the !? Should be fewer instructions.

Oh sorry, merged without seeing this remark. Let's refine if needed!

I tried benching, and it's in the same ballpark, it keeps on ending up as slower for me. I'm not sure why

not a non-whitespace char: 21258.3 i/s !~ : 18617.7 i/s - same-ish: difference falls within error

Which is weird because it is actually fewer instructions

irb(main):162:0* puts RubyVM::InstructionSequence.disasm -> { /[[:^space:]]/ !~ str } == disasm: #<ISeq:block in irb_binding@(irb)>=========================== == catch table | catch type: redo st: 0002 ed: 0013 sp: 0000 cont: 0002 | catch type: next st: 0002 ed: 0013 sp: 0000 cont: 0013 |------------------------------------------------------------------------ 0000 trace 256 ( 162) 0002 trace 1 0004 putobject /[[:^space:]]/ 0006 putself 0007 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache> 0010 opt_send_without_block <callinfo!mid:!~, argc:1, ARGS_SIMPLE>, <callcache> 0013 trace 512 0015 leave => nil irb(main):163:0> puts RubyVM::InstructionSequence.disasm -> { !(/[[:^space:]]/ === str) } == disasm: #<ISeq:block in irb_binding@(irb)>=========================== == catch table | catch type: redo st: 0002 ed: 0016 sp: 0000 cont: 0002 | catch type: next st: 0002 ed: 0016 sp: 0000 cont: 0016 |------------------------------------------------------------------------ 0000 trace 256 ( 163) 0002 trace 1 0004 putobject /[[:^space:]]/ 0006 putself 0007 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache> 0010 opt_send_without_block <callinfo!mid:===, argc:1, ARGS_SIMPLE>, <callcache> 0013 opt_not <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache> 0016 trace 512 0018 leave => nil irb(main):164:0>

fxn · 2016-04-20T21:30:41Z

Interesting.

In theory both versions could be equally performant. Let's take a present string: In the current regexp, the regexp engine should be able to halt and return false in the leftmost non-space character because of the \A anchor. The alternative does exactly the same, iterating left to right and halt as soon as you find a non-space character.

Then, for blank strings with whitespace both regexps need to exhaust the string to determine all are spaces, or that no non-space was found.

On the other hand the [[:space:]] class is way much smaller than its complement (a handful of characters versus thousands and thousands). But again, the regexp engine would probably optimize that behind the scenes to check membership in the most efficient way, negating if needed.

But, the measures show in practice there is a difference, and I wonder if the quantifier explains it. That is, the work the engine does if a quantifier is involved is maybe more complicated internally than the really straightforward test in this patch. For starters it needs possibly to maintain backtracking points.

In! 😄 🚀

SamSaffron · 2016-04-20T21:49:19Z

or just bundle fast_blank or simply add a comment to the generated config :)

https://github.com/SamSaffron/fast_blank

tenderlove · 2016-04-20T22:04:42Z

@fxn I think the greedy operator (*) in this case will start from the far right side and backtrack (IIRC non-greedy stars are Kleene stars). I think that as the size of the string grows, /\A[[:space:]]*\z/ will get slower than /[[:^space:]]/ due to the greediness.

schneems · 2016-04-20T22:05:04Z

@SamSaffron and @tenderlove while you're here 😉 i've got a question. If we can generate a list of all :space: strings in a hash, then I think we can double current AS String#blank? speeds. I talked about that a bit more in my linked comment.

To do that we'll need to decode this list and represent them in Ruby (https://github.com/ruby/ruby/blob/20cd25c86fd28eb1b5068d0db607e6aa33107f65/enc/unicode/name2ctype.h#L2794-L2807.) any tips on the best way to do that?

SamSaffron · 2016-04-20T22:11:00Z

Have you seen:

https://github.com/SamSaffron/fast_blank/blob/master/ext/fast_blank/fast_blank.c#L29-L79

and

https://github.com/SamSaffron/fast_blank/blob/master/spec/fast_blank_spec.rb#L17-L32

schneems · 2016-04-20T22:13:53Z

I have not, thanks 👏 😄

tenderlove · 2016-04-20T22:14:08Z

@schneems next time please don't use random length strings in benchmarks. It makes the test data non-portable and unpredictable. It's OK to generate random strings, but do it once and throw it in the DATA section at the end of the script. 🙇

SamSaffron · 2016-04-20T22:45:57Z

@schneems I recall trying to optimise this regex in the past, you make one change that improves perf for one class of strings and unfortunately some other string is slower. That is what triggered me to make fast_blank.

Ruby core have no plans to add a blank protocol and lack of parity with strip has always bugged me quite a lot. Why can the same string be both blank and not length 0 when stripped? its just weird.

That stuff triggered me to end up writing fast blank, it is used in production at both GitHub and Discourse. fast blanks biggest problem is that people are not aware of it, in many apps it ends up giving you a 5% bump, especially for people that are heavy on the if person.name.present? checks which Rails devs appear to be in ❤️ with.

I feel like the best approach is simply to plug that this gem exists (and fast_xs for that matter) in the default generated Gemfile, but really not my call, it is just frustrating sometimes that "Rails... but fast" it a TOP SECRET, that is stored in @tmm1's brains and a few other members of the illuminati.

SamSaffron · 2016-04-20T22:50:21Z

Long term probably the best thing to do is lobby Ruby to get

#strip and family corrected/improved to handle unicode blanks AND pull in String#blank (not the entire blank protocol)

That would be the best thing here.

rafaelfranca · 2016-04-20T22:57:54Z

To add more people to that list Shopify also use fast_blank in production.

fxn · 2016-04-20T23:07:49Z

@tenderlove AFAIK it still goes left-to-right so you can fail fast. That is, the engine does not blindly go to the far right, but just matches as much as it can (as long as characters match).

So, when the quantifier is greedy you first eat as much as possible matching, and then check if the rest of the regexp matches. If not, backtrack. When the quantifier is not greedy, you match as less as possible, and if the rest matches done, otherwise advance.

In theory, though, if you have a bunch of whitespace and there is a non-space character, the engine reaches it and then backtracks only to fail, because there's no way to match \z in such situation. Perl's for example does backtrack, as this example shows (in Perl regexps (?{ ... }) executes Perl code when the engine reaches that point):

fxn@yeager:~ $ perl -le '"    foo" =~ /\A[[:space:]]*(?{print length $&})\z/'
4
3
2
1
0

That would certainly explain the difference in performance, since the vanilla character class has no backtracking going on.

SamSaffron · 2016-04-20T23:33:33Z

I opened this issue to see if we can get this implemented in Ruby

https://bugs.ruby-lang.org/issues/12306

Follow up to rails#24658.

SamSaffron · 2016-04-21T02:31:20Z

@schneems @fxn @tenderlove help convince Matz that there is real world use of String#blank? and we want it to be a thing on https://bugs.ruby-lang.org/issues/12306

nellshamrell · 2016-04-21T19:27:44Z

So, so, so glad the "Beneath the Surface: Regular Expressions in Ruby" was helpful!

fxn · 2016-04-21T19:56:18Z

I have benchmarked both regexps against a non-blank string of length 1:

require 'benchmark/ips'

str      = 'a'
positive = /\A[[:space:]]*\z/
negative = /[[:^space:]]/

Benchmark.ips do |x|
  x.report('positive') { str =~ positive }
  x.report('negative') { str !~ negative }
  x.compare!
end

There is no significant backtracking going on because \A[[:space:]]* only matches at position 0. Then \z fails, done. The result is still slower in magnitudes similar to those found by @schneems:

fxn@yeager:~/tmp $ ruby foo.rb
Calculating -------------------------------------
            positive    59.972k i/100ms
            negative    62.500k i/100ms
-------------------------------------------------
            positive      1.915M (± 5.7%) i/s -      9.596M
            negative      2.456M (± 6.6%) i/s -     12.250M

Comparison:
            negative:  2456491.7 i/s
            positive:  1915258.1 i/s - 1.28x slower

I changed the quantifier to + to skip even that null match, and still seeing 1.30x factors.

So, albeit when the string has a large prefix with whitespace backtracking could have perhaps a cost (speculation), these tests seem to indicate it does not per se explain that 30%.

My hunch is that the engine just does more work due to the complication in the regexp, whereas the other one is very straightforward (but don't really know it).

Further investigation seems to disprove that backtracking is the reason why the positive variant is slower, see #24658 (comment) so, just say nothing about it, only assert it is slower.

fxn · 2016-04-22T09:40:06Z

@k-takata, just for curiosity, could you shed some light about what really explains the difference in performance seen in the benchmark in the previous comment? That would be awesome to understand!

nurse · 2016-04-24T12:47:44Z

=~ is faster than === for Regexp match because it is optimized into opt_regexpmatch1 in YARV.

require 'benchmark/ips'

def string_generate
  str = " abcdefghijklmnopqrstuvwxyz\t".freeze
  str[rand(0..(str.length - 1))] * rand(0..23)
end

strings = 100.times.map { string_generate }

ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/
NON_SPACE = /[[:^space:]]/

Benchmark.ips do |x|
  x.report('old regex            ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } }
  x.report('current regex constant ===') { strings.each {|str| str.empty? || !(NON_SPACE === str) } }
  x.report('current regex constant =~') { strings.each {|str| str.empty? || !(NON_SPACE =~ str) } }
  x.report('current regex literal ===') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } }
  x.report('current regex literal =~') { strings.each {|str| str.empty? || !(/[[:^space:]]/ =~ str) } }
  x.report('current regex literal !~') { strings.each {|str| str.empty? || (/[[:^space:]]/ !~ str) } }
  x.compare!
end

Warming up --------------------------------------
old regex
                         1.825k i/100ms
current regex constant ===
                         2.065k i/100ms
current regex constant =~
                         1.829k i/100ms
current regex literal ===
                         1.914k i/100ms
current regex literal =~
                         2.239k i/100ms
current regex literal !~
                         1.880k i/100ms
Calculating -------------------------------------
old regex
                         17.430k (±16.3%) i/s -     85.775k in   5.132384s
current regex constant ===
                         20.696k (±17.3%) i/s -     99.120k in   5.009458s
current regex constant =~
                         20.835k (±14.3%) i/s -    102.424k in   5.053427s
current regex literal ===
                         19.547k (±14.5%) i/s -     95.700k in   5.022457s
current regex literal =~
                         21.587k (±19.8%) i/s -    102.994k in   5.064171s
current regex literal !~
                         18.080k (±18.7%) i/s -     86.480k in   5.023627s

Comparison:
current regex literal =~:    21587.0 i/s
current regex constant =~:    20835.1 i/s - same-ish: difference falls within error
current regex constant ===:    20696.1 i/s - same-ish: difference falls within error
current regex literal ===:    19547.0 i/s - same-ish: difference falls within error
current regex literal !~:    18079.8 i/s - same-ish: difference falls within error
old regex            :    17429.8 i/s - same-ish: difference falls within error

% ruby -e'$><<RubyVM::InstructionSequence.new(p %q{ !(/[[:^space:]]*/ === str) }).disasm'
" !(/[[:^space:]]*/ === str) "
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putobject        /[[:^space:]]*/
0004 putself
0005 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0008 opt_send_without_block <callinfo!mid:===, argc:1, ARGS_SIMPLE>, <callcache>
0011 opt_not          <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache>
0014 leave
% ruby -e'$><<RubyVM::InstructionSequence.new(p %q{ /[[:^space:]]*/ !~ str }).disasm'
" /[[:^space:]]*/ !~ str "
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putobject        /[[:^space:]]*/
0004 putself
0005 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0008 opt_send_without_block <callinfo!mid:!~, argc:1, ARGS_SIMPLE>, <callcache>
0011 leave
% ruby -e'$><<RubyVM::InstructionSequence.new(p %q{ !(/[[:^space:]]*/ =~ str) }).disasm'
" !(/[[:^space:]]*/ =~ str) "
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putself
0003 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0006 opt_regexpmatch1 /[[:^space:]]*/
0008 opt_not          <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache>
0011 leave

SamSaffron · 2016-04-27T07:43:28Z

I would strongly advise caution here, @schneems it looks like the change you added actually makes stuff significantly slower for longer strings. I added a more updated bench here:

https://github.com/SamSaffron/fast_blank/blob/master/benchmark

In particular you got to benchmark for strings of various length, my bench does 0,6,14,24,136 ... you probably want to add a super long string as well to ensure there is no pathological case

fast_blank remains significantly faster (except for the one tiny case that is shortcutted using empty? which it is only 10% faster.)

cc @nurse

schneems · 2016-04-27T15:19:49Z

On my machine i'm still seeing that the new method is an improvement over the old

require 'benchmark/ips'

LONG_STRING = "   this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test"


Benchmark.ips do |x|
  x.report('old') { LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING }
  x.report('new') { LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING) }
  x.compare!
end


# Warming up --------------------------------------
#                  old   124.610k i/100ms
#                  new   154.021k i/100ms
# Calculating -------------------------------------
#                  old      1.899M (± 7.2%) i/s -      9.470M in   5.014725s
#                  new      2.674M (± 9.4%) i/s -     13.400M in   5.070622s

# Comparison:
#                  new:  2673836.1 i/s
#                  old:  1899274.9 i/s - 1.41x slower

It's actually slightly faster with the longer string my original test with strings of variable length up to 80 characters.

schneems · 2016-04-27T15:24:03Z

Btw I'm in favor of adding this gem to rails but I think we need to work with jruby out of the box too.

Ideally we could get this in MRI proper.

SamSaffron · 2016-04-27T20:56:22Z

@schneems I am not sure IPS is being invoked right there, you should use the while loop

 x.report("New Slow Blank") do |times|
      i = 0
      while i < times
        s.new_slow_blank?
        i += 1
      end
    end

What Ruby version are you on?

schneems · 2016-04-27T21:49:06Z

Ruby 2.3.1

irb(main):031:0* Benchmark.ips do |x|
irb(main):032:1*   x.report('old') { |t| t.times.each { LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING } }
irb(main):033:1>   x.report('new') { |t| t.times.each { LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING) } }
irb(main):034:1>   x.compare!
irb(main):035:1> end
Warming up --------------------------------------
                 old
    71.399k i/100ms
                 new    81.750k i/100ms
Calculating -------------------------------------
                 old      1.946M (±11.0%) i/s -      9.639M in   5.019133s
                 new      2.828M (±12.0%) i/s -     13.979M in   5.021530s

Comparison:
                 new:  2827559.9 i/s
                 old:  1945769.3 i/s - 1.45x slower

SamSaffron · 2016-04-27T22:56:30Z

wow... Ruby performance is hard, will ping @ko1 on this cause it is super confusing

https://gist.github.com/SamSaffron/d1a9cc8e141e7415e06306369fdedfe5

SamSaffron · 2016-04-27T23:58:42Z

OK I think I know why this is slower in the less artificial benchmark

New regex allocates more stuff, a lot more stuff.

https://gist.github.com/SamSaffron/f73fd0395e050e927d1a3137373eeaee

449 bytes in new version 169 in old version, once extracted into a method regex magic is causing less stuff to be reused in globals

nurse · 2016-04-28T07:19:26Z

@schneems @SamSaffron t.times.each is slow because block invocation is slow, and it disables Integer#times' optimization. Use while loop instead for such micro benchmark.

schneems · 2016-04-28T13:55:13Z

This is by far the most comments i've ever gotten on a 1 line code change. This is all pretty weird.

I re-tried with the while method of looping and get similar results

require 'benchmark/ips'

LONG_STRING = "   this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test"


Benchmark.ips do |x|
  x.report('old') do |times| 
    i = 0
    while i < times
      LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING
      i += 1
    end
  end
  x.report('new') do |times| 
    i = 0
    while i < times
      LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING)
      i += 1
    end
  end
  x.compare!
end
# Warming up --------------------------------------
#                  old   124.919k i/100ms
#                  new   149.075k i/100ms
# Calculating -------------------------------------
#                  old      2.258M (± 7.0%) i/s -     11.243M in   5.006437s
#                  new      3.200M (±10.1%) i/s -     15.951M in   5.044622s

# Comparison:
#                  new:  3199827.3 i/s
#                  old:  2257599.3 i/s - 1.42x slower

However when I put those regexes into a method I do see the slow down and the "new" method is 2x slower than the old method. I was under the impression that regex literals are essentially frozen. Assigning it to a constant doesn't help.

It looks like the memory use is larger because a character is matched and a MatchData object is created, if you make the string empty LONG_STRING = " " then the old regex will generate and return a match which will allocate a MatchData.

When I repeated the benchmark with the regular expressions in methods with a long string that matches the old regex, they are roughly the same speed.

I'm a bit more confused now than I was before.

schneems · 2016-04-28T14:05:56Z

Here are my two benchmarks side by side https://gist.github.com/schneems/330efedbe310e59ad2f0f3e35358d3d5

fxn · 2016-04-28T14:08:32Z

To add more entropy to the party, this new regexp is basically a revert of @tmm1's 30ba7ee (except for the operator and the negation of the character class) 😄.

SamSaffron · 2016-04-29T06:30:05Z

@schneems the results make sense this all boils down to yet another issue that has been open for over 3 years in Ruby core

https://bugs.ruby-lang.org/issues/8110

basically onigaruma supports a no backref mode but there is no way to invoke it from Ruby. carrying around the backrefs and the other mountains of globals for EVERY regex makes it impossible to write super fast regular expressions in Ruby. I am willing to bet that Ruby has some optimisations to maintain some of the allocated stuff when invoked in a loop and drops some reuse when methods are invoked.

Hopefully @matz can accept some mechanism in Ruby to tap onigaruma without backrefs and regexes without globals.

schneems · 2016-04-29T13:50:46Z

I'm still confused about why this only happens in a method? You would think we would be creating all those globals when we're at the main context just as when we're in a method. You would also think that in both cases were using a regex. The old regex on the surface looks more complicated than the new regex, so while I understand that there is matchdata created, it doesn't fully explain the speed difference. I tried to make a case where the "old" method is slower and...

I tried setting LONG_STRING to something that would match the "old" string and not the new string, you would expect that to create MatchData and be equally slow but the two ended up roughly the same speeds https://gist.github.com/schneems/0c9b81e1542f457092c8bdaeacb01312

It looks like we should revert my change for speed, i'm happy to do that. But I do want to fully understand the minutiae here for future optimizations.

fxn · 2016-04-29T14:26:08Z

Agree, this is the kind of thing that needs a clear benefit. We could perhaps revert and still be open to revisit if further research shows an undebatable speedup.

@SamSaffron regarding your gem, I personally use it in my projects. But to include it in the generated Gemfile, even commented out, is a different story.

One minor point that I am sure we could solve somehow is that we need blank_as? as blank? transparently.

The more fundamental con, however, is that we would be committing to a substitute outside the realm of the project. Separate repos, separate test suites... I think everything is more ordered if people just opt-in by themselves.

And in any case, it is our duty to provide the best pure Ruby implementation we can think of regardless.

This commit undoes 54243fe. Reason: Further investigation has shown the benefit is not so clear generally speaking. There is a long discussion and several benchmarks in the PR #24658 if you are interested in the details.

SamSaffron · 2016-05-02T00:38:04Z

I think everything is more ordered if people just opt-in by themselves.

I totally understand the sentiment, but the same argument can apply to uglifyjs or coffeescript or mime types. Plenty of dependencies come from the outside, as it stands we are giving an edge to the blessed few that know about fast_xs and fast_blank. This knowledge is tribal there is no way to get the word out.

That said, I totally respect whatever is decided here, ultimately @ko1 agrees with me that we need a mechanism to tap onigaruma without forcing backref's and globals for every regex test, once that is implemented we could achieve significant perf improvements to this blank regex and many other spots in rails.

nurse · 2016-05-24T13:11:31Z

Anyway your discussion seems to be derived from tmm1's profile.
As far as it is still true now, implementing String#present? with regexp instead of calling String#blank? can speed up by one method call without side effect.

See rails/rails#24658 (comment)

maclover7 added the activesupport label Apr 20, 2016

tenderlove reviewed Apr 20, 2016
View reviewed changes

fxn merged commit 115efeb into rails:master Apr 20, 2016

kamipo added a commit to kamipo/rails that referenced this pull request Apr 21, 2016

Remove unused BLANK_RE

c7617c7

Follow up to rails#24658.

kamipo mentioned this pull request Apr 21, 2016

Remove unused BLANK_RE #24663

Merged

schneems referenced this pull request Apr 28, 2016

Expand double-negative in String#blank? regex

30ba7ee

lbennett-stacki pushed a commit to gitlabhq/gitlabhq that referenced this pull request Jan 18, 2018

Use the fast_blank gem

70ac55b

See rails/rails#24658 (comment)

Speed up String#blank? Regex #24658

Speed up String#blank? Regex #24658

Uh oh!

Conversation

schneems commented Apr 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenderlove Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

fxn Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

schneems Apr 20, 2016

Choose a reason for hiding this comment

Uh oh!

fxn commented Apr 20, 2016

Uh oh!

SamSaffron commented Apr 20, 2016

Uh oh!

tenderlove commented Apr 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schneems commented Apr 20, 2016

Uh oh!

SamSaffron commented Apr 20, 2016

Uh oh!

schneems commented Apr 20, 2016

Uh oh!

tenderlove commented Apr 20, 2016

Uh oh!

SamSaffron commented Apr 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamSaffron commented Apr 20, 2016

Uh oh!

rafaelfranca commented Apr 20, 2016

Uh oh!

fxn commented Apr 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamSaffron commented Apr 20, 2016

Uh oh!

SamSaffron commented Apr 21, 2016

Uh oh!

nellshamrell commented Apr 21, 2016

Uh oh!

fxn commented Apr 21, 2016

Uh oh!

fxn commented Apr 22, 2016

Uh oh!

nurse commented Apr 24, 2016

Uh oh!

SamSaffron commented Apr 27, 2016

Uh oh!

schneems commented Apr 27, 2016

Uh oh!

schneems commented Apr 27, 2016

Uh oh!

SamSaffron commented Apr 27, 2016

Uh oh!

schneems commented Apr 27, 2016

Uh oh!

SamSaffron commented Apr 27, 2016

Uh oh!

SamSaffron commented Apr 27, 2016

Uh oh!

nurse commented Apr 28, 2016

Uh oh!

schneems commented Apr 28, 2016

Uh oh!

schneems commented Apr 28, 2016

Uh oh!

fxn commented Apr 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamSaffron commented Apr 29, 2016

Uh oh!

schneems commented Apr 29, 2016

Uh oh!

schneems commented Apr 20, 2016 •

edited

Loading

tenderlove commented Apr 20, 2016 •

edited

Loading

SamSaffron commented Apr 20, 2016 •

edited

Loading

fxn commented Apr 20, 2016 •

edited

Loading

fxn commented Apr 28, 2016 •

edited

Loading

fxn commented Apr 29, 2016 •

edited

Loading

nurse commented May 24, 2016 •

edited

Loading