Speed up String#blank? Regex #24658

Merged
merged 1 commit into from Apr 20, 2016

Conversation

Projects
None yet
8 participants
@schneems
Member

schneems commented Apr 20, 2016

Follow up on 697384d#commitcomment-17184696.

The regex to detect a blank string /\A[[:space:]]*\z/ will loop through every character in the string to ensure that all of them are a :space: type. We can invert this logic and instead look for any non-:space: characters. When that happens, we would return on the first character found and the regex engine does not need to keep looking.

Thanks @nellshamrell for the regex talk at LSRC.

By defining a "blank" string as any string that does not have a non-whitespace character (yes, double negative) we can get a substantial speed bump.

Also an inline regex is (barely) faster than a regex in a constant, since it skips the constant lookup. A regex literal is frozen by default.

require 'benchmark/ips'

def string_generate
  str = " abcdefghijklmnopqrstuvwxyz\t".freeze
  str[rand(0..(str.length - 1))] * rand(0..23)
end

strings = 100.times.map { string_generate }

ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/

Benchmark.ips do |x|
  x.report('current regex            ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } }
  x.report('not a non-whitespace char') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } }
  x.compare!
end

# Warming up --------------------------------------
# current regex
#                          1.744k i/100ms
# not a non-whitespace char
#                          2.264k i/100ms
# Calculating -------------------------------------
# current regex
#                          18.078k (± 8.9%) i/s -     90.688k
# not a non-whitespace char
#                          23.580k (± 7.1%) i/s -    117.728k

# Comparison:
# not a non-whitespace char:    23580.3 i/s
# current regex            :    18078.2 i/s - 1.30x slower

This makes the method roughly 30% faster (23.580 - 18.078)/18.078 * 100.

cc/ @fxn

Speed up String#blank? Regex
Follow up on 697384d#commitcomment-17184696.

The regex to detect a blank string `/\A[[:space:]]*\z/` will loop through every character in the string to ensure that all of them are a `:space:` type. We can invert this logic and instead look for any non-`:space:` characters. When that happens, we would return on the first character found and the regex engine does not need to keep looking.

Thanks @nellshamrell for the regex talk at LSRC.

By defining a "blank" string as any string that does not have a non-whitespace character (yes, double negative) we can get a substantial speed bump.

Also an inline regex is (barely) faster than a regex in a constant, since it skips the constant lookup. A regex literal is frozen by default.

```ruby
require 'benchmark/ips'

def string_generate
  str = " abcdefghijklmnopqrstuvwxyz\t".freeze
  str[rand(0..(str.length - 1))] * rand(0..23)
end

strings = 100.times.map { string_generate }

ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/

Benchmark.ips do |x|
  x.report('current regex            ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } }
  x.report('+ instead of *           ') { strings.each {|str| str.empty? || /\A[[:space:]]+\z/ === str } }
  x.report('not a non-whitespace char') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } }
  x.compare!
end

# Warming up --------------------------------------
# current regex
#                          1.744k i/100ms
# not a non-whitespace char
#                          2.264k i/100ms
# Calculating -------------------------------------
# current regex
#                          18.078k (± 8.9%) i/s -     90.688k
# not a non-whitespace char
#                          23.580k (± 7.1%) i/s -    117.728k

# Comparison:
# not a non-whitespace char:    23580.3 i/s
# current regex            :    18078.2 i/s - 1.30x slower
```

This makes the method roughly 30% faster `(23.580 - 18.078)/18.078 * 100`.

cc/ @fxn

schneems referenced this pull request Apr 20, 2016

~3.5x speedup of String#blank? for empty strings
See the rationale in the comment in this patch.

To benchmark this I ran a number of variations, ultimately narrowing to

    require 'benchmark/ips'

    str = ''
    regexp = /\A[[:space:]]*\z/

    Benchmark.ips do |x|
      x.report('regexp') { regexp === str }
      x.report('empty')  { str.empty? || regexp === str }
      x.compare!
    end

This benchmark has consistently reported speedups around 3.5x:

    Calculating -------------------------------------
                  regexp    69.197k i/100ms
                   empty   115.468k i/100ms
    -------------------------------------------------
                  regexp      2. 6.3%) i/s -     13.839M
                   empty      9. 8.8%) i/s -     47.804M

    Comparison:
                   empty:  9642607.6 i/s
                  regexp:  2768351.9 i/s - 3.48x slower

Sometimes even reaching 4x.

Running the same bechmark on strings of 10 or 100 characters (with
whitespace or present) has shown a slowdown of just about 1.01/1.02.
Marginal, we seem to have a worthwhile trade-off here.
- empty? || BLANK_RE === self
+ # Regex check is slow, only check non-empty strings.
+ # A string not blank if it contains a single non-space string.
+ empty? || !(/[[:^space:]]/ === self)

This comment has been minimized.

@tenderlove

tenderlove Apr 20, 2016

Member

Can you do !~ and remove the !? Should be fewer instructions.

@tenderlove

tenderlove Apr 20, 2016

Member

Can you do !~ and remove the !? Should be fewer instructions.

This comment has been minimized.

@fxn

fxn Apr 20, 2016

Member

Oh sorry, merged without seeing this remark. Let's refine if needed!

@fxn

fxn Apr 20, 2016

Member

Oh sorry, merged without seeing this remark. Let's refine if needed!

This comment has been minimized.

@schneems

schneems Apr 20, 2016

Member

I tried benching, and it's in the same ballpark, it keeps on ending up as slower for me. I'm not sure why

not a non-whitespace char:    21258.3 i/s
!~                       :    18617.7 i/s - same-ish: difference falls within error

Which is weird because it is actually fewer instructions

irb(main):162:0* puts RubyVM::InstructionSequence.disasm -> { /[[:^space:]]/ !~ str }
== disasm: #<ISeq:block in irb_binding@(irb)>===========================
== catch table
| catch type: redo   st: 0002 ed: 0013 sp: 0000 cont: 0002
| catch type: next   st: 0002 ed: 0013 sp: 0000 cont: 0013
|------------------------------------------------------------------------
0000 trace            256                                             ( 162)
0002 trace            1
0004 putobject        /[[:^space:]]/
0006 putself
0007 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0010 opt_send_without_block <callinfo!mid:!~, argc:1, ARGS_SIMPLE>, <callcache>
0013 trace            512
0015 leave
=> nil
irb(main):163:0> puts RubyVM::InstructionSequence.disasm -> { !(/[[:^space:]]/ === str) }
== disasm: #<ISeq:block in irb_binding@(irb)>===========================
== catch table
| catch type: redo   st: 0002 ed: 0016 sp: 0000 cont: 0002
| catch type: next   st: 0002 ed: 0016 sp: 0000 cont: 0016
|------------------------------------------------------------------------
0000 trace            256                                             ( 163)
0002 trace            1
0004 putobject        /[[:^space:]]/
0006 putself
0007 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0010 opt_send_without_block <callinfo!mid:===, argc:1, ARGS_SIMPLE>, <callcache>
0013 opt_not          <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache>
0016 trace            512
0018 leave
=> nil
irb(main):164:0>
@schneems

schneems Apr 20, 2016

Member

I tried benching, and it's in the same ballpark, it keeps on ending up as slower for me. I'm not sure why

not a non-whitespace char:    21258.3 i/s
!~                       :    18617.7 i/s - same-ish: difference falls within error

Which is weird because it is actually fewer instructions

irb(main):162:0* puts RubyVM::InstructionSequence.disasm -> { /[[:^space:]]/ !~ str }
== disasm: #<ISeq:block in irb_binding@(irb)>===========================
== catch table
| catch type: redo   st: 0002 ed: 0013 sp: 0000 cont: 0002
| catch type: next   st: 0002 ed: 0013 sp: 0000 cont: 0013
|------------------------------------------------------------------------
0000 trace            256                                             ( 162)
0002 trace            1
0004 putobject        /[[:^space:]]/
0006 putself
0007 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0010 opt_send_without_block <callinfo!mid:!~, argc:1, ARGS_SIMPLE>, <callcache>
0013 trace            512
0015 leave
=> nil
irb(main):163:0> puts RubyVM::InstructionSequence.disasm -> { !(/[[:^space:]]/ === str) }
== disasm: #<ISeq:block in irb_binding@(irb)>===========================
== catch table
| catch type: redo   st: 0002 ed: 0016 sp: 0000 cont: 0002
| catch type: next   st: 0002 ed: 0016 sp: 0000 cont: 0016
|------------------------------------------------------------------------
0000 trace            256                                             ( 163)
0002 trace            1
0004 putobject        /[[:^space:]]/
0006 putself
0007 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0010 opt_send_without_block <callinfo!mid:===, argc:1, ARGS_SIMPLE>, <callcache>
0013 opt_not          <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache>
0016 trace            512
0018 leave
=> nil
irb(main):164:0>
@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Apr 20, 2016

Member

Interesting.

In theory both versions could be equally performant. Let's take a present string: In the current regexp, the regexp engine should be able to halt and return false in the leftmost non-space character because of the \A anchor. The alternative does exactly the same, iterating left to right and halt as soon as you find a non-space character.

Then, for blank strings with whitespace both regexps need to exhaust the string to determine all are spaces, or that no non-space was found.

On the other hand the [[:space:]] class is way much smaller than its complement (a handful of characters versus thousands and thousands). But again, the regexp engine would probably optimize that behind the scenes to check membership in the most efficient way, negating if needed.

But, the measures show in practice there is a difference, and I wonder if the quantifier explains it. That is, the work the engine does if a quantifier is involved is maybe more complicated internally than the really straightforward test in this patch. For starters it needs possibly to maintain backtracking points.

In! 😄 🚀

Member

fxn commented Apr 20, 2016

Interesting.

In theory both versions could be equally performant. Let's take a present string: In the current regexp, the regexp engine should be able to halt and return false in the leftmost non-space character because of the \A anchor. The alternative does exactly the same, iterating left to right and halt as soon as you find a non-space character.

Then, for blank strings with whitespace both regexps need to exhaust the string to determine all are spaces, or that no non-space was found.

On the other hand the [[:space:]] class is way much smaller than its complement (a handful of characters versus thousands and thousands). But again, the regexp engine would probably optimize that behind the scenes to check membership in the most efficient way, negating if needed.

But, the measures show in practice there is a difference, and I wonder if the quantifier explains it. That is, the work the engine does if a quantifier is involved is maybe more complicated internally than the really straightforward test in this patch. For starters it needs possibly to maintain backtracking points.

In! 😄 🚀

@fxn fxn merged commit 115efeb into rails:master Apr 20, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 20, 2016

Contributor

or just bundle fast_blank or simply add a comment to the generated config :)

https://github.com/SamSaffron/fast_blank

Contributor

SamSaffron commented Apr 20, 2016

or just bundle fast_blank or simply add a comment to the generated config :)

https://github.com/SamSaffron/fast_blank

@tenderlove

This comment has been minimized.

Show comment
Hide comment
@tenderlove

tenderlove Apr 20, 2016

Member

@fxn I think the greedy operator (*) in this case will start from the far right side and backtrack (IIRC non-greedy stars are Kleene stars). I think that as the size of the string grows, /\A[[:space:]]*\z/ will get slower than /[[:^space:]]/ due to the greediness.

Member

tenderlove commented Apr 20, 2016

@fxn I think the greedy operator (*) in this case will start from the far right side and backtrack (IIRC non-greedy stars are Kleene stars). I think that as the size of the string grows, /\A[[:space:]]*\z/ will get slower than /[[:^space:]]/ due to the greediness.

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Apr 20, 2016

Member

@SamSaffron and @tenderlove while you're here 😉 i've got a question. If we can generate a list of all :space: strings in a hash, then I think we can double current AS String#blank? speeds. I talked about that a bit more in my linked comment.

To do that we'll need to decode this list and represent them in Ruby (https://github.com/ruby/ruby/blob/20cd25c86fd28eb1b5068d0db607e6aa33107f65/enc/unicode/name2ctype.h#L2794-L2807.) any tips on the best way to do that?

Member

schneems commented Apr 20, 2016

@SamSaffron and @tenderlove while you're here 😉 i've got a question. If we can generate a list of all :space: strings in a hash, then I think we can double current AS String#blank? speeds. I talked about that a bit more in my linked comment.

To do that we'll need to decode this list and represent them in Ruby (https://github.com/ruby/ruby/blob/20cd25c86fd28eb1b5068d0db607e6aa33107f65/enc/unicode/name2ctype.h#L2794-L2807.) any tips on the best way to do that?

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Apr 20, 2016

Member

I have not, thanks 👏 😄

Member

schneems commented Apr 20, 2016

I have not, thanks 👏 😄

@tenderlove

This comment has been minimized.

Show comment
Hide comment
@tenderlove

tenderlove Apr 20, 2016

Member

@schneems next time please don't use random length strings in benchmarks. It makes the test data non-portable and unpredictable. It's OK to generate random strings, but do it once and throw it in the DATA section at the end of the script. 🙇

Member

tenderlove commented Apr 20, 2016

@schneems next time please don't use random length strings in benchmarks. It makes the test data non-portable and unpredictable. It's OK to generate random strings, but do it once and throw it in the DATA section at the end of the script. 🙇

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 20, 2016

Contributor

@schneems I recall trying to optimise this regex in the past, you make one change that improves perf for one class of strings and unfortunately some other string is slower. That is what triggered me to make fast_blank.

Ruby core have no plans to add a blank protocol and lack of parity with strip has always bugged me quite a lot. Why can the same string be both blank and not length 0 when stripped? its just weird.

That stuff triggered me to end up writing fast blank, it is used in production at both GitHub and Discourse. fast blanks biggest problem is that people are not aware of it, in many apps it ends up giving you a 5% bump, especially for people that are heavy on the if person.name.present? checks which Rails devs appear to be in ❤️ with.

I feel like the best approach is simply to plug that this gem exists (and fast_xs for that matter) in the default generated Gemfile, but really not my call, it is just frustrating sometimes that "Rails... but fast" it a TOP SECRET, that is stored in @tmm1's brains and a few other members of the illuminati.

Contributor

SamSaffron commented Apr 20, 2016

@schneems I recall trying to optimise this regex in the past, you make one change that improves perf for one class of strings and unfortunately some other string is slower. That is what triggered me to make fast_blank.

Ruby core have no plans to add a blank protocol and lack of parity with strip has always bugged me quite a lot. Why can the same string be both blank and not length 0 when stripped? its just weird.

That stuff triggered me to end up writing fast blank, it is used in production at both GitHub and Discourse. fast blanks biggest problem is that people are not aware of it, in many apps it ends up giving you a 5% bump, especially for people that are heavy on the if person.name.present? checks which Rails devs appear to be in ❤️ with.

I feel like the best approach is simply to plug that this gem exists (and fast_xs for that matter) in the default generated Gemfile, but really not my call, it is just frustrating sometimes that "Rails... but fast" it a TOP SECRET, that is stored in @tmm1's brains and a few other members of the illuminati.

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 20, 2016

Contributor

Long term probably the best thing to do is lobby Ruby to get

#strip and family corrected/improved to handle unicode blanks AND pull in String#blank (not the entire blank protocol)

That would be the best thing here.

Contributor

SamSaffron commented Apr 20, 2016

Long term probably the best thing to do is lobby Ruby to get

#strip and family corrected/improved to handle unicode blanks AND pull in String#blank (not the entire blank protocol)

That would be the best thing here.

@rafaelfranca

This comment has been minimized.

Show comment
Hide comment
@rafaelfranca

rafaelfranca Apr 20, 2016

Member

To add more people to that list Shopify also use fast_blank in production.

Member

rafaelfranca commented Apr 20, 2016

To add more people to that list Shopify also use fast_blank in production.

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Apr 20, 2016

Member

@tenderlove AFAIK it still goes left-to-right so you can fail fast. That is, the engine does not blindly go to the far right, but just matches as much as it can (as long as characters match).

So, when the quantifier is greedy you first eat as much as possible matching, and then check if the rest of the regexp matches. If not, backtrack. When the quantifier is not greedy, you match as less as possible, and if the rest matches done, otherwise advance.

In theory, though, if you have a bunch of whitespace and there is a non-space character, the engine reaches it and then backtracks only to fail, because there's no way to match \z in such situation. Perl's for example does backtrack, as this example shows (in Perl regexps (?{ ... }) executes Perl code when the engine reaches that point):

fxn@yeager:~ $ perl -le '"    foo" =~ /\A[[:space:]]*(?{print length $&})\z/'
4
3
2
1
0

That would certainly explain the difference in performance, since the vanilla character class has no backtracking going on.

Member

fxn commented Apr 20, 2016

@tenderlove AFAIK it still goes left-to-right so you can fail fast. That is, the engine does not blindly go to the far right, but just matches as much as it can (as long as characters match).

So, when the quantifier is greedy you first eat as much as possible matching, and then check if the rest of the regexp matches. If not, backtrack. When the quantifier is not greedy, you match as less as possible, and if the rest matches done, otherwise advance.

In theory, though, if you have a bunch of whitespace and there is a non-space character, the engine reaches it and then backtracks only to fail, because there's no way to match \z in such situation. Perl's for example does backtrack, as this example shows (in Perl regexps (?{ ... }) executes Perl code when the engine reaches that point):

fxn@yeager:~ $ perl -le '"    foo" =~ /\A[[:space:]]*(?{print length $&})\z/'
4
3
2
1
0

That would certainly explain the difference in performance, since the vanilla character class has no backtracking going on.

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 20, 2016

Contributor

I opened this issue to see if we can get this implemented in Ruby

https://bugs.ruby-lang.org/issues/12306

Contributor

SamSaffron commented Apr 20, 2016

I opened this issue to see if we can get this implemented in Ruby

https://bugs.ruby-lang.org/issues/12306

kamipo added a commit to kamipo/rails that referenced this pull request Apr 21, 2016

@kamipo kamipo referenced this pull request Apr 21, 2016

Merged

Remove unused `BLANK_RE` #24663

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 21, 2016

Contributor

@schneems @fxn @tenderlove help convince Matz that there is real world use of String#blank? and we want it to be a thing on https://bugs.ruby-lang.org/issues/12306

Contributor

SamSaffron commented Apr 21, 2016

@schneems @fxn @tenderlove help convince Matz that there is real world use of String#blank? and we want it to be a thing on https://bugs.ruby-lang.org/issues/12306

@nellshamrell

This comment has been minimized.

Show comment
Hide comment
@nellshamrell

nellshamrell Apr 21, 2016

So, so, so glad the "Beneath the Surface: Regular Expressions in Ruby" was helpful!

So, so, so glad the "Beneath the Surface: Regular Expressions in Ruby" was helpful!

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Apr 21, 2016

Member

I have benchmarked both regexps against a non-blank string of length 1:

require 'benchmark/ips'

str      = 'a'
positive = /\A[[:space:]]*\z/
negative = /[[:^space:]]/

Benchmark.ips do |x|
  x.report('positive') { str =~ positive }
  x.report('negative') { str !~ negative }
  x.compare!
end

There is no significant backtracking going on because \A[[:space:]]* only matches at position 0. Then \z fails, done. The result is still slower in magnitudes similar to those found by @schneems:

fxn@yeager:~/tmp $ ruby foo.rb
Calculating -------------------------------------
            positive    59.972k i/100ms
            negative    62.500k i/100ms
-------------------------------------------------
            positive      1.915M (± 5.7%) i/s -      9.596M
            negative      2.456M (± 6.6%) i/s -     12.250M

Comparison:
            negative:  2456491.7 i/s
            positive:  1915258.1 i/s - 1.28x slower

I changed the quantifier to + to skip even that null match, and still seeing 1.30x factors.

So, albeit when the string has a large prefix with whitespace backtracking could have perhaps a cost (speculation), these tests seem to indicate it does not per se explain that 30%.

My hunch is that the engine just does more work due to the complication in the regexp, whereas the other one is very straightforward (but don't really know it).

Member

fxn commented Apr 21, 2016

I have benchmarked both regexps against a non-blank string of length 1:

require 'benchmark/ips'

str      = 'a'
positive = /\A[[:space:]]*\z/
negative = /[[:^space:]]/

Benchmark.ips do |x|
  x.report('positive') { str =~ positive }
  x.report('negative') { str !~ negative }
  x.compare!
end

There is no significant backtracking going on because \A[[:space:]]* only matches at position 0. Then \z fails, done. The result is still slower in magnitudes similar to those found by @schneems:

fxn@yeager:~/tmp $ ruby foo.rb
Calculating -------------------------------------
            positive    59.972k i/100ms
            negative    62.500k i/100ms
-------------------------------------------------
            positive      1.915M (± 5.7%) i/s -      9.596M
            negative      2.456M (± 6.6%) i/s -     12.250M

Comparison:
            negative:  2456491.7 i/s
            positive:  1915258.1 i/s - 1.28x slower

I changed the quantifier to + to skip even that null match, and still seeing 1.30x factors.

So, albeit when the string has a large prefix with whitespace backtracking could have perhaps a cost (speculation), these tests seem to indicate it does not per se explain that 30%.

My hunch is that the engine just does more work due to the complication in the regexp, whereas the other one is very straightforward (but don't really know it).

fxn added a commit that referenced this pull request Apr 21, 2016

just say nothing about why this regexp is slower [ci skip]
Further investigation seems to disprove that backtracking is the
reason why the positive variant is slower, see

    #24658 (comment)

so, just say nothing about it, only assert it is slower.
@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Apr 22, 2016

Member

@k-takata, just for curiosity, could you shed some light about what really explains the difference in performance seen in the benchmark in the previous comment? That would be awesome to understand!

Member

fxn commented Apr 22, 2016

@k-takata, just for curiosity, could you shed some light about what really explains the difference in performance seen in the benchmark in the previous comment? That would be awesome to understand!

@nurse

This comment has been minimized.

Show comment
Hide comment
@nurse

nurse Apr 24, 2016

=~ is faster than === for Regexp match because it is optimized into opt_regexpmatch1 in YARV.

require 'benchmark/ips'

def string_generate
  str = " abcdefghijklmnopqrstuvwxyz\t".freeze
  str[rand(0..(str.length - 1))] * rand(0..23)
end

strings = 100.times.map { string_generate }

ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/
NON_SPACE = /[[:^space:]]/

Benchmark.ips do |x|
  x.report('old regex            ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } }
  x.report('current regex constant ===') { strings.each {|str| str.empty? || !(NON_SPACE === str) } }
  x.report('current regex constant =~') { strings.each {|str| str.empty? || !(NON_SPACE =~ str) } }
  x.report('current regex literal ===') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } }
  x.report('current regex literal =~') { strings.each {|str| str.empty? || !(/[[:^space:]]/ =~ str) } }
  x.report('current regex literal !~') { strings.each {|str| str.empty? || (/[[:^space:]]/ !~ str) } }
  x.compare!
end
Warming up --------------------------------------
old regex
                         1.825k i/100ms
current regex constant ===
                         2.065k i/100ms
current regex constant =~
                         1.829k i/100ms
current regex literal ===
                         1.914k i/100ms
current regex literal =~
                         2.239k i/100ms
current regex literal !~
                         1.880k i/100ms
Calculating -------------------------------------
old regex
                         17.430k (±16.3%) i/s -     85.775k in   5.132384s
current regex constant ===
                         20.696k (±17.3%) i/s -     99.120k in   5.009458s
current regex constant =~
                         20.835k (±14.3%) i/s -    102.424k in   5.053427s
current regex literal ===
                         19.547k (±14.5%) i/s -     95.700k in   5.022457s
current regex literal =~
                         21.587k (±19.8%) i/s -    102.994k in   5.064171s
current regex literal !~
                         18.080k (±18.7%) i/s -     86.480k in   5.023627s

Comparison:
current regex literal =~:    21587.0 i/s
current regex constant =~:    20835.1 i/s - same-ish: difference falls within error
current regex constant ===:    20696.1 i/s - same-ish: difference falls within error
current regex literal ===:    19547.0 i/s - same-ish: difference falls within error
current regex literal !~:    18079.8 i/s - same-ish: difference falls within error
old regex            :    17429.8 i/s - same-ish: difference falls within error
% ruby -e'$><<RubyVM::InstructionSequence.new(p %q{ !(/[[:^space:]]*/ === str) }).disasm'
" !(/[[:^space:]]*/ === str) "
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putobject        /[[:^space:]]*/
0004 putself
0005 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0008 opt_send_without_block <callinfo!mid:===, argc:1, ARGS_SIMPLE>, <callcache>
0011 opt_not          <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache>
0014 leave
% ruby -e'$><<RubyVM::InstructionSequence.new(p %q{ /[[:^space:]]*/ !~ str }).disasm'
" /[[:^space:]]*/ !~ str "
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putobject        /[[:^space:]]*/
0004 putself
0005 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0008 opt_send_without_block <callinfo!mid:!~, argc:1, ARGS_SIMPLE>, <callcache>
0011 leave
% ruby -e'$><<RubyVM::InstructionSequence.new(p %q{ !(/[[:^space:]]*/ =~ str) }).disasm'
" !(/[[:^space:]]*/ =~ str) "
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putself
0003 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0006 opt_regexpmatch1 /[[:^space:]]*/
0008 opt_not          <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache>
0011 leave

nurse commented Apr 24, 2016

=~ is faster than === for Regexp match because it is optimized into opt_regexpmatch1 in YARV.

require 'benchmark/ips'

def string_generate
  str = " abcdefghijklmnopqrstuvwxyz\t".freeze
  str[rand(0..(str.length - 1))] * rand(0..23)
end

strings = 100.times.map { string_generate }

ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/
NON_SPACE = /[[:^space:]]/

Benchmark.ips do |x|
  x.report('old regex            ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } }
  x.report('current regex constant ===') { strings.each {|str| str.empty? || !(NON_SPACE === str) } }
  x.report('current regex constant =~') { strings.each {|str| str.empty? || !(NON_SPACE =~ str) } }
  x.report('current regex literal ===') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } }
  x.report('current regex literal =~') { strings.each {|str| str.empty? || !(/[[:^space:]]/ =~ str) } }
  x.report('current regex literal !~') { strings.each {|str| str.empty? || (/[[:^space:]]/ !~ str) } }
  x.compare!
end
Warming up --------------------------------------
old regex
                         1.825k i/100ms
current regex constant ===
                         2.065k i/100ms
current regex constant =~
                         1.829k i/100ms
current regex literal ===
                         1.914k i/100ms
current regex literal =~
                         2.239k i/100ms
current regex literal !~
                         1.880k i/100ms
Calculating -------------------------------------
old regex
                         17.430k (±16.3%) i/s -     85.775k in   5.132384s
current regex constant ===
                         20.696k (±17.3%) i/s -     99.120k in   5.009458s
current regex constant =~
                         20.835k (±14.3%) i/s -    102.424k in   5.053427s
current regex literal ===
                         19.547k (±14.5%) i/s -     95.700k in   5.022457s
current regex literal =~
                         21.587k (±19.8%) i/s -    102.994k in   5.064171s
current regex literal !~
                         18.080k (±18.7%) i/s -     86.480k in   5.023627s

Comparison:
current regex literal =~:    21587.0 i/s
current regex constant =~:    20835.1 i/s - same-ish: difference falls within error
current regex constant ===:    20696.1 i/s - same-ish: difference falls within error
current regex literal ===:    19547.0 i/s - same-ish: difference falls within error
current regex literal !~:    18079.8 i/s - same-ish: difference falls within error
old regex            :    17429.8 i/s - same-ish: difference falls within error
% ruby -e'$><<RubyVM::InstructionSequence.new(p %q{ !(/[[:^space:]]*/ === str) }).disasm'
" !(/[[:^space:]]*/ === str) "
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putobject        /[[:^space:]]*/
0004 putself
0005 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0008 opt_send_without_block <callinfo!mid:===, argc:1, ARGS_SIMPLE>, <callcache>
0011 opt_not          <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache>
0014 leave
% ruby -e'$><<RubyVM::InstructionSequence.new(p %q{ /[[:^space:]]*/ !~ str }).disasm'
" /[[:^space:]]*/ !~ str "
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putobject        /[[:^space:]]*/
0004 putself
0005 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0008 opt_send_without_block <callinfo!mid:!~, argc:1, ARGS_SIMPLE>, <callcache>
0011 leave
% ruby -e'$><<RubyVM::InstructionSequence.new(p %q{ !(/[[:^space:]]*/ =~ str) }).disasm'
" !(/[[:^space:]]*/ =~ str) "
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putself
0003 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0006 opt_regexpmatch1 /[[:^space:]]*/
0008 opt_not          <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache>
0011 leave
@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 27, 2016

Contributor

I would strongly advise caution here, @schneems it looks like the change you added actually makes stuff significantly slower for longer strings. I added a more updated bench here:

https://github.com/SamSaffron/fast_blank/blob/master/benchmark

In particular you got to benchmark for strings of various length, my bench does 0,6,14,24,136 ... you probably want to add a super long string as well to ensure there is no pathological case

fast_blank remains significantly faster (except for the one tiny case that is shortcutted using empty? which it is only 10% faster.)

cc @nurse

Contributor

SamSaffron commented Apr 27, 2016

I would strongly advise caution here, @schneems it looks like the change you added actually makes stuff significantly slower for longer strings. I added a more updated bench here:

https://github.com/SamSaffron/fast_blank/blob/master/benchmark

In particular you got to benchmark for strings of various length, my bench does 0,6,14,24,136 ... you probably want to add a super long string as well to ensure there is no pathological case

fast_blank remains significantly faster (except for the one tiny case that is shortcutted using empty? which it is only 10% faster.)

cc @nurse

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Apr 27, 2016

Member

On my machine i'm still seeing that the new method is an improvement over the old

require 'benchmark/ips'

LONG_STRING = "   this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test"


Benchmark.ips do |x|
  x.report('old') { LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING }
  x.report('new') { LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING) }
  x.compare!
end


# Warming up --------------------------------------
#                  old   124.610k i/100ms
#                  new   154.021k i/100ms
# Calculating -------------------------------------
#                  old      1.899M (± 7.2%) i/s -      9.470M in   5.014725s
#                  new      2.674M (± 9.4%) i/s -     13.400M in   5.070622s

# Comparison:
#                  new:  2673836.1 i/s
#                  old:  1899274.9 i/s - 1.41x slower

It's actually slightly faster with the longer string my original test with strings of variable length up to 80 characters.

Member

schneems commented Apr 27, 2016

On my machine i'm still seeing that the new method is an improvement over the old

require 'benchmark/ips'

LONG_STRING = "   this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test"


Benchmark.ips do |x|
  x.report('old') { LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING }
  x.report('new') { LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING) }
  x.compare!
end


# Warming up --------------------------------------
#                  old   124.610k i/100ms
#                  new   154.021k i/100ms
# Calculating -------------------------------------
#                  old      1.899M (± 7.2%) i/s -      9.470M in   5.014725s
#                  new      2.674M (± 9.4%) i/s -     13.400M in   5.070622s

# Comparison:
#                  new:  2673836.1 i/s
#                  old:  1899274.9 i/s - 1.41x slower

It's actually slightly faster with the longer string my original test with strings of variable length up to 80 characters.

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Apr 27, 2016

Member

Btw I'm in favor of adding this gem to rails but I think we need to work with jruby out of the box too.

Ideally we could get this in MRI proper.

Member

schneems commented Apr 27, 2016

Btw I'm in favor of adding this gem to rails but I think we need to work with jruby out of the box too.

Ideally we could get this in MRI proper.

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 27, 2016

Contributor

@schneems I am not sure IPS is being invoked right there, you should use the while loop

 x.report("New Slow Blank") do |times|
      i = 0
      while i < times
        s.new_slow_blank?
        i += 1
      end
    end

What Ruby version are you on?

Contributor

SamSaffron commented Apr 27, 2016

@schneems I am not sure IPS is being invoked right there, you should use the while loop

 x.report("New Slow Blank") do |times|
      i = 0
      while i < times
        s.new_slow_blank?
        i += 1
      end
    end

What Ruby version are you on?

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Apr 27, 2016

Member

Ruby 2.3.1

irb(main):031:0* Benchmark.ips do |x|
irb(main):032:1*   x.report('old') { |t| t.times.each { LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING } }
irb(main):033:1>   x.report('new') { |t| t.times.each { LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING) } }
irb(main):034:1>   x.compare!
irb(main):035:1> end
Warming up --------------------------------------
                 old
    71.399k i/100ms
                 new    81.750k i/100ms
Calculating -------------------------------------
                 old      1.946M (±11.0%) i/s -      9.639M in   5.019133s
                 new      2.828M (±12.0%) i/s -     13.979M in   5.021530s

Comparison:
                 new:  2827559.9 i/s
                 old:  1945769.3 i/s - 1.45x slower
Member

schneems commented Apr 27, 2016

Ruby 2.3.1

irb(main):031:0* Benchmark.ips do |x|
irb(main):032:1*   x.report('old') { |t| t.times.each { LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING } }
irb(main):033:1>   x.report('new') { |t| t.times.each { LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING) } }
irb(main):034:1>   x.compare!
irb(main):035:1> end
Warming up --------------------------------------
                 old
    71.399k i/100ms
                 new    81.750k i/100ms
Calculating -------------------------------------
                 old      1.946M (±11.0%) i/s -      9.639M in   5.019133s
                 new      2.828M (±12.0%) i/s -     13.979M in   5.021530s

Comparison:
                 new:  2827559.9 i/s
                 old:  1945769.3 i/s - 1.45x slower
@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 27, 2016

Contributor

wow... Ruby performance is hard, will ping @ko1 on this cause it is super confusing

https://gist.github.com/SamSaffron/d1a9cc8e141e7415e06306369fdedfe5

Contributor

SamSaffron commented Apr 27, 2016

wow... Ruby performance is hard, will ping @ko1 on this cause it is super confusing

https://gist.github.com/SamSaffron/d1a9cc8e141e7415e06306369fdedfe5

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 27, 2016

Contributor

OK I think I know why this is slower in the less artificial benchmark

New regex allocates more stuff, a lot more stuff.

https://gist.github.com/SamSaffron/f73fd0395e050e927d1a3137373eeaee

449 bytes in new version 169 in old version, once extracted into a method regex magic is causing less stuff to be reused in globals

Contributor

SamSaffron commented Apr 27, 2016

OK I think I know why this is slower in the less artificial benchmark

New regex allocates more stuff, a lot more stuff.

https://gist.github.com/SamSaffron/f73fd0395e050e927d1a3137373eeaee

449 bytes in new version 169 in old version, once extracted into a method regex magic is causing less stuff to be reused in globals

@nurse

This comment has been minimized.

Show comment
Hide comment
@nurse

nurse Apr 28, 2016

@schneems @SamSaffron t.times.each is slow because block invocation is slow, and it disables Integer#times' optimization. Use while loop instead for such micro benchmark.

nurse commented Apr 28, 2016

@schneems @SamSaffron t.times.each is slow because block invocation is slow, and it disables Integer#times' optimization. Use while loop instead for such micro benchmark.

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Apr 28, 2016

Member

This is by far the most comments i've ever gotten on a 1 line code change. This is all pretty weird.

I re-tried with the while method of looping and get similar results

require 'benchmark/ips'

LONG_STRING = "   this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test"


Benchmark.ips do |x|
  x.report('old') do |times| 
    i = 0
    while i < times
      LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING
      i += 1
    end
  end
  x.report('new') do |times| 
    i = 0
    while i < times
      LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING)
      i += 1
    end
  end
  x.compare!
end
# Warming up --------------------------------------
#                  old   124.919k i/100ms
#                  new   149.075k i/100ms
# Calculating -------------------------------------
#                  old      2.258M (± 7.0%) i/s -     11.243M in   5.006437s
#                  new      3.200M (±10.1%) i/s -     15.951M in   5.044622s

# Comparison:
#                  new:  3199827.3 i/s
#                  old:  2257599.3 i/s - 1.42x slower

However when I put those regexes into a method I do see the slow down and the "new" method is 2x slower than the old method. I was under the impression that regex literals are essentially frozen. Assigning it to a constant doesn't help.

It looks like the memory use is larger because a character is matched and a MatchData object is created, if you make the string empty LONG_STRING = " " then the old regex will generate and return a match which will allocate a MatchData.

When I repeated the benchmark with the regular expressions in methods with a long string that matches the old regex, they are roughly the same speed.

I'm a bit more confused now than I was before.

Member

schneems commented Apr 28, 2016

This is by far the most comments i've ever gotten on a 1 line code change. This is all pretty weird.

I re-tried with the while method of looping and get similar results

require 'benchmark/ips'

LONG_STRING = "   this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test"


Benchmark.ips do |x|
  x.report('old') do |times| 
    i = 0
    while i < times
      LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING
      i += 1
    end
  end
  x.report('new') do |times| 
    i = 0
    while i < times
      LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING)
      i += 1
    end
  end
  x.compare!
end
# Warming up --------------------------------------
#                  old   124.919k i/100ms
#                  new   149.075k i/100ms
# Calculating -------------------------------------
#                  old      2.258M (± 7.0%) i/s -     11.243M in   5.006437s
#                  new      3.200M (±10.1%) i/s -     15.951M in   5.044622s

# Comparison:
#                  new:  3199827.3 i/s
#                  old:  2257599.3 i/s - 1.42x slower

However when I put those regexes into a method I do see the slow down and the "new" method is 2x slower than the old method. I was under the impression that regex literals are essentially frozen. Assigning it to a constant doesn't help.

It looks like the memory use is larger because a character is matched and a MatchData object is created, if you make the string empty LONG_STRING = " " then the old regex will generate and return a match which will allocate a MatchData.

When I repeated the benchmark with the regular expressions in methods with a long string that matches the old regex, they are roughly the same speed.

I'm a bit more confused now than I was before.

@schneems

This comment has been minimized.

Show comment
Hide comment
Member

schneems commented Apr 28, 2016

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Apr 28, 2016

Member

To add more entropy to the party, this new regexp is basically a revert of @tmm1's 30ba7ee (except for the operator and the negation of the character class) 😄.

Member

fxn commented Apr 28, 2016

To add more entropy to the party, this new regexp is basically a revert of @tmm1's 30ba7ee (except for the operator and the negation of the character class) 😄.

@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron Apr 29, 2016

Contributor

@schneems the results make sense this all boils down to yet another issue that has been open for over 3 years in Ruby core

https://bugs.ruby-lang.org/issues/8110

basically onigaruma supports a no backref mode but there is no way to invoke it from Ruby. carrying around the backrefs and the other mountains of globals for EVERY regex makes it impossible to write super fast regular expressions in Ruby. I am willing to bet that Ruby has some optimisations to maintain some of the allocated stuff when invoked in a loop and drops some reuse when methods are invoked.

Hopefully @matz can accept some mechanism in Ruby to tap onigaruma without backrefs and regexes without globals.

Contributor

SamSaffron commented Apr 29, 2016

@schneems the results make sense this all boils down to yet another issue that has been open for over 3 years in Ruby core

https://bugs.ruby-lang.org/issues/8110

basically onigaruma supports a no backref mode but there is no way to invoke it from Ruby. carrying around the backrefs and the other mountains of globals for EVERY regex makes it impossible to write super fast regular expressions in Ruby. I am willing to bet that Ruby has some optimisations to maintain some of the allocated stuff when invoked in a loop and drops some reuse when methods are invoked.

Hopefully @matz can accept some mechanism in Ruby to tap onigaruma without backrefs and regexes without globals.

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Apr 29, 2016

Member

I'm still confused about why this only happens in a method? You would think we would be creating all those globals when we're at the main context just as when we're in a method. You would also think that in both cases were using a regex. The old regex on the surface looks more complicated than the new regex, so while I understand that there is matchdata created, it doesn't fully explain the speed difference. I tried to make a case where the "old" method is slower and...

I tried setting LONG_STRING to something that would match the "old" string and not the new string, you would expect that to create MatchData and be equally slow but the two ended up roughly the same speeds https://gist.github.com/schneems/0c9b81e1542f457092c8bdaeacb01312

It looks like we should revert my change for speed, i'm happy to do that. But I do want to fully understand the minutiae here for future optimizations.

Member

schneems commented Apr 29, 2016

I'm still confused about why this only happens in a method? You would think we would be creating all those globals when we're at the main context just as when we're in a method. You would also think that in both cases were using a regex. The old regex on the surface looks more complicated than the new regex, so while I understand that there is matchdata created, it doesn't fully explain the speed difference. I tried to make a case where the "old" method is slower and...

I tried setting LONG_STRING to something that would match the "old" string and not the new string, you would expect that to create MatchData and be equally slow but the two ended up roughly the same speeds https://gist.github.com/schneems/0c9b81e1542f457092c8bdaeacb01312

It looks like we should revert my change for speed, i'm happy to do that. But I do want to fully understand the minutiae here for future optimizations.

@fxn

This comment has been minimized.

Show comment
Hide comment
@fxn

fxn Apr 29, 2016

Member

Agree, this is the kind of thing that needs a clear benefit. We could perhaps revert and still be open to revisit if further research shows an undebatable speedup.

@SamSaffron regarding your gem, I personally use it in my projects. But to include it in the generated Gemfile, even commented out, is a different story.

One minor point that I am sure we could solve somehow is that we need blank_as? as blank? transparently.

The more fundamental con, however, is that we would be committing to a substitute outside the realm of the project. Separate repos, separate test suites... I think everything is more ordered if people just opt-in by themselves.

And in any case, it is our duty to provide the best pure Ruby implementation we can think of regardless.

Member

fxn commented Apr 29, 2016

Agree, this is the kind of thing that needs a clear benefit. We could perhaps revert and still be open to revisit if further research shows an undebatable speedup.

@SamSaffron regarding your gem, I personally use it in my projects. But to include it in the generated Gemfile, even commented out, is a different story.

One minor point that I am sure we could solve somehow is that we need blank_as? as blank? transparently.

The more fundamental con, however, is that we would be committing to a substitute outside the realm of the project. Separate repos, separate test suites... I think everything is more ordered if people just opt-in by themselves.

And in any case, it is our duty to provide the best pure Ruby implementation we can think of regardless.

fxn added a commit that referenced this pull request Apr 29, 2016

restores the regexp used in String#blank?
This commit undoes 54243fe.

Reason: Further investigation has shown the benefit is not so clear
generally speaking.

There is a long discussion and several benchmarks in the PR #24658
if you are interested in the details.
@SamSaffron

This comment has been minimized.

Show comment
Hide comment
@SamSaffron

SamSaffron May 2, 2016

Contributor

I think everything is more ordered if people just opt-in by themselves.

I totally understand the sentiment, but the same argument can apply to uglifyjs or coffeescript or mime types. Plenty of dependencies come from the outside, as it stands we are giving an edge to the blessed few that know about fast_xs and fast_blank. This knowledge is tribal there is no way to get the word out.

That said, I totally respect whatever is decided here, ultimately @ko1 agrees with me that we need a mechanism to tap onigaruma without forcing backref's and globals for every regex test, once that is implemented we could achieve significant perf improvements to this blank regex and many other spots in rails.

Contributor

SamSaffron commented May 2, 2016

I think everything is more ordered if people just opt-in by themselves.

I totally understand the sentiment, but the same argument can apply to uglifyjs or coffeescript or mime types. Plenty of dependencies come from the outside, as it stands we are giving an edge to the blessed few that know about fast_xs and fast_blank. This knowledge is tribal there is no way to get the word out.

That said, I totally respect whatever is decided here, ultimately @ko1 agrees with me that we need a mechanism to tap onigaruma without forcing backref's and globals for every regex test, once that is implemented we could achieve significant perf improvements to this blank regex and many other spots in rails.

Neodelf added a commit to Neodelf/rails that referenced this pull request May 5, 2016

restores the regexp used in String#blank?
This commit undoes 54243fe.

Reason: Further investigation has shown the benefit is not so clear
generally speaking.

There is a long discussion and several benchmarks in the PR #24658
if you are interested in the details.
@nurse

This comment has been minimized.

Show comment
Hide comment
@nurse

nurse May 24, 2016

Anyway your discussion seems to be derived from tmm1's profile.
As far as it is still true now, implementing String#present? with regexp instead of calling String#blank? can speed up by one method call without side effect.

nurse commented May 24, 2016

Anyway your discussion seems to be derived from tmm1's profile.
As far as it is still true now, implementing String#present? with regexp instead of calling String#blank? can speed up by one method call without side effect.

LukeeeeBennett pushed a commit to gitlabhq/gitlabhq that referenced this pull request Jan 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment