-
Notifications
You must be signed in to change notification settings - Fork 21.9k
Speed up String#blank? Regex #24658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up String#blank? Regex #24658
Conversation
Follow up on rails@697384d#commitcomment-17184696. The regex to detect a blank string `/\A[[:space:]]*\z/` will loop through every character in the string to ensure that all of them are a `:space:` type. We can invert this logic and instead look for any non-`:space:` characters. When that happens, we would return on the first character found and the regex engine does not need to keep looking. Thanks @nellshamrell for the regex talk at LSRC. By defining a "blank" string as any string that does not have a non-whitespace character (yes, double negative) we can get a substantial speed bump. Also an inline regex is (barely) faster than a regex in a constant, since it skips the constant lookup. A regex literal is frozen by default. ```ruby require 'benchmark/ips' def string_generate str = " abcdefghijklmnopqrstuvwxyz\t".freeze str[rand(0..(str.length - 1))] * rand(0..23) end strings = 100.times.map { string_generate } ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/ Benchmark.ips do |x| x.report('current regex ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } } x.report('+ instead of * ') { strings.each {|str| str.empty? || /\A[[:space:]]+\z/ === str } } x.report('not a non-whitespace char') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } } x.compare! end # Warming up -------------------------------------- # current regex # 1.744k i/100ms # not a non-whitespace char # 2.264k i/100ms # Calculating ------------------------------------- # current regex # 18.078k (± 8.9%) i/s - 90.688k # not a non-whitespace char # 23.580k (± 7.1%) i/s - 117.728k # Comparison: # not a non-whitespace char: 23580.3 i/s # current regex : 18078.2 i/s - 1.30x slower ``` This makes the method roughly 30% faster `(23.580 - 18.078)/18.078 * 100`. cc/ @fxn
See the rationale in the comment in this patch. To benchmark this I ran a number of variations, ultimately narrowing to require 'benchmark/ips' str = '' regexp = /\A[[:space:]]*\z/ Benchmark.ips do |x| x.report('regexp') { regexp === str } x.report('empty') { str.empty? || regexp === str } x.compare! end This benchmark has consistently reported speedups around 3.5x: Calculating ------------------------------------- regexp 69.197k i/100ms empty 115.468k i/100ms ------------------------------------------------- regexp 2. 6.3%) i/s - 13.839M empty 9. 8.8%) i/s - 47.804M Comparison: empty: 9642607.6 i/s regexp: 2768351.9 i/s - 3.48x slower Sometimes even reaching 4x. Running the same bechmark on strings of 10 or 100 characters (with whitespace or present) has shown a slowdown of just about 1.01/1.02. Marginal, we seem to have a worthwhile trade-off here.
empty? || BLANK_RE === self | ||
# Regex check is slow, only check non-empty strings. | ||
# A string not blank if it contains a single non-space string. | ||
empty? || !(/[[:^space:]]/ === self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do !~
and remove the !
? Should be fewer instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, merged without seeing this remark. Let's refine if needed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried benching, and it's in the same ballpark, it keeps on ending up as slower for me. I'm not sure why
not a non-whitespace char: 21258.3 i/s
!~ : 18617.7 i/s - same-ish: difference falls within error
Which is weird because it is actually fewer instructions
irb(main):162:0* puts RubyVM::InstructionSequence.disasm -> { /[[:^space:]]/ !~ str }
== disasm: #<ISeq:block in irb_binding@(irb)>===========================
== catch table
| catch type: redo st: 0002 ed: 0013 sp: 0000 cont: 0002
| catch type: next st: 0002 ed: 0013 sp: 0000 cont: 0013
|------------------------------------------------------------------------
0000 trace 256 ( 162)
0002 trace 1
0004 putobject /[[:^space:]]/
0006 putself
0007 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0010 opt_send_without_block <callinfo!mid:!~, argc:1, ARGS_SIMPLE>, <callcache>
0013 trace 512
0015 leave
=> nil
irb(main):163:0> puts RubyVM::InstructionSequence.disasm -> { !(/[[:^space:]]/ === str) }
== disasm: #<ISeq:block in irb_binding@(irb)>===========================
== catch table
| catch type: redo st: 0002 ed: 0016 sp: 0000 cont: 0002
| catch type: next st: 0002 ed: 0016 sp: 0000 cont: 0016
|------------------------------------------------------------------------
0000 trace 256 ( 163)
0002 trace 1
0004 putobject /[[:^space:]]/
0006 putself
0007 opt_send_without_block <callinfo!mid:str, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0010 opt_send_without_block <callinfo!mid:===, argc:1, ARGS_SIMPLE>, <callcache>
0013 opt_not <callinfo!mid:!, argc:0, ARGS_SIMPLE>, <callcache>
0016 trace 512
0018 leave
=> nil
irb(main):164:0>
Interesting. In theory both versions could be equally performant. Let's take a present string: In the current regexp, the regexp engine should be able to halt and return false in the leftmost non-space character because of the Then, for blank strings with whitespace both regexps need to exhaust the string to determine all are spaces, or that no non-space was found. On the other hand the But, the measures show in practice there is a difference, and I wonder if the quantifier explains it. That is, the work the engine does if a quantifier is involved is maybe more complicated internally than the really straightforward test in this patch. For starters it needs possibly to maintain backtracking points. In! 😄 🚀 |
or just bundle fast_blank or simply add a comment to the generated config :) |
@fxn I think the greedy operator ( |
@SamSaffron and @tenderlove while you're here 😉 i've got a question. If we can generate a list of all :space: strings in a hash, then I think we can double current AS String#blank? speeds. I talked about that a bit more in my linked comment. To do that we'll need to decode this list and represent them in Ruby (https://github.com/ruby/ruby/blob/20cd25c86fd28eb1b5068d0db607e6aa33107f65/enc/unicode/name2ctype.h#L2794-L2807.) any tips on the best way to do that? |
I have not, thanks 👏 😄 |
@schneems next time please don't use random length strings in benchmarks. It makes the test data non-portable and unpredictable. It's OK to generate random strings, but do it once and throw it in the DATA section at the end of the script. 🙇 |
@schneems I recall trying to optimise this regex in the past, you make one change that improves perf for one class of strings and unfortunately some other string is slower. That is what triggered me to make fast_blank. Ruby core have no plans to add a That stuff triggered me to end up writing fast blank, it is used in production at both GitHub and Discourse. fast blanks biggest problem is that people are not aware of it, in many apps it ends up giving you a 5% bump, especially for people that are heavy on the I feel like the best approach is simply to plug that this gem exists (and fast_xs for that matter) in the default generated Gemfile, but really not my call, it is just frustrating sometimes that "Rails... but fast" it a TOP SECRET, that is stored in @tmm1's brains and a few other members of the illuminati. |
Long term probably the best thing to do is lobby Ruby to get
That would be the best thing here. |
To add more people to that list Shopify also use fast_blank in production. |
@tenderlove AFAIK it still goes left-to-right so you can fail fast. That is, the engine does not blindly go to the far right, but just matches as much as it can (as long as characters match). So, when the quantifier is greedy you first eat as much as possible matching, and then check if the rest of the regexp matches. If not, backtrack. When the quantifier is not greedy, you match as less as possible, and if the rest matches done, otherwise advance. In theory, though, if you have a bunch of whitespace and there is a non-space character, the engine reaches it and then backtracks only to fail, because there's no way to match
That would certainly explain the difference in performance, since the vanilla character class has no backtracking going on. |
I opened this issue to see if we can get this implemented in Ruby |
Follow up to rails#24658.
@schneems @fxn @tenderlove help convince Matz that there is real world use of |
So, so, so glad the "Beneath the Surface: Regular Expressions in Ruby" was helpful! |
I have benchmarked both regexps against a non-blank string of length 1: require 'benchmark/ips'
str = 'a'
positive = /\A[[:space:]]*\z/
negative = /[[:^space:]]/
Benchmark.ips do |x|
x.report('positive') { str =~ positive }
x.report('negative') { str !~ negative }
x.compare!
end There is no significant backtracking going on because
I changed the quantifier to So, albeit when the string has a large prefix with whitespace backtracking could have perhaps a cost (speculation), these tests seem to indicate it does not per se explain that 30%. My hunch is that the engine just does more work due to the complication in the regexp, whereas the other one is very straightforward (but don't really know it). |
Further investigation seems to disprove that backtracking is the reason why the positive variant is slower, see #24658 (comment) so, just say nothing about it, only assert it is slower.
@k-takata, just for curiosity, could you shed some light about what really explains the difference in performance seen in the benchmark in the previous comment? That would be awesome to understand! |
require 'benchmark/ips'
def string_generate
str = " abcdefghijklmnopqrstuvwxyz\t".freeze
str[rand(0..(str.length - 1))] * rand(0..23)
end
strings = 100.times.map { string_generate }
ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/
NON_SPACE = /[[:^space:]]/
Benchmark.ips do |x|
x.report('old regex ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } }
x.report('current regex constant ===') { strings.each {|str| str.empty? || !(NON_SPACE === str) } }
x.report('current regex constant =~') { strings.each {|str| str.empty? || !(NON_SPACE =~ str) } }
x.report('current regex literal ===') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } }
x.report('current regex literal =~') { strings.each {|str| str.empty? || !(/[[:^space:]]/ =~ str) } }
x.report('current regex literal !~') { strings.each {|str| str.empty? || (/[[:^space:]]/ !~ str) } }
x.compare!
end
|
I would strongly advise caution here, @schneems it looks like the change you added actually makes stuff significantly slower for longer strings. I added a more updated bench here: https://github.com/SamSaffron/fast_blank/blob/master/benchmark In particular you got to benchmark for strings of various length, my bench does 0,6,14,24,136 ... you probably want to add a super long string as well to ensure there is no pathological case fast_blank remains significantly faster (except for the one tiny case that is shortcutted using empty? which it is only 10% faster.) cc @nurse |
On my machine i'm still seeing that the new method is an improvement over the old require 'benchmark/ips'
LONG_STRING = " this is a longer test
this is a longer test
this is a longer test
this is a longer test
this is a longer test"
Benchmark.ips do |x|
x.report('old') { LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING }
x.report('new') { LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING) }
x.compare!
end
# Warming up --------------------------------------
# old 124.610k i/100ms
# new 154.021k i/100ms
# Calculating -------------------------------------
# old 1.899M (± 7.2%) i/s - 9.470M in 5.014725s
# new 2.674M (± 9.4%) i/s - 13.400M in 5.070622s
# Comparison:
# new: 2673836.1 i/s
# old: 1899274.9 i/s - 1.41x slower It's actually slightly faster with the longer string my original test with strings of variable length up to 80 characters. |
Btw I'm in favor of adding this gem to rails but I think we need to work with jruby out of the box too. Ideally we could get this in MRI proper. |
@schneems I am not sure IPS is being invoked right there, you should use the while loop
What Ruby version are you on? |
Ruby 2.3.1
|
wow... Ruby performance is hard, will ping @ko1 on this cause it is super confusing https://gist.github.com/SamSaffron/d1a9cc8e141e7415e06306369fdedfe5 |
OK I think I know why this is slower in the less artificial benchmark New regex allocates more stuff, a lot more stuff. https://gist.github.com/SamSaffron/f73fd0395e050e927d1a3137373eeaee 449 bytes in new version 169 in old version, once extracted into a method regex magic is causing less stuff to be reused in globals |
@schneems @SamSaffron |
This is by far the most comments i've ever gotten on a 1 line code change. This is all pretty weird. I re-tried with the require 'benchmark/ips'
LONG_STRING = " this is a longer test
this is a longer test
this is a longer test
this is a longer test
this is a longer test"
Benchmark.ips do |x|
x.report('old') do |times|
i = 0
while i < times
LONG_STRING.empty? || /\A[[:space:]]*\z/ === LONG_STRING
i += 1
end
end
x.report('new') do |times|
i = 0
while i < times
LONG_STRING.empty? || !(/[[:^space:]]/ === LONG_STRING)
i += 1
end
end
x.compare!
end
# Warming up --------------------------------------
# old 124.919k i/100ms
# new 149.075k i/100ms
# Calculating -------------------------------------
# old 2.258M (± 7.0%) i/s - 11.243M in 5.006437s
# new 3.200M (±10.1%) i/s - 15.951M in 5.044622s
# Comparison:
# new: 3199827.3 i/s
# old: 2257599.3 i/s - 1.42x slower However when I put those regexes into a method I do see the slow down and the "new" method is 2x slower than the old method. I was under the impression that regex literals are essentially frozen. Assigning it to a constant doesn't help. It looks like the memory use is larger because a character is matched and a MatchData object is created, if you make the string empty When I repeated the benchmark with the regular expressions in methods with a long string that matches the old regex, they are roughly the same speed. I'm a bit more confused now than I was before. |
Here are my two benchmarks side by side https://gist.github.com/schneems/330efedbe310e59ad2f0f3e35358d3d5 |
@schneems the results make sense this all boils down to yet another issue that has been open for over 3 years in Ruby core https://bugs.ruby-lang.org/issues/8110 basically onigaruma supports a no backref mode but there is no way to invoke it from Ruby. carrying around the backrefs and the other mountains of globals for EVERY regex makes it impossible to write super fast regular expressions in Ruby. I am willing to bet that Ruby has some optimisations to maintain some of the allocated stuff when invoked in a loop and drops some reuse when methods are invoked. Hopefully @matz can accept some mechanism in Ruby to tap onigaruma without backrefs and regexes without globals. |
I'm still confused about why this only happens in a method? You would think we would be creating all those globals when we're at the main context just as when we're in a method. You would also think that in both cases were using a regex. The old regex on the surface looks more complicated than the new regex, so while I understand that there is matchdata created, it doesn't fully explain the speed difference. I tried to make a case where the "old" method is slower and... I tried setting It looks like we should revert my change for speed, i'm happy to do that. But I do want to fully understand the minutiae here for future optimizations. |
Agree, this is the kind of thing that needs a clear benefit. We could perhaps revert and still be open to revisit if further research shows an undebatable speedup. @SamSaffron regarding your gem, I personally use it in my projects. But to include it in the generated Gemfile, even commented out, is a different story. One minor point that I am sure we could solve somehow is that we need The more fundamental con, however, is that we would be committing to a substitute outside the realm of the project. Separate repos, separate test suites... I think everything is more ordered if people just opt-in by themselves. And in any case, it is our duty to provide the best pure Ruby implementation we can think of regardless. |
This commit undoes 54243fe. Reason: Further investigation has shown the benefit is not so clear generally speaking. There is a long discussion and several benchmarks in the PR #24658 if you are interested in the details.
I totally understand the sentiment, but the same argument can apply to uglifyjs or coffeescript or mime types. Plenty of dependencies come from the outside, as it stands we are giving an edge to the blessed few that know about fast_xs and fast_blank. This knowledge is tribal there is no way to get the word out. That said, I totally respect whatever is decided here, ultimately @ko1 agrees with me that we need a mechanism to tap onigaruma without forcing backref's and globals for every regex test, once that is implemented we could achieve significant perf improvements to this blank regex and many other spots in rails. |
Anyway your discussion seems to be derived from tmm1's profile. |
Follow up on 697384d#commitcomment-17184696.
The regex to detect a blank string
/\A[[:space:]]*\z/
will loop through every character in the string to ensure that all of them are a:space:
type. We can invert this logic and instead look for any non-:space:
characters. When that happens, we would return on the first character found and the regex engine does not need to keep looking.Thanks @nellshamrell for the regex talk at LSRC.
By defining a "blank" string as any string that does not have a non-whitespace character (yes, double negative) we can get a substantial speed bump.
Also an inline regex is (barely) faster than a regex in a constant, since it skips the constant lookup. A regex literal is frozen by default.
This makes the method roughly 30% faster
(23.580 - 18.078)/18.078 * 100
.cc/ @fxn