Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Optimize Splitter in the matching delimiter regexp case #2285

Merged
merged 1 commit into from

3 participants

@leocassarani
Collaborator

Inspired by @MSNexploder's profiling on #985, I did some digging around the Splitter class to see if I could optimise its execution time.

The main change is that we now avoid calling MatchData#captures in a tight loop unless there are in fact any captures.

I've also made a few micro-optimisations, like using Array#concat instead of Array#push with a splat, and favouring Array#at(-1) over Array#last when we know the array is not empty.

I used the benchmark in benchmark/core/string/bench_split.rb to guide my changes. In particular, the relevant benchmark is "matching delimiter regexp", which I ran against MRI.

Before:

Comparing benchmark/core/string/bench_split.rb:matching delimiter regexp:
   ruby:     138829 i/s
bin/rbx:      48619 i/s - 2.86x slower

After:

Comparing benchmark/core/string/bench_split.rb:matching delimiter regexp:
   ruby:     139696 i/s
bin/rbx:     138014 i/s - 1.01x slower
@leocassarani leocassarani Optimize Splitter in the matching delimiter regexp case
Avoid calling MatchData#captures in a tight loop unless there are in
fact any captures.

Also includes a few micro-optimisations, like using Array#concat instead
of Array#push with a splat, and favouring Array#at(-1) over Array#last
when we know the array is not empty.

Overall, this shows an almost three-fold improvement in the relevant
benchmark.
01593ae
@dbussink
Owner

Thanks! I think there's probably more room for the case that CSV uses actually here, because it calls split with a string, not a regexp. We probably should be able to handle it as a string then, instead of having to go through the regexp path at all for CSV.

@dbussink
Owner

Also, is at(-1) faster than .last? That really surprises me, maybe .last isn't as optimal as it should be then :).

@dbussink dbussink merged commit 52211bf into rubinius:master
@leocassarani
Collaborator

@dbussink: I benchmarked at(-1) vs last, and the former was slightly faster than the latter, mostly because last has to check the size of the array and return nil if it's empty. Since we were calling last at a point where the array couldn't possibly be empty, skipping that check gives us a tiny speed boost.

@burningTyger
Collaborator

@leocassarani did you add a benchmark to the repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 17, 2013
  1. @leocassarani

    Optimize Splitter in the matching delimiter regexp case

    leocassarani authored
    Avoid calling MatchData#captures in a tight loop unless there are in
    fact any captures.
    
    Also includes a few micro-optimisations, like using Array#concat instead
    of Array#push with a splat, and favouring Array#at(-1) over Array#last
    when we know the array is not empty.
    
    Overall, this shows an almost three-fold improvement in the relevant
    benchmark.
This page is out of date. Refresh to see the latest.
Showing with 7 additions and 3 deletions.
  1. +7 −3 kernel/common/splitter.rb
View
10 kernel/common/splitter.rb
@@ -76,7 +76,11 @@ def self.split(string, pattern, limit)
unless collapsed && (match.full.at(0) == last_match_end)
ret << match.pre_match_from(last_match_end)
- ret.push(*match.captures.compact)
+
+ # length > 1 means there are captures
+ if match.length > 1
+ ret.concat(match.captures.compact)
+ end
end
if collapsed
@@ -98,8 +102,8 @@ def self.split(string, pattern, limit)
end
# Trim from end
- if !ret.empty? and (limit.equal?(undefined) || limit == 0)
- while s = ret.last and s.empty?
+ if limit.equal?(undefined) || limit == 0
+ while s = ret.at(-1) and s.empty?
ret.pop
end
end
Something went wrong with that request. Please try again.