Optimize Splitter in the matching delimiter regexp case #2285

Merged
merged 1 commit into from Apr 17, 2013

Conversation

Projects
None yet
3 participants
Member

leocassarani commented Apr 17, 2013

Inspired by @MSNexploder's profiling on #985, I did some digging around the Splitter class to see if I could optimise its execution time.

The main change is that we now avoid calling MatchData#captures in a tight loop unless there are in fact any captures.

I've also made a few micro-optimisations, like using Array#concat instead of Array#push with a splat, and favouring Array#at(-1) over Array#last when we know the array is not empty.

I used the benchmark in benchmark/core/string/bench_split.rb to guide my changes. In particular, the relevant benchmark is "matching delimiter regexp", which I ran against MRI.

Before:

Comparing benchmark/core/string/bench_split.rb:matching delimiter regexp:
   ruby:     138829 i/s
bin/rbx:      48619 i/s - 2.86x slower

After:

Comparing benchmark/core/string/bench_split.rb:matching delimiter regexp:
   ruby:     139696 i/s
bin/rbx:     138014 i/s - 1.01x slower
Optimize Splitter in the matching delimiter regexp case
Avoid calling MatchData#captures in a tight loop unless there are in
fact any captures.

Also includes a few micro-optimisations, like using Array#concat instead
of Array#push with a splat, and favouring Array#at(-1) over Array#last
when we know the array is not empty.

Overall, this shows an almost three-fold improvement in the relevant
benchmark.
Owner

dbussink commented Apr 17, 2013

Thanks! I think there's probably more room for the case that CSV uses actually here, because it calls split with a string, not a regexp. We probably should be able to handle it as a string then, instead of having to go through the regexp path at all for CSV.

Owner

dbussink commented Apr 17, 2013

Also, is at(-1) faster than .last? That really surprises me, maybe .last isn't as optimal as it should be then :).

dbussink added a commit that referenced this pull request Apr 17, 2013

Merge pull request #2285 from leocassarani/splitter-optimization
Optimize Splitter in the matching delimiter regexp case

@dbussink dbussink merged commit 52211bf into rubinius:master Apr 17, 2013

1 check failed

default The Travis build failed
Details
Member

leocassarani commented Apr 17, 2013

@dbussink: I benchmarked at(-1) vs last, and the former was slightly faster than the latter, mostly because last has to check the size of the array and return nil if it's empty. Since we were calling last at a point where the array couldn't possibly be empty, skipping that check gives us a tiny speed boost.

Member

burningTyger commented Apr 17, 2013

@leocassarani did you add a benchmark to the repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment