Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark for Array#reject vs Array#delete vs Array#delete_if vs Array-[nil] #101

Closed
wants to merge 5 commits into from

Conversation

gbalbuena
Copy link

When you need to cleanup the array and extract all the nils there are at least 4 different ways

These are the results:

$ ruby -v code/array/array-reject-vs-delete-vs-delete_if-vs-subtract.rb 
ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-darwin14]
Warming up --------------------------------------
 A.reject{|o|o.nil?}     1.649k i/100ms
       A.delete(nil)   142.214k i/100ms
 A.delete_if(&:nil?)   109.865k i/100ms
           A - [nil]    77.019k i/100ms
Calculating -------------------------------------
 A.reject{|o|o.nil?}      5.360M (±15.0%) i/s -     25.878M
       A.delete(nil)      6.289M (± 8.9%) i/s -     31.287M
 A.delete_if(&:nil?)      3.690M (± 8.9%) i/s -     18.347M
           A - [nil]      1.712M (± 8.8%) i/s -      8.549M

Comparison:
       A.delete(nil):  6288900.3 i/s
 A.reject{|o|o.nil?}:  5360405.0 i/s - same-ish: difference falls within error
 A.delete_if(&:nil?):  3689880.1 i/s - 1.70x slower
           A - [nil]:  1712284.8 i/s - 3.67x slower

@Arcovion Arcovion added the Array label Mar 9, 2016
@nateberkopec
Copy link
Collaborator

Definitely need to add compact

@gbalbuena
Copy link
Author

Here you have it, @nateberkopec good catch!

@gbalbuena
Copy link
Author

$ ruby code/array/array-reject-vs-delete-vs-delete_if-vs-subtract-vs-compact.rb 
Warming up --------------------------------------
           A.compact    65.961k i/100ms
 A.reject{|o|o.nil?}     1.771k i/100ms
       A.delete(nil)   144.815k i/100ms
 A.delete_if(&:nil?)   138.235k i/100ms
           A - [nil]    89.647k i/100ms
Calculating -------------------------------------
           A.compact      6.211M (± 8.5%) i/s -     30.804M
 A.reject{|o|o.nil?}      6.625M (±11.1%) i/s -     32.415M
       A.delete(nil)      7.677M (± 7.9%) i/s -     38.231M
 A.delete_if(&:nil?)      6.587M (± 6.4%) i/s -     32.900M
           A - [nil]      1.930M (± 5.2%) i/s -      9.682M

Comparison:
       A.delete(nil):  7677121.9 i/s
 A.reject{|o|o.nil?}:  6625399.5 i/s - same-ish: difference falls within error
 A.delete_if(&:nil?):  6586659.2 i/s - 1.17x slower
           A.compact:  6211063.3 i/s - 1.24x slower
           A - [nil]:  1929668.5 i/s - 3.98x slower

@Arcovion
Copy link
Collaborator

Arcovion commented Mar 9, 2016

There are some problems with this:

  • Methods like delete mutate ARRAY, invalidating the benchmark as the following runs just process an empty array.
  • There are high standard deviations ±11.1% which leads incorrect rankings (reject should not fall within error).
  • It only tests the best case scenario of every element being nil, rather than randomly interspersing other values.
  • Symbol#to_proc is used for delete_if but not for reject, which may have some small impact.

This should be more accurate, hopefully I didn't overlook anything else:

require 'benchmark/ips'

ARRAY = Array.new(100_000){ rand(2).nonzero? }

ARRAY1 = ARRAY.dup
ARRAY2 = ARRAY.dup
ARRAY3 = ARRAY.dup
ARRAY4 = ARRAY.dup
ARRAY5 = ARRAY.dup
ARRAY6 = ARRAY.dup
ARRAY7 = ARRAY.dup
ARRAY8 = ARRAY.dup

def compact!
  ARRAY1.compact!
end

def compact
  ARRAY2.compact
end

def reject!
  ARRAY3.reject! { |o| o.nil? }
end

def reject
  ARRAY4.reject { |o| o.nil? }
end

def delete
  ARRAY5.delete(nil)
end

def delete_if
  ARRAY6.delete_if { |o| o.nil? }
end

def subtract
  ARRAY7 - [nil]
end

def grep_v
  ARRAY8.grep_v(nil)
end

Benchmark.ips do |x|
  x.report('A.compact!') { compact! }
  x.report('A.compact') { compact }
  x.report('A.reject{|o|o.nil?}') { reject }
  x.report('A.reject!{|o|o.nil?}') { reject! }
  x.report('A.delete(nil)') { delete }
  x.report('A.delete_if{|o|o.nil?}') { delete_if }
  x.report('A - [nil]') { subtract }
  x.report('A.grep_v(nil)') { grep_v }
  x.compare!
end
Warming up --------------------------------------
          A.compact!     3.310k i/100ms
           A.compact   335.000  i/100ms
 A.reject{|o|o.nil?}    15.000  i/100ms
A.reject!{|o|o.nil?}    47.000  i/100ms
       A.delete(nil)    30.000  i/100ms
A.delete_if{|o|o.nil?}
                        46.000  i/100ms
           A - [nil]    43.000  i/100ms
       A.grep_v(nil)    14.000  i/100ms
Calculating -------------------------------------
          A.compact!     35.754k (± 5.6%) i/s -    178.740k
           A.compact      2.346k (± 4.9%) i/s -     11.725k
 A.reject{|o|o.nil?}    160.969  (± 3.1%) i/s -    810.000 
A.reject!{|o|o.nil?}    461.897  (± 5.8%) i/s -      2.679k
       A.delete(nil)    304.818  (± 4.9%) i/s -      1.530k
A.delete_if{|o|o.nil?}
                        462.620  (± 5.0%) i/s -      2.346k
           A - [nil]    424.620  (± 4.7%) i/s -      2.150k
       A.grep_v(nil)    151.899  (± 3.3%) i/s -    770.000 

Comparison:
          A.compact!:    35754.1 i/s
           A.compact:     2346.0 i/s - 15.24x slower
A.delete_if{|o|o.nil?}:      462.6 i/s - 77.29x slower
A.reject!{|o|o.nil?}:      461.9 i/s - 77.41x slower
           A - [nil]:      424.6 i/s - 84.20x slower
       A.delete(nil):      304.8 i/s - 117.30x slower
 A.reject{|o|o.nil?}:      161.0 i/s - 222.12x slower
       A.grep_v(nil):      151.9 i/s - 235.38x slower



def reject
ARRAY.reject { |o| o.nil? }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reject(&:nil?)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ixti looks good to me use that syntax, will take on mind when made the changes proposed.

I didn't think on the mutation of the array, that was a good lesson to keep in mind.

@gbalbuena
Copy link
Author

@Arcovion Thanks for your review, will do the changes, this was something I found in my daily work and this was the repo to document my founds.

@gbalbuena
Copy link
Author

@Arcovion @ixti

Results after code review:

$ ruby code/array/array-reject-vs-delete-vs-delete_if-vs-subtract-vs-compact.rb 
Warming up --------------------------------------
          A.compact!     2.904k i/100ms
           A.compact   135.000  i/100ms
    A.reject(&:nil?)    16.000  i/100ms
   A.reject!(&:nil?)    34.000  i/100ms
       A.delete(nil)    33.000  i/100ms
A.delete_if{|o|o.nil?}
                        33.000  i/100ms
           A - [nil]    32.000  i/100ms
       A.grep_v(nil)    13.000  i/100ms
Calculating -------------------------------------
          A.compact!     30.704k (± 4.9%) i/s -    153.912k
           A.compact      1.393k (± 3.9%) i/s -      7.020k
    A.reject(&:nil?)    160.195  (± 3.7%) i/s -    800.000 
   A.reject!(&:nil?)    336.717  (± 4.5%) i/s -      1.700k
       A.delete(nil)    332.311  (± 3.3%) i/s -      1.683k
A.delete_if{|o|o.nil?}
                        342.727  (± 4.7%) i/s -      1.716k
           A - [nil]    326.369  (± 4.0%) i/s -      1.632k
       A.grep_v(nil)    131.573  (± 5.3%) i/s -    663.000 

Comparison:
          A.compact!:    30704.3 i/s
           A.compact:     1392.8 i/s - 22.05x slower
A.delete_if{|o|o.nil?}:      342.7 i/s - 89.59x slower
   A.reject!(&:nil?):      336.7 i/s - 91.19x slower
       A.delete(nil):      332.3 i/s - 92.40x slower
           A - [nil]:      326.4 i/s - 94.08x slower
    A.reject(&:nil?):      160.2 i/s - 191.67x slower
       A.grep_v(nil):      131.6 i/s - 233.36x slower

@gbalbuena
Copy link
Author

I'm thinking about (&:nil?) instead of {|o| o.nil?} that will force a to_proc call, would it be better to use the traditional notation for that and skips the casting

Thoughts?

@Arcovion
Copy link
Collaborator

Thanks! LGTM.
Would you mind adding the benchmark results to README.md like the others?

@ixti
Copy link
Collaborator

ixti commented Mar 11, 2016

After thinking a bit, I'm not sure how "good" comparison with bang versions are.
What I mean is in case of [1, 2, 3].compact! versus [1, 2, 3].compact we don't actually test anything but creation of new array in case of bang-less version.

@nateberkopec
Copy link
Collaborator

Perhaps we could do two separate benchmarks in the same file - one comparing the bang/modify-in-place versions, one without?

@ixti
Copy link
Collaborator

ixti commented Mar 11, 2016

I was thinking about that too. But after first iteration compact!will be doing nothing...

@Arcovion
Copy link
Collaborator

@ixti Oh wow, I completely forgot that. For some reason I was thinking that only mattered between each run of #report, so I made 8 arrays... lol.

@Arcovion
Copy link
Collaborator

How about:

require 'benchmark/ips'

ARRAY = Array.new(1000){ rand(2).nonzero? }

def compact
  ARRAY.compact
end

def reject
  ARRAY.reject(&:nil?)
end

def delete
  ARRAY.dup.delete(nil)
end

def delete_if
  ARRAY.dup.delete_if(&:nil?)
end

def subtract
  ARRAY - [nil]
end

def grep_v
  ARRAY.grep_v(nil)
end

Benchmark.ips do |x|
  x.report('A.compact') { compact }
  x.report('A.reject(&:nil?)') { reject }
  x.report('A.dup.delete(nil)') { delete }
  x.report('A.dup.delete_if(&:nil?)') { delete_if }
  x.report('A - [nil]') { subtract }
  x.report('A.grep_v(nil)') { grep_v }
  x.compare!
end
Warming up --------------------------------------
           A.compact    32.003k i/100ms
    A.reject(&:nil?)     1.708k i/100ms
   A.dup.delete(nil)     2.285k i/100ms
A.dup.delete_if(&:nil?)
                         1.566k i/100ms
           A - [nil]     3.983k i/100ms
       A.grep_v(nil)     1.452k i/100ms
Calculating -------------------------------------
           A.compact    447.529k (± 7.3%) i/s -      2.240M
    A.reject(&:nil?)     17.602k (± 5.2%) i/s -     88.816k
   A.dup.delete(nil)     24.027k (± 5.7%) i/s -    121.105k
A.dup.delete_if(&:nil?)
                         17.840k (± 3.4%) i/s -     89.262k
           A - [nil]     40.462k (± 6.0%) i/s -    207.116k
       A.grep_v(nil)     15.068k (± 3.7%) i/s -     75.504k

Comparison:
           A.compact:   447529.4 i/s
           A - [nil]:    40461.6 i/s - 11.06x slower
   A.dup.delete(nil):    24027.1 i/s - 18.63x slower
A.dup.delete_if(&:nil?):    17840.2 i/s - 25.09x slower
    A.reject(&:nil?):    17601.9 i/s - 25.43x slower
       A.grep_v(nil):    15068.4 i/s - 29.70x slower

@ixti
Copy link
Collaborator

ixti commented Mar 12, 2016

Yeah. Looks good.

@mblumtritt
Copy link

Hi guys - I have two remarks:

  • The code above benchmarks #dup for #delete and #delete_if too. That's not fair.
  • You mix two categories of functions. The first changes the Array in place and the second creates a complete new Array (with different content). I think that should not be mixed.

@ixti
Copy link
Collaborator

ixti commented Mar 13, 2016

I tend to agree that probably it worth to avoid #dup + (#delete|#delete_if) from the benchmarks.

@gbalbuena
Copy link
Author

require 'benchmark/ips'

ARRAY = Array.new(1000){ rand(2).nonzero? }

def compact
  ARRAY.compact
end

def reject
  ARRAY.reject(&:nil?)
end

def subtract
  ARRAY - [nil]
end

Benchmark.ips do |x|
  x.report('A.compact') { compact }
  x.report('A.reject(&:nil?)') { reject }
  x.report('A - [nil]') { subtract }
  x.compare!
end
devbox:fast-ruby GBalbuena$ ruby code/array/compact-vs-reject-vs-subtract.rb 
Warming up --------------------------------------
           A.compact    33.598k i/100ms
    A.reject(&:nil?)     1.596k i/100ms
           A - [nil]     3.281k i/100ms
Calculating -------------------------------------
           A.compact    346.781k (±20.0%) i/s -      1.680M
    A.reject(&:nil?)     15.690k (± 6.3%) i/s -     78.204k
           A - [nil]     33.054k (± 4.6%) i/s -    167.331k

Comparison:
           A.compact:   346781.0 i/s
           A - [nil]:    33054.3 i/s - 10.49x slower
    A.reject(&:nil?):    15689.7 i/s - 22.10x slower

@gbalbuena
Copy link
Author

require 'benchmark/ips'

ARRAY = Array.new(1000){ rand(2).nonzero? }
ARRAY1 = ARRAY.dup
ARRAY2 = ARRAY.dup
ARRAY3 = ARRAY.dup

def grep_v
  ARRAY1.grep_v(nil)
end

def delete
  ARRAY2.delete(nil)
end

def delete_if
  ARRAY3.delete_if(&:nil?)
end

Benchmark.ips do |x|
  x.report('A.grep_v(nil)') { grep_v }
  x.report('A.delete(nil)') { delete }
  x.report('A.delete_if(&:nil?)') { delete_if }
  x.compare!
end
devbox:fast-ruby GBalbuena$ ruby code/array/grep-vs-delete-vs-delete_if.rb 
Warming up --------------------------------------
       A.grep_v(nil)     1.327k i/100ms
       A.delete(nil)     3.190k i/100ms
 A.delete_if(&:nil?)     3.383k i/100ms
Calculating -------------------------------------
       A.grep_v(nil)     13.916k (± 2.3%) i/s -     70.331k
       A.delete(nil)     32.228k (± 3.2%) i/s -    162.690k
 A.delete_if(&:nil?)     35.120k (± 3.4%) i/s -    175.916k

Comparison:
 A.delete_if(&:nil?):    35119.5 i/s
       A.delete(nil):    32227.6 i/s - 1.09x slower
       A.grep_v(nil):    13916.3 i/s - 2.52x slower

@Arcovion
Copy link
Collaborator

@gbalbuena I'm sorry you've had to make so many changes for what was such a simple PR.
Thing is, we have no way to test in-place methods without #dup. The second benchmark doesn't work; as @ixti pointed out ARRAY.delete(nil) will remove all nils on the first iteration making the test useless.

I wondered if we could test stuff by adding #dup to every test (even if the test doesn't use in-place methods) as if it would cancel out. But my local testing shows it sort of equalises the results and removes a lot of the variance, leading me to believe there is a large overhead for #dup.

What we know for sure is that #compact! will always be the fastest way to remove nils, but why focus on nils?
I tried making a generic benchmark that works for removing any item, for me the most surprising result in all these benchmarks is the speed of subtracting with ARRAY - [item].
If you do just want to delete items from an array, I honestly can't tell if ARRAY.delete(item) is faster than ARRAY - [item], I have to just assume that if you're not using #dup and don't mind mutation then ARRAY.delete(item) is faster.
Most people would probably use ARRAY.reject{ |x| x == item }, but did you know that ARRAY.dup.reject!{ |x| x == item } is faster if you benchmark it this way? Madness.
We also saw that ARRAY.dup.delete(nil) is faster than ARRAY.reject(&nil?), that sort of information is still nice to keep on the benchmark.
Anyway, I'm still just going to use ARRAY.grep_v(item)...
What have we learned here? Really? :D

I think having a benchmark for removing nils is a bit pointless to be honest, #compact! is C code written for that exact purpose. As far as removing any one item from an array, my results are kinda inconclusive. We have no way of testing in-place methods at all.

Anyone else got any thoughts or ideas? Help me out...

@gbalbuena
Copy link
Author

nils was the use case that led me to the findings, that can be changed to a different kind of benchmark and target different cases, for example testing functionals methods over array for example: select, collect, map, reduce, etc.

But I think the cleanup of nils in an array case is a common problem, so if its out of the scope of the benchmarks lets focus on other thing and update the PR.

@nateberkopec
Copy link
Collaborator

I think having a benchmark for removing nils is a bit pointless to be honest, #compact! is C code written for that exact purpose.

To be fair, a good portion of the benchmarks we have with Enumerable boil down to "use this C-optimized method instead of ". Which is fine - I think it's easy to reach for the Enumerable method because you use them so often, rather than specialized methods like compact!.

@nirvdrum
Copy link
Collaborator

It's also good to not rely too heavily on MRI's internals. Those assumptions naturally do not hold for other Ruby implementations and possibly might not hold for future versions of MRI.

@gbalbuena gbalbuena closed this May 27, 2016
@jonathanhefner
Copy link
Collaborator

I was disappointed to see this closed without resolution, so I created a gem to help with these kind of benchmarks: benchmark-inputs. (Feedback welcome!)

Here is the benchmark code comparing all cases:

require 'benchmark/inputs'

NO_NILS = [1] * 100
SOME_NILS = [*1..9, nil] * 10
JUST_NIL = [nil]

Benchmark.inputs([NO_NILS, SOME_NILS]) do |x|
  x.dup_inputs = true  # enabled for destructive operations

  x.report('A.compact') {|a| a.compact }
  x.report('A.compact!') {|a| a.compact! }
  x.report('A.reject(&:nil?)') {|a| a.reject(&:nil?) }
  x.report('A.reject!(&:nil?)') {|a| a.reject!(&:nil?) }
  x.report('A.delete(nil)') {|a| a.delete(nil) }
  x.report('A.delete_if(&:nil?)') {|a| a.delete_if(&:nil?) }
  x.report('A - [nil]'){|a| a - JUST_NIL }
  #x.report('A.grep_v(nil)'){|a| a.grep_v(nil) }  # only in Ruby 2.3+

  x.compare!
end

And here are the results I get:

A.compact
  955052.5 i/s (±0.83%)
A.compact!
  1415897.0 i/s (±3.93%)
A.reject(&:nil?)
  73797.2 i/s (±0.30%)
A.reject!(&:nil?)
  108205.2 i/s (±0.29%)
A.delete(nil)
  60839.9 i/s (±0.97%)
A.delete_if(&:nil?)
  106545.5 i/s (±0.46%)
A - [nil]
  103218.4 i/s (±0.41%)

Comparison:
           A.compact!:   1415897.0 i/s
            A.compact:    955052.5 i/s - 1.48x slower
    A.reject!(&:nil?):    108205.2 i/s - 13.09x slower
  A.delete_if(&:nil?):    106545.5 i/s - 13.29x slower
            A - [nil]:    103218.4 i/s - 13.72x slower
     A.reject(&:nil?):     73797.2 i/s - 19.19x slower
        A.delete(nil):     60839.9 i/s - 23.27x slower

The standard deviation of compact! is surprising, but I think tolerable? The worse performance of all the &:nil? methods makes sense, because a Proc is created on each call. (I thought about storing :nil?.to_proc in a constant, but that wouldn't align with real-world usage.) The astonishing observation is how slow delete(nil) is. So slow that I question its correctness, but it is consistent, and using benchmark-ips shows a similar result (accounting for the overhead of dup):

Benchmark.ips do |x|
  x.report('A.dup.compact!') { SOME_NILS.dup.compact! }
  x.report('A.dup.delete(nil)') { SOME_NILS.dup.delete(nil) }
  x.compare!
end
Warming up --------------------------------------
      A.dup.compact!    36.997k i/100ms
   A.dup.delete(nil)     5.451k i/100ms
Calculating -------------------------------------
      A.dup.compact!    592.561k (± 3.4%) i/s -      2.960M in   5.000114s
   A.dup.delete(nil)     56.339k (± 1.8%) i/s -    283.452k in   5.032793s

Comparison:
      A.dup.compact!:   592561.2 i/s
   A.dup.delete(nil):    56339.1 i/s - 10.52x slower

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants