Use forwarded block rather than yield.

This appears to be faster in our benchmarks, and based also on what @yujinakayama has said.
rspec · Jan 4, 2015 · 940b7ab · 940b7ab
1 parent cd52601
commit 940b7ab
Show file tree

Hide file tree

Showing 3 changed files with 207 additions and 22 deletions.
diff --git a/benchmarks/capture_block_vs_yield.rb b/benchmarks/capture_block_vs_yield.rb
@@ -12,6 +12,8 @@ def capture_block_and_call(&block)
   block.call
 end
 
+puts "Using the block directly"
+
 Benchmark.ips do |x|
   x.report("yield                  ") do
     yield_control { }
@@ -26,25 +28,181 @@ def capture_block_and_call(&block)
   end
 end
 
+puts "Forwarding the block to another method"
+
+def tap_with_yield
+  5.tap { |i| yield i }
+end
+
+def tap_with_forwarded_block(&block)
+  5.tap(&block)
+end
+
+Benchmark.ips do |x|
+  x.report("tap { |i| yield i }") do
+    tap_with_yield { |i| }
+  end
+
+  x.report("tap(&block)        ") do
+    tap_with_forwarded_block { |i| }
+  end
+end
+
+def yield_n_times(n)
+  n.times { yield }
+end
+
+def forward_block_to_n_times(n, &block)
+  n.times(&block)
+end
+
+def call_block_n_times(n, &block)
+  n.times { block.call }
+end
+
+[10, 25, 50, 100, 1000, 10000].each do |count|
+  puts "Invoking the block #{count} times"
+
+  Benchmark.ips do |x|
+    x.report("#{count}.times { yield }     ") do
+      yield_n_times(count) { }
+    end
+
+    x.report("#{count}.times(&block)       ") do
+      forward_block_to_n_times(count) { }
+    end
+
+    x.report("#{count}.times { block.call }") do
+      call_block_n_times(count) { }
+    end
+  end
+end
+
 __END__
 
-This benchmark demonstrates that `yield` is much, much faster
-than capturing `&block` and calling it. In fact, the simple act
-of capturing `&block`, even if we don't later reference `&block`,
-incurs most of the cost, so we should avoid capturing blocks unless
-we absolutely need to.
+This benchmark demonstrates that capturing a block (e.g. `&block`) has
+a high constant cost, taking about 5x longer than a single `yield`
+(even if the block is never used!).
+
+However, fowarding a captured block can be faster than using `yield`
+if the block is used many times (the breakeven point is at about 20-25
+invocations), so it appears that he per-invocation cost of `yield`
+is higher than that of a captured-and-forwarded block.
+
+Note that there is no circumstance where using `block.call` is faster.
 
+See also `flat_map_vs_inject.rb`, which appears to contradict these
+results a little bit.
+
+Using the block directly
 Calculating -------------------------------------
 yield
-                        93.104k i/100ms
+                        91.539k i/100ms
 capture block and yield
-                        52.682k i/100ms
+                        50.945k i/100ms
 capture block and call
-                        51.115k i/100ms
+                        50.923k i/100ms
 -------------------------------------------------
 yield
-                          5.161M (±10.6%) i/s -     25.231M
+                          4.757M (± 6.0%) i/s -     23.709M
 capture block and yield
-                          1.141M (±22.0%) i/s -      5.426M
+                          1.112M (±20.7%) i/s -      5.349M
 capture block and call
-                          1.027M (±21.8%) i/s -      4.856M
+                        964.475k (±20.3%) i/s -      4.634M
+Forwarding the block to another method
+Calculating -------------------------------------
+ tap { |i| yield i }    74.620k i/100ms
+ tap(&block)            51.382k i/100ms
+-------------------------------------------------
+ tap { |i| yield i }      3.213M (± 6.3%) i/s -     16.043M
+ tap(&block)            970.418k (±18.6%) i/s -      4.727M
+Invoking the block 10 times
+Calculating -------------------------------------
+10.times { yield }
+                        49.151k i/100ms
+10.times(&block)
+                        40.682k i/100ms
+10.times { block.call }
+                        27.576k i/100ms
+-------------------------------------------------
+10.times { yield }
+                        908.673k (± 4.9%) i/s -      4.571M
+10.times(&block)
+                        674.565k (±16.1%) i/s -      3.336M
+10.times { block.call }
+                        385.056k (±10.3%) i/s -      1.930M
+Invoking the block 25 times
+Calculating -------------------------------------
+25.times { yield }
+                        29.874k i/100ms
+25.times(&block)
+                        30.934k i/100ms
+25.times { block.call }
+                        17.119k i/100ms
+-------------------------------------------------
+25.times { yield }
+                        416.342k (± 3.6%) i/s -      2.091M
+25.times(&block)
+                        446.108k (±10.6%) i/s -      2.227M
+25.times { block.call }
+                        201.264k (± 7.2%) i/s -      1.010M
+Invoking the block 50 times
+Calculating -------------------------------------
+50.times { yield }
+                        17.690k i/100ms
+50.times(&block)
+                        21.760k i/100ms
+50.times { block.call }
+                         9.961k i/100ms
+-------------------------------------------------
+50.times { yield }
+                        216.195k (± 5.7%) i/s -      1.079M
+50.times(&block)
+                        280.217k (± 9.9%) i/s -      1.393M
+50.times { block.call }
+                        112.754k (± 5.6%) i/s -    567.777k
+Invoking the block 100 times
+Calculating -------------------------------------
+100.times { yield }
+                        10.143k i/100ms
+100.times(&block)
+                        13.688k i/100ms
+100.times { block.call }
+                         5.551k i/100ms
+-------------------------------------------------
+100.times { yield }
+                        111.700k (± 3.6%) i/s -    568.008k
+100.times(&block)
+                        163.638k (± 7.7%) i/s -    821.280k
+100.times { block.call }
+                         58.472k (± 5.6%) i/s -    294.203k
+Invoking the block 1000 times
+Calculating -------------------------------------
+1000.times { yield }
+                         1.113k i/100ms
+1000.times(&block)
+                         1.817k i/100ms
+1000.times { block.call }
+                       603.000  i/100ms
+-------------------------------------------------
+1000.times { yield }
+                         11.156k (± 8.4%) i/s -     56.763k
+1000.times(&block)
+                         18.551k (±10.1%) i/s -     92.667k
+1000.times { block.call }
+                          6.206k (± 3.5%) i/s -     31.356k
+Invoking the block 10000 times
+Calculating -------------------------------------
+10000.times { yield }
+                       113.000  i/100ms
+10000.times(&block)
+                       189.000  i/100ms
+10000.times { block.call }
+                        61.000  i/100ms
+-------------------------------------------------
+10000.times { yield }
+                          1.150k (± 3.6%) i/s -      5.763k
+10000.times(&block)
+                          1.896k (± 6.9%) i/s -      9.450k
+10000.times { block.call }
+                        624.401  (± 3.0%) i/s -      3.172k
diff --git a/benchmarks/flat_map_vs_inject.rb b/benchmarks/flat_map_vs_inject.rb
@@ -1,8 +1,15 @@
 require 'benchmark/ips'
-require 'rspec/core/flat_map'
 
 words = %w[ foo bar bazz big small medium large tiny less more good bad mediocre ]
 
+def flat_map_using_yield(array)
+  array.flat_map { |item| yield item }
+end
+
+def flat_map_using_block(array, &block)
+  array.flat_map(&block)
+end
+
 Benchmark.ips do |x|
   x.report("flat_map") do
     words.flat_map(&:codepoints)
@@ -16,13 +23,33 @@
     words.inject([]) { |a, w| a.concat w.codepoints }
   end
 
-  x.report("FlatMap.flat_map") do
-    RSpec::Core::FlatMap.flat_map(words, &:codepoints)
+  x.report("flat_map_using_yield") do
+    flat_map_using_yield(words, &:codepoints)
+  end
+
+  x.report("flat_map_using_block") do
+    flat_map_using_block(words, &:codepoints)
   end
 end
 
 __END__
-        flat_map    136.445k (± 5.8%) i/s -    682.630k
-      inject (+)     99.557k (±10.0%) i/s -    496.368k
- inject (concat)    120.902k (±14.6%) i/s -    598.400k
-FlatMap.flat_map    121.461k (± 8.5%) i/s -    608.826k
+
+Surprisingly, `flat_map(&block)` appears to be faster than
+`flat_map { yield }` in spite of the fact that our array here
+is smaller than the break-even point of 20-25 measured in the
+`capture_block_vs_yield.rb` benchmark. In fact, the forwaded-block
+version remains faster in my benchmarks here no matter how small
+I shrink the `words` array. I'm not sure why!
+
+Calculating -------------------------------------
+            flat_map    10.594k i/100ms
+          inject (+)     8.357k i/100ms
+     inject (concat)    10.404k i/100ms
+flat_map_using_yield    10.081k i/100ms
+flat_map_using_block    11.683k i/100ms
+-------------------------------------------------
+            flat_map    136.442k (±10.4%) i/s -    678.016k
+          inject (+)     98.024k (± 9.7%) i/s -    493.063k
+     inject (concat)    119.822k (±10.5%) i/s -    593.028k
+flat_map_using_yield    112.284k (± 9.7%) i/s -    564.536k
+flat_map_using_block    134.533k (± 6.3%) i/s -    677.614k
diff --git a/lib/rspec/core/flat_map.rb b/lib/rspec/core/flat_map.rb
@@ -3,12 +3,12 @@ module Core
     # @private
     module FlatMap
       if [].respond_to?(:flat_map)
-        def flat_map(array)
-          array.flat_map { |item| yield item }
+        def flat_map(array, &block)
+          array.flat_map(&block)
         end
       else # for 1.8.7
-        def flat_map(array)
-          array.map { |item| yield item }.flatten(1)
+        def flat_map(array, &block)
+          array.map(&block).flatten(2)
         end
       end