Skip to content

Commit

Permalink
Use forwarded block rather than yield.
Browse files Browse the repository at this point in the history
This appears to be faster in our benchmarks, and
based also on what @yujinakayama has said.
  • Loading branch information
myronmarston committed Jan 4, 2015
1 parent cd52601 commit 940b7ab
Show file tree
Hide file tree
Showing 3 changed files with 207 additions and 22 deletions.
180 changes: 169 additions & 11 deletions benchmarks/capture_block_vs_yield.rb
Expand Up @@ -12,6 +12,8 @@ def capture_block_and_call(&block)
block.call
end

puts "Using the block directly"

Benchmark.ips do |x|
x.report("yield ") do
yield_control { }
Expand All @@ -26,25 +28,181 @@ def capture_block_and_call(&block)
end
end

puts "Forwarding the block to another method"

def tap_with_yield
5.tap { |i| yield i }
end

def tap_with_forwarded_block(&block)
5.tap(&block)
end

Benchmark.ips do |x|
x.report("tap { |i| yield i }") do
tap_with_yield { |i| }
end

x.report("tap(&block) ") do
tap_with_forwarded_block { |i| }
end
end

def yield_n_times(n)
n.times { yield }
end

def forward_block_to_n_times(n, &block)
n.times(&block)
end

def call_block_n_times(n, &block)
n.times { block.call }
end

[10, 25, 50, 100, 1000, 10000].each do |count|
puts "Invoking the block #{count} times"

Benchmark.ips do |x|
x.report("#{count}.times { yield } ") do
yield_n_times(count) { }
end

x.report("#{count}.times(&block) ") do
forward_block_to_n_times(count) { }
end

x.report("#{count}.times { block.call }") do
call_block_n_times(count) { }
end
end
end

__END__

This benchmark demonstrates that `yield` is much, much faster
than capturing `&block` and calling it. In fact, the simple act
of capturing `&block`, even if we don't later reference `&block`,
incurs most of the cost, so we should avoid capturing blocks unless
we absolutely need to.
This benchmark demonstrates that capturing a block (e.g. `&block`) has
a high constant cost, taking about 5x longer than a single `yield`
(even if the block is never used!).

However, fowarding a captured block can be faster than using `yield`
if the block is used many times (the breakeven point is at about 20-25
invocations), so it appears that he per-invocation cost of `yield`
is higher than that of a captured-and-forwarded block.

Note that there is no circumstance where using `block.call` is faster.

See also `flat_map_vs_inject.rb`, which appears to contradict these
results a little bit.

Using the block directly
Calculating -------------------------------------
yield
93.104k i/100ms
91.539k i/100ms
capture block and yield
52.682k i/100ms
50.945k i/100ms
capture block and call
51.115k i/100ms
50.923k i/100ms
-------------------------------------------------
yield
5.161M10.6%) i/s - 25.231M
4.757M 6.0%) i/s - 23.709M
capture block and yield
1.141M22.0%) i/s - 5.426M
1.112M20.7%) i/s - 5.349M
capture block and call
1.027M (±21.8%) i/s - 4.856M
964.475k (±20.3%) i/s - 4.634M
Forwarding the block to another method
Calculating -------------------------------------
tap { |i| yield i } 74.620k i/100ms
tap(&block) 51.382k i/100ms
-------------------------------------------------
tap { |i| yield i } 3.213M (± 6.3%) i/s - 16.043M
tap(&block) 970.418k (±18.6%) i/s - 4.727M
Invoking the block 10 times
Calculating -------------------------------------
10.times { yield }
49.151k i/100ms
10.times(&block)
40.682k i/100ms
10.times { block.call }
27.576k i/100ms
-------------------------------------------------
10.times { yield }
908.673k (± 4.9%) i/s - 4.571M
10.times(&block)
674.565k (±16.1%) i/s - 3.336M
10.times { block.call }
385.056k (±10.3%) i/s - 1.930M
Invoking the block 25 times
Calculating -------------------------------------
25.times { yield }
29.874k i/100ms
25.times(&block)
30.934k i/100ms
25.times { block.call }
17.119k i/100ms
-------------------------------------------------
25.times { yield }
416.342k (± 3.6%) i/s - 2.091M
25.times(&block)
446.108k (±10.6%) i/s - 2.227M
25.times { block.call }
201.264k (± 7.2%) i/s - 1.010M
Invoking the block 50 times
Calculating -------------------------------------
50.times { yield }
17.690k i/100ms
50.times(&block)
21.760k i/100ms
50.times { block.call }
9.961k i/100ms
-------------------------------------------------
50.times { yield }
216.195k (± 5.7%) i/s - 1.079M
50.times(&block)
280.217k (± 9.9%) i/s - 1.393M
50.times { block.call }
112.754k (± 5.6%) i/s - 567.777k
Invoking the block 100 times
Calculating -------------------------------------
100.times { yield }
10.143k i/100ms
100.times(&block)
13.688k i/100ms
100.times { block.call }
5.551k i/100ms
-------------------------------------------------
100.times { yield }
111.700k (± 3.6%) i/s - 568.008k
100.times(&block)
163.638k (± 7.7%) i/s - 821.280k
100.times { block.call }
58.472k (± 5.6%) i/s - 294.203k
Invoking the block 1000 times
Calculating -------------------------------------
1000.times { yield }
1.113k i/100ms
1000.times(&block)
1.817k i/100ms
1000.times { block.call }
603.000 i/100ms
-------------------------------------------------
1000.times { yield }
11.156k (± 8.4%) i/s - 56.763k
1000.times(&block)
18.551k (±10.1%) i/s - 92.667k
1000.times { block.call }
6.206k (± 3.5%) i/s - 31.356k
Invoking the block 10000 times
Calculating -------------------------------------
10000.times { yield }
113.000 i/100ms
10000.times(&block)
189.000 i/100ms
10000.times { block.call }
61.000 i/100ms
-------------------------------------------------
10000.times { yield }
1.150k (± 3.6%) i/s - 5.763k
10000.times(&block)
1.896k (± 6.9%) i/s - 9.450k
10000.times { block.call }
624.401 (± 3.0%) i/s - 3.172k
41 changes: 34 additions & 7 deletions benchmarks/flat_map_vs_inject.rb
@@ -1,8 +1,15 @@
require 'benchmark/ips'
require 'rspec/core/flat_map'

words = %w[ foo bar bazz big small medium large tiny less more good bad mediocre ]

def flat_map_using_yield(array)
array.flat_map { |item| yield item }
end

def flat_map_using_block(array, &block)
array.flat_map(&block)
end

Benchmark.ips do |x|
x.report("flat_map") do
words.flat_map(&:codepoints)
Expand All @@ -16,13 +23,33 @@
words.inject([]) { |a, w| a.concat w.codepoints }
end

x.report("FlatMap.flat_map") do
RSpec::Core::FlatMap.flat_map(words, &:codepoints)
x.report("flat_map_using_yield") do
flat_map_using_yield(words, &:codepoints)
end

x.report("flat_map_using_block") do
flat_map_using_block(words, &:codepoints)
end
end

__END__
flat_map 136.445k (± 5.8%) i/s - 682.630k
inject (+) 99.557k (±10.0%) i/s - 496.368k
inject (concat) 120.902k (±14.6%) i/s - 598.400k
FlatMap.flat_map 121.461k (± 8.5%) i/s - 608.826k

Surprisingly, `flat_map(&block)` appears to be faster than
`flat_map { yield }` in spite of the fact that our array here
is smaller than the break-even point of 20-25 measured in the
`capture_block_vs_yield.rb` benchmark. In fact, the forwaded-block
version remains faster in my benchmarks here no matter how small
I shrink the `words` array. I'm not sure why!

Calculating -------------------------------------
flat_map 10.594k i/100ms
inject (+) 8.357k i/100ms
inject (concat) 10.404k i/100ms
flat_map_using_yield 10.081k i/100ms
flat_map_using_block 11.683k i/100ms
-------------------------------------------------
flat_map 136.442k (±10.4%) i/s - 678.016k
inject (+) 98.024k (± 9.7%) i/s - 493.063k
inject (concat) 119.822k (±10.5%) i/s - 593.028k
flat_map_using_yield 112.284k (± 9.7%) i/s - 564.536k
flat_map_using_block 134.533k (± 6.3%) i/s - 677.614k
8 changes: 4 additions & 4 deletions lib/rspec/core/flat_map.rb
Expand Up @@ -3,12 +3,12 @@ module Core
# @private
module FlatMap
if [].respond_to?(:flat_map)
def flat_map(array)
array.flat_map { |item| yield item }
def flat_map(array, &block)
array.flat_map(&block)
end
else # for 1.8.7
def flat_map(array)
array.map { |item| yield item }.flatten(1)
def flat_map(array, &block)
array.map(&block).flatten(2)
end
end

Expand Down

0 comments on commit 940b7ab

Please sign in to comment.