Improve yielding block performance #1535

Watson1978 · 2017-03-08T05:21:38Z

The yielding block will be faster around 9%.
This patch ensures that expand to inline codes in where invoke yielding block.

Environment
- macOS 10.12.3
- clang 8.0.0 in Xcode 8.2
Before

                    user     system      total        real
Integer#times   0.930000   0.000000   0.930000 (  0.932125)
Array#each      0.950000   0.000000   0.950000 (  0.957962)
Array#map       1.220000   0.030000   1.250000 (  1.249174)

After

                    user     system      total        real
Integer#times   0.850000   0.000000   0.850000 (  0.853202)
Array#each      0.860000   0.010000   0.870000 (  0.865507)
Array#map       1.120000   0.020000   1.140000 (  1.149939)

Test code

require 'benchmark'

Benchmark.bmbm do |x|

  ary = (1..10000).to_a

  x.report "Integer#times" do
    20000000.times do
    end
  end

  x.report "Array#each" do
    2000.times do
      ary.each { |x| }
    end
  end

  x.report "Array#map" do
    2000.times do
      ary.map { |x| }
    end
  end

end

https://bugs.ruby-lang.org/issues/13342

nobu · 2017-03-08T08:06:08Z

Use ALWAYS_INLINE macro.

The yielding block will be faster around 9%. This patch ensures that expand to inline codes in where invoke yielding block. * Environment - macOS 10.12.3 - clang 8.0.0 in Xcode 8.2 * Before user system total real Integer#times 0.930000 0.000000 0.930000 ( 0.932125) Array#each 0.950000 0.000000 0.950000 ( 0.957962) Array#map 1.220000 0.030000 1.250000 ( 1.249174) * After user system total real Integer#times 0.850000 0.000000 0.850000 ( 0.853202) Array#each 0.860000 0.010000 0.870000 ( 0.865507) Array#map 1.120000 0.020000 1.140000 ( 1.149939) * Test code require 'benchmark' Benchmark.bmbm do |x| ary = (1..10000).to_a x.report "Integer#times" do 20000000.times do end end x.report "Array#each" do 2000.times do ary.each { |x| } end end x.report "Array#map" do 2000.times do ary.map { |x| } end end end

Watson1978 · 2017-03-08T08:24:41Z

@nobu Thank you for your review. Updated the code with your suggestion.

This patch improves performance in where invoke blocks with other case. bm_app_lc_fizzbuzz.rb will be faster around 5%. * Before ``` $ ruby benchmark/run.rb --ruby=./miniruby --only-ruby bm_app_lc_fizzbuzz MatzRuby: Ruby: ruby 2.5.0dev (2017-03-08 yield 57806) [x86_64-darwin16] last_commit=Improve yielding block performance app_lc_fizzbuzz: ruby 77.322 -- benchmark summary --------------------------- app_lc_fizzbuzz 77.322 ``` * After ``` $ ruby benchmark/run.rb --ruby=./miniruby --only-ruby bm_app_lc_fizzbuzz MatzRuby: Ruby: ruby 2.5.0dev (2017-03-08 yield 57806) [x86_64-darwin16] last_commit=Improve yielding block performance app_lc_fizzbuzz: ruby 72.187 -- benchmark summary --------------------------- app_lc_fizzbuzz 72.187 ```

Integer#times will be faster around 7% on macOS + clang environment. * Before user system total real 2.310000 0.000000 2.310000 ( 2.315654) * After user system total real 2.150000 0.000000 2.150000 ( 2.153181) * Test code require 'benchmark' Benchmark.bmbm do |x| x.report do 50000000.times do end end end

nurse

For me, 9% is not enough large improvement for such a ugly optimization.

Watson1978 · 2017-05-26T07:00:00Z

I think this pull request is related to https://bugs.ruby-lang.org/issues/12599
This affects the performance if compiled with clang

If inline-threshold compile flag would be adjusted, I guess this might be unnecessary.

nurse · 2017-05-26T17:02:17Z

If some functions which is expected to be inlined like vm_getivar, which has some constant argument and when it is inlined the content is significantly optimized, are not optimized by clang with default inline-threshold, and they are inlined if inline-threshold is set as specific size, it sounds reasonable.

k0kubun · 2019-08-17T04:37:11Z

It seems to have a conflict now. Could you rebase this from master?

k0kubun · 2019-08-19T03:42:57Z

Let me close this as it has not been updated for a while. Please reopen this after resolving conflicts. Thanks.

Watson1978 force-pushed the yield branch from 0d4170f to b504bd7 Compare March 8, 2017 08:23

Watson1978 added 2 commits March 22, 2017 12:53

Watson1978 force-pushed the yield branch from aa0b29e to 0122668 Compare April 7, 2017 06:17

nurse reviewed May 19, 2017

View reviewed changes

Watson1978 mentioned this pull request May 22, 2017

Fix one of performance regressions in method calling #1556

Open

matzbot force-pushed the trunk branch from 2677ddd to ce7ad3a Compare January 18, 2018 15:27

k0kubun changed the base branch from trunk to master August 15, 2019 17:38

k0kubun closed this Aug 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve yielding block performance #1535

Improve yielding block performance #1535

Watson1978 commented Mar 8, 2017 •

edited

Loading

nobu commented Mar 8, 2017

Watson1978 commented Mar 8, 2017

nurse left a comment

Watson1978 commented May 26, 2017

nurse commented May 26, 2017

k0kubun commented Aug 17, 2019

k0kubun commented Aug 19, 2019

Improve yielding block performance #1535

Improve yielding block performance #1535

Conversation

Watson1978 commented Mar 8, 2017 • edited Loading

nobu commented Mar 8, 2017

Watson1978 commented Mar 8, 2017

nurse left a comment

Choose a reason for hiding this comment

Watson1978 commented May 26, 2017

nurse commented May 26, 2017

k0kubun commented Aug 17, 2019

k0kubun commented Aug 19, 2019

Watson1978 commented Mar 8, 2017 •

edited

Loading