Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve yielding block performance #1535

Closed
wants to merge 3 commits into from
Closed

Conversation

Watson1978
Copy link
Contributor

@Watson1978 Watson1978 commented Mar 8, 2017

The yielding block will be faster around 9%.
This patch ensures that expand to inline codes in where invoke yielding block.

  • Environment

    • macOS 10.12.3
    • clang 8.0.0 in Xcode 8.2
  • Before

                    user     system      total        real
Integer#times   0.930000   0.000000   0.930000 (  0.932125)
Array#each      0.950000   0.000000   0.950000 (  0.957962)
Array#map       1.220000   0.030000   1.250000 (  1.249174)
  • After
                    user     system      total        real
Integer#times   0.850000   0.000000   0.850000 (  0.853202)
Array#each      0.860000   0.010000   0.870000 (  0.865507)
Array#map       1.120000   0.020000   1.140000 (  1.149939)
  • Test code
require 'benchmark'

Benchmark.bmbm do |x|

  ary = (1..10000).to_a

  x.report "Integer#times" do
    20000000.times do
    end
  end

  x.report "Array#each" do
    2000.times do
      ary.each { |x| }
    end
  end

  x.report "Array#map" do
    2000.times do
      ary.map { |x| }
    end
  end

end

https://bugs.ruby-lang.org/issues/13342

@nobu
Copy link
Member

nobu commented Mar 8, 2017

Use ALWAYS_INLINE macro.

The yielding block will be faster around 9%.
This patch ensures that expand to inline codes in where invoke yielding block.

* Environment
 - macOS 10.12.3
 - clang 8.0.0 in Xcode 8.2

* Before
                    user     system      total        real
Integer#times   0.930000   0.000000   0.930000 (  0.932125)
Array#each      0.950000   0.000000   0.950000 (  0.957962)
Array#map       1.220000   0.030000   1.250000 (  1.249174)

* After
                    user     system      total        real
Integer#times   0.850000   0.000000   0.850000 (  0.853202)
Array#each      0.860000   0.010000   0.870000 (  0.865507)
Array#map       1.120000   0.020000   1.140000 (  1.149939)

* Test code
require 'benchmark'

Benchmark.bmbm do |x|

  ary = (1..10000).to_a

  x.report "Integer#times" do
    20000000.times do
    end
  end

  x.report "Array#each" do
    2000.times do
      ary.each { |x| }
    end
  end

  x.report "Array#map" do
    2000.times do
      ary.map { |x| }
    end
  end

end
@Watson1978
Copy link
Contributor Author

@nobu Thank you for your review. Updated the code with your suggestion.

This patch improves performance in where invoke blocks with other case.
bm_app_lc_fizzbuzz.rb will be faster around 5%.

* Before
```
$ ruby benchmark/run.rb --ruby=./miniruby --only-ruby  bm_app_lc_fizzbuzz
MatzRuby:
Ruby:
ruby 2.5.0dev (2017-03-08 yield 57806) [x86_64-darwin16]
last_commit=Improve yielding block performance

app_lc_fizzbuzz:
ruby 77.322

-- benchmark summary ---------------------------
app_lc_fizzbuzz	77.322
```

* After
```
$ ruby benchmark/run.rb --ruby=./miniruby --only-ruby  bm_app_lc_fizzbuzz
MatzRuby:
Ruby:
ruby 2.5.0dev (2017-03-08 yield 57806) [x86_64-darwin16]
last_commit=Improve yielding block performance

app_lc_fizzbuzz:
ruby 72.187

-- benchmark summary ---------------------------
app_lc_fizzbuzz	72.187
```
Integer#times will be faster around 7% on macOS + clang environment.

* Before
       user     system      total        real
   2.310000   0.000000   2.310000 (  2.315654)

* After
       user     system      total        real
   2.150000   0.000000   2.150000 (  2.153181)

* Test code
require 'benchmark'

Benchmark.bmbm do |x|
  x.report do
    50000000.times do
    end
  end

end
Copy link
Member

@nurse nurse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, 9% is not enough large improvement for such a ugly optimization.

@Watson1978
Copy link
Contributor Author

I think this pull request is related to https://bugs.ruby-lang.org/issues/12599
This affects the performance if compiled with clang

If inline-threshold compile flag would be adjusted, I guess this might be unnecessary.

@nurse
Copy link
Member

nurse commented May 26, 2017

If some functions which is expected to be inlined like vm_getivar, which has some constant argument and when it is inlined the content is significantly optimized, are not optimized by clang with default inline-threshold, and they are inlined if inline-threshold is set as specific size, it sounds reasonable.

@k0kubun k0kubun changed the base branch from trunk to master August 15, 2019 17:38
@k0kubun
Copy link
Member

k0kubun commented Aug 17, 2019

It seems to have a conflict now. Could you rebase this from master?

@k0kubun
Copy link
Member

k0kubun commented Aug 19, 2019

Let me close this as it has not been updated for a while. Please reopen this after resolving conflicts. Thanks.

@k0kubun k0kubun closed this Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants