Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ractor support #199

Merged
merged 4 commits into from
Aug 20, 2022
Merged

Ractor support #199

merged 4 commits into from
Aug 20, 2022

Conversation

mrkn
Copy link
Contributor

@mrkn mrkn commented Oct 18, 2021

I want to let numo-narray support Ractor in this pull request.

The following changes are made:

  • Freeze UPCAST constants to make them sharable in non-main Ractors
  • Make frozen narrays sharable with non-main Ractors except for instances of Numo::RObject

I keep Numo::RObject non-sharable because its instances can have compound objects such as Array and Hash.

@masa16 Could you please take a look?

@kojix2 kojix2 mentioned this pull request Feb 1, 2022
@orlando-labs
Copy link

orlando-labs commented Feb 2, 2022

Hi. I tried given branch mrkn:ractor_support with my current project and found out performance issues. I tried to isolate the issue with the simple benchmark

require 'benchmark'
require 'numo/narray'

Warning[:experimental] = false

puts 'Testing Numo'

data = Ractor.make_shareable Array.new(1_000_000) { Numo::SFloat.new(10).rand(100) }

Benchmark.bm do |bm|
  bm.report('no ractor') do
    4.times { data.each &:mean }
  end
  
  bm.report('1 ractor') do
    Ractor.new(data) do |arr|
      4.times { arr.each &:mean }
      nil
    end.take
  end

  bm.report('2 ractors') do
    2.times.map do
      Ractor.new(data) do |arr|
        2.times { arr.each &:mean }
        nil
      end
    end.each &:take
  end
  
  bm.report('4 ractors') do
    4.times.map do
      Ractor.new(data) do |arr|
        arr.each &:mean
        nil
      end
    end.each &:take
  end
end

puts 'Testing core Array'

data = Ractor.make_shareable Array.new(2_000_000) { Array.new(10) { Random.rand } }

Benchmark.bm do |bm| 
  bm.report('no ractor') do
    4.times { data.each { |v| v.sum / v.size.to_f } }
  end
  
  bm.report('1 ractor') do
    Ractor.new(data) do |arr|
      4.times { arr.each { |v| v.sum / v.size.to_f } }
      nil
    end.take
  end

  bm.report('2 ractors') do
    2.times.map do
      Ractor.new(data) do |arr|
        2.times { arr.each { |v| v.sum / v.size.to_f } }
        nil
      end
    end.each &:take
  end
  
  bm.report('4 ractors') do
    4.times.map do
      Ractor.new(data) do |arr|
        arr.each { |v| v.sum / v.size.to_f }
        nil
      end
    end.each &:take
  end
end

Running on Ruby 3.1, Centos 8, it produces the following output on idling 14-core xeon e5-2680 v4.

Testing Numo
       user     system      total        real
no ractor  5.259529   0.021958   5.281487 (  5.292886)
1 ractor  6.039938   0.114778   6.154716 (  6.116115)
2 ractors 17.098116   2.135474  19.233590 ( 10.368513)
4 ractors 27.108385   7.667887  34.776272 ( 10.787219)
Testing core Array
       user     system      total        real
no ractor  1.408945   0.000000   1.408945 (  1.411900)
1 ractor  1.742667   0.028465   1.771132 (  1.774470)
2 ractors  1.458583   0.000000   1.458583 (  0.735995)
4 ractors  1.495018   0.000000   1.495018 (  0.385232)

For some reason the performance of multiple Ractors calculating Numo arrays degrades significantly

@mrkn
Copy link
Contributor Author

mrkn commented Feb 3, 2022

@orlando-labs At first, you need to understand that numo-narray is not always faster than Array. Numo-narray is designed for operating large numeric arrays. So testing with 10-length arrays is very disadvantageous for numo-narray.

With the following benchmark, you can see the running time chagnes in the differnt way between numo-narray and normal array. Numo-narray is slower than normal array when array_len < 1000, but it is faster than normal array when array_len > 1000, on my machine.

require 'benchmark'
require 'numo/narray'

array_count = 10000
[10, 100, 1000, 10000].each do |array_len|
  data_numo = Array.new(array_count) { Numo::SFloat.new(array_len).rand(100) }
  data_ary = Array.new(array_count) { Array.new(array_len) { Random.rand } }

  puts
  puts "# array_len = #{array_len}"
  puts

  Benchmark.bm do |bm|
    bm.report('numo') do
      4.times { data_numo.each &:mean }
    end

    bm.report('array') do
      4.times { data_ary.each { |v| v.sum / v.size.to_f } }
    end
  end
end
# array_len = 10

       user     system      total        real
numo  0.041033   0.000000   0.041033 (  0.041040)
array  0.004684   0.000000   0.004684 (  0.004685)

# array_len = 100

       user     system      total        real
numo  0.048274   0.000000   0.048274 (  0.048298)
array  0.014097   0.000000   0.014097 (  0.014101)

# array_len = 1000

       user     system      total        real
numo  0.086663   0.000000   0.086663 (  0.086707)
array  0.108927   0.000000   0.108927 (  0.108994)

# array_len = 10000

       user     system      total        real
numo  0.399808   0.000000   0.399808 (  0.400040)
array  1.062646   0.000000   1.062646 (  1.063160)

With the following benchmark code that is similar to yours, numo-narray is faster than normal array.

require 'benchmark'
require 'numo/narray'

Warning[:experimental] = false

array_len = 10000
array_count = 10000

puts 'Testing Numo'

data = Ractor.make_shareable Array.new(array_count) { Numo::SFloat.new(array_len).rand(100) }

Benchmark.bm do |bm|
  bm.report('no ractor') do
    4.times { data.each &:mean }
  end

  bm.report('1 ractor') do
    Ractor.new(data) do |arr|
      4.times { arr.each &:mean }
      nil
    end.take
  end

  bm.report('2 ractors') do
    2.times.map do
      Ractor.new(data) do |arr|
        2.times { arr.each &:mean }
        nil
      end
    end.each &:take
  end

  bm.report('4 ractors') do
    4.times.map do
      Ractor.new(data) do |arr|
        arr.each &:mean
        nil
      end
    end.each &:take
  end
end

puts 'Testing core Array'

data = Ractor.make_shareable Array.new(2*array_count) { Array.new(array_len) { Random.rand } }

Benchmark.bm do |bm|
  bm.report('no ractor') do
    4.times { data.each { |v| v.sum / v.size.to_f } }
  end

  bm.report('1 ractor') do
    Ractor.new(data) do |arr|
      4.times { arr.each { |v| v.sum / v.size.to_f } }
      nil
    end.take
  end

  bm.report('2 ractors') do
    2.times.map do
      Ractor.new(data) do |arr|
        2.times { arr.each { |v| v.sum / v.size.to_f } }
        nil
      end
    end.each &:take
  end

  bm.report('4 ractors') do
    4.times.map do
      Ractor.new(data) do |arr|
        arr.each { |v| v.sum / v.size.to_f }
        nil
      end
    end.each &:take
  end
end
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]
Testing Numo
       user     system      total        real
no ractor  0.326137   0.000217   0.326354 (  0.326562)
1 ractor  0.355712   0.000000   0.355712 (  0.355770)
2 ractors  0.383952   0.000106   0.384058 (  0.201436)
4 ractors  0.396320   0.000000   0.396320 (  0.106362)
Testing core Array
       user     system      total        real
no ractor  2.058136   0.000000   2.058136 (  2.059276)
1 ractor  2.062877   0.000000   2.062877 (  2.063872)
2 ractors  2.098526   0.000000   2.098526 (  1.052108)
4 ractors  2.203544   0.000006   2.203550 (  0.560517)

@orlando-labs
Copy link

Hi, @mrkn. Thanks for the response. I appreciate it.
And it stays unclear why my example leads to growing processing times: 4 ractors with quarter-load did the job 1.5 times slower than 1 ractor with a full load. With yours 10k-sized, I see expected speedup.

@kojix2
Copy link
Contributor

kojix2 commented Feb 5, 2022

This is an article that ko1, a developer of Ractor, posted on his company Cookpad's blog about a year ago.
https://techlife.cookpad.com/entry/2020/12/26/131858 [Japanese]

Here, he says that using Ractor can be slower than not using it.

In the previous example, we were able to achieve a speedup of almost 4 times. However, this is a best case, or champion data, example that works well.

He writes that slow referencing of constants is one of the reasons why Ractor is slow.

  • The inline cache used for constant lookups was not thread-safe, so the cache was disabled except for the main Ractor.
  • The constant table is shared among Ractor, so it is locked, but if the lock conflicts, it is very slow.

He has written that he will fix this problem, so constant referencing may not be slow now.

As multiple-core CPUs become commonplace, the need to describe parallel computation is increasing. This phrase has been a standard preamble for more than 10 years when I was doing research at university. In fact, I don't think anyone would disagree that parallel computing is essential for writing high-performance software.

In order to perform parallel computation, the program must support parallel computation. In order to do so, parallel programming is required. Many programming languages already have a mechanism for parallel computing.

numo-narray is probably one of the areas in Ruby where Ractor will be used the most in the future. I think it is very important for future for Ruby that Ractor is available in numo-narray.

ping: If you don't mind, @ko1, could you take a look at this for us?

@orlando-labs
Copy link

Hi, @mrkn, @kojix2, as long as I'm using ractor-compatible branch for 2 months, I see no issues, except the performance ones, which are not relative to numo-narray

@seoanezonjic
Copy link

Hi all
I'm very interested in this feature. Is it merged with the main branch or a new checking is needed to allow this code be in production?
Thank you very much
Pedro Seoane

@mrkn
Copy link
Contributor Author

mrkn commented May 24, 2022

I'll contact with the owner.

@masa16 masa16 merged commit 9468bf2 into ruby-numo:master Aug 20, 2022
@darnellbrawner
Copy link

darnellbrawner commented Dec 13, 2022

Seeing similar slow down issues when using more than 2 Ractors and Numo.
https://github.com/PlummersSoftwareLLC/Primes
solution_2 uses Numo Single thread, Numo multithread using Ractor, and multithreaded no Numo used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants