Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ractor newobj cache #3842

Merged
merged 9 commits into from Dec 6, 2020
Merged

Ractor newobj cache #3842

merged 9 commits into from Dec 6, 2020

Conversation

ko1
Copy link
Contributor

@ko1 ko1 commented Dec 3, 2020

Now object allocation requires VM global lock to synchronize objspace.
However, of course, it introduces huge overhead.
This patch caches some slots (in a page) by each ractor and use cached
slots for object allocation. If there is no cached slots, acquire the global lock
and get new cached slots, or start GC (marking or lazy sweeping).

@ko1 ko1 force-pushed the ractor_newobj_cache branch 2 times, most recently from 627eb49 to 81b0422 Compare December 6, 2020 06:44
@ko1 ko1 marked this pull request as ready for review December 6, 2020 06:45
@ko1
Copy link
Contributor Author

ko1 commented Dec 6, 2020

On the Linux box,

Warning[:experimental] = false

def task
  i = 0
  while i < 1_000_000
    a = [[[[[[[[[[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]]]]]]]
    i += 1
  end
  # 100_000.times{ [[[[[[[[[[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]]]]]]] }
end

MODE = (ARGV.shift || :serial).to_sym
TN = 8

case MODE
when :serial
  TN.times{ task }
when :r_serial
  TN.times{
    Ractor.new{
      task
    }.take
  }
when :r_parallel
  TN.times.map{
    Ractor.new{
      task
    }
  }.each{|r| r.take}
else
  raise
end

This program shows:

                               user     system      total        real
serial/master_mini         0.000085   0.000028   0.321040 (  0.321028)
serial/miniruby            0.000136   0.000045   0.289354 (  0.289312)
serial/master_ruby         0.000144   0.000048   0.333495 (  0.333450)
serial/ruby                0.000134   0.000044   0.333663 (  0.333622)
r_serial/master_mini       0.000097   0.000032   0.564617 (  0.564514)
r_serial/miniruby          0.000142   0.000047   0.309408 (  0.309311)
r_serial/master_ruby       0.000134   0.000044   0.595155 (  0.595050)
r_serial/ruby              0.000119   0.000040   0.348166 (  0.348066)
r_parallel/master_mini     0.000079   0.000027   7.840724 (  2.451340)
r_parallel/miniruby        0.000061   0.000020   1.708715 (  0.777945)
r_parallel/master_ruby     0.000092   0.000031   8.217047 (  2.541274)
r_parallel/ruby            0.000103   0.000035   1.921474 (  0.868992)

master_mini: master's miniruby
master_ruby: master's installed ruby
miniruby: modified miniruby
ruby: modified ruby

This patch improved the object allocation performance, especially on multi-ractor mode.
However it is still slow with parallel allocations.

@ko1
Copy link
Contributor Author

ko1 commented Dec 6, 2020

With another task:

def task
  i = 0
  s = '0,' * 1_000
  while i < 1_000
    a = s.split(/,/)

    i += 1
  end
end
                               user     system      total        real
serial/master_mini         0.000094   0.000023   0.733983 (  0.733979)
serial/miniruby            0.000144   0.000036   0.738881 (  0.738846)
serial/master_ruby         0.000138   0.000035   0.798197 (  0.798159)
serial/ruby                0.000139   0.000035   0.803850 (  0.803837)
r_serial/master_mini       0.000142   0.000035   1.131381 (  1.131291)
r_serial/miniruby          0.000144   0.000036   1.051713 (  1.051624)
r_serial/master_ruby       0.000131   0.000033   1.336663 (  1.336582)
r_serial/ruby              0.000146   0.000037   1.211739 (  1.211644)
r_parallel/master_mini     0.000093   0.000023  12.059595 (  3.394471)
r_parallel/miniruby        0.000103   0.000026   9.873731 (  2.765747)
r_parallel/master_ruby     0.000095   0.000024  13.050518 (  3.613259)
r_parallel/ruby            0.000083   0.000021  10.168195 (  2.848708)

It is still slower. Maybe because VM lock contention on the Encoding.

@ko1
Copy link
Contributor Author

ko1 commented Dec 6, 2020

Another task (Array#product to make many arrays):

def task
  i = 0
  a = (1..100).to_a
  while i < 1_000
    as = a.product(a)
    i += 1
  end
end
                               user     system      total        real
serial/master_mini         0.000185   0.000031   2.461938 (  2.461924)
serial/miniruby            0.000154   0.000025   2.452719 (  2.452708)
serial/master_ruby         0.000151   0.000025   3.474300 (  3.474296)
serial/ruby                0.000145   0.000025   3.396823 (  3.396814)
r_serial/master_mini       0.000151   0.000025   3.183569 (  3.183501)
r_serial/miniruby          0.000113   0.000019   3.173086 (  3.173036)
r_serial/master_ruby       0.000144   0.000024   4.357124 (  4.357084)
r_serial/ruby              0.000114   0.000019   3.029095 (  3.029025)
r_parallel/master_mini     0.000089   0.000015  30.343168 (  9.423811)
r_parallel/miniruby        0.000097   0.000017   5.655085 (  3.710885)
r_parallel/master_ruby     0.000103   0.000017  32.811944 ( 10.600854)
r_parallel/ruby            0.000093   0.000015   5.451007 (  3.810941)

This patch is effective (compare with master).

ruby_multi_ractor was a flag that indicates the interpreter doesn't
make any additional ractors (single ractor mode).
Instead of boolean flag, ruby_single_main_ractor pointer is introduced
which keeps main ractor's pointer if single ractor mode. If additional
ractors are created, ruby_single_main_ractor becomes NULL.
accessing theap needs complicating synchronization but it reduce
performance on multi-ractor mode. So simply stop using theap
on multi-ractor mode. In future, theap should be replaced with
more cleaver memory strategy.
Without this patch, Ruby doesn't show ractor's information when
there is only 1 ractor. However it is hard to read the log when
some ractors are created and terminated. This patch makes to keep
showing ractor's information on multi-ractor mode.
This is variant of RB_VM_LOCK_ENTER_LEV, but accept current racotr's
pointer.
Before this patch, there is no information to start locking.
Now object allocation requires VM global lock to synchronize objspace.
However, of course, it introduces huge overhead.
This patch caches some slots (in a page) by each ractor and use cached
slots for object allocation. If there is no cached slots, acquire the global lock
and get new cached slots, or start GC (marking or lazy sweeping).
NEWOBJ with current ec.
Passing current ec can improve performance of newobj. This patch
tries it for Array and String literals ([] and '').
On windows, MJIT doesn't work without this patch because of
the declaration of ruby_single_main_ractor. This patch fix this
issue and move the definition of it from ractor.c to vm.c to locate
near place of ruby_current_vm_ptr.
@ko1 ko1 merged commit bef3eb5 into ruby:master Dec 6, 2020
@ko1 ko1 deleted the ractor_newobj_cache branch December 7, 2020 02:14
ko1 added a commit to ko1/ruby that referenced this pull request Dec 10, 2020
Per ractor method cache (GH-ruby#3842) only cached 1 page and this patch
caches several pages to keep at least 512 free slots if available.
If you increase the number of cached free slots, all cached slots
will be collected when the GC is invoked.
@ko1 ko1 mentioned this pull request Dec 10, 2020
ko1 added a commit that referenced this pull request Dec 10, 2020
Per ractor method cache (GH-#3842) only cached 1 page and this patch
caches several pages to keep at least 512 free slots if available.
If you increase the number of cached free slots, all cached slots
will be collected when the GC is invoked.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant