Ractor newobj cache #3842

ko1 · 2020-12-03T17:25:16Z

Now object allocation requires VM global lock to synchronize objspace.
However, of course, it introduces huge overhead.
This patch caches some slots (in a page) by each ractor and use cached
slots for object allocation. If there is no cached slots, acquire the global lock
and get new cached slots, or start GC (marking or lazy sweeping).

ko1 · 2020-12-06T06:56:18Z

On the Linux box,

Warning[:experimental] = false

def task
  i = 0
  while i < 1_000_000
    a = [[[[[[[[[[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]]]]]]]
    i += 1
  end
  # 100_000.times{ [[[[[[[[[[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]]]]]]] }
end

MODE = (ARGV.shift || :serial).to_sym
TN = 8

case MODE
when :serial
  TN.times{ task }
when :r_serial
  TN.times{
    Ractor.new{
      task
    }.take
  }
when :r_parallel
  TN.times.map{
    Ractor.new{
      task
    }
  }.each{|r| r.take}
else
  raise
end

This program shows:

                               user     system      total        real
serial/master_mini         0.000085   0.000028   0.321040 (  0.321028)
serial/miniruby            0.000136   0.000045   0.289354 (  0.289312)
serial/master_ruby         0.000144   0.000048   0.333495 (  0.333450)
serial/ruby                0.000134   0.000044   0.333663 (  0.333622)
r_serial/master_mini       0.000097   0.000032   0.564617 (  0.564514)
r_serial/miniruby          0.000142   0.000047   0.309408 (  0.309311)
r_serial/master_ruby       0.000134   0.000044   0.595155 (  0.595050)
r_serial/ruby              0.000119   0.000040   0.348166 (  0.348066)
r_parallel/master_mini     0.000079   0.000027   7.840724 (  2.451340)
r_parallel/miniruby        0.000061   0.000020   1.708715 (  0.777945)
r_parallel/master_ruby     0.000092   0.000031   8.217047 (  2.541274)
r_parallel/ruby            0.000103   0.000035   1.921474 (  0.868992)

master_mini: master's miniruby
master_ruby: master's installed ruby
miniruby: modified miniruby
ruby: modified ruby

This patch improved the object allocation performance, especially on multi-ractor mode.
However it is still slow with parallel allocations.

ko1 · 2020-12-06T06:59:34Z

With another task:

def task
  i = 0
  s = '0,' * 1_000
  while i < 1_000
    a = s.split(/,/)

    i += 1
  end
end

                               user     system      total        real
serial/master_mini         0.000094   0.000023   0.733983 (  0.733979)
serial/miniruby            0.000144   0.000036   0.738881 (  0.738846)
serial/master_ruby         0.000138   0.000035   0.798197 (  0.798159)
serial/ruby                0.000139   0.000035   0.803850 (  0.803837)
r_serial/master_mini       0.000142   0.000035   1.131381 (  1.131291)
r_serial/miniruby          0.000144   0.000036   1.051713 (  1.051624)
r_serial/master_ruby       0.000131   0.000033   1.336663 (  1.336582)
r_serial/ruby              0.000146   0.000037   1.211739 (  1.211644)
r_parallel/master_mini     0.000093   0.000023  12.059595 (  3.394471)
r_parallel/miniruby        0.000103   0.000026   9.873731 (  2.765747)
r_parallel/master_ruby     0.000095   0.000024  13.050518 (  3.613259)
r_parallel/ruby            0.000083   0.000021  10.168195 (  2.848708)

It is still slower. Maybe because VM lock contention on the Encoding.

ko1 · 2020-12-06T15:29:49Z

Another task (Array#product to make many arrays):

def task
  i = 0
  a = (1..100).to_a
  while i < 1_000
    as = a.product(a)
    i += 1
  end
end

                               user     system      total        real
serial/master_mini         0.000185   0.000031   2.461938 (  2.461924)
serial/miniruby            0.000154   0.000025   2.452719 (  2.452708)
serial/master_ruby         0.000151   0.000025   3.474300 (  3.474296)
serial/ruby                0.000145   0.000025   3.396823 (  3.396814)
r_serial/master_mini       0.000151   0.000025   3.183569 (  3.183501)
r_serial/miniruby          0.000113   0.000019   3.173086 (  3.173036)
r_serial/master_ruby       0.000144   0.000024   4.357124 (  4.357084)
r_serial/ruby              0.000114   0.000019   3.029095 (  3.029025)
r_parallel/master_mini     0.000089   0.000015  30.343168 (  9.423811)
r_parallel/miniruby        0.000097   0.000017   5.655085 (  3.710885)
r_parallel/master_ruby     0.000103   0.000017  32.811944 ( 10.600854)
r_parallel/ruby            0.000093   0.000015   5.451007 (  3.810941)

This patch is effective (compare with master).

ruby_multi_ractor was a flag that indicates the interpreter doesn't make any additional ractors (single ractor mode). Instead of boolean flag, ruby_single_main_ractor pointer is introduced which keeps main ractor's pointer if single ractor mode. If additional ractors are created, ruby_single_main_ractor becomes NULL.

accessing theap needs complicating synchronization but it reduce performance on multi-ractor mode. So simply stop using theap on multi-ractor mode. In future, theap should be replaced with more cleaver memory strategy.

Without this patch, Ruby doesn't show ractor's information when there is only 1 ractor. However it is hard to read the log when some ractors are created and terminated. This patch makes to keep showing ractor's information on multi-ractor mode.

This is variant of RB_VM_LOCK_ENTER_LEV, but accept current racotr's pointer.

Before this patch, there is no information to start locking.

Now object allocation requires VM global lock to synchronize objspace. However, of course, it introduces huge overhead. This patch caches some slots (in a page) by each ractor and use cached slots for object allocation. If there is no cached slots, acquire the global lock and get new cached slots, or start GC (marking or lazy sweeping).

NEWOBJ with current ec.

Passing current ec can improve performance of newobj. This patch tries it for Array and String literals ([] and '').

On windows, MJIT doesn't work without this patch because of the declaration of ruby_single_main_ractor. This patch fix this issue and move the definition of it from ractor.c to vm.c to locate near place of ruby_current_vm_ptr.

Per ractor method cache (GH-ruby#3842) only cached 1 page and this patch caches several pages to keep at least 512 free slots if available. If you increase the number of cached free slots, all cached slots will be collected when the GC is invoked.

Per ractor method cache (GH-#3842) only cached 1 page and this patch caches several pages to keep at least 512 free slots if available. If you increase the number of cached free slots, all cached slots will be collected when the GC is invoked.

ko1 force-pushed the ractor_newobj_cache branch 2 times, most recently from 627eb49 to 81b0422 Compare December 6, 2020 06:44

ko1 marked this pull request as ready for review December 6, 2020 06:45

ko1 force-pushed the ractor_newobj_cache branch from 81b0422 to 9520266 Compare December 6, 2020 06:50

ko1 force-pushed the ractor_newobj_cache branch from 9520266 to 7393d4c Compare December 6, 2020 15:11

ko1 added 9 commits December 7, 2020 02:42

cancel theap on multi-ractors

064996f

accessing theap needs complicating synchronization but it reduce performance on multi-ractor mode. So simply stop using theap on multi-ractor mode. In future, theap should be replaced with more cleaver memory strategy.

RB_VM_LOCK_ENTER_CR_LEV

88de920

This is variant of RB_VM_LOCK_ENTER_LEV, but accept current racotr's pointer.

log for the beggining of vm_lock_enter

f6d7c47

Before this patch, there is no information to start locking.

RB_EC_NEWOBJ_OF

af873b2

NEWOBJ with current ec.

tuning trial: newobj with current ec

ba9fe3d

Passing current ec can improve performance of newobj. This patch tries it for Array and String literals ([] and '').

fix decl of ruby_single_main_ractor

d9b7e85

On windows, MJIT doesn't work without this patch because of the declaration of ruby_single_main_ractor. This patch fix this issue and move the definition of it from ractor.c to vm.c to locate near place of ruby_current_vm_ptr.

ko1 force-pushed the ractor_newobj_cache branch from 7393d4c to d9b7e85 Compare December 6, 2020 17:42

ko1 merged commit bef3eb5 into ruby:master Dec 6, 2020

ko1 deleted the ractor_newobj_cache branch December 7, 2020 02:14

ko1 mentioned this pull request Dec 10, 2020

Ractor newobj opt #3875

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ractor newobj cache #3842

Ractor newobj cache #3842

ko1 commented Dec 3, 2020 •

edited

ko1 commented Dec 6, 2020

ko1 commented Dec 6, 2020

ko1 commented Dec 6, 2020

Ractor newobj cache #3842

Ractor newobj cache #3842

Conversation

ko1 commented Dec 3, 2020 • edited

ko1 commented Dec 6, 2020

ko1 commented Dec 6, 2020

ko1 commented Dec 6, 2020

ko1 commented Dec 3, 2020 •

edited