Optimize defining methods on deeply included modules #2752

jhawthorn · 2019-12-13T21:54:13Z

In Rails we discovered that a significant amount of time was spent defining named routes, methods on a module included by a lot of other classes. Checking profiling this time was mostly spent in rb_class_foreach_subclass and rb_class_clear_method_cache. @tenderlove helped me investigate what was happening inside Ruby and we found that it was revisiting classes and clearing their caches multiple times since there could be multiple paths to them down the subclass tree. This could also be measured by a large increase in RubyVM.stat(:class_serial).

This added several seconds to the bootup time of large applications like GitHub's and Shopify's.

Though we've improved this in Rails by avoiding some duplicate modules, defining these methods is still a little slow and Ruby should avoid doing duplicate work clearing caches.

This PR improves the speed of defining methods on modules which have already been included in other classes.

`rb_clear_method_cache_by_class`

rb_clear_method_cache_by_class calls rb_class_clear_method_cache recursively on subclasses, where it will bump the class serial and clear some other data (callable_m_tbl, and some mjit data).

Previously this could end up taking a long time to clear all the classes if the module was included a few levels deep and especially if there were multiple paths to it in the dependency tree (ie. a class includes two modules which both include the same other module) as we end up revisiting class/iclass/module objects multiple times.

This commit avoids revisiting the same object, by short circuiting when revisit the same object. We can check this efficiently by comparing the class serial of each object we visit with the next class serial at the start. We know that any objects with a higher class serial have already been visited.

(This would probably have even more of an improvement if mjit was enabled, since it will avoid clearing that cache when possible, but I didn't test that)

`check_override_opt_method`

This was another place we were calling rb_class_foreach_subclass recursively.

Previously every time a method was defined on a module, we would recursively walk all subclasses to see if the module was included in a class which the VM optimizes for (such as Integer#+).

For most method definitions we can tell immediately that this won't be the case based on the method's name. To do this we just keep a hash with method IDs of optimized methods and if our new method isn't in that list we don't need to check subclasses at all.

Benchmarks

Synthetic

This is probably an unrealistic scenario but best measures that we are avoiding duplicate work revisiting the same trees.

mod = initial = Module.new
10.times do
  mod = Module.new{ include initial; include mod }
end

classes = 100.times.map { Class.new { include mod } }

before = RubyVM.stat(:class_serial)
1000.times do
  initial.define_method(:foo) {  }
end
after = RubyVM.stat(:class_serial)
puts "class_serial changed by #{after - before}"

Before:

$ time ./ruby --disable-gems test_method_cache.rb
class_serial changed by 310160000
./ruby --disable-gems test_method_cache2.rb  3.52s user 0.00s system 99% cpu 3.522 total

After:

$ time ./ruby --disable-gems test_method_cache.rb
class_serial changed by 1266000
./ruby --disable-gems test_method_cache2.rb  0.03s user 0.00s system 99% cpu 0.028 total

Rails without duplicate module fix

This is test is a good approximation of this part of a very large real Rails 6.0 application's boot.

Benchmark script

require "action_controller/railtie"
require "rails"

class MyApp < Rails::Application
  config.eager_load = true
  config.secret_key_base = SecureRandom.hex
  initialize!
end

class ApplicationController < ActionController::Base
end

N = (ENV["N"] || 500).to_i
M = (ENV["M"] || 2000).to_i
N.times do |n|
  Object.const_set("Foo#{n}Controller", Class.new(ApplicationController))
end
puts "For #{N} controller:"

MyApp.routes.draw do
  before = RubyVM.stat(:class_serial)
  get "/foo", to: "home#index"
  after = RubyVM.stat(:class_serial)
  puts "  Class serial changed by #{after - before} per route"

  ms = Benchmark.ms {
    M.times do |n|
      get "/foo#{n}", to: "home#index"
    end
  }
  puts "  Defining #{M} routes took #{ms}ms"
end

Before:

$ N=1000 M=5000 ruby -S bundle exec ruby test_url_definitons.rb
For 1000 controller:
  Class serial changed by 27043 per route
  Defining 5000 routes took 5195.625874213874ms

After:

$ N=1000 M=5000 ruby -S bundle exec ruby test_url_definitons.rb
For 1000 controller:
  Class serial changed by 19039 per route
  Defining 5000 routes took 2421.337984967977ms

Rails with duplicate module fix

This is test is a good approximation of this part of a very large real Rails master application's boot. There's less impact because it's been patched to avoid the worst of the cache clearing (most of the runtime is now outside of that) but the improvement is still significant.

Before:

$ N=1000 M=5000 ruby -S bundle exec ruby test_url_definitons.rb
For 1000 controller:
  Class serial changed by 4028 per route
  Defining 5000 routes took 1043.2057077996433ms

After:

$ N=1000 M=5000 ruby -S bundle exec ruby test_url_definitons.rb
For 1000 controller:
  Class serial changed by 2024 per route
  Defining 5000 routes took 786.4317209459841ms

rb_clear_method_cache_by_class calls rb_class_clear_method_cache recursively on subclasses, where it will bump the class serial and clear some other data (callable_m_tbl, and some mjit data). Previously this could end up taking a long time to clear all the classes if the module was included a few levels deep and especially if there were multiple paths to it in the dependency tree (ie. a class includes two modules which both include the same other module) as we end up revisiting class/iclass/module objects multiple times. This commit avoids revisiting the same object, by short circuiting when revisit the same object. We can check this efficiently by comparing the class serial of each object we visit with the next class serial at the start. We know that any objects with a higher class serial have already been visited.

Avoids genereating a "throwaway" sentinel class serial. There wasn't any read harm in doing so (we're at no risk of exhaustion and there'd be no measurable performance impact), but if feels cleaner that all class serials actually end up assigned and used (especially now that we won't overwrite them in a single method definition).

We know that this is a heap-allocated object (a CLASS, MODULE, or ICLASS) so we don't need to check if it is an immediate value. This should be very slightly faster.

It's a Ruby bug if this ever happens check it as an assertion instead of paying the cost of the check every time.

Previously every time a method was defined on a module, we would recursively walk all subclasses to see if the module was included in a class which the VM optimizes for (such as Integer#+). For most method definitions we can tell immediately that this won't be the case based on the method's name. To do this we just keep a hash with method IDs of optimized methods and if our new method isn't in that list we don't need to check subclasses at all.

jhawthorn · 2019-12-17T17:14:41Z

This also seems to (very slightly) improve performance of Ruby startup (even with nothing loaded).

We can see the methods in question from perf record --freq=32000 ./ruby --disable-gems -e ''

perf stat shows a significant reduction in cycles/instructions (but a relatively small change in runtime)

Before

$ make -j16 ruby && perf stat -r 500 ./ruby --disable-gems -e ''
        BASERUBY = /home/jhawthorn/.rubies/ruby-2.6.3/bin/ruby --disable=gems
        CC = gcc
        LD = ld
        LDSHARED = gcc -shared
        CFLAGS = -O3 -ggdb3 -Wall -Wextra -Werror=deprecated-declarations -Werror=duplicated-cond -Werror=implicit-function-declaration -Werror=implicit-int -Werror=misleading-indentation -Werror=pointer-arith -Werror=write-strings -Wimplicit-fallthrough=0 -Wmissing-noreturn
-Wno-cast-function-type -Wno-constant-logical-operand -Wno-long-long -Wno-missing-field-initializers -Wno-overlength-strings -Wno-packed-bitfield-compat -Wno-parentheses-equality -Wno-self-assign -Wno-tautological-compare -Wno-unused-parameter -Wno-unused-value -Wsuggest-attr
ibute=format -Wsuggest-attribute=noreturn -Wunused-variable -std=gnu99
        XCFLAGS = -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-strict-overflow -DRUBY_DEVEL=1 -fvisibility=hidden -fexcess-precision=standard -DRUBY_EXPORT -fPIE -DCANONICALIZATION_FOR_MATHN -I. -I.ext/include/x86_64-linux -I./include -I. -I./enc/unicode/12.1.0
        CPPFLAGS =
        DLDFLAGS = -Wl,--compress-debug-sections=zlib -fstack-protector-strong -pie
        SOLIBS = -lz -lpthread -lrt -lrt -lgmp -ldl -lcrypt -lm
        LANG = en_US.UTF-8
        LC_ALL =
        LC_CTYPE = en_US.UTF-8
        MFLAGS = -j16 --jobserver-auth=4,5
gcc (GCC) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

make: 'ruby' is up to date.

 Performance counter stats for './ruby --disable-gems -e ' (500 runs):

              7.24 msec task-clock:u              #    0.970 CPUs utilized            ( +-  0.59% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             1,589      page-faults:u             #    0.219 M/sec                    ( +-  0.01% )
         8,872,760      cycles:u                  #    1.226 GHz                      ( +-  1.10% )  (59.19%)
           247,261      stalled-cycles-frontend:u #    2.79% frontend cycles idle     ( +-  0.57% )  (66.48%)
         1,052,212      stalled-cycles-backend:u  #   11.86% backend cycles idle      ( +-  0.19% )  (95.11%)
        15,544,068      instructions:u            #    1.75  insn per cycle
                                                  #    0.07  stalled cycles per insn  ( +-  0.00% )
         3,417,975      branches:u                #  472.144 M/sec                    ( +-  0.01% )
            76,959      branch-misses:u           #    2.25% of all branches          ( +-  0.46% )  (79.22%)

         0.0074641 +- 0.0000437 seconds time elapsed  ( +-  0.59% )

After

$ make -j16 ruby && perf stat -r 500 ./ruby --disable-gems -e ''
        BASERUBY = /home/jhawthorn/.rubies/ruby-2.6.3/bin/ruby --disable=gems
        CC = gcc
        LD = ld
        LDSHARED = gcc -shared
        CFLAGS = -O3 -ggdb3 -Wall -Wextra -Werror=deprecated-declarations -Werror=duplicated-cond -Werror=implicit-function-declaration -Werror=implicit-int -Werror=misleading-indentation -Werror=pointer-arith -Werror=write-strings -Wimplicit-fallthrough=0 -Wmissing-noreturn
-Wno-cast-function-type -Wno-constant-logical-operand -Wno-long-long -Wno-missing-field-initializers -Wno-overlength-strings -Wno-packed-bitfield-compat -Wno-parentheses-equality -Wno-self-assign -Wno-tautological-compare -Wno-unused-parameter -Wno-unused-value -Wsuggest-attr
ibute=format -Wsuggest-attribute=noreturn -Wunused-variable -std=gnu99
        XCFLAGS = -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-strict-overflow -DRUBY_DEVEL=1 -fvisibility=hidden -fexcess-precision=standard -DRUBY_EXPORT -fPIE -DCANONICALIZATION_FOR_MATHN -I. -I.ext/include/x86_64-linux -I./include -I. -I./enc/unicode/12.1.0
        CPPFLAGS =
        DLDFLAGS = -Wl,--compress-debug-sections=zlib -fstack-protector-strong -pie
        SOLIBS = -lz -lpthread -lrt -lrt -lgmp -ldl -lcrypt -lm
        LANG = en_US.UTF-8
        LC_ALL =
        LC_CTYPE = en_US.UTF-8
        MFLAGS = -j16 --jobserver-auth=4,5
gcc (GCC) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

make: 'ruby' is up to date.

 Performance counter stats for './ruby --disable-gems -e ' (500 runs):

              6.86 msec task-clock:u              #    0.968 CPUs utilized            ( +-  0.64% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             1,588      page-faults:u             #    0.232 M/sec                    ( +-  0.01% )
         7,877,563      cycles:u                  #    1.148 GHz                      ( +-  1.21% )  (57.39%)
           237,588      stalled-cycles-frontend:u #    3.02% frontend cycles idle     ( +-  0.61% )  (68.23%)
         1,016,882      stalled-cycles-backend:u  #   12.91% backend cycles idle      ( +-  0.15% )  (97.05%)
        14,052,135      instructions:u            #    1.78  insn per cycle
                                                  #    0.07  stalled cycles per insn  ( +-  0.01% )
         2,964,832      branches:u                #  432.121 M/sec                    ( +-  0.01% )
            75,965      branch-misses:u           #    2.56% of all branches          ( +-  0.52% )  (77.33%)

         0.0070858 +- 0.0000450 seconds time elapsed  ( +-  0.63% )

jhawthorn added 5 commits December 12, 2019 16:31

Check type with BUILTIN_TYPE in method cache clear

fad82e8

We know that this is a heap-allocated object (a CLASS, MODULE, or ICLASS) so we don't need to check if it is an immediate value. This should be very slightly faster.

Replace debug test with assertion

14ff694

It's a Ruby bug if this ever happens check it as an assertion instead of paying the cost of the check every time.

jhawthorn mentioned this pull request Dec 14, 2019

Cache routes.url_helpers built modules rails/rails#37927

Merged

tenderlove merged commit 2544772 into ruby:master Dec 17, 2019

tenderlove deleted the fast_class_cache_clear branch December 17, 2019 17:19

jhawthorn mentioned this pull request Jan 5, 2020

Fold symbol tables for tracking basic operation method redefinition changes into a single ID table #2812

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize defining methods on deeply included modules #2752

Optimize defining methods on deeply included modules #2752

jhawthorn commented Dec 13, 2019 •

edited

jhawthorn commented Dec 17, 2019

Optimize defining methods on deeply included modules #2752

Optimize defining methods on deeply included modules #2752

Conversation

jhawthorn commented Dec 13, 2019 • edited

rb_clear_method_cache_by_class

check_override_opt_method

Benchmarks

Synthetic

Rails without duplicate module fix

Rails with duplicate module fix

jhawthorn commented Dec 17, 2019

jhawthorn commented Dec 13, 2019 •

edited

`rb_clear_method_cache_by_class`

`check_override_opt_method`