Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize defining methods on deeply included modules #2752

Merged
merged 5 commits into from Dec 17, 2019

Conversation

jhawthorn
Copy link
Member

@jhawthorn jhawthorn commented Dec 13, 2019

In Rails we discovered that a significant amount of time was spent defining named routes, methods on a module included by a lot of other classes. Checking profiling this time was mostly spent in rb_class_foreach_subclass and rb_class_clear_method_cache. @tenderlove helped me investigate what was happening inside Ruby and we found that it was revisiting classes and clearing their caches multiple times since there could be multiple paths to them down the subclass tree. This could also be measured by a large increase in RubyVM.stat(:class_serial).

This added several seconds to the bootup time of large applications like GitHub's and Shopify's.

Though we've improved this in Rails by avoiding some duplicate modules, defining these methods is still a little slow and Ruby should avoid doing duplicate work clearing caches.

This PR improves the speed of defining methods on modules which have already been included in other classes.

rb_clear_method_cache_by_class

rb_clear_method_cache_by_class calls rb_class_clear_method_cache recursively on subclasses, where it will bump the class serial and clear some other data (callable_m_tbl, and some mjit data).

Previously this could end up taking a long time to clear all the classes if the module was included a few levels deep and especially if there were multiple paths to it in the dependency tree (ie. a class includes two modules which both include the same other module) as we end up revisiting class/iclass/module objects multiple times.

This commit avoids revisiting the same object, by short circuiting when revisit the same object. We can check this efficiently by comparing the class serial of each object we visit with the next class serial at the start. We know that any objects with a higher class serial have already been visited.

(This would probably have even more of an improvement if mjit was enabled, since it will avoid clearing that cache when possible, but I didn't test that)

check_override_opt_method

This was another place we were calling rb_class_foreach_subclass recursively.

Previously every time a method was defined on a module, we would recursively walk all subclasses to see if the module was included in a class which the VM optimizes for (such as Integer#+).

For most method definitions we can tell immediately that this won't be the case based on the method's name. To do this we just keep a hash with method IDs of optimized methods and if our new method isn't in that list we don't need to check subclasses at all.

Benchmarks

Synthetic

This is probably an unrealistic scenario but best measures that we are avoiding duplicate work revisiting the same trees.

mod = initial = Module.new
10.times do
  mod = Module.new{ include initial; include mod }
end

classes = 100.times.map { Class.new { include mod } }

before = RubyVM.stat(:class_serial)
1000.times do
  initial.define_method(:foo) {  }
end
after = RubyVM.stat(:class_serial)
puts "class_serial changed by #{after - before}"

Before:

$ time ./ruby --disable-gems test_method_cache.rb
class_serial changed by 310160000
./ruby --disable-gems test_method_cache2.rb  3.52s user 0.00s system 99% cpu 3.522 total

After:

$ time ./ruby --disable-gems test_method_cache.rb
class_serial changed by 1266000
./ruby --disable-gems test_method_cache2.rb  0.03s user 0.00s system 99% cpu 0.028 total

Rails without duplicate module fix

This is test is a good approximation of this part of a very large real Rails 6.0 application's boot.

Benchmark script
require "action_controller/railtie"
require "rails"

class MyApp < Rails::Application
  config.eager_load = true
  config.secret_key_base = SecureRandom.hex
  initialize!
end

class ApplicationController < ActionController::Base
end

N = (ENV["N"] || 500).to_i
M = (ENV["M"] || 2000).to_i
N.times do |n|
  Object.const_set("Foo#{n}Controller", Class.new(ApplicationController))
end
puts "For #{N} controller:"

MyApp.routes.draw do
  before = RubyVM.stat(:class_serial)
  get "/foo", to: "home#index"
  after = RubyVM.stat(:class_serial)
  puts "  Class serial changed by #{after - before} per route"

  ms = Benchmark.ms {
    M.times do |n|
      get "/foo#{n}", to: "home#index"
    end
  }
  puts "  Defining #{M} routes took #{ms}ms"
end

Before:

$ N=1000 M=5000 ruby -S bundle exec ruby test_url_definitons.rb
For 1000 controller:
  Class serial changed by 27043 per route
  Defining 5000 routes took 5195.625874213874ms

After:

$ N=1000 M=5000 ruby -S bundle exec ruby test_url_definitons.rb
For 1000 controller:
  Class serial changed by 19039 per route
  Defining 5000 routes took 2421.337984967977ms

Rails with duplicate module fix

This is test is a good approximation of this part of a very large real Rails master application's boot. There's less impact because it's been patched to avoid the worst of the cache clearing (most of the runtime is now outside of that) but the improvement is still significant.

Before:

$ N=1000 M=5000 ruby -S bundle exec ruby test_url_definitons.rb
For 1000 controller:
  Class serial changed by 4028 per route
  Defining 5000 routes took 1043.2057077996433ms

After:

$ N=1000 M=5000 ruby -S bundle exec ruby test_url_definitons.rb
For 1000 controller:
  Class serial changed by 2024 per route
  Defining 5000 routes took 786.4317209459841ms

rb_clear_method_cache_by_class calls rb_class_clear_method_cache
recursively on subclasses, where it will bump the class serial and clear
some other data (callable_m_tbl, and some mjit data).

Previously this could end up taking a long time to clear all the classes
if the module was included a few levels deep and especially if there
were multiple paths to it in the dependency tree (ie. a class includes
two modules which both include the same other module) as we end up
revisiting class/iclass/module objects multiple times.

This commit avoids revisiting the same object, by short circuiting when
revisit the same object. We can check this efficiently by comparing the
class serial of each object we visit with the next class serial at the
start. We know that any objects with a higher class serial have already
been visited.
Avoids genereating a "throwaway" sentinel class serial. There wasn't any
read harm in doing so (we're at no risk of exhaustion and there'd be no
measurable performance impact), but if feels cleaner that all class
serials actually end up assigned and used (especially now that we won't
overwrite them in a single method definition).
We know that this is a heap-allocated object (a CLASS, MODULE, or
ICLASS) so we don't need to check if it is an immediate value. This
should be very slightly faster.
It's a Ruby bug if this ever happens check it as an assertion instead of
paying the cost of the check every time.
Previously every time a method was defined on a module, we would
recursively walk all subclasses to see if the module was included in a
class which the VM optimizes for (such as Integer#+).

For most method definitions we can tell immediately that this won't be
the case based on the method's name. To do this we just keep a hash with
method IDs of optimized methods and if our new method isn't in that list
we don't need to check subclasses at all.
@jhawthorn
Copy link
Member Author

This also seems to (very slightly) improve performance of Ruby startup (even with nothing loaded).

We can see the methods in question from perf record --freq=32000 ./ruby --disable-gems -e ''

Screen Shot 2019-12-17 at 09 10 45

perf stat shows a significant reduction in cycles/instructions (but a relatively small change in runtime)

Before

$ make -j16 ruby && perf stat -r 500 ./ruby --disable-gems -e ''
        BASERUBY = /home/jhawthorn/.rubies/ruby-2.6.3/bin/ruby --disable=gems
        CC = gcc
        LD = ld
        LDSHARED = gcc -shared
        CFLAGS = -O3 -ggdb3 -Wall -Wextra -Werror=deprecated-declarations -Werror=duplicated-cond -Werror=implicit-function-declaration -Werror=implicit-int -Werror=misleading-indentation -Werror=pointer-arith -Werror=write-strings -Wimplicit-fallthrough=0 -Wmissing-noreturn
-Wno-cast-function-type -Wno-constant-logical-operand -Wno-long-long -Wno-missing-field-initializers -Wno-overlength-strings -Wno-packed-bitfield-compat -Wno-parentheses-equality -Wno-self-assign -Wno-tautological-compare -Wno-unused-parameter -Wno-unused-value -Wsuggest-attr
ibute=format -Wsuggest-attribute=noreturn -Wunused-variable -std=gnu99
        XCFLAGS = -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-strict-overflow -DRUBY_DEVEL=1 -fvisibility=hidden -fexcess-precision=standard -DRUBY_EXPORT -fPIE -DCANONICALIZATION_FOR_MATHN -I. -I.ext/include/x86_64-linux -I./include -I. -I./enc/unicode/12.1.0
        CPPFLAGS =
        DLDFLAGS = -Wl,--compress-debug-sections=zlib -fstack-protector-strong -pie
        SOLIBS = -lz -lpthread -lrt -lrt -lgmp -ldl -lcrypt -lm
        LANG = en_US.UTF-8
        LC_ALL =
        LC_CTYPE = en_US.UTF-8
        MFLAGS = -j16 --jobserver-auth=4,5
gcc (GCC) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

make: 'ruby' is up to date.

 Performance counter stats for './ruby --disable-gems -e ' (500 runs):

              7.24 msec task-clock:u              #    0.970 CPUs utilized            ( +-  0.59% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             1,589      page-faults:u             #    0.219 M/sec                    ( +-  0.01% )
         8,872,760      cycles:u                  #    1.226 GHz                      ( +-  1.10% )  (59.19%)
           247,261      stalled-cycles-frontend:u #    2.79% frontend cycles idle     ( +-  0.57% )  (66.48%)
         1,052,212      stalled-cycles-backend:u  #   11.86% backend cycles idle      ( +-  0.19% )  (95.11%)
        15,544,068      instructions:u            #    1.75  insn per cycle
                                                  #    0.07  stalled cycles per insn  ( +-  0.00% )
         3,417,975      branches:u                #  472.144 M/sec                    ( +-  0.01% )
            76,959      branch-misses:u           #    2.25% of all branches          ( +-  0.46% )  (79.22%)

         0.0074641 +- 0.0000437 seconds time elapsed  ( +-  0.59% )

After

$ make -j16 ruby && perf stat -r 500 ./ruby --disable-gems -e ''
        BASERUBY = /home/jhawthorn/.rubies/ruby-2.6.3/bin/ruby --disable=gems
        CC = gcc
        LD = ld
        LDSHARED = gcc -shared
        CFLAGS = -O3 -ggdb3 -Wall -Wextra -Werror=deprecated-declarations -Werror=duplicated-cond -Werror=implicit-function-declaration -Werror=implicit-int -Werror=misleading-indentation -Werror=pointer-arith -Werror=write-strings -Wimplicit-fallthrough=0 -Wmissing-noreturn
-Wno-cast-function-type -Wno-constant-logical-operand -Wno-long-long -Wno-missing-field-initializers -Wno-overlength-strings -Wno-packed-bitfield-compat -Wno-parentheses-equality -Wno-self-assign -Wno-tautological-compare -Wno-unused-parameter -Wno-unused-value -Wsuggest-attr
ibute=format -Wsuggest-attribute=noreturn -Wunused-variable -std=gnu99
        XCFLAGS = -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-strict-overflow -DRUBY_DEVEL=1 -fvisibility=hidden -fexcess-precision=standard -DRUBY_EXPORT -fPIE -DCANONICALIZATION_FOR_MATHN -I. -I.ext/include/x86_64-linux -I./include -I. -I./enc/unicode/12.1.0
        CPPFLAGS =
        DLDFLAGS = -Wl,--compress-debug-sections=zlib -fstack-protector-strong -pie
        SOLIBS = -lz -lpthread -lrt -lrt -lgmp -ldl -lcrypt -lm
        LANG = en_US.UTF-8
        LC_ALL =
        LC_CTYPE = en_US.UTF-8
        MFLAGS = -j16 --jobserver-auth=4,5
gcc (GCC) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

make: 'ruby' is up to date.

 Performance counter stats for './ruby --disable-gems -e ' (500 runs):

              6.86 msec task-clock:u              #    0.968 CPUs utilized            ( +-  0.64% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             1,588      page-faults:u             #    0.232 M/sec                    ( +-  0.01% )
         7,877,563      cycles:u                  #    1.148 GHz                      ( +-  1.21% )  (57.39%)
           237,588      stalled-cycles-frontend:u #    3.02% frontend cycles idle     ( +-  0.61% )  (68.23%)
         1,016,882      stalled-cycles-backend:u  #   12.91% backend cycles idle      ( +-  0.15% )  (97.05%)
        14,052,135      instructions:u            #    1.78  insn per cycle
                                                  #    0.07  stalled cycles per insn  ( +-  0.01% )
         2,964,832      branches:u                #  432.121 M/sec                    ( +-  0.01% )
            75,965      branch-misses:u           #    2.56% of all branches          ( +-  0.52% )  (77.33%)

         0.0070858 +- 0.0000450 seconds time elapsed  ( +-  0.63% )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants