New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with ruby meta programming #4879

Closed
GaneshSPatil opened this Issue Dec 1, 2017 · 27 comments

Comments

Projects
None yet
6 participants
@GaneshSPatil

GaneshSPatil commented Dec 1, 2017

Environment

  • JRuby version (jruby -v): jruby 1.7.26 (1.9.3p551) 2016-08-26 69763b8 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_101-b13 +jit [darwin-x86_64]
  • Operating system and platform (e.g. uname -a): Darwin **.local 17.2.0 Darwin Kernel Version 17.2.0: Fri Sep 29 18:27:05 PDT 2017; root:xnu-4570.20.62~3/RELEASE_X86_64 x86_64

Description

Extensively Using ruby extend functionality blocks threads which in turns causes server performance issues.
extend method calls jruby addSubclass function which is a bottleneck.

Background

Many libraries that provide JSON serialization make use of extend. Under heavy load, this causes performance degradation.

Code Snippet to Reproduce Issue:

module Bar
  def hello
    p "hello from bar!"
  end
end

class Foo
  def hello
    p "hello!"
  end
end

threads = 1000.times.collect do
  Thread.new do
    10000.times do |i|
      foo = Foo.new
      foo.extend(Bar)
    end
  end
end

threads.each {|t| t.join }

Run the above code snippet and capture thread dump. You will see a lot of threads waiting to enter synchronized this block.

here is an example thread dump.

@ketan

This comment has been minimized.

Show comment
Hide comment
@ketan

ketan Dec 1, 2017

Member

For perspective, this is locking up a significant number of jetty web threads because of object.extend being invoked by a ruby gem.

Member

ketan commented Dec 1, 2017

For perspective, this is locking up a significant number of jetty web threads because of object.extend being invoked by a ruby gem.

@kares

This comment has been minimized.

Show comment
Hide comment
@kares

kares Dec 1, 2017

Member

... is this yet another report against (the no longer supported) JRuby 1.7.x or did you guys repro on 9K ?

Member

kares commented Dec 1, 2017

... is this yet another report against (the no longer supported) JRuby 1.7.x or did you guys repro on 9K ?

@kares

This comment has been minimized.

Show comment
Hide comment
@kares

kares Dec 1, 2017

Member

got it, that addSubclass piece should not have changed since 1.7 so it should be an issue on 9K as well
... should be fine to synchronize over the RubyClass instance only but there's must be a reason the global lock is (also) there

Member

kares commented Dec 1, 2017

got it, that addSubclass piece should not have changed since 1.7 so it should be an issue on 9K as well
... should be fine to synchronize over the RubyClass instance only but there's must be a reason the global lock is (also) there

@ketan

This comment has been minimized.

Show comment
Hide comment
@ketan

ketan Dec 1, 2017

Member

We've run this on jruby 9000 as well, same result.

Member

ketan commented Dec 1, 2017

We've run this on jruby 9000 as well, same result.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Dec 1, 2017

Member

We can probably improve the synchronization here, but extend has a pretty big cost even on standard Ruby. Doing it for every json serialization will severely limit throughput.

There may be ways to improve this and do a more narrow synchronization, but will be tricky to reduce the cost of extend.

Member

headius commented Dec 1, 2017

We can probably improve the synchronization here, but extend has a pretty big cost even on standard Ruby. Doing it for every json serialization will severely limit throughput.

There may be ways to improve this and do a more narrow synchronization, but will be tricky to reduce the cost of extend.

@ketan

This comment has been minimized.

Show comment
Hide comment
@ketan

ketan Dec 1, 2017

Member

We can probably improve the synchronization here, but extend has a pretty big cost even on standard Ruby. Doing it for every json serialization will severely limit throughput.

There may be ways to improve this and do a more narrow synchronization, but will be tricky to reduce the cost of extend.

Could you elaborate this, or point me to something I can read to understand this better?

Member

ketan commented Dec 1, 2017

We can probably improve the synchronization here, but extend has a pretty big cost even on standard Ruby. Doing it for every json serialization will severely limit throughput.

There may be ways to improve this and do a more narrow synchronization, but will be tricky to reduce the cost of extend.

Could you elaborate this, or point me to something I can read to understand this better?

@kares

This comment has been minimized.

Show comment
Hide comment
@kares

kares Dec 1, 2017

Member

could you guys link us to JSON doing extend ... thought the DCI (anti) pattern to be dead at this point.
... so I am curious as to why a 'generic' library such as JSON would do that, on every parse?

Member

kares commented Dec 1, 2017

could you guys link us to JSON doing extend ... thought the DCI (anti) pattern to be dead at this point.
... so I am curious as to why a 'generic' library such as JSON would do that, on every parse?

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Dec 1, 2017

Member

@ketan you are changing the definition of what foo calls every time you call extend and each time it is a new foo. It does happen to be changing each new instance to the same type every time but this is very hard to see that internally (and we don't and MRI does not either). For that reason we heavily discourage this pattern. DCI also used this pattern and here is an article which delves into it in more detail:

https://tonyarcieri.com/dci-in-ruby-is-completely-broken

Member

enebo commented Dec 1, 2017

@ketan you are changing the definition of what foo calls every time you call extend and each time it is a new foo. It does happen to be changing each new instance to the same type every time but this is very hard to see that internally (and we don't and MRI does not either). For that reason we heavily discourage this pattern. DCI also used this pattern and here is an article which delves into it in more detail:

https://tonyarcieri.com/dci-in-ruby-is-completely-broken

@ketan

This comment has been minimized.

Show comment
Hide comment
@ketan

ketan Dec 1, 2017

Member

Thank you for the explanation. Given that this is a known issue (with all ruby implementations?), do you suggest closing this issue out?

Member

ketan commented Dec 1, 2017

Thank you for the explanation. Given that this is a known issue (with all ruby implementations?), do you suggest closing this issue out?

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Dec 1, 2017

Member

@ketan I am not sure. @headius wonders if we can improve the lock to maybe make extend not stack up as much. If that can be improved we will improve in this situation a tiny bit.

We are interested whether a major library is doing this DCI pattern or not. It would be nice to help them speed up if we can. Was this in a library/gem?

Member

enebo commented Dec 1, 2017

@ketan I am not sure. @headius wonders if we can improve the lock to maybe make extend not stack up as much. If that can be improved we will improve in this situation a tiny bit.

We are interested whether a major library is doing this DCI pattern or not. It would be nice to help them speed up if we can. Was this in a library/gem?

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Dec 2, 2017

Member

Could you tell us what gem is doing this? Ideally, we'd be able to help them find an alternative.

The first time you extend an object we have to construct a new hidden class between the object and its "real" class. That means dozens of objects to support the additional class, plus difficulties caching methods because each object looks like a new type.

The cost of extend should also be an issue on MRI. You'll get better performance on both impls if you avoid heavy use of extend.

Are we significantly slower than MRI here?

Member

headius commented Dec 2, 2017

Could you tell us what gem is doing this? Ideally, we'd be able to help them find an alternative.

The first time you extend an object we have to construct a new hidden class between the object and its "real" class. That means dozens of objects to support the additional class, plus difficulties caching methods because each object looks like a new type.

The cost of extend should also be an issue on MRI. You'll get better performance on both impls if you avoid heavy use of extend.

Are we significantly slower than MRI here?

@ketan

This comment has been minimized.

Show comment
Hide comment
@ketan

ketan Dec 3, 2017

Member

The gem we're using is representable. Here's (https://github.com/trailblazer/representable/blob/3b66945c7de584a0c70dc1316101903f7ad1ebb7/lib/representable.rb#L116-L120) the line which is responsible for the object.extend. We've also tested with a couple of other gems to an identical end-result — threads being BLOCKED because of RubyClass#addSubclass. (I suppose there's something to be said about paying a performance cost for developer productivity :)

Our app does not run on MRI, I'm in no position to compare performance with a real app. However a quick benchmark with the example below seems to indicate that JRuby is atleast 2 times faster than MRI is.

require 'rubygems'
require 'benchmark/ips'

class ExampleClass
  def foo; 42; end
end

module ExampleMixin
  def foo; 43; end
end

puts "Running with #{RUBY_DESCRIPTION}"

Benchmark.ips(20) do |bm|
  bm.report("without dci") { ExampleClass.new.foo }
  bm.report("with dci") do
    obj = ExampleClass.new
    obj.extend(ExampleMixin)
    obj.foo
  end
end

And the results:

Running with jruby 9.1.14.0 (2.3.3) 2017-11-08 2176f24 Java HotSpot(TM) 64-Bit Server VM 25.121-b13 on 1.8.0_121-b13 +jit [darwin-x86_64]
Warming up --------------------------------------
         without dci   257.027k i/100ms
            with dci    36.995k i/100ms
Calculating -------------------------------------
         without dci     15.826M (± 8.6%) i/s -    311.003M in  19.985248s
            with dci      1.152M (±35.9%) i/s -      6.622M in  20.439169s
Running with ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-darwin15]
Warming up --------------------------------------
         without dci   226.292k i/100ms
            with dci    36.996k i/100ms
Calculating -------------------------------------
         without dci      6.081M (± 7.9%) i/s -    120.614M in  20.072074s
            with dci    371.584k (±21.3%) i/s -      6.955M in  20.062720s
Member

ketan commented Dec 3, 2017

The gem we're using is representable. Here's (https://github.com/trailblazer/representable/blob/3b66945c7de584a0c70dc1316101903f7ad1ebb7/lib/representable.rb#L116-L120) the line which is responsible for the object.extend. We've also tested with a couple of other gems to an identical end-result — threads being BLOCKED because of RubyClass#addSubclass. (I suppose there's something to be said about paying a performance cost for developer productivity :)

Our app does not run on MRI, I'm in no position to compare performance with a real app. However a quick benchmark with the example below seems to indicate that JRuby is atleast 2 times faster than MRI is.

require 'rubygems'
require 'benchmark/ips'

class ExampleClass
  def foo; 42; end
end

module ExampleMixin
  def foo; 43; end
end

puts "Running with #{RUBY_DESCRIPTION}"

Benchmark.ips(20) do |bm|
  bm.report("without dci") { ExampleClass.new.foo }
  bm.report("with dci") do
    obj = ExampleClass.new
    obj.extend(ExampleMixin)
    obj.foo
  end
end

And the results:

Running with jruby 9.1.14.0 (2.3.3) 2017-11-08 2176f24 Java HotSpot(TM) 64-Bit Server VM 25.121-b13 on 1.8.0_121-b13 +jit [darwin-x86_64]
Warming up --------------------------------------
         without dci   257.027k i/100ms
            with dci    36.995k i/100ms
Calculating -------------------------------------
         without dci     15.826M (± 8.6%) i/s -    311.003M in  19.985248s
            with dci      1.152M (±35.9%) i/s -      6.622M in  20.439169s
Running with ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-darwin15]
Warming up --------------------------------------
         without dci   226.292k i/100ms
            with dci    36.996k i/100ms
Calculating -------------------------------------
         without dci      6.081M (± 7.9%) i/s -    120.614M in  20.072074s
            with dci    371.584k (±21.3%) i/s -      6.955M in  20.062720s
@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Dec 3, 2017

Member

@ketan It seems unlikely we will convince these authors to not use the DCI pattern as one or more have done talks on it and I believe think it is worth the cost (e.g. they know the costs); but from your particular perspective a snippet like:

SongRepresenter.new(song).to_json

should not need to be implemented with DCI at all. I think perhaps the main benefit would be the simple ability to access internal state without access to properties. OTOH, what JSON do you plan on dumping where that state is not already public? Obviously, I am not asking you to write another implementation but the basic premise of decoupling this logic into separate types might not be too hard if you decide that the cost is too high.

Member

enebo commented Dec 3, 2017

@ketan It seems unlikely we will convince these authors to not use the DCI pattern as one or more have done talks on it and I believe think it is worth the cost (e.g. they know the costs); but from your particular perspective a snippet like:

SongRepresenter.new(song).to_json

should not need to be implemented with DCI at all. I think perhaps the main benefit would be the simple ability to access internal state without access to properties. OTOH, what JSON do you plan on dumping where that state is not already public? Obviously, I am not asking you to write another implementation but the basic premise of decoupling this logic into separate types might not be too hard if you decide that the cost is too high.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Dec 5, 2017

Member

There are ways to optimize this but it's something we're reluctant to do when this pattern is so slow in MRI. Even if we optimize it, most users will not see this pattern run fast unless ruby-core also decides to optimize it.

Ideas for optimization:

  • Given a sequence of extends, we can reuse the same set of include wrappers for all cases. This blunts the base cost of extend (which, for the first call, creates both a new singleton class and a new included module wrapper).
  • We can cache those wrappers either in the main class itself or at the extend call site. Basically, if an extend call site is calling Object#extend and the class's current type and the extended module are the same, we use the cached pseudo-type.
  • The tricky bit is if the object's singleton class needs one-off modifications, like defining single methods. Caching these types would require logic at def sites to know that the old "shape" of the singleton and the new "shape" after adding a method are cached and reusable.

I'm inclined to close this bug as Won't Fix since we're performing at least as well as MRI on a very poor-performing pattern. If MRI sees fit to optimize this pattern for core Ruby, we'd be happy to work with them.

Member

headius commented Dec 5, 2017

There are ways to optimize this but it's something we're reluctant to do when this pattern is so slow in MRI. Even if we optimize it, most users will not see this pattern run fast unless ruby-core also decides to optimize it.

Ideas for optimization:

  • Given a sequence of extends, we can reuse the same set of include wrappers for all cases. This blunts the base cost of extend (which, for the first call, creates both a new singleton class and a new included module wrapper).
  • We can cache those wrappers either in the main class itself or at the extend call site. Basically, if an extend call site is calling Object#extend and the class's current type and the extended module are the same, we use the cached pseudo-type.
  • The tricky bit is if the object's singleton class needs one-off modifications, like defining single methods. Caching these types would require logic at def sites to know that the old "shape" of the singleton and the new "shape" after adding a method are cached and reusable.

I'm inclined to close this bug as Won't Fix since we're performing at least as well as MRI on a very poor-performing pattern. If MRI sees fit to optimize this pattern for core Ruby, we'd be happy to work with them.

@ketan

This comment has been minimized.

Show comment
Hide comment
@ketan

ketan Dec 5, 2017

Member

@headius and @enebo — thank you for the consideration.

I agree that the DCI pattern seems like an overkill for this case. We're considering options to move to other JSON libraries in ruby — or, failing that — use something in Java :-/

Member

ketan commented Dec 5, 2017

@headius and @enebo — thank you for the consideration.

I agree that the DCI pattern seems like an overkill for this case. We're considering options to move to other JSON libraries in ruby — or, failing that — use something in Java :-/

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Dec 5, 2017

Member

@ketan I would present your findings to the authors of that library, and if they really want to use this pattern perhaps encourage them to take it up with MRI. Many folks using the DCI pattern don't realize what a huge impact it has on performance, nor that there are reasonable alternatives.

Closing this as Won't Fix for now.

Member

headius commented Dec 5, 2017

@ketan I would present your findings to the authors of that library, and if they really want to use this pattern perhaps encourage them to take it up with MRI. Many folks using the DCI pattern don't realize what a huge impact it has on performance, nor that there are reasonable alternatives.

Closing this as Won't Fix for now.

@headius headius closed this Dec 5, 2017

@headius headius added this to the Won't Fix milestone Dec 5, 2017

@kares

This comment has been minimized.

Show comment
Hide comment
@kares

kares Dec 5, 2017

Member

@headius @enebo wondering - is the double sync really needed, locking the RubyClass isn't enough?

Member

kares commented Dec 5, 2017

@headius @enebo wondering - is the double sync really needed, locking the RubyClass isn't enough?

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Dec 6, 2017

Member

@kares This may be something we can improve. I traced those lines to 8549533, which was over 8 years ago and may no longer hold. I'll investigate a bit more.

Member

headius commented Dec 6, 2017

@kares This may be something we can improve. I traced those lines to 8549533, which was over 8 years ago and may no longer hold. I'll investigate a bit more.

@headius headius reopened this Dec 6, 2017

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Dec 6, 2017

Member

I'm reopening this because it does appear we can eliminate some (maybe a lot) of the locking here.

@kares I believe you are right. The methods that only modify the local subclasses need only synchronize on that collection. Methods that update the subclasses field must synchronize on the RubyClass. And then there's methods that actually recurse into subclasses...

I will do a PR (for 9.1.16) that removes most of this synchronization, makes the rest narrower, and ideally improves the overhead of subclass modifications and method invalidation.

Member

headius commented Dec 6, 2017

I'm reopening this because it does appear we can eliminate some (maybe a lot) of the locking here.

@kares I believe you are right. The methods that only modify the local subclasses need only synchronize on that collection. Methods that update the subclasses field must synchronize on the RubyClass. And then there's methods that actually recurse into subclasses...

I will do a PR (for 9.1.16) that removes most of this synchronization, makes the rest narrower, and ideally improves the overhead of subclass modifications and method invalidation.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Dec 6, 2017

Member

Note: this is going into 9.1.16 because it's high risk to be fiddling about with these locks right before a release. We'll let it bake over the holiday and do a release after the new year.

@GaneshSPatil Looks like you may get a partial fix after all! It won't improve the straight-line speed of extend, but it should reduce the contention you reported. Please stand by to test your app against the PR.

Member

headius commented Dec 6, 2017

Note: this is going into 9.1.16 because it's high risk to be fiddling about with these locks right before a release. We'll let it bake over the holiday and do a release after the new year.

@GaneshSPatil Looks like you may get a partial fix after all! It won't improve the straight-line speed of extend, but it should reduce the contention you reported. Please stand by to test your app against the PR.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Dec 7, 2017

Member

@GaneshSPatil You could start testing that PR any time. Better sooner than later, since I know we'll uncover some concurrency issues in the process of reworking this locking.

Member

headius commented Dec 7, 2017

@GaneshSPatil You could start testing that PR any time. Better sooner than later, since I know we'll uncover some concurrency issues in the process of reworking this locking.

@ketan

This comment has been minimized.

Show comment
Hide comment
@ketan

ketan Dec 7, 2017

Member

@headius — thanks for the PR. We'll give this a spin today and get back to you.

Member

ketan commented Dec 7, 2017

@headius — thanks for the PR. We'll give this a spin today and get back to you.

@TheKidCoder

This comment has been minimized.

Show comment
Hide comment
@TheKidCoder

TheKidCoder Jan 12, 2018

@ketan Any update on your findings from the PR?

TheKidCoder commented Jan 12, 2018

@ketan Any update on your findings from the PR?

@GaneshSPatil

This comment has been minimized.

Show comment
Hide comment
@GaneshSPatil

GaneshSPatil Jan 23, 2018

@TheKidCoder -- We've tested the PR on the performance test setup, on which we found the performance issue.

Our findings:

  • We didn't see any blocked threads around addSubClass method with the PR changes. The original issue regarding the server performance issues with ruby extend functionality addressed by the PR.

  • Upgrading the jruby version from 1.7.26 to 9.1.15 we've faced performance issues. One of our API endpoints which used to take 12s to 15s to respond now takes around 45s to a minute. The timings are captured on an idle server. On a heavily loaded server, we expect the APIs to take more time to respond.

GaneshSPatil commented Jan 23, 2018

@TheKidCoder -- We've tested the PR on the performance test setup, on which we found the performance issue.

Our findings:

  • We didn't see any blocked threads around addSubClass method with the PR changes. The original issue regarding the server performance issues with ruby extend functionality addressed by the PR.

  • Upgrading the jruby version from 1.7.26 to 9.1.15 we've faced performance issues. One of our API endpoints which used to take 12s to 15s to respond now takes around 45s to a minute. The timings are captured on an idle server. On a heavily loaded server, we expect the APIs to take more time to respond.

@kares

This comment has been minimized.

Show comment
Hide comment
@kares

kares Jan 23, 2018

Member

sounds like the original issue is resolved (this can be closed), @GaneshSPatil for the other 'performance' issue I would fill a new bug when you know the details. if you need help figuring it out I am happy to consult.

Member

kares commented Jan 23, 2018

sounds like the original issue is resolved (this can be closed), @GaneshSPatil for the other 'performance' issue I would fill a new bug when you know the details. if you need help figuring it out I am happy to consult.

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 23, 2018

Member

@GaneshSPatil we really want to know why there is such a large difference in time. That really sounds like something which should JIT just isn't for some reason (large methods being one common reason -- 9k is a completely different set of internals and some methods could be small enough in 1.7 to JIT but end up being larger in 9k -- this is just an example). Please open a new issue with any information.

Member

enebo commented Jan 23, 2018

@GaneshSPatil we really want to know why there is such a large difference in time. That really sounds like something which should JIT just isn't for some reason (large methods being one common reason -- 9k is a completely different set of internals and some methods could be small enough in 1.7 to JIT but end up being larger in 9k -- this is just an example). Please open a new issue with any information.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Jan 24, 2018

Member

Fixed in 9.1.16 and confirmed by reporter.

Member

headius commented Jan 24, 2018

Fixed in 9.1.16 and confirmed by reporter.

@headius headius closed this Jan 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment