Skip to content

Loading…

Dramatically speed up MongoMapper::Plugins::Keys#load_from_database #450

Open
wants to merge 3 commits into from

6 participants

@jonpollock

This pull request significantly enhances the performance of mongo-mapper when dealing with a large number of documents that may also have a large number of keys. When profiling our app, we noticed that MongoMapper::Plugins::Keys#load_from_database was taking a VERY long time. In fact, much longer than the time it took to get a response from mongo. For example, a 5kb response from mongo for a single flat document with about 40 keys would arrive in under 1ms, but the "load_from_database" method took over 17ms to complete. This method is only working with stuff that is already in memory and it was surprising that it took so long.

The first thing we noticed was the Keys#key? method. This was generating an array of keys and calling include? on it, instead of just calling has_key?. This was called quite a lot and the changes cut about 8ms off the single load_from_database call for that one example object; basically cutting the time in half. In our large test page, this was being called 34,667 times for 623 documents.

In the load_from_database method, we broke the different types of keys into separate lists, which saved doing all the respond_to? calls and allows us to more directly do what needs to be done for each key type. This also eliminated almost all calls to key?.

Then we deferred building the associations until needed. The Keys#load_from_database method will simply store the value for associations being loaded into an instance variable, "@association#{key}", for future evaluation.

These changes alone reduced the average response time for a particular page of ours which loaded 623 documents from 8.1 seconds to 1.4 seconds. keys#load_from_database went from 4.617 seconds to 0.23 seconds and these changes also helped other aspects of the app. Naturally, your mileage may vary.

I have re-run the mongo-mapper tests and they all still pass.

@wpeterson

Heads up, the travis build is broken due to changes on Travis's side.

This pull request fixes it if you want to cherry-pick the commit to get your PR build passing:

#451

@jnunemaker jnunemaker commented on the diff
...ugins/associations/many_embedded_polymorphic_proxy.rb
@@ -12,10 +21,19 @@ def replace(values)
protected
def find_target
- (@_values || []).map do |hash|
- child = polymorphic_class(hash).load(hash)
- assign_references(child)
- child
+ if !@_from_db

The if and else of this seem to be doing the same thing. Am I missing something?

In one case it is doing a load and setting the _from_db to false and in the other it is doing a new.

I understand that the boolean switch, but how does that affect things?

load would load the proxy the same way as load_from_database. The new, would end up setting the _new property to true, which would be wrong.

We truly are loading it from the database, just not evaluating what we received from the database right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jnunemaker jnunemaker commented on the diff
lib/mongo_mapper/plugins/associations.rb
@@ -72,10 +73,20 @@ def build_proxy(association)
end
def get_proxy(association)
- unless proxy = self.instance_variable_get(association.ivar)
- proxy = build_proxy(association)
+ value_from_db = instance_variable_get(:"@_association_#{association.name}")
+ if value_from_db
+ instance_variable_set :"@_association_#{association.name}", nil
+ unless proxy = self.instance_variable_get(association.ivar)
+ proxy = build_proxy(association)
+ end
+ association.embeddable? ? proxy.load_from_database(value_from_db) : proxy.replace(value_from_db)

What is the benefit of this? Seems like load_from_database and replace are identical.

In this particular case, it turns out that association should always be embeddable? and we can reduce that to just proxy.load_from_database.

In any case, the main difference is that load_from_database sets the @_from_db attribute to true and replace sets it to false.

In general, when calling the proxy's find_target, @_from_db controls whether it will do a load or a new. load will generally go to the one defined in keys, which may be replaced with the identity map. It is the method that would have been called by load_from_database if we didn't skip loading the associations.

If we used new, the _new property would get set to true, which we don't want.

The whole point is to defer the evaluating the proxy until it is needed. This saved us quite a bit of time when loading lists of documents. Especially, when it contained embedded documents we didn't need for the particular operation we were doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jnunemaker jnunemaker commented on the diff
lib/mongo_mapper/plugins/keys.rb
@@ -13,6 +13,9 @@ module Keys
module ClassMethods
def inherited(descendant)
+ descendant.instance_variable_set(:"@key_names", key_names.dup)

We definitely should be doing this stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jnunemaker jnunemaker commented on the diff
lib/mongo_mapper/plugins/keys.rb
((7 lines not shown))
end
end
def key?(key)
- keys.keys.include?(key.to_s)
+ keys.has_key?(key.to_s)

Good catch. I'll fix this in master right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jnunemaker

I really appreciate the effort towards improving performance. Performance hasn't been really focused on the past several releases and it needs to be again. I've picked a few of these changes into master already.

Some of the changes here are hard to read through and create some duplication problems (date conversion outside of Key#get/set).

Any chance you could paste an example document here? Or maybe put a mongodump somewhere that I could paste through. Also, what was the overall method for testing? Did you just start adding timing stuff and see what took the longest?

I'm not surprised that load from database takes the most time as that is where everything happens (typecasting, etc.). You definitely have me interested in speeding things up, but I want to do it in a way that doesn't affect duplication or readability. Does that make sense?

Again, thanks for bringing this up and any more help you can provide.

@jonpollock

The special Date handling may no longer be necessary. We weren't doing a final key.get(read_value) for the value we put into the @read{key} instance variable. I noticed that just before making the pull request and fixed that, but didn't go back and remove the date handling. I'll experiment with that and update the pull request. This would certainly make it cleaner.

For the most part we used newrelic's profiling capability to see these hotspots. In other cases, I just put timing into the code.

I'll see what I can do about getting you a sample document. The one that I noticed the keys.key? problem was our site settings document, which had 131 keys defined on it.

@edebill-moxiesoft

jonpollock: Don't forget that we went back and validated the performance gains at the web page level using jmeter after these changes went in. It made a huge difference on overall site performance. Some pages got 10x faster. Not every page, but particularly database-heavy ones did.

@cheald
MongoMapper member

I'm immensely interested in this changeset, as #read_key is the lion's share of my profiling efforts right now, but it breaks the test suite six ways from sunday. Do you all see the same issues?

@jnunemaker

The biggest improvement is just caching read attribute in an instance variable. Right now MM stupidly retypcasts all the time instead of just reading. I definitely want to fix that. I suppose I could bring in just the read attribute caching, but overall I'm not a huge fan of this changeset as I find it really hard to follow.

@cheald
MongoMapper member

I'm actually working on just the read attribute caching in a separate branch, as I also had a hard time following this changeset, but the read attribute caching has very obvious benefits.

@cheald
MongoMapper member

https://github.com/cheald/mongomapper/commits/faster_read_keys/ has my changes so far. I'm working on some further experiments, though, as a not-insignificant amount of time is spent in String#intern and Keys#keys, both of which I'm trying to whittle down. My gut is that this can all be significantly faster.

I'll submit a pull request once I've concluded this round of experiments.

@cheald
MongoMapper member

This PR has been mostly deprecated by the recent overhauls to Keys. However, I still like the idea of deferring embedded document loads until they're actually invoked. The following does that nicely.

Thoughts? This is an easy enough change, and quick tests seem to indicate that it substantially improves load performance in cases where you have embedded documents that you don't actually touch (for obvious reasons).

diff --git a/lib/mongo_mapper/plugins/associations.rb b/lib/mongo_mapper/plugins/associations.rb
index d69a8c7..af7ae5a 100644
--- a/lib/mongo_mapper/plugins/associations.rb
+++ b/lib/mongo_mapper/plugins/associations.rb
@@ -73,13 +73,20 @@ module MongoMapper
       def build_proxy(association)
         proxy = association.proxy_class.new(self, association)
         self.instance_variable_set(association.ivar, proxy)
-
         proxy
       end

       def get_proxy(association)
         proxy = self.instance_variable_get(association.ivar) if instance_variable_defined?(association.ivar)
         proxy ||= build_proxy(association)
+        realize_association(association.name, proxy)
+      end
+
+      def realize_association(name, proxy)
+        if @__deferred_associations && @__deferred_associations.key?(name)
+          proxy.replace @__deferred_associations.delete(name)
+        end
+        proxy
       end

       def save_to_collection(options={})
diff --git a/lib/mongo_mapper/plugins/keys.rb b/lib/mongo_mapper/plugins/keys.rb
index 3d3de71..de43ed4 100644
--- a/lib/mongo_mapper/plugins/keys.rb
+++ b/lib/mongo_mapper/plugins/keys.rb
@@ -289,9 +289,13 @@ module MongoMapper

           # Init the keys ivar. Due to the volume of times this method is called, we don't want it in a method.
           @_mm_keys ||= self.class.keys
+          @__deferred_associations ||= {}

           attrs.each do |key, value|
-            if !@_mm_keys.key?(key) && respond_to?(:"#{key}=")
+            s_key = key.to_sym
+            if associations.key?(s_key)
+              @__deferred_associations[s_key] = value
+            elsif !@_mm_keys.key?(key) && respond_to?(:"#{key}=")
               self.send(:"#{key}=", value)
             else
               internal_write_key key, value, false
@jonpollock

The keys.rb file has certainly changed since I last looked at it. Yes, this seems like an appropriate way to solve it. We have found that one of the biggest performance gains was in deferring the loading of these embedded documents.

@smtlaissezfaire

Does that last patch work when #attributes() is called?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 19, 2012
  1. @jonpollock
Commits on Sep 20, 2012
  1. @jonpollock

    Update .travis.yml

    jonpollock committed
    Fix travis build.
Commits on Sep 22, 2012
  1. @jonpollock
View
1 .travis.yml
@@ -10,3 +10,4 @@ rvm:
notifications:
email: false
bundler_args: --without development
+services: mongodb
View
17 lib/mongo_mapper/plugins/associations.rb
@@ -65,6 +65,7 @@ def embedded_associations
end
def build_proxy(association)
+ instance_variable_set :"@_association_#{association.name}", nil
proxy = association.proxy_class.new(self, association)
self.instance_variable_set(association.ivar, proxy)
@@ -72,10 +73,20 @@ def build_proxy(association)
end
def get_proxy(association)
- unless proxy = self.instance_variable_get(association.ivar)
- proxy = build_proxy(association)
+ value_from_db = instance_variable_get(:"@_association_#{association.name}")
+ if value_from_db
+ instance_variable_set :"@_association_#{association.name}", nil
+ unless proxy = self.instance_variable_get(association.ivar)
+ proxy = build_proxy(association)
+ end
+ association.embeddable? ? proxy.load_from_database(value_from_db) : proxy.replace(value_from_db)

What is the benefit of this? Seems like load_from_database and replace are identical.

In this particular case, it turns out that association should always be embeddable? and we can reduce that to just proxy.load_from_database.

In any case, the main difference is that load_from_database sets the @_from_db attribute to true and replace sets it to false.

In general, when calling the proxy's find_target, @_from_db controls whether it will do a load or a new. load will generally go to the one defined in keys, which may be replaced with the identity map. It is the method that would have been called by load_from_database if we didn't skip loading the associations.

If we used new, the _new property would get set to true, which we don't want.

The whole point is to defer the evaluating the proxy until it is needed. This saved us quite a bit of time when loading lists of documents. Especially, when it contained embedded documents we didn't need for the particular operation we were doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ proxy
+ else
+ unless proxy = self.instance_variable_get(association.ivar)
+ proxy = build_proxy(association)
+ end
+ proxy
end
- proxy
end
def save_to_collection(options={})
View
26 lib/mongo_mapper/plugins/associations/many_embedded_polymorphic_proxy.rb
@@ -3,7 +3,16 @@ module MongoMapper
module Plugins
module Associations
class ManyEmbeddedPolymorphicProxy < EmbeddedCollection
+ def load_from_database(values)
+ @_from_db = true
+ @_values = values.map do |v|
+ v.respond_to?(:attributes) ? v.attributes.merge(association.type_key_name => v.class.name) : v
+ end
+ reset
+ end
+
def replace(values)
+ @_from_db = false
@_values = values.map do |v|
v.respond_to?(:attributes) ? v.attributes.merge(association.type_key_name => v.class.name) : v
end
@@ -12,10 +21,19 @@ def replace(values)
protected
def find_target
- (@_values || []).map do |hash|
- child = polymorphic_class(hash).load(hash)
- assign_references(child)
- child
+ if !@_from_db

The if and else of this seem to be doing the same thing. Am I missing something?

In one case it is doing a load and setting the _from_db to false and in the other it is doing a new.

I understand that the boolean switch, but how does that affect things?

load would load the proxy the same way as load_from_database. The new, would end up setting the _new property to true, which would be wrong.

We truly are loading it from the database, just not evaluating what we received from the database right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ (@_values || []).map do |hash|
+ child = polymorphic_class(hash).new(hash)
+ assign_references(child)
+ child
+ end
+ else
+ @_from_db = false
+ (@_values || []).map do |hash|
+ child = polymorphic_class(hash).load(hash)
+ assign_references(child)
+ child
+ end
end
end
View
24 lib/mongo_mapper/plugins/associations/many_embedded_proxy.rb
@@ -3,7 +3,16 @@ module MongoMapper
module Plugins
module Associations
class ManyEmbeddedProxy < EmbeddedCollection
+ def load_from_database(values)
+ @_from_db = true
+ @_values = (values || []).map do |v|
+ v.respond_to?(:attributes) ? v.attributes : v
+ end
+ reset
+ end
+
def replace(values)
+ @_from_db = false
@_values = (values || []).map do |v|
v.respond_to?(:attributes) ? v.attributes : v
end
@@ -12,9 +21,18 @@ def replace(values)
private
def find_target
- (@_values || []).map do |attrs|
- klass.load(attrs).tap do |child|
- assign_references(child)
+ if !@_from_db
+ (@_values || []).map do |attrs|
+ klass.new(attrs).tap do |child|
+ assign_references(child)
+ end
+ end
+ else
+ @_from_db = false
+ (@_values || []).map do |attrs|
+ klass.load(attrs).tap do |child|
+ assign_references(child)
+ end
end
end
end
View
26 lib/mongo_mapper/plugins/associations/one_embedded_polymorphic_proxy.rb
@@ -3,17 +3,33 @@ module MongoMapper
module Plugins
module Associations
class OneEmbeddedPolymorphicProxy < OneEmbeddedProxy
+ def load_from_database(value)
+ @_from_db = true
+ @value = value.respond_to?(:attributes) ? value.attributes.merge(association.type_key_name => value.class.name) : value
+ reset
+ end
+
def replace(value)
+ @_from_db = false
@value = value.respond_to?(:attributes) ? value.attributes.merge(association.type_key_name => value.class.name) : value
reset
end
-
+
protected
def find_target
- if @value
- child = polymorphic_class(@value).load(@value)
- assign_references(child)
- child
+ if !@_from_db
+ if @value
+ child = polymorphic_class(@value).new(@value)
+ assign_references(child)
+ child
+ end
+ else
+ @_from_db = false
+ if @value
+ child = polymorphic_class(@value).load(@value)
+ assign_references(child)
+ child
+ end
end
end
View
13 lib/mongo_mapper/plugins/associations/one_embedded_proxy.rb
@@ -12,6 +12,19 @@ def build(attributes={})
def replace(doc)
if doc.respond_to?(:attributes)
+ @target = klass.new(doc.attributes)
+ elsif doc.nil?
+ @target = nil
+ else
+ @target = klass.new(doc)
+ end
+ assign_references(@target)
+ loaded
+ @target
+ end
+
+ def load_from_database(doc)
+ if doc.respond_to?(:attributes)
@target = klass.load(doc.attributes)
else
@target = klass.load(doc)
View
1 lib/mongo_mapper/plugins/clone.rb
@@ -8,6 +8,7 @@ def initialize_copy(other)
@_new = true
@_destroyed = false
@_id = nil
+ remove_instance_variable(:"@_read__id") if instance_variable_defined? (:"@_read__id")
associations.each do |name, association|
instance_variable_set(association.ivar, nil)
end
View
69 lib/mongo_mapper/plugins/keys.rb
@@ -13,6 +13,8 @@ module Keys
module ClassMethods
def inherited(descendant)
+ descendant.instance_variable_set(:"@key_names", key_names.dup)

We definitely should be doing this stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ descendant.instance_variable_set(:"@object_id_keys", object_id_keys.dup)
descendant.instance_variable_set(:@keys, keys.dup)
super
end
@@ -21,6 +23,14 @@ def keys
@keys ||= {}
end
+ def key_names
+ @key_names ||= []
+ end
+
+ def object_id_keys
+ @object_id_keys ||= []
+ end
+
def key(*args)
Key.new(*args).tap do |key|
keys[key.name] = key
@@ -28,11 +38,13 @@ def key(*args)
create_key_in_descendants(*args)
create_indexes_for(key)
create_validations_for(key)
+ key_names << key.name
+ object_id_keys << key.name.to_sym if key.type == ObjectId
end
end
def key?(key)
- keys.keys.include?(key.to_s)
+ keys.has_key?(key.to_s)

Good catch. I'll fix this in master right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
end
def using_object_id?
@@ -198,7 +210,7 @@ def attributes
end
embedded_associations.each do |association|
- if documents = instance_variable_get(association.ivar)
+ if documents = get_proxy(association)
if association.is_a?(Associations::OneAssociation)
attrs[association.name] = documents.to_mongo
else
@@ -231,7 +243,7 @@ def update_attribute(name, value)
end
def id
- _id
+ self[:_id]
end
def id=(value)
@@ -256,7 +268,18 @@ def keys
end
def key_names
- keys.keys
+ self.class.key_names
+ end
+
+ def has_attribute?(name)
+ attr_name = name.chomp('=');
+ return true if keys.has_key? attr_name
+ associations.each_value do |assoc|
+ if assoc.embeddable?
+ return true if assoc.name == attr_name
+ end
+ end
+ false
end
def non_embedded_keys
@@ -270,11 +293,24 @@ def embedded_keys
private
def load_from_database(attrs)
return if attrs.blank?
- attrs.each do |key, value|
- if respond_to?(:"#{key}=") && !self.class.key?(key)
- self.send(:"#{key}=", value)
+ attr_keys = attrs.keys
+ model_keys = attr_keys & key_names
+ other_keys = attr_keys - key_names
+
+ model_keys.each do |key|
+ write_key_from_database(key, attrs[key])
+ end
+
+ other_keys.each do |key, value|
+ if associations.has_key?(key.to_sym)
+ write_association_key_from_database(key, attrs[key])
else
- self[key] = value
+ if respond_to?(:"#{key}=")
+ self.send(:"#{key}=", attrs[key]) if respond_to?(:"#{key}=")
+ else
+ # so, add the attribute to the model anyway.
+ self[key] = attrs[key]
+ end
end
end
end
@@ -290,10 +326,13 @@ def set_parent_document(key, value)
end
def read_key(key_name)
+ read_value = instance_variable_get(:"@_read_#{key_name}")
+ return read_value if read_value
if key = keys[key_name.to_s]
value = key.get(instance_variable_get(:"@#{key_name}"))
set_parent_document(key, value)
instance_variable_set(:"@#{key_name}", value)
+ instance_variable_set(:"@_read_#{key_name}", value)
end
end
@@ -305,7 +344,19 @@ def write_key(name, value)
key = keys[name.to_s]
set_parent_document(key, value)
instance_variable_set :"@#{name}_before_type_cast", value
- instance_variable_set :"@#{name}", key.set(value)
+
+ read_value = key.set(value)
+ instance_variable_set :"@#{name}", read_value
+ instance_variable_set(:"@_read_#{name}", key.get(read_value))
+ end
+
+ def write_key_from_database(key, value)
+ instance_variable_set :"@#{key}", value
+ instance_variable_set :"@#{key}_before_type_cast", value
+ end
+
+ def write_association_key_from_database(key, value)
+ instance_variable_set :"@_association_#{key}", value
end
end
end
Something went wrong with that request. Please try again.