Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Identity map #76

Merged
99 commits merged into from Feb 18, 2011

Conversation

Projects
None yet
Contributor

miloops commented Oct 8, 2010

This is the implementation of Identity Map for ActiveRecord, Marcin Raczkowski's project for Ruby Summer Of Code (http://rubysoc.org/projects):

Project #12: ActiveRecord Identity Map

Our goal is provide plugable identity map implementation for ActiveRecord. An identity map is a design pattern used to improve performance by providing a in-memory cache to prevent duplicate retrieval of the same object data from the database, in our case in context of the same request or thread.

If the requested data has already been loaded from the database, the identity map returns the same instance of the already instantiated object, but if it has not been loaded yet, it loads it and stores the new object in the map. The main gains of this project will be performance improvement and memory consumption reduction.

Contributor

josevalim commented Oct 8, 2010

Just one note, for those interested in trying it out, you need to add to your Gemfile:

gem "rails", :git => "git://github.com/miloops/rails.git", :branch => "identity_map"
gem "weakling", :git => "git://github.com/swistak/weakling.git"
gem "rack", :git => "git://github.com/rack/rack.git"
gem "arel", :git => "git://github.com/rails/arel.git"

Soleone commented Oct 8, 2010

Great stuff! This is very useful in large projects where each request has to load e.g. a "user" model or another context every time. We rolled our own customer Identity Map implementation in a very large app and definitely observed increased performance so I'm glad an official solution is in the works. Thanks!

Contributor

bensie commented Oct 8, 2010

Awesome work!

Contributor

loe commented Oct 8, 2010

This is amazing! I think the best feature is being able to validate on both sides of an association without having to manually stitch them together in the controller.

@author.books.build

book validates_presence_of :author and author validates_presence_of :book

doing b = author.books.build; b.author = @author was frustrating at best!

Contributor

iain commented Oct 8, 2010

Nice! I would love to write controller specs with mocking #save without having to mock .find:

site = Factory :site
site.should_receive(:update_attributes).with('foo' => 'bar').and_return(true)
put :update, :id => site.to_param, :site => { :foo => "bar" }
response.should redirect_to(site_url(site))

This actually works now!

Anyway, I've tried it on two real live Rails 3.0 apps and put them on rails master and miloops identimap_branch.

The first app showed no difference in performance. It has 332 examples, all 3 versions took around 20 seconds to run and around 118MB of RAM used. In the identity_map branch there was one failing spec.

A second project (544 specs) did show some differences between Rails 3.0 and the master branch, but no difference in performance in the identity_map branch. But there were a lot of failing specs though.

3.0 stable: 35 seconds, 272MB, 0 failing specs
master: 29 seconds, 260MB, 2 failing specs
identity_map: 29 seconds 260MB, 13 failing specs

These weren't real benchmarks or anything, I just ran rake spec and observed memory usage myself.
The failing specs were all different errors, but all related to updating and finding records.

Oh, I couldn't get cucumber to run on master or identity_map, which is a shame, because that would've been more representative of real usage.

Contributor

josevalim commented Oct 8, 2010

Awesome feedback Iain! If you have some extra time, do you think you can give us more information about these extra errors you got?

Contributor

iain commented Oct 8, 2010

It's past midnight here, so I'll be brief:

From the first app, a test that failed only in the identity_map branch:

site = Factory :site
other_site = Factory :site
recruiter = Factory :recruiter, :first_name => "before-update", :site_id => site.id
attrs = Factory.attributes_for(:recruiter, :first_name => "after-update", :site_id => other_site.id)
put :update, :id => recruiter.to_param, :recruiter => attrs
recruiter.reload.first_name.should == "after-update" # succeeds
recruiter.site.should == other_site # fails, still pointing to site, not other_site.

I can't really see what's wrong here and why the first_name field does update, but the site_id doesn't. Especially since it works in 3.0 and rails-master. It might be authentication/authorization that doesn't go quite well, because certain signed in users are not allowed to change the site_id. (I'm using devise, cancan and inherited_resources in this controller).

The other project has been around for a lot longer (was started with rails 3.0.0.beta, if I remember correctly) and has a lot more gem dependencies.

There were some errors I can understand that come from the identity map. I have these classes:

class User < ActiveRecord::Base
end
module Authentication
  class User < ::User
  end
end

And it sometimes picks the wrong one. This pattern sounded really cool when I first heard about it, but caused me nothing but headaches, but that's besides the point.

I got this one a couple of times:

put :update, :project_id => project.id, :id => comment.id, :comment => { :body => "" }, :format => :js
JSON.parse(response.body).should have_key('errors') # fails

And some that look like this:

user = Factory :user
comment1 = Factory :comment, :user => user
comment2 = Factory :comment, :user => user
subject.comments << comment1 << comment2
subject.save!
subject.reload.comments.should == [ comment1, comment2 ]

Where I get just one comment instead of both. But when I removed the call to reload it worked again.
It works without the reload in 3.0 too, and I'm not entirely sure why I put it there in the first place. I guess putting reload in is one of the first things I try to do when debugging something.

I guess the majority of failing specs fail because they happened to be accidentally passing before. I found a couple instances where I was testing the wrong object. So I think this update will be a huge improvement and help you find bugs faster than before.

Edit: well, that didn't turn out to be very brief at all! :)

Contributor

miloops commented Oct 14, 2010

Hey Iain, you should try it now, in the latests commits i added a middleware to flush identity map on each requests, flush IM on tests and many other things that you can check out in today's commits.

Feel free to add me on IM miloops at gmail in case to discuss any problem you are having.

Contributor

iain commented Oct 14, 2010

It seems to work fine when running the the server, but I still have some issues running my specs. Like this one:

  subject { Factory.build :profile, :first_name => "Jan" }
  it "accepts utf8" do
    subject.first_name = "☃"
    subject.save!
    subject.reload.first_name.should == "☃"
  end

When I run rake spec:models, or rspec spec/models/profile_spec.rb, it works passes.
When I run rake spec it fails. Weirdly enough, it returns the default value from the factory, even though I never mention that in my specs:

  6) Profile accepts utf8
     Failure/Error: subject.reload.first_name.should == "☃"
     expected: "\342\230\203",
          got: "Kees" (using ==)
     # ./spec/models/profile_spec.rb:34

I don't have any time anymore tonight, but I'll be happy to discuss it with you soon.

Contributor

josevalim commented Oct 14, 2010

Iain, you are using rspec, so a callback that we added to ActiveSupport::TestCase is not being executed. Please try adding the code below (it should run before each test in the whole suite):

before(:each) do
  ActiveRecord::IdentityMap.clear
end

It will likely solve the issue. :)

Contributor

iain commented Oct 15, 2010

I found one bug in rails master (not specific to identity map):

>> Project.select(:id).map(&:id)
  Project Load (2.6ms)  SELECT 'id' FROM `projects`
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, etc...
>> Project.select('id').map(&:id)
  Project Load (0.7ms)  SELECT id FROM `projects`
=> [7, 16, 76, 92, 98, 101, 102, 116, etc....

On a more related note, the only other issue I could find was that ActiveRecord::IdentityMap.clear doesn't clear the aggregation cache. I'm not sure whether it should, but it is something that broke my controller spec:

it "shows errors for invalid comment when html" do
  comment.clear_aggregation_cache
  put :update, :project_id => project.id, :id => comment.id, :comment => { :body => "" }
  assigns(:comment).should_not be_valid # fails without the clear_aggregation_cache 2 lines up
  response.should render_template(:edit)
end

I couldn't find anything else.

swistak and others added some commits Aug 28, 2010

IdentityMap - misc fixes
- Added IdentityMap to be included into AR::Base
- Fixed bug with Mysql namespace missing when running tests only for sqlite
- Added sqlite as default connection
Remove objects from identity map if save failed, otherwise finding ag…
…ain the same record will have invalid attributes.
Remove objects from identity map if save! failed, otherwise finding a…
…gain the same record will have invalid attributes.
Don't use identity map if loading readonly records, this will prevent…
… changing readonly status on already loaded records.
Use strings primary keys in identity map keys to avoid problems with …
…casting and also allow strings primary keys.
Use association_class method which returns the reflection class, this…
… method is redefined in polymorphic belongs to associations.
Revert "IdentityMap - Adjustments to test cases"
This reverts commit 4db9dca55e3acc2c59f252eb83ecb83db5f4b81b.

Conflicts:

	activerecord/test/cases/identity_map_test.rb
Clear IdentityMap before continue this test, we can do this here beca…
…use store_full_sti_class is not supposed to change during "runtime".
Revert "Use ActiveSupport::WeakHash for MRI, JRuby prefers Weakling."
This reverts commit 3cddebc2402eb71f2806e8b2119dc3efdceb4662.

Conflicts:

	activerecord/lib/active_record/identity_map.rb
	activesupport/lib/active_support/weak_hash.rb
Merge remote branch 'rails/master' into identity_map
Conflicts:
	activerecord/lib/active_record/associations/association_proxy.rb
	activerecord/lib/active_record/autosave_association.rb
	activerecord/lib/active_record/base.rb
	activerecord/lib/active_record/persistence.rb

Jose,

How does this mesh with Rack::FiberPool and EventMachine-based DB adapters that run every request in its own fiber?

Thanks for your work on this,

-Alex

miloops added some commits Feb 15, 2011

Merge remote branch 'rails/master' into identity_map
Conflicts:
	activerecord/examples/performance.rb
	activerecord/lib/active_record/association_preload.rb
	activerecord/lib/active_record/associations.rb
	activerecord/lib/active_record/associations/association_proxy.rb
	activerecord/lib/active_record/autosave_association.rb
	activerecord/lib/active_record/base.rb
	activerecord/lib/active_record/nested_attributes.rb
	activerecord/test/cases/relations_test.rb
Merge remote branch 'rails/master' into identity_map
Conflicts:
	activerecord/lib/active_record/associations/association.rb
	activerecord/lib/active_record/fixtures.rb

hamin commented Apr 14, 2011

AWESOME!

very cool! i am waiting for release in stable

jweiss commented Apr 26, 2011

Why is this tied to ActiveRecord and not an ActiveModel functionality?
I wanted to add support for SimplyStored (CouchDB wrapper) but it seems wrong to require ActiveRecord...

Contributor

josevalim commented Apr 26, 2011

I believe the part of IdentityMap that is agnostic is actually quite small. Most of concerns are actually in cleaning up the identity map and identifying all the situations that require so. If you think there is a significant part of the identity map that could be moved to ActiveModel, please do provide a patch!

jweiss commented Apr 26, 2011

I'm taking about https://github.com/rails/rails/blob/master/activerecord/lib/active_record/identity_map.rb

This looks totaly generic to me and could be copied for my SimplyStored IdentityMap. I'll see that I extract it.

j-manu pushed a commit to j-manu/rails that referenced this pull request Jan 18, 2012

Merge pull request #76 from makoto/master
UrlEncodedPairParser is deprecated, but still used as an example

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment