Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Identity map #76

Merged
99 commits merged into from Feb 18, 2011

Conversation

Projects
None yet
Contributor

miloops commented Oct 8, 2010

This is the implementation of Identity Map for ActiveRecord, Marcin Raczkowski's project for Ruby Summer Of Code (http://rubysoc.org/projects):

Project #12: ActiveRecord Identity Map

Our goal is provide plugable identity map implementation for ActiveRecord. An identity map is a design pattern used to improve performance by providing a in-memory cache to prevent duplicate retrieval of the same object data from the database, in our case in context of the same request or thread.

If the requested data has already been loaded from the database, the identity map returns the same instance of the already instantiated object, but if it has not been loaded yet, it loads it and stores the new object in the map. The main gains of this project will be performance improvement and memory consumption reduction.

Contributor

josevalim commented Oct 8, 2010

Just one note, for those interested in trying it out, you need to add to your Gemfile:

gem "rails", :git => "git://github.com/miloops/rails.git", :branch => "identity_map"
gem "weakling", :git => "git://github.com/swistak/weakling.git"
gem "rack", :git => "git://github.com/rack/rack.git"
gem "arel", :git => "git://github.com/rails/arel.git"

Soleone commented Oct 8, 2010

Great stuff! This is very useful in large projects where each request has to load e.g. a "user" model or another context every time. We rolled our own customer Identity Map implementation in a very large app and definitely observed increased performance so I'm glad an official solution is in the works. Thanks!

Contributor

bensie commented Oct 8, 2010

Awesome work!

Contributor

loe commented Oct 8, 2010

This is amazing! I think the best feature is being able to validate on both sides of an association without having to manually stitch them together in the controller.

@author.books.build

book validates_presence_of :author and author validates_presence_of :book

doing b = author.books.build; b.author = @author was frustrating at best!

Contributor

iain commented Oct 8, 2010

Nice! I would love to write controller specs with mocking #save without having to mock .find:

site = Factory :site
site.should_receive(:update_attributes).with('foo' => 'bar').and_return(true)
put :update, :id => site.to_param, :site => { :foo => "bar" }
response.should redirect_to(site_url(site))

This actually works now!

Anyway, I've tried it on two real live Rails 3.0 apps and put them on rails master and miloops identimap_branch.

The first app showed no difference in performance. It has 332 examples, all 3 versions took around 20 seconds to run and around 118MB of RAM used. In the identity_map branch there was one failing spec.

A second project (544 specs) did show some differences between Rails 3.0 and the master branch, but no difference in performance in the identity_map branch. But there were a lot of failing specs though.

3.0 stable: 35 seconds, 272MB, 0 failing specs
master: 29 seconds, 260MB, 2 failing specs
identity_map: 29 seconds 260MB, 13 failing specs

These weren't real benchmarks or anything, I just ran rake spec and observed memory usage myself.
The failing specs were all different errors, but all related to updating and finding records.

Oh, I couldn't get cucumber to run on master or identity_map, which is a shame, because that would've been more representative of real usage.

Contributor

josevalim commented Oct 8, 2010

Awesome feedback Iain! If you have some extra time, do you think you can give us more information about these extra errors you got?

Contributor

iain commented Oct 8, 2010

It's past midnight here, so I'll be brief:

From the first app, a test that failed only in the identity_map branch:

site = Factory :site
other_site = Factory :site
recruiter = Factory :recruiter, :first_name => "before-update", :site_id => site.id
attrs = Factory.attributes_for(:recruiter, :first_name => "after-update", :site_id => other_site.id)
put :update, :id => recruiter.to_param, :recruiter => attrs
recruiter.reload.first_name.should == "after-update" # succeeds
recruiter.site.should == other_site # fails, still pointing to site, not other_site.

I can't really see what's wrong here and why the first_name field does update, but the site_id doesn't. Especially since it works in 3.0 and rails-master. It might be authentication/authorization that doesn't go quite well, because certain signed in users are not allowed to change the site_id. (I'm using devise, cancan and inherited_resources in this controller).

The other project has been around for a lot longer (was started with rails 3.0.0.beta, if I remember correctly) and has a lot more gem dependencies.

There were some errors I can understand that come from the identity map. I have these classes:

class User < ActiveRecord::Base
end
module Authentication
  class User < ::User
  end
end

And it sometimes picks the wrong one. This pattern sounded really cool when I first heard about it, but caused me nothing but headaches, but that's besides the point.

I got this one a couple of times:

put :update, :project_id => project.id, :id => comment.id, :comment => { :body => "" }, :format => :js
JSON.parse(response.body).should have_key('errors') # fails

And some that look like this:

user = Factory :user
comment1 = Factory :comment, :user => user
comment2 = Factory :comment, :user => user
subject.comments << comment1 << comment2
subject.save!
subject.reload.comments.should == [ comment1, comment2 ]

Where I get just one comment instead of both. But when I removed the call to reload it worked again.
It works without the reload in 3.0 too, and I'm not entirely sure why I put it there in the first place. I guess putting reload in is one of the first things I try to do when debugging something.

I guess the majority of failing specs fail because they happened to be accidentally passing before. I found a couple instances where I was testing the wrong object. So I think this update will be a huge improvement and help you find bugs faster than before.

Edit: well, that didn't turn out to be very brief at all! :)

Contributor

miloops commented Oct 14, 2010

Hey Iain, you should try it now, in the latests commits i added a middleware to flush identity map on each requests, flush IM on tests and many other things that you can check out in today's commits.

Feel free to add me on IM miloops at gmail in case to discuss any problem you are having.

Contributor

iain commented Oct 14, 2010

It seems to work fine when running the the server, but I still have some issues running my specs. Like this one:

  subject { Factory.build :profile, :first_name => "Jan" }
  it "accepts utf8" do
    subject.first_name = "☃"
    subject.save!
    subject.reload.first_name.should == "☃"
  end

When I run rake spec:models, or rspec spec/models/profile_spec.rb, it works passes.
When I run rake spec it fails. Weirdly enough, it returns the default value from the factory, even though I never mention that in my specs:

  6) Profile accepts utf8
     Failure/Error: subject.reload.first_name.should == "☃"
     expected: "\342\230\203",
          got: "Kees" (using ==)
     # ./spec/models/profile_spec.rb:34

I don't have any time anymore tonight, but I'll be happy to discuss it with you soon.

Contributor

josevalim commented Oct 14, 2010

Iain, you are using rspec, so a callback that we added to ActiveSupport::TestCase is not being executed. Please try adding the code below (it should run before each test in the whole suite):

before(:each) do
  ActiveRecord::IdentityMap.clear
end

It will likely solve the issue. :)

Contributor

iain commented Oct 15, 2010

I found one bug in rails master (not specific to identity map):

>> Project.select(:id).map(&:id)
  Project Load (2.6ms)  SELECT 'id' FROM `projects`
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, etc...
>> Project.select('id').map(&:id)
  Project Load (0.7ms)  SELECT id FROM `projects`
=> [7, 16, 76, 92, 98, 101, 102, 116, etc....

On a more related note, the only other issue I could find was that ActiveRecord::IdentityMap.clear doesn't clear the aggregation cache. I'm not sure whether it should, but it is something that broke my controller spec:

it "shows errors for invalid comment when html" do
  comment.clear_aggregation_cache
  put :update, :project_id => project.id, :id => comment.id, :comment => { :body => "" }
  assigns(:comment).should_not be_valid # fails without the clear_aggregation_cache 2 lines up
  response.should render_template(:edit)
end

I couldn't find anything else.

swistak and others added some commits Aug 28, 2010

@swistak @miloops swistak + miloops IdentityMap - Adding Weakling and IM Base as concern 3df4460
@swistak @miloops swistak + miloops IdentityMap - Tests for IM ce3ea55
@swistak @miloops swistak + miloops IdentityMap - misc fixes
- Added IdentityMap to be included into AR::Base
- Fixed bug with Mysql namespace missing when running tests only for sqlite
- Added sqlite as default connection
ce66bfd
@swistak @miloops swistak + miloops IdentityMap - adding and removing of records on create/update c357f84
@swistak @miloops swistak + miloops Looping prevention for autosave relations on validation and creation 3ab625f
@swistak @miloops swistak + miloops IdentityMap - Adjustments to test cases ccb335d
@swistak @miloops swistak + miloops IdentityMap - Fixes problem with dirty attributes eebf33a
@miloops miloops Remove object from identity map if transaction failed. e88fd02
@miloops miloops Use identity mapper only if enabled. 09f12a1
@miloops miloops Remove associated records from identity map if any raised an unexpect…
…ed exception.
7df6175
@miloops miloops Prevent pushing duplicated records when using identity map. dd6e680
@miloops miloops Disable identity map when loading associated records from habtm. fa11c60
@miloops miloops Use yield instead block argument. f0eaf11
@miloops miloops Uncomment test and make it work. f1913ad
@miloops miloops Add test to show that when IdentityMap is disabled finders returns di…
…fferent objects.
a9edd6c
@miloops miloops Remove objects from identity map if save failed, otherwise finding ag…
…ain the same record will have invalid attributes.
e83f5a0
@miloops miloops Remove objects from identity map if save! failed, otherwise finding a…
…gain the same record will have invalid attributes.
f3adddb
@miloops miloops Add test for update_attributes and identity map. 6b0b95f
@miloops miloops Associated objects are assigned from identity map if enabled and cont…
…ains associated object.
234bbe5
@miloops miloops Remove associated object from identity map when reloading. 6f68447
@miloops miloops Add tests for inverse relations when using has many and identity map. 448420f
@miloops miloops Don't use identity map if loading readonly records, this will prevent…
… changing readonly status on already loaded records.
c0ad5e4
@miloops miloops implicit_readonly is not set until records are loaded, just check rea…
…donly_value and then set readonly status.
a3210d9
@miloops miloops Use strings primary keys in identity map keys to avoid problems with …
…casting and also allow strings primary keys.
0f16949
@miloops miloops Change test models. cd6d6fc
@miloops miloops Testing objects equality is what we are looking for here, no query ca…
…ching.
21483cb
@miloops miloops Test with target object, failing on 1.9.2 when comparing object again…
…st association proxy object.
ab42382
@miloops miloops Set Identity Map disabled by default. Enable it for testing. 6d58b27
@swistak @miloops swistak + miloops Separated initialization 24485b9
@swistak @miloops swistak + miloops Test reorganization 7404505
@swistak @miloops swistak + miloops Reeject attributes even if association is loaded 76f33dc
@miloops miloops Fix test name and typo. 003e4fb
@miloops miloops Added config syntax to enable/disable identity map: config.active_rec…
…ord.identity_map = true
da721da
@miloops miloops Add docs to Identity Map. a62f722
@miloops miloops IdentityMap is enabled by default. 4f3b8e1
@miloops miloops Use hash[:Post][1] style identity maps for each table. 301dd3d
@miloops miloops Use ActiveSupport::WeakHash for MRI, JRuby prefers Weakling. ada0149
@miloops miloops Weakling is only required for JRuby. 8669933
@miloops miloops Use conditional to avoid warnings. 0873d1e
@miloops miloops Check if constant is defined in AR, if not this can cause errors when…
… using polymorphic associations.
96cc08f
@miloops miloops Add initial tests for WeakHash. 4da31d2
@miloops miloops Use IdentityMap middleware to flush map on each request. f3722a3
@miloops miloops Use association_class method which returns the reflection class, this…
… method is redefined in polymorphic belongs to associations.
c036caf
@miloops miloops Flush IdentityMap when running tests. 87aa913
@miloops miloops Use just one repository and keep it in the current thread. d13df4c
@miloops miloops Change API name, we don't need any param. 3a34ae0
@miloops miloops Use block syntax in IdentityMap middleware. 08c37b7
@miloops miloops Bring back "Reject attributes even if association is loaded" after re…
…basing to master.
333cdcd
@miloops miloops Don't wrap into identity map if it is disabled. 48edab9
@miloops miloops Don't load IdentityMap middleware if not enabled. Simplify middleware. 69b627e
@miloops miloops Revert "IdentityMap - Adjustments to test cases"
This reverts commit 4db9dca55e3acc2c59f252eb83ecb83db5f4b81b.

Conflicts:

	activerecord/test/cases/identity_map_test.rb
93daf1b
@miloops miloops Ups, forgot to remove one conflict tag from previous commit. 9fe0dbf
@miloops miloops Fix number of queries performed in tests. 4015819
@miloops miloops Call super setup in this test. edb69b9
@miloops miloops Clear IdentityMap before continue this test, we can do this here beca…
…use store_full_sti_class is not supposed to change during "runtime".
4a0a160
@miloops miloops Added method to IM to remove objects by class and id. Then used it to…
… remove objects when updating counters.
024bc70
@miloops miloops Query objects if readonly_value is false, skip them only if nil. b9e869a
@miloops miloops Test setup method should clean up IM. 095c445
@miloops miloops Remove associated objects from IM when clearing them from association…
… cache.
5ee3663
@miloops miloops Don't change tests, fix code: if locking is enabled skip IM. 7892543
@miloops miloops Update number of queries executed instead of avoiding IM. 6cd1224
@miloops miloops Refactor associations cache removal from IM. (ht: Aaron Patterson) 7d563f9
@miloops miloops We don't need to dup key, since only value is weak. a84add0
@miloops miloops Set IdentityMap disabled by default. 938f168
@miloops miloops Read from config, because AR may not be loaded yet. 421643b
@miloops miloops Revert "Use ActiveSupport::WeakHash for MRI, JRuby prefers Weakling."
This reverts commit 3cddebc2402eb71f2806e8b2119dc3efdceb4662.

Conflicts:

	activerecord/lib/active_record/identity_map.rb
	activesupport/lib/active_support/weak_hash.rb
2fd48c8
@miloops miloops Usa Hash instead of WeakHash. dd166fa
@miloops miloops No need to check returned object now that weakhash is gone. 22696b5
@miloops miloops Add test using identity map and select. 1c88d59
@miloops miloops IM is disabled by default. b626f3e
@miloops miloops Enable IM in performance script unless IM=disabled is set when runnin…
…g it.
54f924c
@miloops miloops We have to check object class to avoid issues when using STI. 1d530e2
@miloops miloops Fix typo. d472241
@miloops miloops Enable IdentityMap when generating new apps. 375aaa9
@miloops miloops Simplify remove_from_config. 5098302
@miloops miloops identity_map name is used for configuration, use IdentityMap to acces…
…s it.
72aa6f4
@miloops miloops "there is no need to store this option just for initialization" José …
…Valim dixit.
a12bb71
@miloops miloops IM enable should be kept in current thread. d9c0340
@miloops miloops No need to specify clear is a method from IM when we are inside IM. b2b5d02
@miloops miloops Clean IdentityMap before running each benchmark. 2ba06b4
@miloops miloops Merge remote branch 'rails/master' into identity_map
Conflicts:
	activerecord/lib/active_record/associations/association_proxy.rb
	activerecord/lib/active_record/autosave_association.rb
	activerecord/lib/active_record/base.rb
	activerecord/lib/active_record/persistence.rb
02fc6fb
@miloops miloops Should save without validation if autosave is enabled. 348c0ec

Jose,

How does this mesh with Rack::FiberPool and EventMachine-based DB adapters that run every request in its own fiber?

Thanks for your work on this,

-Alex

miloops added some commits Feb 15, 2011

@miloops miloops Merge remote branch 'rails/master' into identity_map
Conflicts:
	activerecord/examples/performance.rb
	activerecord/lib/active_record/association_preload.rb
	activerecord/lib/active_record/associations.rb
	activerecord/lib/active_record/associations/association_proxy.rb
	activerecord/lib/active_record/autosave_association.rb
	activerecord/lib/active_record/base.rb
	activerecord/lib/active_record/nested_attributes.rb
	activerecord/test/cases/relations_test.rb
8ee0b44
@miloops miloops Remove identity map from benchmark script. ca75091
@miloops miloops Run tests without IdentityMap when IM=false is given. f1778eb
@miloops miloops Don't shadow outer local variable. c13b7c4
@miloops miloops Fix expected queries in relation tests. 90a850a
@miloops miloops Merge remote branch 'rails/master' into identity_map
Conflicts:
	activerecord/lib/active_record/associations/association.rb
	activerecord/lib/active_record/fixtures.rb
0b702ba
@miloops miloops No need to test against target anymore. 15a03ca
@miloops miloops Use to_a instead :load in test, since :load changed. b8c2feb
@miloops miloops Reindent and remove wrong line left in merge by mistake. 3560869
@miloops miloops No need to test agaisnt target. d21a454
@miloops miloops Should use "=" instead "replace" after this commit: 1644663 eb23b22
@miloops miloops Initialize @target instead asking if it is defined. 49f3525
@miloops miloops No need to have reinit_with inside an InstanceMethods module. 8052623
@miloops miloops WeakHash is not used, remove it. d9eb007
@miloops miloops Remove nbproject form gitignore. This shouldn't be here in the first …
…place.
3e5efb3
@miloops miloops Don't use skip, just don't run anything, we don't have skip in Ruby 1.8 3927827
@miloops miloops Merge remote branch 'rails/master' into identity_map 00418ac

hamin commented Apr 14, 2011

AWESOME!

very cool! i am waiting for release in stable

jweiss commented Apr 26, 2011

Why is this tied to ActiveRecord and not an ActiveModel functionality?
I wanted to add support for SimplyStored (CouchDB wrapper) but it seems wrong to require ActiveRecord...

Contributor

josevalim commented Apr 26, 2011

I believe the part of IdentityMap that is agnostic is actually quite small. Most of concerns are actually in cleaning up the identity map and identifying all the situations that require so. If you think there is a significant part of the identity map that could be moved to ActiveModel, please do provide a patch!

jweiss commented Apr 26, 2011

I'm taking about https://github.com/rails/rails/blob/master/activerecord/lib/active_record/identity_map.rb

This looks totaly generic to me and could be copied for my SimplyStored IdentityMap. I'll see that I extract it.

@j-manu j-manu pushed a commit to j-manu/rails that referenced this pull request Jan 18, 2012

@vijaydev vijaydev Merge pull request #76 from makoto/master
UrlEncodedPairParser is deprecated, but still used as an example
81ce77d

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment