Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identity map master #6524

Closed
wants to merge 28 commits into from
Closed

Identity map master #6524

wants to merge 28 commits into from

Conversation

sskirby
Copy link
Contributor

@sskirby sskirby commented May 28, 2012

This is for #5261.

This pull request fixes the issue outlined in 302c912.

It takes the approach that bi-directional associations need to be kept in sync by the developer, not the framework. Also, if the database is modified via SQL or a method that is a fancy wrapper for SQL (like increment_counter), it is the developer's responsibility to re-synchronize the models. I believe that this is an acceptable approach and it is one that is commonly implemented by other ORMs such as Hibernate.

@sskirby
Copy link
Contributor Author

sskirby commented May 28, 2012

Also for #5442

@arturopie
Copy link
Contributor

Awesome!

IdentityMap.get(klass, owner[reflection.foreign_key])
else
nil
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for else block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather keep this in since it is more explicit. We actually do mean to return nil in this case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we always mean to return nil when nothing else is returned :). It's fine in any case.

@carlosantoniodasilva
Copy link
Member

Thanks for your work on that, I'll try to bring people to discuss about your idea / approach.

/cc @josevalim @jeremy

@josevalim
Copy link
Contributor

@sskirby I am still not sure this is the proper solution. The behavior caused by this bug can be really surprising and hard to track. Telling people that they have to manually track their associations does not seem like a comfortable solution to me. I believe DataMapper get it right.

@sskirby
Copy link
Contributor Author

sskirby commented May 29, 2012

@josevalim Are you referring to Datamapper 1 or 2? Because my co-worker has written some test code and Datamapper 1 forces the application developer to manage both sides of bi-directional associations. For example:

DataMapper.repository do
     father = Parent.create :name => "Sam"

     jim = Child.new :name => "Jim"
     lisa = Child.new :name => "Lisa"

     father.children << jim
     father.children << lisa

     father.save

     jim.parent = nil
     jim.save

     assert_equal 2, father.children.size

     father.reload
     assert_equal 1, father.children.size
   end

Also, which bug are you referring to? The one outlined in 302c912 is fixed.

I would also mention that if you don't understand how the IdentityMap works, yes you will run into hard to find bugs. But the same applies when you have no IdentityMap. The first time I used Rails, I was expecting an update to a parent object to be seen by all of it's children, but they weren't because the children had references to a different instance of the parent.

I think you get strange behaviour either way you go, so why not let the developer choose what kind of strange behaviour they want?

At one time it seems like the vision for the IdentityMap was for it to be simply dropped into an existing application and have everything work the same, but be faster. I do not think that this vision will ever come to pass. I think that you either write your application with the IM or without it.

Trying to make the semantics of the application the same with or without the IdentityMap makes the IM implementation much more complicated and much more likely to introduce 'strangeness'. Not to mention harder to maintain. Keeping the IM simple with a few simple rules for the application developer to follow seems like a reasonable approach as evidenced by the fact that other major ORMs have gone this route.

@arturopie
Copy link
Contributor

I think @sskirby has a good point. I'm sure many developers (including myself) are willing to follow a few simple rules to improve the performance of our rails applications. Many people are complaining because rails is getting slower on each new version. IM has the potential to change that.

@josevalim
Copy link
Contributor

Many people are complaining because rails is getting slower on each new version.

Proof? In my experience, AR got slower when 3.0 came out, 3.1 and 3.2 got faster (not only AR, but overall).

I am ok with supporting Identity Map for those developers who want to, but it will get in for the right reasons. Not because some people are saying Rails is getting slower.

@sskirby
Copy link
Contributor Author

sskirby commented May 29, 2012

@josevalim What are the right reasons? It seems as if you disagree with the approach that I am taking with the IdentityMap in general. What can I do to convince you that this approach is a very valid one shared by all the major ORMs that I have looked at?

I've got several applications that are using the IdentityMap. Some are seeing almost an order of magnitude improvement in performance on some pages (I am not joking.) Would some benchmarks be enough reason?

@josevalim
Copy link
Contributor

@sskirby I don't disagree with your approach, I am actually ok (you convinced me). I am just waiting for more feedback from other core members to merge this back.

I've got several applications that are using the IdentityMap. Some are seeing almost an order of magnitude improvement in performance on some pages (I am not joking.)

Yes. If you built everything relying on Identity Map, disabling it will surely be expensive.

EDIT: Thanks for the pull request and the discussion in general.

@arturopie
Copy link
Contributor

Proof? In my experience, AR got slower when 3.0 came out, 3.1 and 3.2 got faster (not only AR, but overall).

We went from 2.* to 3.1 in one of our application, and the application experimented a big drop in perfromance. I did some benchmark for 3.2 and didn't see a great improvement on production mode, the improvement was only on dev mode.

@josevalim
Copy link
Contributor

We went from 2.* to 3.1 in one of our application, and the application
experimented a big drop in perfromance. I did some benchmark for 3.2 and
didn't see a great improvement on production mode, the improvement was only
on dev mode.

3.2 dev is totally faster, production is faster than 3.1, but is not a
great improvement as you put. Point is: it is not getting slower after the
first 3.0 release as you made it sound.

Sent from my iPhone

@arturopie
Copy link
Contributor

I apologize, I meant major versions. I just don't want Rails to become slower than it is now.

@josevalim
Copy link
Contributor

@arturopie so benchmark, profile and provide data on hotspots. The main point of an Identity Map is to provide object consistency, most of the performance benefits of it is already available via the query cache.

@arturopie
Copy link
Contributor

@arturopie so benchmark, profile and provide data on hotspots. The main point of an Identity Map is to provide object consistency, most of the performance benefits of it is already available via the query cache.

I don't have access to that data right now, but I can give you some conclusions from the benchmark.

After upgrading to rails 3.1, we realized that our app was spending too much time on the Ruby Garbage Collector. We had already optimized our GC settings, so this was bad news. After further investigation, we found that rails was creating a lot more strings objects, that were triggering the GC more often.

Our slowest requests were the ones that loaded the same records multiple times. They were using the query cache but that wasn't good enough because at that point, AR already created the SQL query, which apparently, requires to create a lot of strings. So, if we could load the object before creating the SQL query, we could get some improvements. There is when we try IM and the performance improved a lot for those slow requests. Indeed, the number of strings created were much lower with IM enabled.

Another way to easily verify this was by looking at AR logs. With IM off, there were a lot of outputs from the query cache in the console, but after printing several lines, I could see there were small pauses. This pauses were occuring because the GC was working at that moment. With IM on, there were no more query cache logs, and a lot of outputs from IM; however, the pauses disappeared or were only a few.

To conclude, if we can use a cache earlier in the execution (IM), we will get a better performance benefit than using it much later (query cache).

@josevalim
Copy link
Contributor

Another way to easily verify this was by looking at AR logs. With IM off,
there were a lot of outputs from the query cache in the console, but after
printing several lines, I could see there were small pauses. This pauses
were occurring because the GC was working at that moment. With IM on, there
were no more query cache logs, and a lot of outputs from IM; however, the
pauses disappeared or were only a few.

This is weird. The IM can only avoid queries if you are querying by id
(correct me if I am wrong). If IM is giving you a huge performance benefit,
it is usually because you have a N+1 issue somewhere. IM allows you to not
care about these in some cases, but if you have IM disabled, you need to
look after it.

@sskirby
Copy link
Contributor Author

sskirby commented Jun 4, 2012

The nice thing about the IM is that it steps in even if you are querying by something other than id. It won't prevent the query to the DB, but it will prevent the instantiation of a new in-memory AR instance. And this is where we see significant savings.

@sskirby
Copy link
Contributor Author

sskirby commented Jun 20, 2012

Anyone have any opinions on this? The longer we wait, the harder the merge back into master will be :(

@tenderlove any thoughts?

@josevalim
Copy link
Contributor

I am 👍 to bring it back. @tenderlove, @jonleighton ?

Sent from my iPhone

@sskirby
Copy link
Contributor Author

sskirby commented Jun 25, 2012

  1. Rebased on master and 2) Made the IM behaviour a little more sane by only flushing the IM at obvious places.

Previously, entities in the IM would get evicted when a transaction failed, saving failed, counters were updated, and when associations were reloaded. Doing this makes the IM harder to understand because you can't rely on an entity you know you've loaded being in the IM. This leads to subtle bugs because the expectation is to only have one loaded copy of an entity in memory at a time.

I think this makes reasoning about applications with the IM enabled easier. Please feel free to comment!

@guilleiguaran
Copy link
Member

I'm 👍 on this

@sskirby
Copy link
Contributor Author

sskirby commented Jul 10, 2012

@josevalim have we come any closer to a decision on this? I'm more than happy to rebase onto master to make the merge easy.

@sskirby
Copy link
Contributor Author

sskirby commented Jul 25, 2012

@josevalim @tenderlove @jonleighton Jose removed the Identity Map in #5261, does he not have the authority to put it back? It would be great to get some closure on this issue before the release of Rails 4. Cheers!

@josevalim
Copy link
Contributor

@sskirby I have raised this discussion a couple times with Rails Core and we were not able to reach any consensus. It seems though the majority's opinion is to not add it back. The costs of maintaining are too high compared to its usage and benefit.

PS: don't shoot the messenger.

@sskirby
Copy link
Contributor Author

sskirby commented Jul 25, 2012

@josevalim thank you for getting back to me, I do appreciate it even if the news isn't good! Is there a mailing list where I can join the conversation?

sskirby and others added 21 commits August 9, 2012 19:02
This comes at the expense of having to do a SQL query even when the
entity being looked up is in the IM. This shouldn't be a problem since
we can rely on the query cache.
This means that a SQL query will still be issued for entities that are
contained in the IM, but the query cache should minimize any performance
impact.
The bulk update methods by-pass the domain models and go straight to SQL
and therefore it is up to the application developer to keep these
changes in sync.
@rafaelfranca
Copy link
Member

Just to let you know, we decided to keep Identity Map removed from the Rails. As @jonleighton said, if anyone want to implement it as a plugin we will be happy to accept pull request refactoring the Rails internals to make it possible.

Thanks.

@sskirby
Copy link
Contributor Author

sskirby commented Aug 29, 2012

@rafaelfranca thanks for your attention to this issue. Is there a thread of discussion somewhere that I can chime in on? I have been working on making the IdentityMap work with Rails with regards to the valid objections that have been raised in this thread and I believe there is a way to make everyone happy.

@sskirby
Copy link
Contributor Author

sskirby commented Aug 29, 2012

@rafaelfranca one more thing, most of the changes in this pull request are to tests to make them work both with and without the IM (mostly using #reload instead of .find). If I broke those out into another pull request, would you accept those changes?

I will work on extracting the IM into a gem.

@steveklabnik
Copy link
Member

most of the changes in this pull request are to tests to make them work both with and without the IM If I broke those out into another pull request, would you accept those changes?

I'd imagine that'd fall under 'to make it possible' :)

@rafaelfranca
Copy link
Member

@sskirby unfortunately we discussed in a private chat.

Sure, please send a new pull request with these changes.

@the8472
Copy link

the8472 commented Dec 7, 2012

How about marking all objects loaded with the IdentityMap active as read-only? Then it would still be useful for localized performance optimizations on actions that only display data without modifying it until its consistency problems are solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet