New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identity map master #6524
Identity map master #6524
Conversation
|
Also for #5442 |
|
Awesome! |
| IdentityMap.get(klass, owner[reflection.foreign_key]) | ||
| else | ||
| nil | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for else block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather keep this in since it is more explicit. We actually do mean to return nil in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we always mean to return nil when nothing else is returned :). It's fine in any case.
|
Thanks for your work on that, I'll try to bring people to discuss about your idea / approach. /cc @josevalim @jeremy |
|
@sskirby I am still not sure this is the proper solution. The behavior caused by this bug can be really surprising and hard to track. Telling people that they have to manually track their associations does not seem like a comfortable solution to me. I believe DataMapper get it right. |
|
@josevalim Are you referring to Datamapper 1 or 2? Because my co-worker has written some test code and Datamapper 1 forces the application developer to manage both sides of bi-directional associations. For example: DataMapper.repository do
father = Parent.create :name => "Sam"
jim = Child.new :name => "Jim"
lisa = Child.new :name => "Lisa"
father.children << jim
father.children << lisa
father.save
jim.parent = nil
jim.save
assert_equal 2, father.children.size
father.reload
assert_equal 1, father.children.size
endAlso, which bug are you referring to? The one outlined in 302c912 is fixed. I would also mention that if you don't understand how the IdentityMap works, yes you will run into hard to find bugs. But the same applies when you have no IdentityMap. The first time I used Rails, I was expecting an update to a parent object to be seen by all of it's children, but they weren't because the children had references to a different instance of the parent. I think you get strange behaviour either way you go, so why not let the developer choose what kind of strange behaviour they want? At one time it seems like the vision for the IdentityMap was for it to be simply dropped into an existing application and have everything work the same, but be faster. I do not think that this vision will ever come to pass. I think that you either write your application with the IM or without it. Trying to make the semantics of the application the same with or without the IdentityMap makes the IM implementation much more complicated and much more likely to introduce 'strangeness'. Not to mention harder to maintain. Keeping the IM simple with a few simple rules for the application developer to follow seems like a reasonable approach as evidenced by the fact that other major ORMs have gone this route. |
|
I think @sskirby has a good point. I'm sure many developers (including myself) are willing to follow a few simple rules to improve the performance of our rails applications. Many people are complaining because rails is getting slower on each new version. IM has the potential to change that. |
Proof? In my experience, AR got slower when 3.0 came out, 3.1 and 3.2 got faster (not only AR, but overall). I am ok with supporting Identity Map for those developers who want to, but it will get in for the right reasons. Not because some people are saying Rails is getting slower. |
|
@josevalim What are the right reasons? It seems as if you disagree with the approach that I am taking with the IdentityMap in general. What can I do to convince you that this approach is a very valid one shared by all the major ORMs that I have looked at? I've got several applications that are using the IdentityMap. Some are seeing almost an order of magnitude improvement in performance on some pages (I am not joking.) Would some benchmarks be enough reason? |
|
@sskirby I don't disagree with your approach, I am actually ok (you convinced me). I am just waiting for more feedback from other core members to merge this back.
Yes. If you built everything relying on Identity Map, disabling it will surely be expensive. EDIT: Thanks for the pull request and the discussion in general. |
We went from 2.* to 3.1 in one of our application, and the application experimented a big drop in perfromance. I did some benchmark for 3.2 and didn't see a great improvement on production mode, the improvement was only on dev mode. |
3.2 dev is totally faster, production is faster than 3.1, but is not a Sent from my iPhone |
|
I apologize, I meant major versions. I just don't want Rails to become slower than it is now. |
|
@arturopie so benchmark, profile and provide data on hotspots. The main point of an Identity Map is to provide object consistency, most of the performance benefits of it is already available via the query cache. |
I don't have access to that data right now, but I can give you some conclusions from the benchmark. After upgrading to rails 3.1, we realized that our app was spending too much time on the Ruby Garbage Collector. We had already optimized our GC settings, so this was bad news. After further investigation, we found that rails was creating a lot more strings objects, that were triggering the GC more often. Our slowest requests were the ones that loaded the same records multiple times. They were using the query cache but that wasn't good enough because at that point, AR already created the SQL query, which apparently, requires to create a lot of strings. So, if we could load the object before creating the SQL query, we could get some improvements. There is when we try IM and the performance improved a lot for those slow requests. Indeed, the number of strings created were much lower with IM enabled. Another way to easily verify this was by looking at AR logs. With IM off, there were a lot of outputs from the query cache in the console, but after printing several lines, I could see there were small pauses. This pauses were occuring because the GC was working at that moment. With IM on, there were no more query cache logs, and a lot of outputs from IM; however, the pauses disappeared or were only a few. To conclude, if we can use a cache earlier in the execution (IM), we will get a better performance benefit than using it much later (query cache). |
This is weird. The IM can only avoid queries if you are querying by id |
|
The nice thing about the IM is that it steps in even if you are querying by something other than id. It won't prevent the query to the DB, but it will prevent the instantiation of a new in-memory AR instance. And this is where we see significant savings. |
|
Anyone have any opinions on this? The longer we wait, the harder the merge back into master will be :( @tenderlove any thoughts? |
|
I am Sent from my iPhone |
Previously, entities in the IM would get evicted when a transaction failed, saving failed, counters were updated, and when associations were reloaded. Doing this makes the IM harder to understand because you can't rely on an entity you know you've loaded being in the IM. This leads to subtle bugs because the expectation is to only have one loaded copy of an entity in memory at a time. I think this makes reasoning about applications with the IM enabled easier. Please feel free to comment! |
|
I'm |
|
@josevalim have we come any closer to a decision on this? I'm more than happy to rebase onto master to make the merge easy. |
|
@josevalim @tenderlove @jonleighton Jose removed the Identity Map in #5261, does he not have the authority to put it back? It would be great to get some closure on this issue before the release of Rails 4. Cheers! |
|
@sskirby I have raised this discussion a couple times with Rails Core and we were not able to reach any consensus. It seems though the majority's opinion is to not add it back. The costs of maintaining are too high compared to its usage and benefit. PS: don't shoot the messenger. |
|
@josevalim thank you for getting back to me, I do appreciate it even if the news isn't good! Is there a mailing list where I can join the conversation? |
This comes at the expense of having to do a SQL query even when the entity being looked up is in the IM. This shouldn't be a problem since we can rely on the query cache.
This means that a SQL query will still be issued for entities that are contained in the IM, but the query cache should minimize any performance impact.
The bulk update methods by-pass the domain models and go straight to SQL and therefore it is up to the application developer to keep these changes in sync.
|
Just to let you know, we decided to keep Identity Map removed from the Rails. As @jonleighton said, if anyone want to implement it as a plugin we will be happy to accept pull request refactoring the Rails internals to make it possible. Thanks. |
|
@rafaelfranca thanks for your attention to this issue. Is there a thread of discussion somewhere that I can chime in on? I have been working on making the IdentityMap work with Rails with regards to the valid objections that have been raised in this thread and I believe there is a way to make everyone happy. |
|
@rafaelfranca one more thing, most of the changes in this pull request are to tests to make them work both with and without the IM (mostly using #reload instead of .find). If I broke those out into another pull request, would you accept those changes? I will work on extracting the IM into a gem. |
I'd imagine that'd fall under 'to make it possible' :) |
|
@sskirby unfortunately we discussed in a private chat. Sure, please send a new pull request with these changes. |
|
How about marking all objects loaded with the IdentityMap active as read-only? Then it would still be useful for localized performance optimizations on actions that only display data without modifying it until its consistency problems are solved. |
This is for #5261.
This pull request fixes the issue outlined in 302c912.
It takes the approach that bi-directional associations need to be kept in sync by the developer, not the framework. Also, if the database is modified via SQL or a method that is a fancy wrapper for SQL (like increment_counter), it is the developer's responsibility to re-synchronize the models. I believe that this is an acceptable approach and it is one that is commonly implemented by other ORMs such as Hibernate.