-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major Performance Improvements #24
Conversation
Conflicts: Source/MMRecord/MMRecordResponse.m Conflicts: Source/MMRecord/MMRecordResponse.m
Adding fetch batch size to fetch request Conflicts: Source/MMRecord/MMRecordResponse.m
…100000 down to 25 seconds.
Thanks Conrad, I was looking to do the same on Monday - glad you got to one of the labs for it too! I'll be taking a closer look over the weekend but...could relationshipKeyPathsForPrefetching be used in order to improve performance on the relationship record lookups? (Just specifying the key) |
if (relationshipName != nil) { | ||
if ([self.relationshipDescriptionsDictionary objectForKey:[relationshipDescription name]]) { | ||
[(NSMutableArray *)self.relationshipProtosDictionary setValue:relationshipProto forKey:relationshipName]; | ||
[(NSMutableArray *)self.relationshipDescriptionsDictionary setValue:relationshipProto forKey:relationshipName]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these casts be NSMutableDictionary
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the casts since they're not required.
One potentially significant performance improvement that I see in looking through the code is that you can make use of In a hot loop where you are doing a bunch of lookups where the only difference is in the key value (such as when you do a bunch of primary key lookups) the performance gains can be significant. |
If you want to get really serious about performance when syncing with Core Data in a update-or-create situation then you may want to consider eliminating the looped fetch requests entirely. In my exploration of performance in RestKit, I found that the biggest performance bottleneck was in executing the fetch requests. The single biggest thing I have done for RestKit's Core Data performance was introducing an in-memory based cache for objects by identification attributes (equivalent to your primary key, but I support compound keys). The basic idea is that you load up all existing objects for a given entity using a dictionary result fetch request and build a map from all identifying attributes to the target managed object ID's that have that value. It's very efficient for Core Data because you never actually fetch the objects -- you only get the values as a dictionary. As new objects are created/updated by the sync process you refresh the cache. At the end of the day this means that you can answer the fundamental question of "should I update an existing object with this data or should I create a new object" without executing any fetch requests. This is huge for performance. I'd recommend that you take a look at my RKBenchmark project to see the effect of these optimizations and then look at the |
BTW if any of the code from RestKit proves valuable for you in MMRecord I'd be more than happy to spin it out as a new independent library and add you as a contributor to the repository. I fully support your efforts with MMRecord and would love to collaborate where it makes sense to do so. |
This branch doesn't work for me - it returns managed objects but they're never actually saved to the PS. The current master works fine. |
I need to sit down and take another look at this, both continuing to evaluate from a performance standpoint, that issue, and blake's comments. I just haven't had time since getting back from dub dub. I'll try and get to it soon. Thanks guys. |
@blakewatters I really appreciate the thoughtful and very helpful feedback here. I can tell you've put a lot of thought into this sort of problem and I appreciate your willingness to lend your advice. I think I'm going to keep it as a separate action item to update MMRecord's fetch handling in the way you're describing in a separate pull request. I should probably have been a bit clearer about what this pull request was actually accomplishing. The bottle neck this was trying to address was actually a bug on the part of MMRecord. It was iterating through it's temporary data set (something similar to your in memory cache) at exponential complexity, which was slowing down large data set imports. This pull request fixes that issue, so I'd like to go ahead and get that in. That being said, I definitely would like to eek as much performance out of the Core Data side of things as possible. I think that the solution implemented now goes part of the way towards the solution you've got. Fortunately, we're at least only making one fetch request per type of entity. If there are 1000 User objects there is only one fetch that happens for all of those. That actually ends up being a pretty significant performance boost as I am sure you are aware. And setting the batch size does help a lot for SQLite stores, but I agree that simply fetching the primary keys would be more efficient for kicking this process off. I'll be back with a new PR for that later when I have some more time to go heads down on it. Thanks! |
Looks like all is well here now judging from @urbanappetite's posts on the other thread. Planning on merging this in tomorrow. @kcharwood @larsacus Let me know if you want to take a look at this. |
@@ -454,6 +450,7 @@ - (NSArray*)fetchRecordsWithPrimaryKeys:(NSArray *)primaryKeys forEntity:(NSEnti | |||
return nil; | |||
|
|||
NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] initWithEntityName:[entity name]]; | |||
[fetchRequest setFetchBatchSize:20]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this magic batch size of 20 come from? Is this the optimal fetch size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I grabbed it out of thin air. It only applies if there is a SQLite persistence store and is ignored otherwise. I figured 20 is roughly the size of most popular paginated APIs responses, or that typical response sizes would be in the 100 range if not paginated. For those it should fit nicely. And when I tested on a response of 100,000 records it seemed to still be a performance improvement over zero batch size being set, so I think it should be fine for now.
|
||
- (void)addProtoRecordToDictionary:(MMRecordProtoRecord *)protoRecord { | ||
if (protoRecord.primaryKeyValue) { | ||
[self.prototypeDictionary setObject:protoRecord forKey:protoRecord.primaryKeyValue]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also use self.prototypeDictionary[protoRecord.primaryKeyValue] = protoRecord;
Really not important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
God, I still forget to use those. Sure, why not.
So I had an itch to do some performance optimization at WWDC after watching the Core Data Performance session and here's the result. On my sample test App.net dataset I'm seeing performance improvements from the following sample sizes.
1000 records: 300ms
10000 records: 3s
100000 records: 30s
Those are fairly complex records which have multiple relationships and populate multiple attributes per relationship entity. It's a significant performance improvement just with a few basic algorithmic changes.
Part of this pulls in Luke's (cc @urbanappetite) changes from the other pull request. That pull request is now closed. Part of it is changes I made at WWDC.
The changes represent mainly 3 issues:
My apologies about the prior pull request that was fairly messy. I also added a new Performance tuning project to the Examples directory and the MMRecord workspace. That project has now been committed to Master and is no longer part of this pull request.
As an aside, I have a lot of thoughts on some more advancements we can make here to support huge batch imports that I thought of during some of the WWDC talks and labs. More to come on that later.
cc @blakewatters
Thanks,
Conrad