Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major Performance Improvements #24

Merged
merged 15 commits into from
Jul 2, 2013
Merged

Major Performance Improvements #24

merged 15 commits into from
Jul 2, 2013

Conversation

cnstoll
Copy link
Contributor

@cnstoll cnstoll commented Jun 15, 2013

So I had an itch to do some performance optimization at WWDC after watching the Core Data Performance session and here's the result. On my sample test App.net dataset I'm seeing performance improvements from the following sample sizes.

1000 records: 300ms
10000 records: 3s
100000 records: 30s

Those are fairly complex records which have multiple relationships and populate multiple attributes per relationship entity. It's a significant performance improvement just with a few basic algorithmic changes.

Part of this pulls in Luke's (cc @urbanappetite) changes from the other pull request. That pull request is now closed. Part of it is changes I made at WWDC.

The changes represent mainly 3 issues:

  1. Duplication of proto records in the primary proto records array (the list of objects that get populated).
  2. Duplication of relationship protos, resulting in excessive computation
  3. Not setting fetchBatchSize and losing the opportunity for some free performance gains when using SQLite stores.

My apologies about the prior pull request that was fairly messy. I also added a new Performance tuning project to the Examples directory and the MMRecord workspace. That project has now been committed to Master and is no longer part of this pull request.

As an aside, I have a lot of thoughts on some more advancements we can make here to support huge batch imports that I thought of during some of the WWDC talks and labs. More to come on that later.

cc @blakewatters

Thanks,

Conrad

@ghost ghost assigned cnstoll Jun 15, 2013
@cnstoll cnstoll mentioned this pull request Jun 15, 2013
@lprhodes
Copy link

Thanks Conrad,

I was looking to do the same on Monday - glad you got to one of the labs for it too!

I'll be taking a closer look over the weekend but...could relationshipKeyPathsForPrefetching be used in order to improve performance on the relationship record lookups? (Just specifying the key)

if (relationshipName != nil) {
if ([self.relationshipDescriptionsDictionary objectForKey:[relationshipDescription name]]) {
[(NSMutableArray *)self.relationshipProtosDictionary setValue:relationshipProto forKey:relationshipName];
[(NSMutableArray *)self.relationshipDescriptionsDictionary setValue:relationshipProto forKey:relationshipName];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these casts be NSMutableDictionary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the casts since they're not required.

@blakewatters
Copy link

One potentially significant performance improvement that I see in looking through the code is that you can make use of NSPredicate instances built using substitution values. Basically an NSPredicate built from a string is parsed into a graph of NSExpression objects and this parsing process is not the cheapest thing in the world. When you use substitution values you reuse the existing parsing results and sub in the dynamic values from a dictionary.

In a hot loop where you are doing a bunch of lookups where the only difference is in the key value (such as when you do a bunch of primary key lookups) the performance gains can be significant.

@blakewatters
Copy link

If you want to get really serious about performance when syncing with Core Data in a update-or-create situation then you may want to consider eliminating the looped fetch requests entirely. In my exploration of performance in RestKit, I found that the biggest performance bottleneck was in executing the fetch requests. The single biggest thing I have done for RestKit's Core Data performance was introducing an in-memory based cache for objects by identification attributes (equivalent to your primary key, but I support compound keys). The basic idea is that you load up all existing objects for a given entity using a dictionary result fetch request and build a map from all identifying attributes to the target managed object ID's that have that value. It's very efficient for Core Data because you never actually fetch the objects -- you only get the values as a dictionary. As new objects are created/updated by the sync process you refresh the cache. At the end of the day this means that you can answer the fundamental question of "should I update an existing object with this data or should I create a new object" without executing any fetch requests. This is huge for performance.

I'd recommend that you take a look at my RKBenchmark project to see the effect of these optimizations and then look at the RKInMemoryManagedObjectCache code to see what I am doing. It accelerates Core Data sync operations by orders of magnitude -- you'll be excited about how much faster you can make MMRecord if you leverage similar techniques (or hell you can even just straight up snap in my cache code to take it for a spin, there's no dependencies outside of Core Data for this stuff).

@blakewatters
Copy link

BTW if any of the code from RestKit proves valuable for you in MMRecord I'd be more than happy to spin it out as a new independent library and add you as a contributor to the repository. I fully support your efforts with MMRecord and would love to collaborate where it makes sense to do so.

@lprhodes
Copy link

This branch doesn't work for me - it returns managed objects but they're never actually saved to the PS. The current master works fine.

@cnstoll
Copy link
Contributor Author

cnstoll commented Jun 23, 2013

I need to sit down and take another look at this, both continuing to evaluate from a performance standpoint, that issue, and blake's comments. I just haven't had time since getting back from dub dub. I'll try and get to it soon.

Thanks guys.

@cnstoll
Copy link
Contributor Author

cnstoll commented Jun 30, 2013

@blakewatters I really appreciate the thoughtful and very helpful feedback here. I can tell you've put a lot of thought into this sort of problem and I appreciate your willingness to lend your advice.

I think I'm going to keep it as a separate action item to update MMRecord's fetch handling in the way you're describing in a separate pull request. I should probably have been a bit clearer about what this pull request was actually accomplishing. The bottle neck this was trying to address was actually a bug on the part of MMRecord. It was iterating through it's temporary data set (something similar to your in memory cache) at exponential complexity, which was slowing down large data set imports. This pull request fixes that issue, so I'd like to go ahead and get that in.

That being said, I definitely would like to eek as much performance out of the Core Data side of things as possible. I think that the solution implemented now goes part of the way towards the solution you've got. Fortunately, we're at least only making one fetch request per type of entity. If there are 1000 User objects there is only one fetch that happens for all of those. That actually ends up being a pretty significant performance boost as I am sure you are aware. And setting the batch size does help a lot for SQLite stores, but I agree that simply fetching the primary keys would be more efficient for kicking this process off.

I'll be back with a new PR for that later when I have some more time to go heads down on it. Thanks!

@cnstoll
Copy link
Contributor Author

cnstoll commented Jul 2, 2013

Looks like all is well here now judging from @urbanappetite's posts on the other thread. Planning on merging this in tomorrow. @kcharwood @larsacus Let me know if you want to take a look at this.

@@ -454,6 +450,7 @@ - (NSArray*)fetchRecordsWithPrimaryKeys:(NSArray *)primaryKeys forEntity:(NSEnti
return nil;

NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] initWithEntityName:[entity name]];
[fetchRequest setFetchBatchSize:20];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this magic batch size of 20 come from? Is this the optimal fetch size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I grabbed it out of thin air. It only applies if there is a SQLite persistence store and is ignored otherwise. I figured 20 is roughly the size of most popular paginated APIs responses, or that typical response sizes would be in the 100 range if not paginated. For those it should fit nicely. And when I tested on a response of 100,000 records it seemed to still be a performance improvement over zero batch size being set, so I think it should be fine for now.


- (void)addProtoRecordToDictionary:(MMRecordProtoRecord *)protoRecord {
if (protoRecord.primaryKeyValue) {
[self.prototypeDictionary setObject:protoRecord forKey:protoRecord.primaryKeyValue];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also use self.prototypeDictionary[protoRecord.primaryKeyValue] = protoRecord;

Really not important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God, I still forget to use those. Sure, why not.

cnstoll added a commit that referenced this pull request Jul 2, 2013
@cnstoll cnstoll merged commit 25b7749 into master Jul 2, 2013
@cnstoll cnstoll deleted the cnstoll_performance branch July 2, 2013 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants