Import Performance Issues #450

Open
seepel opened this Issue Apr 8, 2013 · 9 comments

Comments

Projects
None yet
5 participants

seepel commented Apr 8, 2013

When importing data with relationships performance is poor because each relationship executes a separate fetch request. I'm going to be working on this for a project of mine so I would like to raise the issue before I begin work so that I can contribute my changes back to the project in the best way possible. If there are any comments on this please let me know.

On a related note, there is a similar issue in MR_importFromArray:inContext

Contributor

ryanjm commented Aug 29, 2013

Any updates on this?

Contributor

tonyarnold commented Aug 30, 2013

Hi @seepel, go right ahead. The basic rules of contributing to any GitHub project apply:

  1. Work on and submit pull requests against the develop branch, not master;
  2. Please use the same code formatting style that is used throughout the rest of the project — it might not be what you prefer to use, but it’s consistent throughout existing code.

Beyond that, ping @magicalpanda/team-magicalrecord via this thread if you have anything you’d like to discuss in detail.

seepel commented Aug 30, 2013

My performance improvements ended up being pretty specific to my application, and this issue was pretty silent so I ended up letting it sit idle. Some of what I did could probably slot into develop in some fashion without too much trouble, while at the same time being enough to be useful. I'll outline my setup first and maybe we can go from there.

After doing some tests I found that I got a significant performance boost by using the thread confinement model vs. nested contexts. (i.e. merging a context was faster than propagating a save up a context level). So I actually checked out MagicalRecord 1.8 and went from there.

Step zero was fetching all objects that might be needed in MR_importFromArray rather than one by one.

Basically the first step after that was to reduce the number of fetches required on updates by utilizing the relationships that already exist. So if a nested object for an import already exists as a relationship, rather than fetching the object, just use the object that is already there. Of course for to many relationships this necessitates specifying what relationships should be pre fetched as Core Data will lazy fetch these. So even if they are already in the cache it will likely require a round trip to the DB. For many to many relationships this is especially important as you need to specify fetching the inverse relationship as well.

Of course that doesn't help on inserts because the object one is relating may already exist, but in the import method you don't know about it, so you have to fall back to fetching every single nested object again. To solve this I switched from a fetch or create model to a fetch and insert, then merge and delete later. So when the time comes to import a nested object that may or may not exist I just create a new one. Then before the save, execute a fetch request on all inserted objects "primary key" (i.e. relatedByAttribute) and de-dupe as needed. That way there are just a few large fetches that are necessary. This part is probably a little too specific to my application. I basically lock out any contexts from saving until this de-duping process is finished. The gotchya is that one can't lock out the default context because it will dead lock when the merge happens. So creating new objects in the default context is dangerous. In a nested context world I suppose I could imagine having a choke point on the background context that talks directly to the persistent store to alleviate this burden. This also pretty much requires all your entities to have a relatedByAttribute specification for the de-duping.

In the near future I'll look at my modifications to see what I can clean up and push back up stream. Hopefully my ramblings here were clear enough :). If there are any pieces you are particularly interested in (or particularly not interested in) just let me know. Failing that I'll try to get my commits organized from least to most risky so any unwanted changes can be yanked out without too much trouble.

Contributor

tonyarnold commented Aug 30, 2013

Wow, thanks! That’s a detailed, well analysed approach!

Thread confinement is definitely faster than parent/child MOCs — Apple confirmed this at WWDC in June (watch the Core Data performance session for more details). We’ll be moving back to using thread confinement in a future release of MagicalRecord.

As for your fetch/merge/delete process, I agree that it sounds a bit specific in it’s current state. I’ll leave this issue open for now as a reference while we’re discussing future plans — I really, really appreciate your time in writing up your findings: thank you 😄 👍

seepel commented Aug 30, 2013

Oh I forgot to mention, I also explored the possibility of flattening the objects to be imported, batch importing by entities, and then looping through the original graph again to fix up relationships after the fact. It turned out that the extra graph traversals and extra allocations didn't really allow for much improvement. But thinking back on it I wonder if there might be a smarter way to do it than my first attempt.

There were also a few quick and easy wins, one example being not modifying attributes and relationships that didn't need to be changed. On updates this can remove a good number of SQLite statements as by simply touching a property forces core data to update the version number of the object. You end up seeing a bunch of "UPDATE ZENTITY SET Z_OPT = ?, Z_ENT = ?;" statements otherwise.

I can confirm as seepel indicated in #588 that you receive at least a 3x speed boost for large imports when you don't call MR_swapMethodsFromClass.

tonyarnold was assigned Dec 12, 2013

Contributor

tonyarnold commented Dec 29, 2013

Shuffling this off to MagicalRecord 3.x. I want to wrap up development on MagicalRecord 2.x, so I'm trying to focus on bug fixes rather than new features right now.

Also, if anyone has any ideas about moving away from using MR_swapMethodsFromClass I'm open to suggestions.

seepel commented Jan 14, 2014

@tonyarnold My suggestion would be to simply do it once at setup with an optional compiler flag to turn it off. If you are importing in a multi threaded environment (which I assume most people are...?) it would be impossible to know which method you're calling anyway so switching back and forth seems like it won't help much. Unless I'm missing something.

Hey, I come here from this article.
With comparison of performance @seepel and @magicalpanda implementation.
Just leave it here, since it related with this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment