Skip to content
This repository has been archived by the owner on Feb 7, 2019. It is now read-only.

Performance improvements #44

Merged
merged 3 commits into from
Jan 28, 2015
Merged

Performance improvements #44

merged 3 commits into from
Jan 28, 2015

Conversation

raphaelm
Copy link
Contributor

As you might have noticed, cleanerversion creates a huge number of SQL queries when calling clone() on an item with many-to-many relationships. I'm working on improving this, and the three improvements in this pull request reduced the query load in my test environment by a factor of three or four. Here's what I did:

  • Added an in_bulk parameter to clone() which is set to true when called from clone_relations. The effect is that earlier_version.save() will not be called, but clone_relations will call bulk_create for them, as this issues only one SQL query instead of n. Also, the later_version objects, which are not modified by clone (when called on an unmodified object from the database, which is the case here), except for their version_start_date, are now changed in a single .update() call.
  • Removed a duplicate later_version.save() from clone() as I was unable to find any technical reason for it to be there.

@maennel
Copy link
Contributor

maennel commented Jan 14, 2015

Hi Raphael, sorry for being not so responsive on your Pull Requests.
First of all thanks a lot for contributing to CleanerVersion, I really appreciate a lot. I think the points that you have addressed are some of the biggest weaknesses of CleanerVersion.
However, I am a bit short on time these days and will, as soon as possible have a look at your code!
Thanks,
-Manuel

@raphaelm
Copy link
Contributor Author

Hey Manuel, don't worry :) As you may have guessed, I'm currently integrating cleanerversion into one of my projects and I while this probably won't be the last pull requests you see from me, I'm currently using my own, patched version, so it does not really matter to me when you merge the patches (although I'll be happy to read your comments on it).

# Perform the bulk changes rel.clone() did not perform because of the in_bulk parameter
# This saves a huge bunch of SQL queries
source.through.objects.filter(id__in=[l.id for l in later]).update(**{'version_start_date': forced_version_date})
source.through.objects.bulk_create([r for r in m2m_rels if hasattr(r, '_not_created') and r._not_created])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: could it be, that the '_not_created' property persists through some caching mechanism and had as an effect that the relation-clone got created a second time here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, that might be a complicated question, I'll take some time to investigate it tomorrow :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I haven't seen any side-effects in any tests until now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct my if I'm wrong here, but even if save() is being called twice, it should issue an UPDATE query instead of a second INSERT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I correct myself, I of course was wrong, as it is a bulk_create and no save, so this could really create duplicate entries. I'm not quite sure that this would be exposed by the current unit tests, we would have to try constructing a test case to provoke it and I'm not quite sure how to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'll write such a test, no problem...
However, I don't see any duplicate entries in your code - there are 3 insert or update operations:
1.) save() is called on those relations not having the _not_created property and being in the m2m_rels-list => these are typically the non-current entries still pointing the current model-object
2.) update() is called on objects inside the later-list. later only contains clones of current objects. But due to the properties of a CleanerVersion-current object (version_end_date is None, version_start_date indicates the beginning of the version's validity, version_birth_date indicates the birth date of the object and id (the unique ID)== identity (the object ID)), we only want to update the version_start_date on new 'current' objects.
3.) bulk_create() is called in order to create all the versions that were current prior to calling clone(), on line 1086. These versions have their version_end_date set (thanks to code in def clone(...)) and id has been assigned a new value, such that id != identity (lines 1028 & 1029).

In a nutshell, I don't think save() (or any other inserting command) is called twice on a same object, provoking duplicate entries.
However, better safe than sorry - I'll write the tests in order to be 100% sure. ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raphaelm Please see PR #49 for the tests and let me know what you think about it. Thanks :)

@maennel
Copy link
Contributor

maennel commented Jan 27, 2015

Hey Raphael, I think you did a great job on this - as you wrote, a massive performance gain for systems with large cardinality values.
Thanks a lot!

@maennel
Copy link
Contributor

maennel commented Jan 27, 2015

@raphaelm As you can see, I've added a test case for the code you wrote in PR #48. Let me know what you think about it... ;)
Thanks,
-Manuel

@raphaelm
Copy link
Contributor Author

Looks good (both the additional improvement and the test case)!

@maennel
Copy link
Contributor

maennel commented Jan 28, 2015

Ok, so I'll proceed to merge the code, and then we'll see for further steps like caching the '_not_created' property.

maennel added a commit that referenced this pull request Jan 28, 2015
Performance improvements
@maennel maennel merged commit 28ec832 into swisscom:master Jan 28, 2015
@raphaelm raphaelm deleted the perf branch January 30, 2015 16:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants