-
Notifications
You must be signed in to change notification settings - Fork 53
Conversation
have introduced a bug, so I reverted this.
Hi Raphael, sorry for being not so responsive on your Pull Requests. |
Hey Manuel, don't worry :) As you may have guessed, I'm currently integrating cleanerversion into one of my projects and I while this probably won't be the last pull requests you see from me, I'm currently using my own, patched version, so it does not really matter to me when you merge the patches (although I'll be happy to read your comments on it). |
# Perform the bulk changes rel.clone() did not perform because of the in_bulk parameter | ||
# This saves a huge bunch of SQL queries | ||
source.through.objects.filter(id__in=[l.id for l in later]).update(**{'version_start_date': forced_version_date}) | ||
source.through.objects.bulk_create([r for r in m2m_rels if hasattr(r, '_not_created') and r._not_created]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering: could it be, that the '_not_created' property persists through some caching mechanism and had as an effect that the relation-clone got created a second time here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, that might be a complicated question, I'll take some time to investigate it tomorrow :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I haven't seen any side-effects in any tests until now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct my if I'm wrong here, but even if save()
is being called twice, it should issue an UPDATE
query instead of a second INSERT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I correct myself, I of course was wrong, as it is a bulk_create
and no save
, so this could really create duplicate entries. I'm not quite sure that this would be exposed by the current unit tests, we would have to try constructing a test case to provoke it and I'm not quite sure how to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'll write such a test, no problem...
However, I don't see any duplicate entries in your code - there are 3 insert or update operations:
1.) save()
is called on those relations not having the _not_created
property and being in the m2m_rels
-list => these are typically the non-current entries still pointing the current model-object
2.) update()
is called on objects inside the later
-list. later
only contains clones of current objects. But due to the properties of a CleanerVersion-current object (version_end_date
is None
, version_start_date
indicates the beginning of the version's validity, version_birth_date
indicates the birth date of the object and id
(the unique ID)== identity
(the object ID)), we only want to update the version_start_date
on new 'current' objects.
3.) bulk_create()
is called in order to create all the versions that were current prior to calling clone()
, on line 1086. These versions have their version_end_date
set (thanks to code in def clone(...)
) and id
has been assigned a new value, such that id != identity
(lines 1028 & 1029).
In a nutshell, I don't think save()
(or any other inserting command) is called twice on a same object, provoking duplicate entries.
However, better safe than sorry - I'll write the tests in order to be 100% sure. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Raphael, I think you did a great job on this - as you wrote, a massive performance gain for systems with large cardinality values. |
Looks good (both the additional improvement and the test case)! |
Ok, so I'll proceed to merge the code, and then we'll see for further steps like caching the '_not_created' property. |
As you might have noticed, cleanerversion creates a huge number of SQL queries when calling
clone()
on an item with many-to-many relationships. I'm working on improving this, and the three improvements in this pull request reduced the query load in my test environment by a factor of three or four. Here's what I did:in_bulk
parameter toclone()
which is set to true when called fromclone_relations
. The effect is thatearlier_version.save()
will not be called, butclone_relations
will callbulk_create
for them, as this issues only one SQL query instead of n. Also, thelater_version
objects, which are not modified byclone
(when called on an unmodified object from the database, which is the case here), except for theirversion_start_date
, are now changed in a single.update()
call.later_version.save()
fromclone()
as I was unable to find any technical reason for it to be there.