You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is currently disallowed to bulk_create() multi-table inherited Django models, due to the technical challenges with doing so, however it is not necessarily impossible in theory.
As a Pulp developer, if we are able to make assumptions about the way it will be used, then we can avoid most of the problems that prevent a generic implementation. For instance, only one level of inheritance.
I developed a proof of concept strategy that is unfortunately made more difficult by Django's proxy model behavior and the fact that there's no class which represents just the subclass table. So, this code emulates how multi-table inherited models set up their internal relationships, but doesn't actually use model inheritance.
Strategy:
Transactions don't check the integrity of foreign keys until they are committed
So save our child table first with a random uuid as the content_ptr, in bulk
Then go back and save the parent model, in bulk
classPulpBase(models.Model):
pulp_id=models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
pulp_created=models.DateTimeField(auto_now_add=True)
pulp_type=models.TextField(null=False, default=None)
classMeta:
abstract=TrueclassNewContent(PulpBase):
pulp_type=models.TextField(null=False, default=None)
classNewPulpManager(models.Manager):
# Ignore the hacky workarounds for stuff like pulp_type, I was trying to keep things simpledefbulk_get_or_create(self, objs, batch_size=None):
withtransaction.atomic():
pulp_type=self.model.get_pulp_type()
q=models.Q(pk__in=[])
unsaved_idxs_by_nat_key=defaultdict(list)
foridx, objinenumerate(objs):
content_already_saved=notobj._state.addingifnotcontent_already_saved:
unsaved_idxs_by_nat_key[obj.natural_key()].append(idx)
q |= models.Q(**obj.natural_key_dict())
existing_objs=self.model.objects.filter(q)
forexisting_objinexisting_objs.iterator(chunk_size=batch_sizeor2000):
foridxinunsaved_idxs_by_nat_key[existing_objs.natural_key()]:
objs[idx] =existing_objnew_base_content= []
forobjinobjs:
content_already_saved=notobj._state.addingifnotcontent_already_saved:
new_base_content.append(NewContent(pulp_type=pulp_type, pulp_id=obj.pk))
self.bulk_create(objs, batch_size=batch_size)
NewContent.objects.bulk_create(new_base_content, batch_size=batch_size)
returnobjsclassContentBase(models.Model):
content_ptr=models.OneToOneField(NewContent, primary_key=True, default=uuid.uuid4, on_delete=models.CASCADE)
objects=NewPulpManager()
@classmethoddefget_pulp_type(cls):
returncls.TYPE@classmethoddefnatural_key_fields(cls):
""" Returns a tuple of the natural key fields which usually equates to unique_together fields """returntuple(chain.from_iterable(cls._meta.unique_together))
defnatural_key(self):
""" Get the model's natural key based on natural_key_fields. Returns: tuple: The natural key. """returntuple(getattr(self, f) forfinself.natural_key_fields())
defnatural_key_dict(self):
""" Get the model's natural key as a dictionary of keys and values. """to_return= {}
forkeyinself.natural_key_fields():
to_return[key] =getattr(self, key)
returnto_returnclassMeta:
abstract=TrueclassNewFileContent(ContentBase):
TYPE="file.file"relative_path=models.TextField(null=False)
digest=models.CharField(max_length=64, null=False)
classMeta:
unique_together= ("relative_path", "digest")
The speedup is about 3x vs what we currently do.
Old
In [5]: %time PulpFileContent.objects.bulk_get_or_create(old_content)
CPU times: user 555 ms, sys: 88.6 ms, total: 643 ms
Wall time: 1.77 s
New
In [7]: %time NewFileContent.objects.bulk_get_or_create(new_content)
CPU times: user 648 ms, sys: 1.39 ms, total: 649 ms
Wall time: 694 ms
The text was updated successfully, but these errors were encountered:
This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!
Author: @dralley (dalley)
Redmine Issue: 7824, https://pulp.plan.io/issues/7824
It is currently disallowed to bulk_create() multi-table inherited Django models, due to the technical challenges with doing so, however it is not necessarily impossible in theory.
https://code.djangoproject.com/ticket/28821
As a Pulp developer, if we are able to make assumptions about the way it will be used, then we can avoid most of the problems that prevent a generic implementation. For instance, only one level of inheritance.
I developed a proof of concept strategy that is unfortunately made more difficult by Django's proxy model behavior and the fact that there's no class which represents just the subclass table. So, this code emulates how multi-table inherited models set up their internal relationships, but doesn't actually use model inheritance.
Strategy:
The speedup is about 3x vs what we currently do.
Old
New
The text was updated successfully, but these errors were encountered: