Skip to content
This repository has been archived by the owner on Dec 7, 2022. It is now read-only.

Problem: Artifact Saver fails to associate existing Artifacts #3764

Merged
merged 1 commit into from Nov 20, 2018

Conversation

dkliban
Copy link
Member

@dkliban dkliban commented Nov 19, 2018

Solution: Update Declarative Artifacts with Artifacts returned by bulk_get_or_create()

[noissue]

@pep8speaks
Copy link

Hello @dkliban! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Nov 19, 2018

Codecov Report

Merging #3764 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3764   +/-   ##
=======================================
  Coverage   54.77%   54.77%           
=======================================
  Files          64       64           
  Lines        2806     2806           
=======================================
  Hits         1537     1537           
  Misses       1269     1269

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68a5322...3f5cc8f. Read the comment docs.

@bmbouter
Copy link
Member

This makes sense to me. Without re-assigning the result from bulk_get_or_create I can see how existing Artifacts won't be associated.

@dkliban Based on the pattern, don't we also have a possibility of duplicate content units not being associated back on the declarative_content object when 'replaced' from the db here:

for content_artifact in ContentArtifact.objects.bulk_get_or_create(

@dkliban
Copy link
Member Author

dkliban commented Nov 19, 2018

@bmbouter Yes, I believe you are right about the duplicate Content.

@bmbouter
Copy link
Member

@dkliban do you want to merge this and fix that separately or file it?

@dkliban
Copy link
Member Author

dkliban commented Nov 19, 2018

@bmbouter I want to merge and submit another PR

@dkliban
Copy link
Member Author

dkliban commented Nov 19, 2018

@bmbouter I take that back. I'll update tihs PR.

@mdellweg
Copy link
Member

mdellweg commented Nov 19, 2018

I experienced, that assigning the pk in bulk_get_or_create acutally works.
Maybe if we update all properties, we don't break any expectations.

Update: Only postgresql sets the pk on bulk_create by now, so the problem, this PR tries to solve is actually a more general one.

@bmbouter
Copy link
Member

@mdellweg I also pitched something similar to that as well; specifically to assign the pk and all attributes. I was suggesting all attributes because all attributes are present in the db but it's likely not-all attributes will be present on the in-memory model which is likely sparse from metdata).

The re-assign-the-attributes approach has downsides though, such as the in-memory model being in an "unsaved" state, even though it's data is actually all saved. That downside seemed risky enough for me to think this zip() approach is better, even with its memory and cpu downsides.

Other thoughts, ideas, criticisms are welcome.

if artifacts_to_save:
Artifact.objects.bulk_get_or_create(artifacts_to_save)
if da_to_save:
for da, artifact in zip(da_to_save, Artifact.objects.bulk_get_or_create(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure, that the order is preserved here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this has been a key question we've been asking. It is not documented in their docs as something we can rely on. So that makes me a little nervous also. @dralley and I were talkinga bout this and he reviewed the bulk_create implementation to ensure it is order-preserving. We also checked that our implementation of bulk_get_or_create is also order-preserving.

We don't have a good way of guaranteeing this from django going forward. Maybe we should have a unit test so we can assert bulk_create in latest always returns items in-order. That would be one way to raise our confidence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's hope for the best.
But i still think, we rely heavily on bulk_create to return the pk's here.
And if i read https://docs.djangoproject.com/en/2.1/ref/models/querysets/#bulk-create correctly, this is not a given on anything but postgres.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdellweg this is also concerning to me. We need to have a convo on the mailing list about if this is acceptable. This issue was introduced when we switched from UUID to ints since the uuid's were client side generated.

@mdellweg
Copy link
Member

My experiments to this problem:
ATIX-AG@6646e36

I think, now i have the problem, that when i have the same Artifact twice in one batch, it complains The file referenced by the Artifact is already present in Artifact storage. Files must be stored outside this location prior to Artifact creation.

Solution: Update Declarative Artifacts with Artifacts returned by bulk_get_or_create()

The Content unit saver did not account for duplicate units at all. This patch also addresses
that problem.

[noissue]
Copy link
Member

@bmbouter bmbouter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkliban This looks right to me. I believe that it will resolve the issues in both areas. Thank you for putting this together! 👍

@bmbouter bmbouter merged commit c584989 into pulp:master Nov 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
4 participants