Problem: Artifact Saver fails to associate existing Artifacts #3764
Conversation
|
Hello @dkliban! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #3764 +/- ##
=======================================
Coverage 54.77% 54.77%
=======================================
Files 64 64
Lines 2806 2806
=======================================
Hits 1537 1537
Misses 1269 1269Continue to review full report at Codecov.
|
|
This makes sense to me. Without re-assigning the result from bulk_get_or_create I can see how existing Artifacts won't be associated. @dkliban Based on the pattern, don't we also have a possibility of duplicate content units not being associated back on the declarative_content object when 'replaced' from the db here:
|
|
@bmbouter Yes, I believe you are right about the duplicate Content. |
|
@dkliban do you want to merge this and fix that separately or file it? |
|
@bmbouter I want to merge and submit another PR |
|
@bmbouter I take that back. I'll update tihs PR. |
|
I experienced, that assigning the pk in bulk_get_or_create acutally works. Update: Only postgresql sets the pk on bulk_create by now, so the problem, this PR tries to solve is actually a more general one. |
|
@mdellweg I also pitched something similar to that as well; specifically to assign the pk and all attributes. I was suggesting all attributes because all attributes are present in the db but it's likely not-all attributes will be present on the in-memory model which is likely sparse from metdata). The re-assign-the-attributes approach has downsides though, such as the in-memory model being in an "unsaved" state, even though it's data is actually all saved. That downside seemed risky enough for me to think this zip() approach is better, even with its memory and cpu downsides. Other thoughts, ideas, criticisms are welcome. |
| if artifacts_to_save: | ||
| Artifact.objects.bulk_get_or_create(artifacts_to_save) | ||
| if da_to_save: | ||
| for da, artifact in zip(da_to_save, Artifact.objects.bulk_get_or_create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure, that the order is preserved here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this has been a key question we've been asking. It is not documented in their docs as something we can rely on. So that makes me a little nervous also. @dralley and I were talkinga bout this and he reviewed the bulk_create implementation to ensure it is order-preserving. We also checked that our implementation of bulk_get_or_create is also order-preserving.
We don't have a good way of guaranteeing this from django going forward. Maybe we should have a unit test so we can assert bulk_create in latest always returns items in-order. That would be one way to raise our confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's hope for the best.
But i still think, we rely heavily on bulk_create to return the pk's here.
And if i read https://docs.djangoproject.com/en/2.1/ref/models/querysets/#bulk-create correctly, this is not a given on anything but postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdellweg this is also concerning to me. We need to have a convo on the mailing list about if this is acceptable. This issue was introduced when we switched from UUID to ints since the uuid's were client side generated.
|
My experiments to this problem: I think, now i have the problem, that when i have the same Artifact twice in one batch, it complains |
Solution: Update Declarative Artifacts with Artifacts returned by bulk_get_or_create() The Content unit saver did not account for duplicate units at all. This patch also addresses that problem. [noissue]
5ccbee4
to
3f5cc8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkliban This looks right to me. I believe that it will resolve the issues in both areas. Thank you for putting this together! 👍
Solution: Update Declarative Artifacts with Artifacts returned by bulk_get_or_create()
[noissue]