-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taught export to insure de-duplicated Artifact.json. #4161
Conversation
What's the impact of this change on fs-exports? |
79140b8
to
134610f
Compare
That is a good question. fs-export is a different code-path aimed at a different end-output, but might benefit from what we've learned here - if not as part of this PR/issue. I'll review and open a new issue if it looks like we can follow a similar pattern there. |
+1 to new issue / separate PR |
Taking a quick look at _export_to_filesystem, it looks like it is relying on hydrated Artifacts where it could get away with just knowing the Artifact.file. This is "mildly" suboptimal, but not a dealbreaker - going to the ddatabase at all takes most of your time, Artifacts aren't so large that "all fields" is horribly worse than just "a few fields". I don't see a path that suffers the same duplication-effort - it doesn't do multiple-repos-at-once, the source of our Pain in this PR. My first thought is that I wouldn't spend time trying to squeeze more performance out of that codepath, unless/until we actually see a problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please leave a TODO about maybe using .iterator()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd even be fine with merging this as is and promise to start on further refactoring right away with both ideas:
- Using a db iterator
- using qs.none()
Another thing (to think about later) that came to my mind: In the case of incremental, we now still include artifacts that are new to a repository version, but guaranteed to be in another (exported) repositories previous version. Could we safely skip them too?
Say the logic switching from:
to
I am absolutely open to ways to make this better in the face of "many repos at once", as a future improvement. Open an issue, if you would, so we have at least a placeholder pointing back to this discussion. |
Clarify please? Do you mean just change to |
hold off on merging while I investigate some test weirdness, please |
iterate over |
Along the way taught export to operate on a QuerySet of Artifacts instead of (prematurely) hydrating all affected Artifacts into a list. fixes pulp#4159.
134610f
to
fbb118c
Compare
Gotcha - good call, results of doing so improve the time even more:
|
Backport to 3.18: 💚 backport PR created✅ Backport PR branch: Backported as #4197 🤖 @patchback |
Backport to 3.21: 💚 backport PR created✅ Backport PR branch: Backported as #4198 🤖 @patchback |
Backport to 3.22: 💚 backport PR created✅ Backport PR branch: Backported as #4199 🤖 @patchback |
Backport to 3.23: 💚 backport PR created✅ Backport PR branch: Backported as #4200 🤖 @patchback |
Backport to 3.28: 💚 backport PR created✅ Backport PR branch: Backported as #4201 🤖 @patchback |
Backport to 3.29: 💚 backport PR created✅ Backport PR branch: Backported as #4202 🤖 @patchback |
fixes #4159.