-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking for dupe content in finalize #599
Conversation
fixes #6362 https://pulp.plan.io/issues/6362 Required PR: pulp/pulpcore#599
7d12f35
to
a1508f2
Compare
fixes #6362 https://pulp.plan.io/issues/6362 Required PR: pulp/pulpcore#599
@@ -82,3 +120,16 @@ def validate_version_paths(version): | |||
validate_file_paths(paths) | |||
except ValueError as e: | |||
raise ValueError(_("Cannot create repository version. {err}.").format(err=e)) | |||
|
|||
|
|||
def validate_repo_version(version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the idea this would be configured to be called by plugins? or is this called for all plugins today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I completely understand your question. This is just a helper method plugins can call during finalize if they want to. Here's an example from pulp_file: pulp/pulp_file#368.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's sort of like a wrapper for any other validate_* methods in repo_version_utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that answers it. You're saying plugin opt into this and this is a helpful wrapper. Sounds good to me.
@@ -553,33 +553,6 @@ def add_content(self, content): | |||
if self.complete: | |||
raise ResourceImmutableError(self) | |||
|
|||
error_messages = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm my understanding of this code being removed. It wouldn't automatically remove the duplicate, it would just fail. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. This is the crux of the problem. When duplicate content gets added to a repo version, it fails before giving the plugin writer the chance to deal with the problem during finalize.
Does this need any docs changes? I suspect there needs to be something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really liked this PR!
Thank you @daviddavis
@@ -0,0 +1 @@ | |||
Added two new repo validation methods (validate_repo_version and validate_duplicate_content). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea!
@@ -67,6 +67,44 @@ def remove_duplicates(repository_version): | |||
repository_version.remove_content(duplicates_qs) | |||
|
|||
|
|||
def validate_duplicate_content(version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better! it makes easy for plugin writers to customize
@daviddavis , thanks! Looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like this approach a lot (in fact pulp_cookbook implements a method similar to validate_duplicate_content()
), I don't fully agree to your statement:
it would violate the single responsibility principle as remove_duplicates removes existing content from the repo version if it collides with content being added.
My expectation for a function called remove_duplicates()
is that it does exactly that. If remove_duplicates()
can't remove all duplicates, I expect it to fail. Thus, the validation in the following code would never raise:
remove_duplicates(new_version)
validate_duplicate_content(version)
But, in fact, it will raise a ValueError
if there are
a) duplicates in the added content
b) duplicates in the base/previous repo version's content (should never happen if proper validation is done)
This may be confusing to plugin writers.
Moreover, I think that the current docstring of remove_duplicates()
does not emphasize that it only "removes existing content from the repo version if it collides with content being added" (as you point out correctly) and that there may still be duplicates after calling the function.
So, could we:
- make the docstring very clear about this behavior?
- and, maybe even: rename
remove_duplicates
to e.g.supersede_existing_duplicates
? (as said before, I am not good at naming things...)
@gmbnomis My comment about As to renaming |
fixes #6362 https://pulp.plan.io/issues/6362 Required PR: pulp/pulp_file#368
@gmbnomis I talked on IRC with @bmbouter and @fabricio-aguiar. We'd like to merge this and open a separate issue to consider renaming remove_duplicates since we agreed that its name is not accurate. Wdyt? |
Attached issue: https://pulp.plan.io/issues/6362 |
@daviddavis yes, please go ahead. You have addressed the most important bit (updating the documentation). Of course, the name is still misleading, but if you are hesitant to change it because it creates issues with plugins/work for plugins, we can keep it. |
@gmbnomis cool, sounds good. I opened https://pulp.plan.io/issues/6416. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me as long as we are tracking the name fix in https://pulp.plan.io/issues/6416
Thanks @daviddavis, @gmbnomis and @fabricio-aguiar !
fixes #6362 https://pulp.plan.io/issues/6362 Required PR: pulp/pulpcore#599
fixes #6362 https://pulp.plan.io/issues/6362 Required PR: pulp/pulpcore#599 (cherry picked from commit dc6acb9)
fixes #6362
https://pulp.plan.io/issues/6362
Plugin example: pulp/pulp_file#368
I decided not to update remove_duplicates to handle checking for duplicate content because (a) it would violate the single responsibility principle as remove_duplicates removes existing content from the repo version if it collides with content being added.
Also (b) remove_duplicates is called at the beginning of finalize before processing happens while validation ought to happen at the end.