Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking for dupe content in finalize #599

Merged
merged 1 commit into from
Mar 30, 2020
Merged

Conversation

daviddavis
Copy link
Contributor

@daviddavis daviddavis commented Mar 19, 2020

fixes #6362
https://pulp.plan.io/issues/6362

Plugin example: pulp/pulp_file#368

I decided not to update remove_duplicates to handle checking for duplicate content because (a) it would violate the single responsibility principle as remove_duplicates removes existing content from the repo version if it collides with content being added.

Also (b) remove_duplicates is called at the beginning of finalize before processing happens while validation ought to happen at the end.

daviddavis pushed a commit to daviddavis/pulp_file that referenced this pull request Mar 19, 2020
@daviddavis daviddavis force-pushed the issue6362 branch 3 times, most recently from 7d12f35 to a1508f2 Compare March 19, 2020 16:36
daviddavis pushed a commit to daviddavis/pulp_file that referenced this pull request Mar 19, 2020
@@ -82,3 +120,16 @@ def validate_version_paths(version):
validate_file_paths(paths)
except ValueError as e:
raise ValueError(_("Cannot create repository version. {err}.").format(err=e))


def validate_repo_version(version):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the idea this would be configured to be called by plugins? or is this called for all plugins today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I completely understand your question. This is just a helper method plugins can call during finalize if they want to. Here's an example from pulp_file: pulp/pulp_file#368.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sort of like a wrapper for any other validate_* methods in repo_version_utils.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that answers it. You're saying plugin opt into this and this is a helpful wrapper. Sounds good to me.

@@ -553,33 +553,6 @@ def add_content(self, content):
if self.complete:
raise ResourceImmutableError(self)

error_messages = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm my understanding of this code being removed. It wouldn't automatically remove the duplicate, it would just fail. Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. This is the crux of the problem. When duplicate content gets added to a repo version, it fails before giving the plugin writer the chance to deal with the problem during finalize.

@bmbouter
Copy link
Member

Does this need any docs changes? I suspect there needs to be something.

Copy link
Member

@fao89 fao89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really liked this PR!
Thank you @daviddavis

@@ -0,0 +1 @@
Added two new repo validation methods (validate_repo_version and validate_duplicate_content).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

@@ -67,6 +67,44 @@ def remove_duplicates(repository_version):
repository_version.remove_content(duplicates_qs)


def validate_duplicate_content(version):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better! it makes easy for plugin writers to customize

@goosemania
Copy link
Member

@daviddavis , thanks! Looks good to me.

Copy link
Contributor

@gmbnomis gmbnomis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like this approach a lot (in fact pulp_cookbook implements a method similar to validate_duplicate_content()), I don't fully agree to your statement:

it would violate the single responsibility principle as remove_duplicates removes existing content from the repo version if it collides with content being added.

My expectation for a function called remove_duplicates() is that it does exactly that. If remove_duplicates() can't remove all duplicates, I expect it to fail. Thus, the validation in the following code would never raise:

remove_duplicates(new_version)
validate_duplicate_content(version)

But, in fact, it will raise a ValueError if there are
a) duplicates in the added content
b) duplicates in the base/previous repo version's content (should never happen if proper validation is done)

This may be confusing to plugin writers.

Moreover, I think that the current docstring of remove_duplicates() does not emphasize that it only "removes existing content from the repo version if it collides with content being added" (as you point out correctly) and that there may still be duplicates after calling the function.

So, could we:

  • make the docstring very clear about this behavior?
  • and, maybe even: rename remove_duplicates to e.g. supersede_existing_duplicates ? (as said before, I am not good at naming things...)

@daviddavis
Copy link
Contributor Author

@gmbnomis My comment about remove_duplicates only removing existing content is about the existing nature of the function and not what it ought to be doing based on its name. I agree that it's not clear and could be confusing to plugin writers. I went ahead and updated the docstring for remove_duplicates. I fully agree with that move.

As to renaming remove_duplicates, I'm a bit torn because it would be a lot of work but I do agree that its name is not entirely accurate. I could go either way. Any thoughts @goosemania @bmbouter @fabricio-aguiar ?

@daviddavis
Copy link
Contributor Author

@gmbnomis I talked on IRC with @bmbouter and @fabricio-aguiar. We'd like to merge this and open a separate issue to consider renaming remove_duplicates since we agreed that its name is not accurate. Wdyt?

@pulpbot
Copy link
Member

pulpbot commented Mar 30, 2020

Attached issue: https://pulp.plan.io/issues/6362

@gmbnomis
Copy link
Contributor

@gmbnomis I talked on IRC with @bmbouter and @fabricio-aguiar. We'd like to merge this and open a separate issue to consider renaming remove_duplicates since we agreed that its name is not accurate. Wdyt?

@daviddavis yes, please go ahead. You have addressed the most important bit (updating the documentation). Of course, the name is still misleading, but if you are hesitant to change it because it creates issues with plugins/work for plugins, we can keep it.

@daviddavis
Copy link
Contributor Author

@gmbnomis cool, sounds good. I opened https://pulp.plan.io/issues/6416.

Copy link
Member

@bmbouter bmbouter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me as long as we are tracking the name fix in https://pulp.plan.io/issues/6416

Thanks @daviddavis, @gmbnomis and @fabricio-aguiar !

@daviddavis daviddavis merged commit 6d1fcbd into pulp:master Mar 30, 2020
daviddavis pushed a commit to pulp/pulp_file that referenced this pull request Mar 30, 2020
pulpbot pushed a commit to pulp/pulp_file that referenced this pull request Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants