Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulp_deb duplicate package handling is erroneous #921

Closed
quba42 opened this issue Oct 25, 2023 · 1 comment · Fixed by #922
Closed

pulp_deb duplicate package handling is erroneous #921

quba42 opened this issue Oct 25, 2023 · 1 comment · Fixed by #922
Labels
.bugfix CHANGES/<issue_number>.bugfix Katello For bugs and issues known to affect Katello

Comments

@quba42
Copy link
Collaborator

quba42 commented Oct 25, 2023

Version
all versions affected

Describe the bug
We use remove_duplicates from pulpcore to handle duplicates: https://github.com/pulp/pulp_deb/blob/main/pulp_deb/app/models/repository.py#L96

As it turns out, this does not do exactly what we want for APT repos in two ways:

  • A duplicate package in an APT repo, is only forbidden if they have differing sha256 checksums. Identical duplicates (the same packag in two different pool/ folder locations, are unwise/bad practice, but not actually forbidden. remove_duplicates does not discriminate between these two cases, and ends up removing packages from perfectly legitimate APT repos.
  • remove_duplicates only compares incoming packages with previously present packages, it does not search the incoming packages for duplicates. Instead it is meant to be combined with validate_duplicate_content which throws an error if the incoming packages contain duplicates. However, we don't and can't use this because this would completely prevent syncing of upstream repos containing "acceptable duplicates".

To Reproduce
One version of the problem is to perform a pulp to pulp sync of only the default distribution, followed by a sync of all distributions.
The default distribution contains an "acceptable duplicate" of every package, and all of these are suddenly kicked out by remove_duplicates. With the next sync, the incoming default distribution packages will kick out all the packages from the structured distributions and each sync will oscillate back and forth between no packages for default and no packages for everything else.

Expected behavior
We need our own APT specific version of remove_duplicates, that will kick out illigitimate duplicates, but not acceptable duplicates.
We also need our own version of the functionality of validate_duplicate_content. Unfortunately this may require significantly more queries than the version in pulpcore, because of the need to distinguish between the two cases.

@quba42 quba42 added Triage-Needed .bugfix CHANGES/<issue_number>.bugfix and removed Triage-Needed labels Oct 25, 2023
@quba42
Copy link
Collaborator Author

quba42 commented Oct 25, 2023

downstream: OR-4275

@quba42 quba42 added the Katello For bugs and issues known to affect Katello label Oct 25, 2023
quba42 added a commit to ATIX-AG/pulp_deb that referenced this issue Oct 30, 2023
closes pulp#921

As a result of the behaviour fixes, we can also drop the now superfluous
duplicate distribution checking. We are using validate_duplicate_content
to catch incoming duplicates, and we are removing old duplicates, so it
is not possible to run into this error. As a result it is not worth the
performance cost to check for it.
@quba42 quba42 linked a pull request Oct 30, 2023 that will close this issue
quba42 added a commit to ATIX-AG/pulp_deb that referenced this issue Oct 31, 2023
closes pulp#921

As a result of the behaviour fixes, we can also drop the now superfluous
duplicate distribution checking. We are using validate_duplicate_content
to catch incoming duplicates, and we are removing old duplicates, so it
is not possible to run into this error. As a result it is not worth the
performance cost to check for it.
quba42 added a commit to ATIX-AG/pulp_deb that referenced this issue Oct 31, 2023
closes pulp#921

As a result of the behaviour fixes, we can also drop the now superfluous
duplicate distribution checking. We are using validate_duplicate_content
to catch incoming duplicates, and we are removing old duplicates, so it
is not possible to run into this error. As a result it is not worth the
performance cost to check for it.
quba42 added a commit to ATIX-AG/pulp_deb that referenced this issue Nov 13, 2023
closes pulp#921

As a result of the behaviour fixes, we can also drop the now superfluous
duplicate distribution checking. We are using validate_duplicate_content
to catch incoming duplicates, and we are removing old duplicates, so it
is not possible to run into this error. As a result it is not worth the
performance cost to check for it.
quba42 added a commit to ATIX-AG/pulp_deb that referenced this issue Nov 15, 2023
closes pulp#921

As a result of the behaviour fixes, we can also drop the now superfluous
duplicate distribution checking. We are using validate_duplicate_content
to catch incoming duplicates, and we are removing old duplicates, so it
is not possible to run into this error. As a result it is not worth the
performance cost to check for it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.bugfix CHANGES/<issue_number>.bugfix Katello For bugs and issues known to affect Katello
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant