New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a user, multiple source/target repositories can be used for recursive copy #1420
Conversation
* .extend() is not a method that exists on sets * whether a repository has been loaded is debug information, shouldn't be logged normally [noissue]
7b460e7
to
94aab7f
Compare
47cbda1
to
fcc2145
Compare
| @@ -461,7 +461,7 @@ def _associate_unit(dest_repo, unit, config): | |||
| elif isinstance(unit, models.RPM): | |||
| # copy will happen in one batch | |||
| return unit | |||
| elif isinstance(unit, (models.YumMetadataFile, models.ModulemdDefaults)): | |||
| elif isinstance(unit, models.YumMetadataFile): | |||
This comment has been hidden.
This comment has been hidden.
Sorry, something went wrong.
This comment has been hidden.
This comment has been hidden.
Sorry, something went wrong.
This comment has been hidden.
This comment has been hidden.
Sorry, something went wrong.
self.get_unique_modules() returns a generator. By iterating over it, it gets exhausted making this function useless. [noissue]
After units pass through the solver they come out the other end via some mongodb queries using only(). We need to make sure the storage path is set because it's needed in the association stage. [noissue]
095e973
to
d4f95b3
Compare
|
|
||
| # ------ get RPM children of errata and modules ------ | ||
| wanted_rpms = get_rpms_to_copy_by_key(rpm_search_dicts, import_conduit, source_repo) | ||
| rpm_search_dicts = None | ||
| rpms_to_copy = filter_available_rpms(wanted_rpms, import_conduit, source_repo) | ||
| associated_units |= copy_rpms(rpms_to_copy, source_repo, dest_repo, import_conduit, config, | ||
| solver) | ||
| rpms_to_copy = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw this in a bunch of places and a couple of comments have mentioned that it's for "garbage collection". I assume it's necessary because I doubt it would be there otherwise, but I figure del <variable> is probably better for that purpose.
|
TODO: As per @goosemania, docs need to be updated https://docs.pulpproject.org/plugins/pulp_rpm/user-guide/features.html#advanced-copy-between-repositories |
| failed_units |= units - associated_units | ||
| if associated_units: | ||
| repo_controller.update_last_unit_added(dst_repo.repo_id) | ||
| repo_controller.rebuild_content_unit_counts(dst_repo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technically only needs to happen for the additional repos, not the primary destination repo, but it doesn't hurt anything to do it twice. If it's an expensive operation then we can add that check back, but I don't know how expensive it is.
| @@ -118,6 +118,7 @@ | |||
| CONFIG_SSL_AUTH_CA_CERT = 'ssl_auth_ca_cert' | |||
|
|
|||
| # Copy operation config | |||
| CONFIG_ADDITIONAL_REPOS = 'additional_repos' | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add this option to technical reference docs https://docs.pulpproject.org/plugins/pulp_rpm/tech-reference/yum-plugins.html#yum-importer
| additional_repos = config.get(constants.CONFIG_ADDITIONAL_REPOS, {}) | ||
| if additional_repos and not recursive: | ||
| # TODO: is there a better error to raise? | ||
| raise ValueError("Cannot use additional_repos without one of the recursive flags set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this fine if we provide some kind of source/docs were users can learn about additional_repos
c67b03a
to
c62f190
Compare
efbe664
to
4dd4245
Compare
Depsolving has to happen *after* all the direct child units are discovered. It also can't happen inside copy_rpms() because that conflates depsolving and copying, which is a problem. We need these to be separate operations. re: #5067 https://pulp.plan.io/issues/5067
We previously decided that a module's module-default unit always had to be copied regardless of which stream is copied, but somehow I missed that. [noissue]
Don't update the content counts if it's the "normal" destination repository because core will handle it for us and it'll be done twice. [noissue]
These types have to be cloned instead of copied which means that the unit passed in and the unit being associated are not the same. So the code thinks that the original unit was never associated. Longstanding issue revealed by refactoring. [noissue]
4dd4245
to
0dca1dd
Compare
docs/user-guide/features.rst
Outdated
| @@ -86,9 +86,20 @@ In a simple case the result of such API call copies unit and units directly rela | |||
| There are more complicated relations and behavior may vary depending on the content | |||
| and the type of a recursive flag. More information about such cases below. | |||
|
|
|||
| Additionally, newer versions versions of Fedora and RHEL/CentOS have introduced modular repositories, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/versions//
6778645
to
c3c117f
Compare
c3c117f
to
3c02159
Compare
[noissue]
50771f9
to
6a2d2d7
Compare
| # and failure on an assert statement here | ||
| # https://github.com/openSUSE/libsolv/blob/master/src/solver.c#L1979-L1981 | ||
| ) | ||
| # Only loads the original source and destination repositories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the comment with the latest behavior.
[noissue]
Avoid hundreds of warnings spammed about qt5-qtbase-0:5.11.1-i686.i686 has inferior architecture and so forth [noissue]
2ac42f9
to
d03b8de
Compare
multi-repo copy
This PR has a companion here: pulp/pulp#3947
This PR is best reviewed as a series of individual commits.
Testing this PR (manually)
Create 4 repositories:
Personally, I used these two Fedora repositories:
The former repo depends on stuff in the latter repo.
Make a direct API request -- there is no
pulp-adminsupport for this feature at the momentcurl -k -u admin:admin --cert ~/.pulp/user-cert.pem -d '{"source_repo_id":"modular-release","criteria":{"type_ids":["modulemd"],"filters":{"unit":{"$and":[{"name":"mariadb","stream":"10.4"}]}}},"override_config":{"recursive_conservative":true,"additional_repos":{"release": "test2"}}}' -H "Content-Type: application/json" -X POST https://localhost/pulp/api/v2/repositories/test1/actions/associate/In this request, I copied the module
mariadbstream10.4from repomodular-releaseto repotest1while using an override config that specifies"recursive_conservative": trueand"additional_repos":{"release": "test2"}."additional_repos":{"release": "test2"}means that the reporeleasewill be loaded into the depsolver also, and any dependencies found inreleasewill be copied to repotest2instead of the normal destination repository.Results (as per
pulp-admin rpm repo list):Design decisions
I put the
additional_reposkey in the override config becauserecursiveandrecursive_conservativeare also in the override config, because the RPM plugin is the only plugin that will use this and therefore this is the least invasive way to make sure no other plugins are affected. Information in theadditional_reposkey has to be used by both Pulp 2 core and by the plugin, and so it has to be passed up through every single layer of the API -- and the config already is.This PR does a significant amount of refactoring and cleans up the copy code quite a bit. Several existing, undiscovered bugs were discovered in the process of the refactoring which can be found in the individual commits.
Among the significant changes:
associate.py::associate()is no longer called recursivelyassociate.py::copy_rpms(), called repeatedly as more and more RPM children of errata, groups, etc. are discovered. Instead, the children are found first, combined into one single set of units along with the original ones from which the children were found, then combined again with the dependencies discovered via depsolving (which only happens once), which are then associated with the target repo(s) all at once and in one place.Future work
There are no smash tests (or unit tests) that currently cover the
additional_reposkey. Since it's difficult to test manually this will be the only realistic way to really verify the behavior is correct.