Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, multiple source/target repositories can be used for recursive copy #1420

Merged
merged 11 commits into from Sep 10, 2019

Conversation

dralley
Copy link
Contributor

@dralley dralley commented Aug 19, 2019

multi-repo copy

This PR has a companion here: pulp/pulp#3947

This PR is best reviewed as a series of individual commits.

Testing this PR (manually)

Create 4 repositories:

  • 1 repository must contain modules or RPMs that depend on other RPMs in the second repo
  • 1 repository must contain all of the RPM dependencies needed by the first repo
  • 2 empty repositories

Personally, I used these two Fedora repositories:

The former repo depends on stuff in the latter repo.

Make a direct API request -- there is no pulp-admin support for this feature at the moment

curl -k -u admin:admin --cert ~/.pulp/user-cert.pem -d '{"source_repo_id":"modular-release","criteria":{"type_ids":["modulemd"],"filters":{"unit":{"$and":[{"name":"mariadb","stream":"10.4"}]}}},"override_config":{"recursive_conservative":true,"additional_repos":{"release": "test2"}}}' -H "Content-Type: application/json" -X POST https://localhost/pulp/api/v2/repositories/test1/actions/associate/

In this request, I copied the module mariadb stream 10.4 from repo modular-release to repo test1 while using an override config that specifies "recursive_conservative": true and "additional_repos":{"release": "test2"}. "additional_repos":{"release": "test2"} means that the repo release will be loaded into the depsolver also, and any dependencies found in release will be copied to repo test2 instead of the normal destination repository.

Results (as per pulp-admin rpm repo list):

Id:                  test1
Display Name:        None
Description:         None
Content Unit Counts:
  Modulemd:          1
  Modulemd Defaults: 1
  Rpm:               19
 
Id:                  test2
Display Name:        None
Description:         None
Content Unit Counts:
  Rpm: 238

Design decisions

I put the additional_repos key in the override config because recursive and recursive_conservative are also in the override config, because the RPM plugin is the only plugin that will use this and therefore this is the least invasive way to make sure no other plugins are affected. Information in the additional_repos key has to be used by both Pulp 2 core and by the plugin, and so it has to be passed up through every single layer of the API -- and the config already is.

This PR does a significant amount of refactoring and cleans up the copy code quite a bit. Several existing, undiscovered bugs were discovered in the process of the refactoring which can be found in the individual commits.

Among the significant changes:

  • associate.py::associate() is no longer called recursively
  • depsolving is no longer a process that occurs as the RPMs are copied inside associate.py::copy_rpms(), called repeatedly as more and more RPM children of errata, groups, etc. are discovered. Instead, the children are found first, combined into one single set of units along with the original ones from which the children were found, then combined again with the dependencies discovered via depsolving (which only happens once), which are then associated with the target repo(s) all at once and in one place.
    • the changes above enabled the simple copy branch to be combined with recursive copy -- only the depsolving step is skipped. this reduces a lot of duplicate code.
  • discovery of RPM children of groups now happens iteratively instead of recursively

Future work

There are no smash tests (or unit tests) that currently cover the additional_repos key. Since it's difficult to test manually this will be the only realistic way to really verify the behavior is correct.

* .extend() is not a method that exists on sets
* whether a repository has been loaded is debug information, shouldn't
be logged normally

[noissue]
@dralley dralley force-pushed the copy-modulemd branch 3 times, most recently from 7b460e7 to 94aab7f Compare August 22, 2019 20:07
@pep8speaks
Copy link

pep8speaks commented Aug 22, 2019

Hello @dralley! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-09-10 18:28:02 UTC

@dralley dralley force-pushed the copy-modulemd branch 2 times, most recently from 47cbda1 to fcc2145 Compare August 23, 2019 02:11
@@ -461,7 +461,7 @@ def _associate_unit(dest_repo, unit, config):
elif isinstance(unit, models.RPM):
# copy will happen in one batch
return unit
elif isinstance(unit, (models.YumMetadataFile, models.ModulemdDefaults)):
elif isinstance(unit, models.YumMetadataFile):

This comment has been hidden.

This comment has been hidden.

This comment has been hidden.

self.get_unique_modules() returns a generator. By iterating over it, it
gets exhausted making this function useless.

[noissue]
After units pass through the solver they come out the other end via some
mongodb queries using only(). We need to make sure the storage path is
set because it's needed in the association stage.

[noissue]
@dralley dralley force-pushed the copy-modulemd branch 3 times, most recently from 095e973 to d4f95b3 Compare August 27, 2019 21:16

# ------ get RPM children of errata and modules ------
wanted_rpms = get_rpms_to_copy_by_key(rpm_search_dicts, import_conduit, source_repo)
rpm_search_dicts = None
rpms_to_copy = filter_available_rpms(wanted_rpms, import_conduit, source_repo)
associated_units |= copy_rpms(rpms_to_copy, source_repo, dest_repo, import_conduit, config,
solver)
rpms_to_copy = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this in a bunch of places and a couple of comments have mentioned that it's for "garbage collection". I assume it's necessary because I doubt it would be there otherwise, but I figure del <variable> is probably better for that purpose.

@dralley
Copy link
Contributor Author

dralley commented Aug 28, 2019

failed_units |= units - associated_units
if associated_units:
repo_controller.update_last_unit_added(dst_repo.repo_id)
repo_controller.rebuild_content_unit_counts(dst_repo)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This technically only needs to happen for the additional repos, not the primary destination repo, but it doesn't hurt anything to do it twice. If it's an expensive operation then we can add that check back, but I don't know how expensive it is.

@@ -118,6 +118,7 @@
CONFIG_SSL_AUTH_CA_CERT = 'ssl_auth_ca_cert'

# Copy operation config
CONFIG_ADDITIONAL_REPOS = 'additional_repos'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional_repos = config.get(constants.CONFIG_ADDITIONAL_REPOS, {})
if additional_repos and not recursive:
# TODO: is there a better error to raise?
raise ValueError("Cannot use additional_repos without one of the recursive flags set")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this fine if we provide some kind of source/docs were users can learn about additional_repos

@dralley dralley force-pushed the copy-modulemd branch 4 times, most recently from c67b03a to c62f190 Compare August 30, 2019 17:17
@dralley dralley force-pushed the copy-modulemd branch 3 times, most recently from efbe664 to 4dd4245 Compare September 3, 2019 15:55
Depsolving has to happen *after* all the direct child units are discovered. It
also can't happen inside copy_rpms() because that conflates depsolving
and copying, which is a problem. We need these to be separate
operations.

re: #5067
https://pulp.plan.io/issues/5067
We previously decided that a module's module-default unit always had to be
copied regardless of which stream is copied, but somehow I missed that.

[noissue]
Don't update the content counts if it's the "normal" destination
repository because core will handle it for us and it'll be done twice.

[noissue]
These types have to be cloned instead of copied which means that the
unit passed in and the unit being associated are not the same. So the
code thinks that the original unit was never associated. Longstanding
issue revealed by refactoring.

[noissue]
@@ -86,9 +86,20 @@ In a simple case the result of such API call copies unit and units directly rela
There are more complicated relations and behavior may vary depending on the content
and the type of a recursive flag. More information about such cases below.

Additionally, newer versions versions of Fedora and RHEL/CentOS have introduced modular repositories,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/versions//

# and failure on an assert statement here
# https://github.com/openSUSE/libsolv/blob/master/src/solver.c#L1979-L1981
)
# Only loads the original source and destination repositories
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment with the latest behavior.

Avoid hundreds of warnings spammed about
qt5-qtbase-0:5.11.1-i686.i686 has inferior architecture
and so forth

[noissue]
@dralley dralley merged commit b9f593d into pulp:2-master Sep 10, 2019
@dralley dralley deleted the copy-modulemd branch September 10, 2019 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants