Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multiple input repositories to the solver #1407

Merged
merged 1 commit into from Jul 21, 2019

Conversation

dralley
Copy link
Contributor

@dralley dralley commented Jul 16, 2019

This PR does not add the ability to provide Pulp with multiple source repositories, it only provides the ability to provide the solver with multiple source repositories.

  • Add a new "additional_repos" kwarg to the Solver class
  • Make provisions for loading all of the new additional repos
  • Make all target repositories be loaded into one repo, since libsolv
    only supports a single "installed" repo
  • Make the "find_dependent_rpms" method return a dict of {'repo_id':
    set(<unit_ids>)} so that the upper-level copy code knows where the units
    came from, and hence, can determine where they need to go.

@pep8speaks
Copy link

pep8speaks commented Jul 16, 2019

Hello @dralley! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-07-21 01:32:15 UTC

@@ -57,7 +57,8 @@ def associate(source_repo, dest_repo, import_conduit, config, units=None, solver
solver = pulp_solv.Solver(
source_repo,
target_repo=dest_repo,
conservative=config.get(constants.CONFIG_RECURSIVE_CONSERVATIVE)
conservative=config.get(constants.CONFIG_RECURSIVE_CONSERVATIVE),
ignore_missing=False
Copy link
Contributor Author

@dralley dralley Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to be reverted later, not sure yet. Our dummy-solvable-injecting code is what is causing the crashes, and disabling it (what this line does) fixes them and so far seems to give perfectly correct results, so while the necessity of that code should be investigated it makes sense to me to disable it for now.

The crashes currently happen on 2-master, not just my PR branches. Tests pass with this change, although they passed before too -- none of them triggered the crashes. This may have even been happening before any of the new changes were made.

In practice, I've gotten the correct results back from the solver even though it's reporting problems such as

'nothing provides rtld(GNU_HASH) needed by silver-0:1.0.5-x86_64.x86_64',
 'nothing provides libpthread.so.0()(64bit) needed by silver-0:1.0.5-x86_64.x86_64',
 'nothing provides libpthread.so.0(GLIBC_2.2.5)(64bit) needed by silver-0:1.0.5-x86_64.x86_64',
 'nothing provides libcrypto.so.1.1()(64bit) needed by silver-0:1.0.5-x86_64.x86_64',
 'nothing provides libdl.so.2()(64bit) needed by silver-0:1.0.5-x86_64.x86_64',
 'nothing provides libdl.so.2(GLIBC_2.2.5)(64bit) needed by silver-0:1.0.5-x86_64.x86_64',

The dummy-solvable-injecting code tries to address these, but it seems like we got the correct answer without addressing them anyways.

I can move it into a different commit, though, if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dralley , +1 to mark it somehow. Maybe an inline comment is enough? So someone else and not only you :) would be able to find the spot.


ret = collections.defaultdict(set)
for repo, unit_ids in repo_unit_map.items():
for paginated_unit_ids in misc_utils.paginate(unit_ids):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to paginate, when all unit_ids are already in memory? Is it because mongo queries below can be too large and exceed BSON limits?

Copy link
Contributor Author

@dralley dralley Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pagination was there when I inherited it. I expect that it's the latter reason, yeah. I don't know for certain that it's necessary but I'd rather leave it.

self.target_repo = target_repo
self.primary_source_repo = source_repo
self.primary_target_repo = target_repo
self._additional_repo_mapping = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to set it to the value of additional_repos parameter and maybe make a default value of additional_repos an empty dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...yeah, whoops.


def find_dependent_rpms(self, units):
"""Finds the set of dependent units and returns them in a dictionary where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Finds/Find/
s/returns/return/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -253,8 +254,9 @@ def copy_rpms(units, source_repo, dest_repo, import_conduit, config, solver=None
if solver:
# This returns units that have a flattened 'provides' metadata field
# for memory purposes (RHBZ #1185868)
deps = solver.find_dependent_rpms(unit_set)
unit_set |= deps
repo_unit_set = solver.find_dependent_rpms(unit_set)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think repo_unit_set is not used for identifying which unit to copy to which target repo and currently it copies everything into one. Is it intentional? Do you think it's out of the scope of this PR?
If yes, then we should make it clear on the issue that this is one of the items which remains to be implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional? Do you think it's out of the scope of this PR?

Yes x2. I'll leave a comment.

Copy link
Member

@goosemania goosemania left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @dralley ! 🎉

* Add a new "additional_repos" kwarg to the Solver class
* Make provisions for loading all of the new additional repos
* Make all target repositories be loaded into one repo, since libsolv
only supports a single "installed" repo
* Make the "find_dependent_rpms" method return a dict of {'repo_id':
set(<unit_ids>)} so that the upper-level copy code knows where the units
came from, and hence, can determine where they need to go.

re: #5067
https://pulp.plan.io/issues/5067
@dralley dralley merged commit 48ba9c4 into pulp:2-master Jul 21, 2019
@dralley dralley deleted the enable-multi-source branch July 21, 2019 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants