Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix URL in catalog entry created for existing package. #995

Merged
merged 1 commit into from Oct 21, 2016
Merged

Fix URL in catalog entry created for existing package. #995

merged 1 commit into from Oct 21, 2016

Conversation

jortel
Copy link
Contributor

@jortel jortel commented Oct 19, 2016

https://pulp.plan.io/issues/2354

Adding catalog entries in 2.8 really just got shoehorned in as best we could but the code organization and procedural flows made it very difficult. This patch is not pretty but resolves the issue.

@mention-bot
Copy link

@jortel, thanks for your PR! By analyzing the history of the files in this pull request, we identified @mhrivnak, @ipanova and @barnabycourt to be potential reviewers.

@mhrivnak
Copy link
Contributor

For expedience, here's a rough, thrown-together, completely-untested and likely incomplete patch that at least illustrates how I think this could work.

diff --git a/plugins/pulp_rpm/plugins/importers/yum/existing.py b/plugins/pulp_rpm/plugins/importers/yum/existing.py
index 79c2355..f8ec3f8 100644
--- a/plugins/pulp_rpm/plugins/importers/yum/existing.py
+++ b/plugins/pulp_rpm/plugins/importers/yum/existing.py
@@ -100,8 +100,9 @@ def check_all_and_associate(wanted, conduit, config, download_deferred, catalog)
     also associates the unit to the given repo. Note that the check for the actual file
     is performed only for the supported unit types.

-    :param wanted:            iterable of units as namedtuples
-    :type  wanted:            iterable
+    :param wanted:            dict where keys are units as namedtuples, and values are
+                              WantedUnitInfo instances
+    :type  wanted:            dict
     :param conduit:           repo sync conduit
     :type  conduit:           pulp.plugins.conduits.repo_sync.RepoSync
     :param config:            configuration instance passed to the importer
@@ -115,7 +116,7 @@ def check_all_and_associate(wanted, conduit, config, download_deferred, catalog)
                 named tuples received as input were not found on the server.
     :rtype:     set
     """
-    sorted_units = _sort_by_type(wanted)
+    sorted_units = _sort_by_type(wanted.iterkeys())
     for unit_type, values in sorted_units.iteritems():
         model = plugin_api.get_unit_model_by_id(unit_type)
         # FIXME "fields" does not get used, but it should
@@ -134,7 +135,7 @@ def check_all_and_associate(wanted, conduit, config, download_deferred, catalog)
                 # package file does not exist and downloading is not deferred.
                 if not download_deferred and not file_exists:
                     continue
-            catalog.add(unit)
+            catalog.add(unit, wanted[unit.unit_key_as_named_tuple].relative_path)
             if rpm_parse.signature_enabled(config):
                 try:
                     rpm_parse.filter_signature(unit, config)
diff --git a/plugins/pulp_rpm/plugins/importers/yum/sync.py b/plugins/pulp_rpm/plugins/importers/yum/sync.py
index db2f8b6..d57e637 100644
--- a/plugins/pulp_rpm/plugins/importers/yum/sync.py
+++ b/plugins/pulp_rpm/plugins/importers/yum/sync.py
@@ -8,6 +8,7 @@ import shutil
 import tempfile
 import traceback

+from collections import namedtuple
 from gettext import gettext as _
 from cStringIO import StringIO
 from urlparse import urljoin
@@ -40,6 +41,9 @@ from pulp_rpm.plugins.importers.yum.utils import RepoURLModifier
 _logger = logging.getLogger(__name__)


+WantedUnitInfo = namedtuple('WantedUnitInfo', ('size', 'relative_path'))
+
+
 class RepoSync(object):

     def __init__(self, repo, conduit, config):
@@ -559,11 +563,11 @@ class RepoSync(object):
             # check for the units that are not in the repo, but exist on the server
             # and associate them to the repo
             to_download = existing.check_all_and_associate(
-                wanted.iterkeys(), self.conduit, self.config, self.download_deferred, catalog)
+                wanted, self.conduit, self.config, self.download_deferred, catalog)
             count = len(to_download)
             size = 0
             for unit in to_download:
-                size += wanted[unit]
+                size += wanted[unit].size
             return to_download, count, size
         finally:
             primary_file_handle.close()
@@ -603,11 +607,11 @@ class RepoSync(object):
                     # check for the units that are not in the repo, but exist on the server
                     # and associate them to the repo
                     to_download = existing.check_all_and_associate(
-                        wanted.iterkeys(), self.conduit, self.config, self.download_deferred,
+                        wanted, self.conduit, self.config, self.download_deferred,
                         catalog)
                     count += len(to_download)
                     for unit in to_download:
-                        size += wanted[unit]
+                        size += wanted[unit].size
                 finally:
                     presto_file_handle.close()

@@ -963,24 +967,24 @@ class RepoSync(object):
         for model in package_info_generator:
             versions = wanted.setdefault(model.key_string_without_version, {})
             serialized_version = model.complete_version_serialized
-            size = model.size
+            info = WantedUnitInfo(model.size, model.download_path)

             # if we are limited on the number of old versions we can have,
             if number_old_versions_to_keep is not None:
                 number_to_keep = number_old_versions_to_keep + 1
                 if len(versions) < number_to_keep:
-                    versions[serialized_version] = (model.unit_key_as_named_tuple, size)
+                    versions[serialized_version] = (model.unit_key_as_named_tuple, info)
                 else:
                     smallest_version = sorted(versions.keys(), reverse=True)[:number_to_keep][-1]
                     if serialized_version > smallest_version:
                         del versions[smallest_version]
-                        versions[serialized_version] = (model.unit_key_as_named_tuple, size)
+                        versions[serialized_version] = (model.unit_key_as_named_tuple, info)
             else:
-                versions[serialized_version] = (model.unit_key_as_named_tuple, size)
+                versions[serialized_version] = (model.unit_key_as_named_tuple, info)
         ret = {}
         for units in wanted.itervalues():
-            for unit, size in units.itervalues():
-                ret[unit] = size
+            for unit, info in units.itervalues():
+                ret[unit] = info

         return ret

@@ -517,8 +517,20 @@ def _decide_rpms_to_download(self, metadata_files, catalog):
wanted = self._identify_wanted_versions(package_info_generator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than iterate through primary.xml a third time, which can be costly on large repos, I think we can do this in the identify_wanted_versions method.

Strictly adhering to its purpose, that method should reasonably return a list or generator of units. But since we need to know the size of each unit, and it's convenient to get the size while iterating there, it returns a dict where the value is the size. I suggest adding the relative path there, alongside the size, so that all the info you need is in the wanted dict. Then in the call to check_all_and_associate, you could pass that entire dict as the first argument instead of just its keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much better solution.

@mhrivnak
Copy link
Contributor

Looks great. I'm actually a bit surprised at how much it resembles my patch, since I just threw that together for illustrative purposes. I thought for sure I must have forgotten or screwed up something. :)

"""
Add the specified content unit to the catalog.

:param unit: A unit being added.
:type unit: pulp_rpm.plugins.db.models.RpmBase
:param path: The path component of the download URL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is close, but not quite accurate. It's the relative path within the base path of the repo URL, right? Not the entire path component of the URL?

@jortel jortel merged commit 05bd3ae into pulp:2.8-dev Oct 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants