Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

953665 - added the ability for copy operations to not also copy child un... #181

Merged
merged 3 commits into from Apr 19, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/tech-reference/yum-plugins.rst
Expand Up @@ -423,6 +423,14 @@ configuration values are optional.
Number of times to retry before declaring an error during repository synchronization;
defaults to ``2``.

``copy_children``
Supported only as an override config option to a repository copy command, when
this option is False, the copy command will not attempt to locate and copy child
packages of errata, groups, or categories. For example, if it is already known
that all of a group's RPMs are available in the destination repository, it can
save substantial time to set this to False and thus not have the importer verify
the presence of each. default is True.

Yum Distributor
===============

Expand Down
12 changes: 12 additions & 0 deletions docs/user-guide/release-notes.rst
Expand Up @@ -7,6 +7,16 @@ Pulp 2.1.1

Pulp 2.1.1 is a bugfix release that also comes with a few performance improvements.

Changes
-------

A new config option may be passed when making a copy request between two
repositories. If ``copy_children=False`` is passed as part of the override
config when copying errata, groups, or categories, then the child packages of
those units will not be copied. This saves time when copying an entire
repository, because for example, you can tell the group copy operation not to
worry about finding and retrieving the RPMs referenced by each group.

Noteworthy Bugs Fixed
---------------------

Expand All @@ -21,6 +31,8 @@ PULP_MANIFEST correctly
`949008 <https://bugzilla.redhat.com/show_bug.cgi?id=949008>`_ - The ISO importer set the SSL_VERIFY_HOST value
to 1, when it should be 2

`953665 <https://bugzilla.redhat.com/show_bug.cgi?id=953665>`_ - Copying large repo uses tons of RAM and takes too long

Upgrade Instructions
--------------------

Expand Down
30 changes: 22 additions & 8 deletions pulp_rpm/plugins/importers/yum_importer/importer.py
Expand Up @@ -23,12 +23,12 @@
from yum_importer.comps import ImporterComps
from yum_importer.errata import ImporterErrata, link_errata_rpm_units
from yum_importer.importer_rpm import ImporterRPM, get_existing_units, form_lookup_key
from pulp.server.db.model.criteria import UnitAssociationCriteria, Criteria
from pulp.server.db.model.criteria import UnitAssociationCriteria
from pulp.plugins.importer import Importer
from pulp.plugins.model import Unit, SyncReport
from pulp.plugins.model import SyncReport
from pulp_rpm.common.ids import TYPE_ID_IMPORTER_YUM, TYPE_ID_PKG_GROUP, TYPE_ID_PKG_CATEGORY, TYPE_ID_DISTRO,\
TYPE_ID_DRPM, TYPE_ID_ERRATA, TYPE_ID_RPM, TYPE_ID_SRPM
from pulp_rpm.common import constants
from pulp_rpm.common import constants, ids
from pulp_rpm.yum_plugin import util, depsolver, metadata
from pulp_rpm.yum_plugin.metadata import get_package_xml

Expand Down Expand Up @@ -275,7 +275,7 @@ def import_units(self, source_repo, dest_repo, import_conduit, config, units=Non
@type import_conduit: L{pulp.plugins.conduits.unit_import.ImportUnitConduit}

@param config: plugin configuration
@type config: L{pulp.plugins.plugins.config.PluginCallConfiguration}
@type config: L{pulp.plugins.config.PluginCallConfiguration}

@param units: optional list of pre-filtered units to import
@type units: list of L{pulp.plugins.data.Unit}
Expand All @@ -285,7 +285,12 @@ def import_units(self, source_repo, dest_repo, import_conduit, config, units=Non
units = import_conduit.get_source_units()
blacklist_units = self._query_blacklist_units(import_conduit, config)
_LOG.info("Importing %s units from %s to %s" % (len(units), source_repo.id, dest_repo.id))
existing_rpm_units_dict = get_existing_units(import_conduit, criteria=UnitAssociationCriteria(type_ids=[TYPE_ID_RPM, TYPE_ID_SRPM]))
# don't get this stuff if we aren't going to use it
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we would load ALL of the units in the source repo into RAM even if we never actually used this data. Now we check to see if we intend to use it or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This certainly looks like an improvement but still concerned that if we determine that we need the existing units, that we're still loading all of the existing units into memory. Perhaps no way around it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm completely with you on that concern. Limiting which fields are loaded (line 292 below) helps a lot with memory use at least. I have a better overall solution in the works in the new importer, but it would take a lot of work to re-implement it here. And considering that this code will all be replaced anyway, my stance is that we shouldn't invest in this any more than we must.

if config.get('resolve_dependencies') or config.get(constants.CONFIG_COPY_CHILDREN, True):
criteria = UnitAssociationCriteria(type_ids=[TYPE_ID_RPM, TYPE_ID_SRPM], unit_fields=ids.UNIT_KEY_RPM)
existing_rpm_units_dict = get_existing_units(import_conduit, criteria=criteria)
else:
existing_rpm_units_dict = {}
for u in units:
if u.unit_key in blacklist_units:
continue
Expand Down Expand Up @@ -358,11 +363,14 @@ def _import_errata_unit_rpms(self, source_repo, erratum_unit, import_conduit, co
@type import_conduit: L{pulp.plugins.conduits.unit_import.ImportUnitConduit}

@param config: plugin configuration
@type config: L{pulp.plugins.plugins.config.PluginCallConfiguration}
@type config: L{pulp.plugins.config.PluginCallConfiguration}

@param existing_rpm_units: optional list of pre-filtered units to import
@type existing_rpm_units: list of L{pulp.plugins.data.Unit}
"""
if not config.get(constants.CONFIG_COPY_CHILDREN, True):
return

pkglist = erratum_unit.metadata['pkglist']
existing_rpm_units = existing_rpm_units or {}
for pkg in pkglist:
Expand Down Expand Up @@ -405,6 +413,9 @@ def _import_pkg_category_unit(self, source_repo, dest_repo, pkg_category_unit, i
pkg_category_unit.unit_key['repo_id'] = dest_repo.id
import_conduit.save_unit(pkg_category_unit)

if not config.get(constants.CONFIG_COPY_CHILDREN, True):
return

pkg_group_unit_ids = pkg_category_unit.metadata['packagegroupids']
for pgid in pkg_group_unit_ids:
criteria = UnitAssociationCriteria(type_ids=[TYPE_ID_PKG_GROUP], unit_filters={'id' : pgid})
Expand Down Expand Up @@ -439,12 +450,15 @@ def _import_pkg_group_unit(self, source_repo, dest_repo, pkg_group_unit, import_
pkg_group_unit.unit_key['repo_id'] = dest_repo.id
import_conduit.save_unit(pkg_group_unit)

if not config.get(constants.CONFIG_COPY_CHILDREN, True):
return

mandatory_pkg_names = pkg_group_unit.metadata["mandatory_package_names"] or []
optional_pkg_names = pkg_group_unit.metadata["optional_package_names"] or []
conditional_pkg_names = [cond_pkg[0] for cond_pkg in pkg_group_unit.metadata["conditional_package_names"]]
default_pkg_names = pkg_group_unit.metadata['default_package_names'] or []
for pname in mandatory_pkg_names + optional_pkg_names + conditional_pkg_names + default_pkg_names:
criteria = UnitAssociationCriteria(type_ids=[TYPE_ID_RPM], unit_filters={'name' : pname})
criteria = UnitAssociationCriteria(type_ids=[TYPE_ID_RPM], unit_filters={'name' : pname}, unit_fields=ids.UNIT_KEY_RPM)
found_pkgs = import_conduit.get_source_units(criteria=criteria)
if not len(found_pkgs):
continue
Expand Down Expand Up @@ -679,7 +693,7 @@ def resolve_dependencies(self, repo, units, dependency_conduit, config):
solved, unsolved = dsolve.processResults(results)
dep_pkgs_map = {}
_LOG.debug(" results from depsolver %s" % results)
criteria = UnitAssociationCriteria(type_ids=[TYPE_ID_RPM])
criteria = UnitAssociationCriteria(type_ids=[TYPE_ID_RPM], unit_fields=ids.UNIT_KEY_RPM)
existing_units = get_existing_units(dependency_conduit, criteria)
for dep, pkgs in solved.items():
dep_pkgs_map[dep] = []
Expand Down
1 change: 1 addition & 0 deletions pulp_rpm/src/pulp_rpm/common/constants.py
Expand Up @@ -26,6 +26,7 @@
PUBLISHED_DISTRIBUTION_FILES_KEY = 'published_distributions'

# Importer configuration key names
CONFIG_COPY_CHILDREN = 'copy_children'
CONFIG_FEED_URL = 'feed_url'
CONFIG_MAX_SPEED = 'max_speed'
CONFIG_NUM_THREADS = 'num_threads'
Expand Down
21 changes: 21 additions & 0 deletions pulp_rpm/src/pulp_rpm/extension/admin/copy.py
Expand Up @@ -13,7 +13,11 @@
import sys

from pulp.bindings.exceptions import BadRequestException
from pulp.client import parsers
from pulp.client.commands.unit import UnitCopyCommand
from pulp.client.extensions.extensions import PulpCliOption

from pulp_rpm.common import ids, constants
from pulp_rpm.common.ids import TYPE_ID_RPM, TYPE_ID_SRPM, TYPE_ID_DRPM, TYPE_ID_ERRATA, TYPE_ID_DISTRO, TYPE_ID_PKG_GROUP, TYPE_ID_PKG_CATEGORY

# -- constants ----------------------------------------------------------------
Expand All @@ -26,6 +30,13 @@
DESC_PKG_GROUP = _('copy package groups from one repository to another')
DESC_PKG_CATEGORY = _('copy package categories from one repository to another')

COPY_CHILD_RPMS_OPTION = PulpCliOption(
'--copy-children', _('copy child RPMs (default: True)'),
required=False, default='true', parse_func=parsers.parse_boolean)

COPY_CHILD_GROUPS_OPTION = PulpCliOption(
'--copy-children', _('copy child groups (default: True)'),
required=False, default='true', parse_func=parsers.parse_boolean)

# -- commands -----------------------------------------------------------------

Expand All @@ -34,6 +45,9 @@ class RpmCopyCommand(UnitCopyCommand):
def __init__(self, context):
self.context = context
def rpm_copy(**kwargs):
# let's not load ALL of the metadata, as that could take a tremendous
# amount of RAM
kwargs['fields'] = ids.UNIT_KEY_RPM
return _copy(self.context, TYPE_ID_RPM, **kwargs)
super(RpmCopyCommand, self).__init__(rpm_copy, name='rpm', description=DESC_RPM)

Expand Down Expand Up @@ -63,6 +77,7 @@ def __init__(self, context):
def errata_copy(**kwargs):
return _copy(self.context, TYPE_ID_ERRATA, **kwargs)
super(ErrataCopyCommand, self).__init__(errata_copy, name='errata', description=DESC_ERRATA)
self.add_option(COPY_CHILD_RPMS_OPTION)


class DistributionCopyCommand(UnitCopyCommand):
Expand All @@ -81,6 +96,7 @@ def __init__(self, context):
def package_group_copy(**kwargs):
return _copy(self.context, TYPE_ID_PKG_GROUP, **kwargs)
super(PackageGroupCopyCommand, self).__init__(package_group_copy, name='group', description=DESC_PKG_GROUP)
self.add_option(COPY_CHILD_RPMS_OPTION)


class PackageCategoryCopyCommand(UnitCopyCommand):
Expand All @@ -90,6 +106,7 @@ def __init__(self, context):
def package_category_copy(**kwargs):
return _copy(self.context, TYPE_ID_PKG_CATEGORY, **kwargs)
super(PackageCategoryCopyCommand, self).__init__(package_category_copy, name='category', description=DESC_PKG_CATEGORY)
self.add_option(COPY_CHILD_GROUPS_OPTION)


def _copy(context, type_id, **kwargs):
Expand All @@ -108,6 +125,10 @@ def _copy(context, type_id, **kwargs):
from_repo = kwargs['from-repo-id']
to_repo = kwargs['to-repo-id']
kwargs['type_ids'] = [type_id]
# pass a custom override_config if this option is present
if COPY_CHILD_RPMS_OPTION.keyword in kwargs:
kwargs['override_config'] = {constants.CONFIG_COPY_CHILDREN: kwargs[COPY_CHILD_RPMS_OPTION.keyword]}
del kwargs[COPY_CHILD_RPMS_OPTION.keyword]

# If rejected an exception will bubble up and be handled by middleware.
# The only caveat is if the source repo ID is invalid, it will come back
Expand Down
48 changes: 40 additions & 8 deletions pulp_rpm/test/unit/server/test_import_units.py
Expand Up @@ -13,30 +13,27 @@
# http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt.

import glob
import itertools
import mock
import os
import random
import shutil
import sys
import tempfile
import time
import unittest

from grinder.BaseFetch import BaseFetch
from pulp.plugins.config import PluginCallConfiguration
from pulp.plugins.model import Repository, Unit
import pycurl

sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)) + "/../../../src/")
sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)) + "/../../../plugins/importers/")

from yum_importer import importer_rpm
from yum_importer.importer import YumImporter
from pulp_rpm.common import constants
from pulp_rpm.common.ids import (TYPE_ID_RPM, UNIT_KEY_RPM, TYPE_ID_IMPORTER_YUM, TYPE_ID_ERRATA,
TYPE_ID_DISTRO, TYPE_ID_PKG_CATEGORY, TYPE_ID_PKG_GROUP)
from pulp_rpm.yum_plugin import util
from rpm_support_base import PULP_UNITTEST_REPO_URL, PulpRPMTests
import constants
import constants_test
import importer_mocks


Expand Down Expand Up @@ -551,6 +548,41 @@ def test_package_group_unit_import(self):
for u in verify_old_version_skipped:
self.assertFalse(u in associated_units)

def test_import_errata_unit_rpms_no_children(self):
importer = YumImporter()
config = PluginCallConfiguration({}, {}, {constants.CONFIG_COPY_CHILDREN: False})
mock_unit = mock.MagicMock()

importer._import_errata_unit_rpms(mock.MagicMock(), mock_unit, mock.MagicMock(),
config)
# if this attribute doesn't exist, then the method quit before doing any
# work, which is what we want
self.assertTrue('metadata' not in dir(mock_unit))

@mock.patch.object(YumImporter, '_safe_copy_unit', new=lambda self, x: x)
def test_import_pkg_category_unit_no_children(self):
importer = YumImporter()
config = PluginCallConfiguration({}, {}, {constants.CONFIG_COPY_CHILDREN: False})
mock_unit = mock.MagicMock()

importer._import_pkg_category_unit(mock.MagicMock(), mock.MagicMock(),
mock_unit, mock.MagicMock(), config)
# if this attribute doesn't exist, then the method quit before doing any
# work, which is what we want
self.assertTrue('metadata' not in dir(mock_unit))

@mock.patch.object(YumImporter, '_safe_copy_unit', new=lambda self, x: x)
def test_import_pkg_group_unit_no_children(self):
importer = YumImporter()
config = PluginCallConfiguration({}, {}, {constants.CONFIG_COPY_CHILDREN: False})
mock_unit = mock.MagicMock()

importer._import_pkg_group_unit(mock.MagicMock(), mock.MagicMock(),
mock_unit, mock.MagicMock(), config)
# if this attribute doesn't exist, then the method quit before doing any
# work, which is what we want
self.assertTrue('metadata' not in dir(mock_unit))


class TestImportDependencies(PulpRPMTests):

Expand Down Expand Up @@ -609,11 +641,11 @@ def existing_units(self):
# units.append(unit)

unit = Unit(TYPE_ID_RPM, self.UNIT_KEY_A, {}, '')
unit.metadata = constants.PULP_SERVER_RPM_METADATA
unit.metadata = constants_test.PULP_SERVER_RPM_METADATA
units.append(unit)

unit = Unit(TYPE_ID_RPM, self.UNIT_KEY_B, {}, '')
unit.metadata = constants.PULP_RPM_SERVER_RPM_METADATA
unit.metadata = constants_test.PULP_RPM_SERVER_RPM_METADATA
units.append(unit)
return units

Expand Down
6 changes: 3 additions & 3 deletions pulp_rpm/test/unit/server/test_resolve_dependencies.py
Expand Up @@ -23,7 +23,7 @@

sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)) + "/../../../plugins/importers/")
import importer_mocks
import constants
import constants_test
from yum_importer.importer import YumImporter
from pulp_rpm.yum_plugin import util
from pulp.plugins.model import Repository, Unit
Expand Down Expand Up @@ -60,12 +60,12 @@ def test_resolve_deps(self):
'checksum': 'ee5afa0aaf8bd2130b7f4a9b35f4178336c72e95358dd33bda8acaa5f28ea6e9', 'type_id' : 'rpm'}

unit_key_a_obj = Unit(RPM_TYPE_ID, unit_key_a, {}, '')
unit_key_a_obj.metadata = constants.PULP_SERVER_RPM_METADATA
unit_key_a_obj.metadata = constants_test.PULP_SERVER_RPM_METADATA
unit_key_b = {'id' : '', 'name' :'pulp-rpm-server', 'version' :'0.0.309', 'release' :'1.fc17', 'epoch':'0','arch' : 'noarch', 'checksumtype' :'sha256',
'checksum': '1e6c3a3bae26423fe49d26930b986e5f5ee25523c13f875dfcd4bf80f770bf56', 'type_id' : 'rpm', }

unit_key_b_obj = Unit(RPM_TYPE_ID, unit_key_b, {}, '')
unit_key_b_obj.metadata = constants.PULP_RPM_SERVER_RPM_METADATA
unit_key_b_obj.metadata = constants_test.PULP_RPM_SERVER_RPM_METADATA
existing_units = []
for unit in [unit_key_a_obj, unit_key_b_obj]:
existing_units.append(unit)
Expand Down