Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync from a mirror list feed if provided #1690

Merged
merged 1 commit into from Jun 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis/post_before_script.sh
Expand Up @@ -18,4 +18,4 @@ cmd_prefix bash -c "django-admin shell -c \"${CREATE_SIGNING_SERVICE}\""
echo "machine pulp
login admin
password password
" > ~/.netrc
" > ~/.netrc
1 change: 1 addition & 0 deletions CHANGES/6225.feature
@@ -0,0 +1 @@
Added support for syncing from a mirror list feed
2 changes: 1 addition & 1 deletion coverage.md
Expand Up @@ -12,7 +12,7 @@ Manual Coverage
| As a user, I can sync all yum content types ( NO drpm) in additive mode (default) | PART | checking counts of packages and advisories |
| As a user, I can sync and skip specific type (srpm) | NO | |
| As a user, I can sync opensuse repository | NO | |
| As a user, I can sync from a mirror list | NO | not merged yet |
| As a user, I can sync from a mirror list | YES | |
| **Duplicates** | | |
| As a user, I have only one advisory with the same id in a repo version | YES | |
| As a user, I have only one module with the same NSVCA in a repo version | NO | |
Expand Down
2 changes: 1 addition & 1 deletion docs/_static/api.json

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions docs/workflows/create_sync_publish.rst
Expand Up @@ -57,6 +57,11 @@ Remote GET response:
"url": "https://fixtures.pulpproject.org/rpm-unsigned/"
}

While creating a new remote, you may set the field ``url`` to point to a mirror list feed. Pulp
fetches the list of available mirrors and tries to download content from the first valid mirror.
This means that whenever an error occurs during the synchronization, the whole sync process ends
with an error too.

.. _versioned-repo-created:

Configuration for SLES 12+ repository with authentication
Expand Down
3 changes: 1 addition & 2 deletions pulp_rpm/app/kickstart/treeinfo.py
Expand Up @@ -9,13 +9,12 @@
from productmd.treeinfo import TreeInfo


def get_treeinfo_data(remote):
def get_treeinfo_data(remote, remote_url):
"""
Get Treeinfo data from remote.

"""
treeinfo_serialized = {}
remote_url = remote.url if remote.url[-1] == "/" else f"{remote.url}/"
namespaces = [".treeinfo", "treeinfo"]
for namespace in namespaces:
downloader = remote.get_downloader(url=urljoin(remote_url, namespace))
Expand Down
48 changes: 42 additions & 6 deletions pulp_rpm/app/tasks/synchronizing.py
Expand Up @@ -2,6 +2,7 @@
import gzip
import logging
import os
import re

from collections import defaultdict
from functools import partial
Expand Down Expand Up @@ -100,6 +101,36 @@ def repodata_exists(remote, url):
return True


def fetch_mirror(remote):
"""Fetch the first valid mirror from a list of all available mirrors from a mirror list feed.

URLs which are commented out or have any punctuations in front of them are being ignored.
"""
downloader = remote.get_downloader(url=remote.url.rstrip("/"))
result = downloader.fetch()

url_pattern = re.compile(r"(^|^[\w\s=]+\s)((http(s)?)://.*)")
with open(result.path) as mirror_list_file:
for mirror in mirror_list_file:
match = re.match(url_pattern, mirror)
if match and repodata_exists(remote, match.group(2)):
Copy link
Member

@ipanova ipanova Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should check whether the repodata for the url exists, i think we should just parse the file and add the urls to the list of feeds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmbouter i don't have a strong opinion on this approach, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we would do no parsing and try to use the URL blind. If the resultant works it works, if it fails try the next one.

Copy link
Member

@ipanova ipanova Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there have been some discussion around what should be the behaviour and this is what we agreed on: 1) identify whether this is a mirrorlist 2) find url that matches the regexp and check whether it has repomd.xml, once such url has been found return it and use it in the the sync, no need to parse other urls 3) if further issues will appear during the sync with that url just fail the sync

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for mirror in mirror_list_file:
    match = re.match(url_pattern, mirror)
    if match and repodata_exists(remote, match.group(2)):
        mirror_list_feed.append(match.group(2))
        break

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also get rid of the list and return the url instead

return match.group(2)

return None


def fetch_remote_url(remote):
"""Fetch a single remote from which can be content synced."""
remote_url = remote.url.rstrip("/") + "/"
downloader = remote.get_downloader(url=urljoin(remote_url, "repodata/repomd.xml"))
try:
downloader.fetch()
except (ClientResponseError, FileNotFoundError):
return fetch_mirror(remote)
else:
return remote_url


def synchronize(remote_pk, repository_pk, mirror, skip_types, optimize):
"""
Sync content from the remote repository.
Expand Down Expand Up @@ -127,7 +158,13 @@ def synchronize(remote_pk, repository_pk, mirror, skip_types, optimize):

deferred_download = (remote.policy != Remote.IMMEDIATE) # Interpret download policy

treeinfo = get_treeinfo_data(remote)
remote_url = fetch_remote_url(remote)
if not remote_url:
raise ValueError(_("A no valid remote URL was provided."))
else:
remote_url = remote_url.rstrip("/") + "/"

treeinfo = get_treeinfo_data(remote, remote_url)
if treeinfo:
treeinfo["repositories"] = {}
for repodata in set(treeinfo["download"]["repodatas"]):
Expand All @@ -142,7 +179,7 @@ def synchronize(remote_pk, repository_pk, mirror, skip_types, optimize):
sub_repo.save()
treeinfo["repositories"].update({repodata: str(sub_repo.pk)})
path = f"{repodata}/"
new_url = urljoin(remote.url, path)
new_url = urljoin(remote_url, path)
if repodata_exists(remote, new_url):
stage = RpmFirstStage(
remote,
Expand All @@ -161,7 +198,8 @@ def synchronize(remote_pk, repository_pk, mirror, skip_types, optimize):
deferred_download,
optimize=optimize,
skip_types=skip_types,
treeinfo=treeinfo)
treeinfo=treeinfo,
new_url=remote_url)
dv = RpmDeclarativeVersion(first_stage=first_stage,
repository=repository,
mirror=mirror)
Expand Down Expand Up @@ -317,8 +355,7 @@ def newpkgcb(pkgId, name, arch):

async def run(self):
"""Build `DeclarativeContent` from the repodata."""
remote_url = self.new_url or self.remote.url
self.data.remote_url = remote_url if remote_url[-1] == "/" else f"{remote_url}/"
self.data.remote_url = self.new_url or self.remote.url

progress_data = dict(message='Downloading Metadata Files', code='downloading.metadata')
with ProgressReport(**progress_data) as metadata_pb:
Expand All @@ -327,7 +364,6 @@ async def run(self):
downloader = self.remote.get_downloader(
url=urljoin(self.data.remote_url, 'repodata/repomd.xml')
)
# TODO: decide how to distinguish between a mirror list and a normal repo
result = await downloader.run()
metadata_pb.increment()

Expand Down
37 changes: 37 additions & 0 deletions pulp_rpm/tests/functional/api/test_sync.py
Expand Up @@ -37,6 +37,8 @@
RPM_INVALID_FIXTURE_URL,
RPM_KICKSTART_FIXTURE_SUMMARY,
RPM_KICKSTART_FIXTURE_URL,
RPM_MIRROR_LIST_GOOD_FIXTURE_URL,
RPM_MIRROR_LIST_BAD_FIXTURE_URL,
RPM_MODULAR_FIXTURE_SUMMARY,
RPM_MODULAR_FIXTURE_URL,
RPM_PACKAGE_CONTENT_NAME,
Expand Down Expand Up @@ -115,6 +117,41 @@ def test_sync(self):
self.assertEqual(latest_version_href, repo.latest_version_href)
self.assertDictEqual(get_content_summary(repo.to_dict()), RPM_FIXTURE_SUMMARY)

def test_sync_from_valid_mirror_list_feed(self):
"""Sync RPM content from a mirror list feed which contains a valid remote URL."""
remote = self.remote_api.create(gen_rpm_remote(RPM_MIRROR_LIST_GOOD_FIXTURE_URL))
repo, remote = self.do_test(remote=remote)

self.addCleanup(self.repo_api.delete, repo.pulp_href)
self.addCleanup(self.remote_api.delete, remote.pulp_href)

self.assertDictEqual(get_content_summary(repo.to_dict()), RPM_FIXTURE_SUMMARY)
self.assertDictEqual(get_added_content_summary(repo.to_dict()), RPM_FIXTURE_SUMMARY)

def test_sync_from_invalid_mirror_list_feed(self):
"""Sync RPM content from a mirror list feed which contains an invalid remote URL."""
repo = self.repo_api.create(gen_repo())
self.assertEqual(repo.latest_version_href, f"{repo.pulp_href}versions/0/")

remote = self.remote_api.create(gen_rpm_remote(RPM_MIRROR_LIST_BAD_FIXTURE_URL))
remote = self.remote_api.read(remote.pulp_href)
repo, remote = self.do_test(remote=remote)

self.addCleanup(self.repo_api.delete, repo.pulp_href)
self.addCleanup(self.remote_api.delete, remote.pulp_href)

repository_sync_data = RpmRepositorySyncURL(remote=remote.pulp_href)
sync_response = self.repo_api.sync(repo.pulp_href, repository_sync_data)
task_result = monitor_task(sync_response.task)

if isinstance(task_result, dict):
self.assertEqual(
task_result["error"]["description"],
"A no valid remote URL was provided."
)
else:
self.fail("A task was completed without a failure.")

def test_sync_modular(self):
"""Sync RPM modular content.

Expand Down
3 changes: 3 additions & 0 deletions pulp_rpm/tests/functional/constants.py
Expand Up @@ -87,6 +87,9 @@

RPM_INVALID_FIXTURE_URL = urljoin(PULP_FIXTURES_BASE_URL, 'rpm-missing-filelists/')

RPM_MIRROR_LIST_GOOD_FIXTURE_URL = urljoin(PULP_FIXTURES_BASE_URL, 'rpm-mirrorlist-good')
RPM_MIRROR_LIST_BAD_FIXTURE_URL = urljoin(PULP_FIXTURES_BASE_URL, 'rpm-mirrorlist-bad')

RPM_PACKAGE_COUNT = 35
"""The number of packages available at
:data:`RPM_SIGNED_FIXTURE_URL` and :data:`RPM_UNSIGNED_FIXTURE_URL`
Expand Down