Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with metadata mirroring #502

Merged
merged 1 commit into from
Jun 2, 2021
Merged

Conversation

dralley
Copy link
Collaborator

@dralley dralley commented May 1, 2021

[noissue]

@dralley dralley force-pushed the metadata_mirroring branch 2 times, most recently from b3149be to 7a85a26 Compare May 1, 2021 00:09
@pulpbot
Copy link
Member

pulpbot commented May 1, 2021

WARNING!!! This PR is not attached to an issue. In most cases this is not advisable. Please see our PR docs for more information about how to attach this PR to an issue.

@dralley
Copy link
Collaborator Author

dralley commented May 3, 2021

I guess the question is, do we want to merge this as-is (with some tests), and then fix it later, or wait until we can add support to DeclarativeVersion, which might take a few cycles especially if the changes are invasive.

I kind of think we need to do it now, especially since the RPM plugin has no choice.

@@ -42,7 +47,24 @@ def synchronize(remote_pk, repository_pk, mirror):

first_stage = FileFirstStage(remote)
dv = DeclarativeVersion(first_stage, repository, mirror=mirror)
return dv.create()
rv = dv.create()
if mirror:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the mirror variable meant to be a different thing? i.e. distiguishing additive sync from the other one?

Copy link
Collaborator Author

@dralley dralley May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mirror signals that we want an exact copy of the content in the remote repository - and therefore implies that any content that isn't in the remote repository should be removed.

This fits into that same conceptual model. The user gets the same sync behavior they wanted, and also an automatic publication with the exact metadata that was downloaded - an exact copy of the remote repository. They're free to ignore it though, and it wouldn't prevent any future publishes if that is how their workflow is set up.

This is not especially useful for the file plugin, since publishing is so fast now, and since the metadata is trivial to reproduce exactly. For the RPM and perhaps Debian plugins though, it is impossible for Pulp to reproduce the metadata exactly, due to timestamps and signatures.

@dralley dralley changed the title [PoC] Sync with metadata mirroring Sync with metadata mirroring May 3, 2021
@dralley dralley marked this pull request as ready for review May 3, 2021 19:20
# accomodate this use case
global metadata_files
with FilePublication.create(rv, pass_through=True) as publication:
(mdfile_path, relative_path) = metadata_files.pop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks strange to me: metadata_files is a list, and we only pop a single entry from it. Why don't we just put that tuple in the variable? Or if we want to make it more portable we should loop over that list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just trying to keep it consistent with the RPM plugin version I'm working on - there's no reason we couldn't do that. You're right about looping over the list though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since then I've made changes that mess up the "consistency", anyway.

@ipanova
Copy link
Member

ipanova commented May 5, 2021

I guess the question is, do we want to merge this as-is (with some tests), and then fix it later, or wait until we can add support to DeclarativeVersion, which might take a few cycles especially if the changes are invasive.

I kind of think we need to do it now, especially since the RPM plugin has no choice.

The main driver of this feature is RPM plugin which also has tight deadlines with regards to the delivery dates.
Based on this ticket there is not seem to be an easy solution to adapt these changes an incorporate into DV https://pulp.plan.io/issues/8687.

My suggestions would be to take the approach we have as of today and work towards a better one later.

@dralley
Copy link
Collaborator Author

dralley commented May 27, 2021

The RPM version of this PR has now merged


def test_01_sync(self):
"""Assert that syncing the repository w/ mirror=True creates a publication."""
self.assertEqual(self.publications_api.list().count, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption prevents to run this test in a box you are doing other experiments.
We should instead look if the number increased by one.

Copy link
Collaborator Author

@dralley dralley Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not making that assumption in a great many places? Any test that cleans up orphan content afterwards for example.

I don't think Pytest does any parallelization (edit: by default)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

publications are not cleaned by orphan cleanup. So that doesn't apply here.
This would really require a pclean.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the list of created resources isn't separated by type (like e.g. the repository version summary), so trying to verify this from the created resources is kind of hacky (but doable if it's strongly desired).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this looks better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, i should have explained better: I was thinking about assigning the number of publications to a variable without asserting before and assert on length==N+1 after the autopublish.
But the way it is now looks even better.
BTW, I have seen codepaths to rely on the order of created_resources. As in version = c_r[0]; publication = c_r[1].

@dralley dralley force-pushed the metadata_mirroring branch 3 times, most recently from ad13158 to d592450 Compare June 1, 2021 20:59
When synced with "mirror=True", we take the manifest we downloaded and
immediately create a publication with it, rather than regenerating the
whole manifest.

closes: #8851
https://pulp.plan.io/issues/8851
@dralley
Copy link
Collaborator Author

dralley commented Jun 2, 2021

I've created an issue for this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants