Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing custom/unknown repository metadata #1439

Merged
merged 1 commit into from Sep 16, 2019
Merged

Conversation

fao89
Copy link
Member

@fao89 fao89 commented Sep 11, 2019

closes #5431
https://pulp.plan.io/issues/5431
Required PR: #1438

@goosemania
Copy link
Member

goosemania commented Sep 12, 2019

One minor change is missing here.
In order to have only one metadata of a certain type in a repo version, criteria for RemoveDuplicates stage should be configured accordingly.
See example for packages here and then it should be passed to the declarative version.

pulp_rpm/app/serializers.py Outdated Show resolved Hide resolved
url=urljoin(remote_url, record.location_href),
relative_path=record.location_href,
remote=self.remote,
deferred_download=self.deferred_download
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically we had all the repo metadata always downloaded.
I'm not sure if it's a good or bad idea to download it on_demand only.

One example (in favor of always downloading repository metadata):

  • sync a repo with on_demand policy
  • copy custom metadata to another repo
  • in the meantime metadata changes upstream, previous metadata is no longer available to be downloaded.

To be fair, it can happen to any content, however metadata is a bit different, it's always relevant to a specific set of content in a repo, so when you copy it, you do want to preserve the "old" metadata because it will be relevant to the content you have in your copy.

I think my suggestion would be to always set deferred_download to False.

Does it make sense?
Any thoughts? @fabricio-aguiar @daviddavis

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to me, but I have no idea how often metadata changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daviddavis , @dkliban , @ipanova , what do you think about ALWAYS downloading custom repository metadata?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to always download custom repo metadata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's download it all the time

@@ -0,0 +1 @@
Syncing custom/unknown repository metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also make it a feature and not just misc.
Users don't care much about models and they might not fully understand what publish is, however they know very well the term "sync" and it's a feature that they can perform now sync (and publish is kinda implied :) for them)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since publish is implied for them, I think it is better to make it a feature on the publish PR, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, works for me

@fao89 fao89 requested a review from a team September 12, 2019 13:41
fao89 added a commit to fao89/pulp_rpm that referenced this pull request Sep 12, 2019
fao89 added a commit to fao89/pulp_rpm that referenced this pull request Sep 12, 2019

package_dupe_criteria = [
{'model': Package, 'field_names': ['name', 'epoch', 'version', 'release', 'arch']},
{'model': RepoMetadataFile, 'field_names': ['data_type']},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

package_dupe_criteria = {'model': Package,
'field_names': ['name', 'epoch', 'version', 'release', 'arch']}

package_dupe_criteria = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabricio-aguiar , since it no longer contains criteria for packages only, I suggest to rename it to something more general, like dupe_criteria.

Copy link
Member

@goosemania goosemania left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@goosemania goosemania merged commit 42d0388 into pulp:master Sep 16, 2019
fao89 added a commit to fao89/pulp_rpm that referenced this pull request Sep 16, 2019
fao89 added a commit to fao89/pulp_rpm that referenced this pull request Sep 16, 2019
fao89 added a commit to fao89/pulp_rpm that referenced this pull request Sep 16, 2019
fao89 added a commit to fao89/pulp_rpm that referenced this pull request Sep 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants