Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #3941 #3942

Closed
wants to merge 1 commit into from
Closed

Fixes #3941 #3942

wants to merge 1 commit into from

Conversation

jpasqualetto
Copy link
Contributor

Ensure all artifacts which relative_path includes repodata/ are included on fs_exports.

Ensure all artifacts which relative_path includes repodata/ are
included on fs_exports.
Copy link
Contributor

@dralley dralley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repodata/ is an implementation detail of the RPM plugin, it doesn't belong in the core export code. I don't think we can merge this as-is.

Instead I would look into why this query doesn't pick it up already? I would expect it to.

@dralley
Copy link
Contributor

dralley commented Jun 20, 2023

I see your explanation but it doesn't make complete sense to me.

Describe the bug
Using fs_exporter to export incrementally a repository that has a productid certificate will not copy the productid certificate.

This happens because we calculate the differences between repository version being export and start_version. As the productid is always the same, then this artifact is not listed as new on the list of differences and is the not included on the export.

We should ensure all artifacts that belong to metadata are exported.

To Reproduce

Have a repository that contains a productid certificate
Generate a complete export of such repository
Export an incremental version of this repository using fs_exporter
Check that the productid certificate of the repository is missing from the export

Expected behavior
All metadata files to be exported, regardless if the export is complete or incremental

The point of incremental export is that only the things which have changed are exported. So if it was in a previous export, I wouldn't expect it to be in a subsequent incremental one. That's the whole point of the incrementalism. The incremental exports are meant to be copied on top of some existing export rather than used on their own. If you do that, then the productid ought to be present in the repo that you end up with.

Are you sure the user in this case isn't trying to use an incremental export as though it was a full export?

@jpasqualetto
Copy link
Contributor Author

I see your explanation but it doesn't make complete sense to me.

Describe the bug
Using fs_exporter to export incrementally a repository that has a productid certificate will not copy the productid certificate.
This happens because we calculate the differences between repository version being export and start_version. As the productid is always the same, then this artifact is not listed as new on the list of differences and is the not included on the export.
We should ensure all artifacts that belong to metadata are exported.
To Reproduce

Have a repository that contains a productid certificate
Generate a complete export of such repository
Export an incremental version of this repository using fs_exporter
Check that the productid certificate of the repository is missing from the export

Expected behavior
All metadata files to be exported, regardless if the export is complete or incremental

The point of incremental export is that only the things which have changed are exported. So if it was in a previous export, I wouldn't expect it to be in a subsequent incremental one. That's the whole point of the incrementalism. The incremental exports are meant to be copied on top of some existing export rather than used on their own. If you do that, then the productid ought to be present in the repo that you end up with.

Are you sure the user in this case isn't trying to use an incremental export as though it was a full export?

The way katello is using it is such that the user importing the repository will sync from the incremental directory (without copying it over anything), where it is expected to find the metadata of the repository and any new artifacts.

I completely agree with you that if we copy it over a previous complete export it works just fine.

Maybe @parthaa should join the conversation.

@dralley
Copy link
Contributor

dralley commented Jun 20, 2023

I would consider this either a Katello or a documentation bug then. This is the way this feature was described when it was requested / added:

#3413

Idea is

  • User has already exported Rhel - 7 (using the fs exporter) - full export
  • User consumes content from the export
  • Several days later the user now wants the new content for RHEL 7. Instead of exporting 50GB worth of content again the user's export should only consist of rpms that got added after the start version and latest metadata
  • User should be able to copy the contents of this export over the regular rhel export directory and use that to sync repodata.

From Pulp's perspective I don't think there is a bug here.

@dralley
Copy link
Contributor

dralley commented Jun 20, 2023

I'm going to close the issues, we can keep the conversation going though.

@dralley dralley closed this Jun 20, 2023
@jpasqualetto
Copy link
Contributor Author

I would consider this either a Katello or a documentation bug then. This is the way this feature was described when it was requested / added:

#3413

Idea is

  • User has already exported Rhel - 7 (using the fs exporter) - full export
  • User consumes content from the export
  • Several days later the user now wants the new content for RHEL 7. Instead of exporting 50GB worth of content again the user's export should only consist of rpms that got added after the start version and latest metadata
  • User should be able to copy the contents of this export over the regular rhel export directory and use that to sync repodata.

From Pulp's perspective I don't think there is a bug here.

Also from #3431

Make the export take a publication as it does today and a start repository version

Ideally whatever is exported

Will be delta between publication_version - start_version in terms of rpms (mostly added)
The metadata of publication_version will be copied verbatim (no comparison necessary)
So this repository export is not 'complete' since it has only the added/changed rpms but the metadata of the suggested publication.

The metadata should be copied verbatim but this is not happening.

@dralley
Copy link
Contributor

dralley commented Jun 20, 2023

I interpret that as just meaning that there aren't any diffs between files, so copying the file alone is sufficient to make it sync-able. Rather than applying a patch to the file(s), or something.

But yeah, it's a bit ambiguous.

The Katello-side PR describes it as RPMs + metadata that changed.

The files are exported in a syncable format but only contain rpms + metadata that changed since the last time you exported.

Katello/katello#10417 (comment)

The issue is that that is not necessarily directly sync-able due to this one exception.

@parthaa @iballou if it would be possible to copy the import directory overtop of the old one first that would be best, but if needed I think we could make an exception for PublishedMetadata when doing incremental exports.

@parthaa
Copy link
Contributor

parthaa commented Jun 21, 2023

The Katello-side PR describes it as RPMs + metadata that changed.

@dralley The katello comment is probably misleading.
Expectation here is when you incrementally export yum repos you are getting an export with

  • Difference in RPM / content artifacts between 2 publications
  • All Metadata of the latest publication
    The idea here is that if the user has already synced all the previous version on their import sat, they can just point to this fs exporter dump and sync. (Since they have synced everything previously as immediate they can just use this partial repo.)

More over the standard (non-FS) pulp exporter does the same thing for incremental afaik. Latest full metadata + the content artifacts that changed. @ggainey correct me if I am wrong.

If productid belongs to the metadata then we should include it. If it belongs as an artifact then incremental will not include that.

@ggainey
Copy link
Contributor

ggainey commented Jun 21, 2023

quick comment - pulpexport/import (the standard one) does not export Publication/Distribution info, because it's not expected that the upstream "owns" the downstream's expectations there. It doesn't know what Content is "metadata" - it just knows "these Artifacts have changed since the last export, and will therefore be exported".

@ggainey
Copy link
Contributor

ggainey commented Jun 21, 2023

I completely agree with you that if we copy it over a previous complete export it works just fine.

That was in fact my understanding of the design. Clearly, Something Changed between discussion and implementation...

@ggainey
Copy link
Contributor

ggainey commented Jun 21, 2023

Maybe somewhere around here https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/export.py#L146-L155 we could look for for all PublishedMetadata associated with the passed-in publication, and force-add them to relative_path_to_artifacts - I think that will address the need, without exposing pulpcore to "knowing things about plugins Core Was Not Meant to Know"

@dralley dralley reopened this Jun 22, 2023
@dralley dralley closed this Jun 22, 2023
@dralley
Copy link
Contributor

dralley commented Jun 22, 2023

Let's move discussion to the issue #3941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants