Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables FilesystemExporter to incrementally export rpms #3511

Merged
merged 1 commit into from Feb 2, 2023

Conversation

parthaa
Copy link
Contributor

@parthaa parthaa commented Jan 13, 2023

It accepts a start repository version and calculates what content artifacts need to get copied over.

@parthaa parthaa force-pushed the incremental-fs-export branch 9 times, most recently from 503d304 to f037333 Compare January 14, 2023 06:48
@ggainey ggainey changed the title Enables one to incrementally export rpms Enables FilesystemExporter to incrementally export rpms Jan 16, 2023
@parthaa
Copy link
Contributor Author

parthaa commented Jan 16, 2023

Also check out Katello/katello#10417 (comment) on how this PR works with katello

@ggainey
Copy link
Contributor

ggainey commented Jan 16, 2023

This PR currently supports start_repository_version only if you have specified repository_version. publication= is a major workflow for users of this functionality, but if you specify start_repository_version= and publication=, you get a 500. Rude. :)

I think we should always accept publication=. Actually, more strongly - I contend that we should support a start_publication= param as well. You can only specify one of publication/repository_version, and only one of start_publication/start_repository_version, and the combinatorics should all work.

(Also - we prob need some error-checking at the view-level, you should not be able to specify a delta that "goes back in time" - once we have start and end repo-version, we should insure that start < end...)

The other thing I'm noting is, if I start with a repo, remove some RPMs to create version-N, add them back in to create version-M, and export start_repository_version=N/repository_version=M, I get an empty export. I expect somewhere in the "find the content for a version" is making An Assumption that may not be valid.

Testing/investigation continues.

@parthaa
Copy link
Contributor Author

parthaa commented Jan 16, 2023

This PR currently supports start_repository_version only if you have specified repository_version. publication= is a major workflow for users of this functionality, but if you specify start_repository_version= and publication=, you get a 500. Rude. :)

Interesting - Katello only uses publication. Wonder why start_repository_version= and publication= is failing for you. I used that combination for all my tests :).

I think we should always accept publication=. Actually, more strongly - I contend that we should support a start_publication= param as well. You can only specify one of publication/repository_version, and only one of start_publication/start_repository_version, and the combinatorics should all work.

I wonder if we should limit the scope of this PR to only publication and start_publication and leave the version compare to a different PR? Katello really only needs this combination to work. That would potentially mean small set of changes.

@sjha4 @ianballou - Do you guys know if we create publications every time with publish a Content View. @ggainey is proposing using start-publication for incremental

The other thing I'm noting is, if I start with a repo, remove some RPMs to create version-N, add them back in to create version-M, and export start_repository_version=N/repository_version=M, I get an empty export. I expect somewhere in the "find the content for a version" is making An Assumption that may not be valid

I will look into this case

@sjha4
Copy link

sjha4 commented Jan 16, 2023

We may not create "new" publications if our contents_changed dynflow middleware detects nothing has changed in which case we reuse the previous version's publication. Every repo version however will have a publication URL which could be used here I think with the catch that the publication may not be unique between versions.

@ipanova ipanova marked this pull request as draft January 17, 2023 14:26
@parthaa
Copy link
Contributor Author

parthaa commented Jan 20, 2023

This PR currently supports start_repository_version only if you have specified repository_version. publication= is a major workflow for users of this functionality, but if you specify start_repository_version= and publication=, you get a 500. Rude. :)

  1. I created a repo with feed and synced (so I should have version 1)
  2. I uploaded a rpm to the repo
  3. Then created a publication for the latest version
  4. I created an exporter with the following data
{
  "pulp_href": "/pulp/api/v3/exporters/core/filesystem/8f7a1139-5153-4e54-9e9d-994152379d01/",
  "pulp_created": "2023-01-20T18:27:54.693496Z",
  "name": "open_Export-Library-SYNCABLE_2.0-67",
  "path": "/var/lib/pulp/exports/open/Export-Library-SYNCABLE/2.0/2023-01-20T18-27-51-00-00/custom/goals/auto",
  "method": "hardlink"
}
  1. I then did the export on it
$ curl --cert /home/vagrant/foreman-certs/client_cert.pem --key /home/vagrant/foreman-certs/client_key.pem \
   -X POST -H 'Content-Type: application/json' \
   -d '{"publication":"/pulp/api/v3/publications/rpm/rpm/62b3af46-d059-4d2b-a898-1dd7955ec531/", "start_repository_version":"/pulp/api/v3/repositories/rpm/rpm/6c98c027-53ff-440e-a5ce-ba68cc0833cc/versions/4/"}' \
    https://centos8-katello-devel-stable.example.com/pulp/api/v3/exporters/core/filesystem/8f7a1139-5153-4e54-9e9d-994152379d01/exports/

Works well - journalctl output

Jan 20 20:06:28 centos8-katello-devel-stable.example.com pulpcore-api[15405]: pulp [0ed95cc8205048eca7cf18c134bf3b94]:  - - [20/Jan/2023:20:06:28 +0000] "POST /pulp/api/v3/exporters/core/filesystem/8f7a1139-5153-4e54-9e9d-994152379d01/exports/ HTTP/1.1" 202 67 "-" "curl/7.61.1"
Jan 20 20:06:28 centos8-katello-devel-stable.example.com pulpcore-worker-4[23566]: pulp [0ed95cc8205048eca7cf18c134bf3b94]: pulpcore.tasking.pulpcore_worker:INFO: Starting task 7b7d754c-771b-4c51-a443-dd5402be20da
Jan 20 20:06:28 centos8-katello-devel-stable.example.com pulpcore-worker-4[23566]: pulp [0ed95cc8205048eca7cf18c134bf3b94]: pulpcore.app.tasks.export:INFO: Exporting: file_system_exporter=open_Export-Library-SYNCABLE_2.0-67, publication=62b3af46-d059-4d2b-a898-1dd7955ec531, start_repo_version=e5e32d67-0e8c-455a-9c65-eb9cb5242ac7, path=/var/lib/pulp/exports/open/Export-Library-SYNCABLE/2.0/2023-01-20T18-27-51-00-00/custom/goals/auto
Jan 20 20:06:28 centos8-katello-devel-stable.example.com pulpcore-worker-4[23566]: pulp [0ed95cc8205048eca7cf18c134bf3b94]: pulpcore.tasking.pulpcore_worker:INFO: Task completed 7b7d754c-771b-4c51-a443-dd5402be20da
Jan 20 20:07:10 centos8-katello-devel-stable.example.com pulpcore-api[15405]: pulp [6d519aaaad01486bbfdda21d9685c6b5]:  - - [20/Jan/2023:20:07:10 +0000] "GET /pulp/api/v3/tasks/7b7d754c-771b-4c51-a443-dd5402be20da/ HTTP/1.1" 200 744 "-" "Pulp-CLI/0.14.0"

@ianballou
Copy link

I wonder if we should limit the scope of this PR to only publication and start_publication and leave the version compare to a different PR? Katello really only needs this combination to work. That would potentially mean small set of changes.

I'm still digging into this myself, but it sounds like Katello only really needs to give the start repo version and the end publication to get what we want -- if that's the case then I'm with Partha in that perhaps the start publication could be added later. I'd be curious to hear though if Pulp folks think we might be missing something and do need a start publication.

@mdellweg
Copy link
Member

Can we phrase this feature around "content" instead of "rpms"?

@parthaa
Copy link
Contributor Author

parthaa commented Jan 23, 2023

Can we phrase this feature around "content" instead of "rpms"?

I think only Yum/File repos are supported for FS Exporter at this point. That being said where do you see this PR being specific to rpms?

@mdellweg
Copy link
Member

mdellweg commented Jan 24, 2023

Can we phrase this feature around "content" instead of "rpms"?

I think only Yum/File repos are supported for FS Exporter at this point. That being said where do you see this PR being specific to rpms?

I meant the docs and changelog. Sorry for not being specific enough.

Thank you for making the feature agnostic.

Copy link
Contributor

@ggainey ggainey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor wordsmithing, and one bugfix :)

pulpcore/app/viewsets/exporter.py Outdated Show resolved Hide resolved
CHANGES/3413.feature Outdated Show resolved Hide resolved
pulpcore/app/serializers/exporter.py Outdated Show resolved Hide resolved
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some formatting and what i meant by agnostic in the changelog...

CHANGES/3413.feature Outdated Show resolved Hide resolved
@dralley
Copy link
Contributor

dralley commented Jan 27, 2023

@parthaa Can you rebase and squash commits?

@@ -138,8 +144,11 @@ def _export_publication_to_file_system(
"content_artifact", "content_artifact__artifact"
).iterator():
# Artifact isn't guaranteed to be present
if pa.content_artifact.artifact:
if pa.content_artifact.artifact and (
start_repo_version is None or pa.content_artifact in difference_content_artifacts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

difference_content_artifacts appears to be a queryset, do we know that this isn't going to be re-evaluated every time in a loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good point I am curious about that :). How would I be able to check if its getting re evaluated ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could explicitly cast to a set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

@ggainey ggainey marked this pull request as ready for review January 27, 2023 21:12
@parthaa parthaa force-pushed the incremental-fs-export branch 2 times, most recently from d5373bc to 81873ca Compare January 27, 2023 22:07
@parthaa parthaa requested review from dralley and mdellweg and removed request for mdellweg and dralley January 31, 2023 18:06
Copy link
Contributor

@ggainey ggainey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all current comments are responded to. Tested with an RPM repo in my test env, worked like a champ. Looks good to me!

@chris1984
Copy link

chris1984 commented Jan 31, 2023

Tested it here locally with Katello and was able to export incrementally and import without any issues

Export:

[root@centos8-stream-katello-nightly ~]# hammer content-export incremental version --format=syncable --organization-id 1 --id 3
[.................................................................................................................................................................] [100%]
Generated /var/lib/pulp/exports/Default_Organization/view/2.0/2023-01-31T20-48-04-00-00/metadata.json

Import:

[root@centos8-stream-katello-nightly ~]# ll /var/lib/pulp/imports/2023-01-31T20-48-04-00-00/^C
[root@centos8-stream-katello-nightly ~]# hammer content-import version --organization-id 4 --path /var/lib/pulp/imports/2023-01-31T20-48-04-00-00/
[.................................................................................................................................................................] [100%]

@ggainey
Copy link
Contributor

ggainey commented Feb 1, 2023

@parthaa OK, please rebase this on top of current-main to address the lint-issues, and resubmit.

It accepts a start repository version and calculates what content
artifacts need to get copied over.

fixes pulp#3413

lint-fixes

Addressed  PR Comments
@mdellweg mdellweg merged commit 70b1eb0 into pulp:main Feb 2, 2023
@patchback
Copy link

patchback bot commented Feb 2, 2023

Backport to 3.21: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.21/70b1eb04a4349bedfa6781c8598b53a99f9a63a1/pr-3511

Backported as #3546

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@patchback
Copy link

patchback bot commented Feb 2, 2023

Backport to 3.22: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.22/70b1eb04a4349bedfa6781c8598b53a99f9a63a1/pr-3511

Backported as #3547

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@dralley
Copy link
Contributor

dralley commented Feb 2, 2023

@ggainey How far should this go back, is 3.18 possible?

@parthaa
Copy link
Contributor Author

parthaa commented Feb 2, 2023

@ggainey How far should this go back, is 3.18 possible?

@dralley - I'll request it if needed and if customers are desperate for it.

@parthaa parthaa deleted the incremental-fs-export branch February 2, 2023 16:54
@patchback
Copy link

patchback bot commented Mar 2, 2023

Backport to 3.18: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 70b1eb0 on top of patchback/backports/3.18/70b1eb04a4349bedfa6781c8598b53a99f9a63a1/pr-3511

Backporting merged PR #3511 into main

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/pulp/pulpcore.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.18/70b1eb04a4349bedfa6781c8598b53a99f9a63a1/pr-3511 upstream/3.18
  4. Now, cherry-pick PR Enables FilesystemExporter to incrementally export rpms #3511 contents into that branch:
    $ git cherry-pick -x 70b1eb04a4349bedfa6781c8598b53a99f9a63a1
    If it'll yell at you with something like fatal: Commit 70b1eb04a4349bedfa6781c8598b53a99f9a63a1 is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x 70b1eb04a4349bedfa6781c8598b53a99f9a63a1
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Enables FilesystemExporter to incrementally export rpms #3511 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.18/70b1eb04a4349bedfa6781c8598b53a99f9a63a1/pr-3511
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@dralley
Copy link
Contributor

dralley commented Mar 2, 2023

@parthaa It looks like there is a request for this to go into 3.18 (6.12.z), could you investigate which other patches are necessary to backport it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants