Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make distribution trees unique per repo version. #1774

Closed
wants to merge 2 commits into from

Conversation

goosemania
Copy link
Member

@pep8speaks
Copy link

pep8speaks commented Jul 16, 2020

Hello @goosemania! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-20 16:58:48 UTC

Comment on lines 8 to 11
- if a distribution tree belongs to one repo, just set repository_id
- if a distribution tree belongs to multiple repos:
- set repository_id for one repo
- create a copy of distribution tree, assign it to the next repo, copy all RPMs to the next repo as well
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that currently it's possible to have one distribution tree belonging to multiple repos?
Do you agree with what needs to be done in the migration?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it cannot belong to multiple repos (not sure),
and I agree with the migration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's allowed to copy a distribution tree, it will belong to a different repo but it will still refer the original repository in the variant for the main repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on handling the case where a repo has multiple distribution trees (https://pulp.plan.io/issues/7115)? I'm wondering about how this will be migrated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daviddavis , this is a hard one. I'm open for suggestions. The question is which one to choose.
I also suspect that it's likely only in theory 2 distribution trees can be there, I doubt there is a real user with such issue there. (I bet tomorrow one will appear after such statement :D).
Maybe it's the case we should ask user to choose and they need to remove one? I mean, I'm not sure even how publish will handle such case and it will be one treeinfo anyway. I guess worth trying to see user experience here.

I can adjust resolve_distribution_trees, so they remove all old ones when the new one is added. But if no new distribution tree is added, I can't choose which one to remove.
Thoughts?

There are 3 option for the migration that I can see:

  1. 2 distribution trees can point to the same main repo and the main repo will have RPMs from both :/ Downside: might be a mess and/or dependency conflicts with combining two sets of RPMs.
  2. Keep only a distribution tree which points to the main repo a distribution tree is a part of, and not to any other main repo (aka remove distribution trees which were copied and not synced into a repo). Downside: it doesn't cover the case when both distribution trees were copied from other repos.
  3. Fail migration if there are 2 distribution trees in one repo, and ask user to remove one of the distribution trees from a specific repo. Downside: might be bad upgrade experience.
    Any other options?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards 3 as I think it's unlikely there's actual Pulp 3 instance out there that has a repo with two distribution trees.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Any other opinions since it affects users? @dkliban , @fao89 , @dralley

@dkliban, if migraiton fails, can user just easily re-run the installer after a repo is fixed?

If a version user is upgrading to has more than 1 migration (pulp_rpm 3.5 case), and some finished fine but then this one fails, does anyone now if all migrations are rolled back in such case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daviddavis , I got this error when trying to have two distribution trees (using sync) in one repo centos8 and centos7.1:
ValueError: Cannot create repository version. Path is duplicated: images/pxeboot/vmlinuz.

With copy, I got a similar one, just for different image.
ValueError: Cannot create repository version. Path is duplicated: images/install.img.

I'm now even more in favour of option 3 since it's very likely that path overlap prevented two distributions to be in one repo.

name='repository',
field=models.ForeignKey(on_delete=django.db.models.deletion.PROTECT, related_name='+', to='rpm.RpmRepository'),
),
migrations.RunPython(make_distribution_tree_unique_per_repo),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order wise, should it be here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so

@@ -106,8 +102,63 @@ class Meta:
"build_timestamp",
)

def get_copy(self, repository):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other ideas or better ways to create a copy are welcome

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me as well.

if previous_version:
previous_disttree_pk = previous_version.content.get(pulp_type=disttree_pulp_type))
new_version.remove_content(Content.objects.filter(pk=previous_disttree_pk))
DistributionTree.objects.filter(pk=previous_disttree_pk).delete()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a correct way of removing a content unit completely?

Copy link
Member

@fao89 fao89 Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we need to delete it, only removing the content is not enough? Putting it better, if we find out the new DistributionTree is bad, is it possible to go back to the previous one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I keep wanting to get rid off kickstarts :)
The only time we need to delete them completely is when a repo is getting removed. Thanks

@pulpbot
Copy link
Member

pulpbot commented Jul 16, 2020

Attached issue: https://pulp.plan.io/issues/7150

Attached issue: https://pulp.plan.io/issues/7115

@goosemania
Copy link
Member Author

Distribution trees can't be shared and there will be a separate content unit per each repo. Auxiliary models like Addons, variants, etc, are also copied because they reference a distribution tree they belong to. However subrepos are not copied and can belong to different distribution trees. Is it ok? (I hope yes).

Repo1 -- DT1 --- Addon1 
                        \
                          subrepo
                        /
Repo2 -- DT2 --- Addon2

@daviddavis
Copy link
Contributor

However subrepos are not copied and can belong to different distribution trees. Is it ok?

Suppose you have a case where you copy a kickstart from repo A to repo B. Then repo A gets resynced and content changes (eg a package gets added somewhere) but the kickstart stays the same. If the content change is in the main repo, then repo B doesn't get the content change. If the content change is in a subrepo, then repo B does get the content change as the subrepo is shared. That seems weird to me?

@goosemania
Copy link
Member Author

Just for the record that the question above was discussed.

[19:00:10] <ttereshc> daviddavis, have you looked at the code? (I'm going to do it right now) I'm wondering if subrepos are ever updated or if the new ones are created if somehting changes
[19:01:16] <ttereshc> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/tasks/synchronizing.py#L174
[19:02:48] <ttereshc> it seems like on the sync, if treeinfo hash differs a new subrepo is created
[19:03:24] <ttereshc> what happens to the old one though?
[19:15:15] <daviddavis> ha good question
[19:16:41] <ttereshc> now I'm not worried about what you asked, it seems fine, but, what happens to old repos, it seems like another pain.

So far: Sync/publish on the fresh install works.

closes #7150
closes #7115

https://pulp.plan.io/issues/7150
https://pulp.plan.io/issues/7115
@goosemania
Copy link
Member Author

Closing in favour of #1782

@goosemania goosemania closed this Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants