Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use better names for distribution tree subrepos #2185

Closed
wants to merge 1 commit into from

Conversation

dralley
Copy link
Contributor

@dralley dralley commented Nov 30, 2021

@dralley dralley marked this pull request as draft November 30, 2021 19:25
@pulpbot
Copy link
Member

pulpbot commented Nov 30, 2021

Attached issue: https://pulp.plan.io/issues/9566

@@ -478,7 +478,7 @@ def is_subrepo(directory):
if repodata == DIST_TREE_MAIN_REPO_PATH:
treeinfo["repositories"].update({repodata: None})
continue
name = f"{repodata}-{treeinfo['hash']}"
name = f"{repodata}-{treeinfo['hash']}-{repository_pk}"
Copy link
Contributor Author

@dralley dralley Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goosemania One question, is there a reason we need to create new repos every time the treeinfo hash changes? Could we just create new versions on existing repos rather than entirely new repos? It seems like the current strategy would create a lot of junk data but there might be a good reason for it that I'm not aware of?

Copy link
Member

@goosemania goosemania Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason that I'm aware of.
I think we did not want to deduplicate sub repos because their management looked complicated.
But I do not see a reason why not to create a new repo version.
The only question is how would you generate the repo name? If there were any changes, a treeinfo hash would likely be changed.
Just name = f"{repodata}-{repository_pk}" without the hash?

Copy link
Member

@goosemania goosemania Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it, I'm not sure how to use subrepo versions. It seems that there should be some dist tree identifier in the subrepo name. I think we included it so subrepos are not reused when dist tree changes, to handle potential rollbacks better.
Imagine that you synced a distree with subrepos into repo version 1 of the main repo.
Now something changed in a sub repo, and distree is slightly different now, so a new dist tree content unit should be created and added to a repo version 2 of the main repo.
If for sub repo changes we will just create a new repo version and relate the same subrepo to the newly created distree content, whenever we roll back a main repo to repo version 1, we'll have no info or control which repo version each sub repo is supposed to be at, we'll use the latest and it will be wrong.


 ----------------                      ----------------
| repo version 1 |                    | repo version 2 |
 ----------------                      ----------------
  | dist tree 1 |                       | dist tree 2 |
   -------------                         -------------
                   \                 /
                       ----------                          
                      | sub repo |                       
                       ----------
                         | v1 |
                          ----                             
                         | v2 | 
                          ----                             

repos = list()
for repo in RpmRepository.objects.filter(user_hidden=True).only("name").iterator():
if repo.name.count("-") == 1:
repo.name = "-".join(repo.name, str(repo.pk))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also FWIW I'm pretty sure you were right the first time, we have to

go through every dist tree and check which repos its addons and variants refer to.

so I need to go back and do that part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately. Reading code now, I see that a repository_pk of a main repo is used, not the subrepo, that means we need to go through every dist tree.
It's not too late to change approach and support multiple naming schemes :D

@dralley
Copy link
Contributor Author

dralley commented Jan 20, 2022

Replacing with @goosemania 's pull request

@dralley dralley closed this Jan 20, 2022
@dralley dralley deleted the distree-uniqueness branch January 20, 2022 20:08
@dralley dralley restored the distree-uniqueness branch January 20, 2022 20:08
@goosemania
Copy link
Member

#2367

@dralley dralley deleted the distree-uniqueness branch August 10, 2023 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants