New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't create a new repository version if no content was added or removed. #380
Conversation
a599730
to
eba2317
Compare
pulpcore/tests/functional/api/using_plugin/test_crd_publications.py
Outdated
Show resolved
Hide resolved
pulpcore/tests/functional/api/using_plugin/test_repo_versions.py
Outdated
Show resolved
Hide resolved
pulpcore/tests/functional/api/using_plugin/test_distributions.py
Outdated
Show resolved
Hide resolved
aa62347
to
8e7f5ec
Compare
8e7f5ec
to
5a4b80d
Compare
5a4b80d
to
7ae6a5a
Compare
| @@ -628,7 +628,8 @@ def __exit__(self, exc_type, exc_value, traceback): | |||
| """ | |||
| Save the RepositoryVersion if no errors are raised, delete it if not | |||
| """ | |||
| if exc_value: | |||
| no_change = not self.added() and not self.removed() | |||
| if exc_value or no_change: | |||
| self.delete() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to the task's created_resource entry if there is no change? It is set when creating the new version, but I don't see it being unset anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the delete cascades to the CreatedResource which inherits from GenericRelationModel.
https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/task.py#L411
https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/generic.py#L22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, thank you. Relational databases are magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that because of how GFKs work the cascade doesn't actually happen in the database, it's enforced by Django. So it might be worthwhile to double check all of this.
pulpcore/app/models/repository.py
Outdated
| @@ -628,7 +628,8 @@ def __exit__(self, exc_type, exc_value, traceback): | |||
| """ | |||
| Save the RepositoryVersion if no errors are raised, delete it if not | |||
| """ | |||
| if exc_value: | |||
| no_change = not self.added() and not self.removed() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is implemented right now applies to all finalizations of a repo version. I thought when wanted to implement it for sync only?
I think there is a use case for creating repo versions with empty changes for /modify: If you want to set an old repo version to be the latest, just do a modify giving the old repo_version as the base version and no artifacts to add/remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this old version is different from the latest version, then self.added() and self.removed() will contain values that reflect those differences. If you are trying to create the exact same version as the latest, then no new version will be created. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that my mental model of added/removed was wrong when discussing validation/modification with @bmbouter. Hooks for add_content/remove_content actually are different from inspecting added/removed content during finalization.
But you are right of course, .added/.removed is the difference to the previous version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This convo made me thinking. How does the user of the API finds out to which repo version the "no change" result refers to?
For example, suppose I want to sync and publish
Prior to this change: After the sync, I can use the created repository version resource I got back in the task result for publication. As a user, I am sure that I publish the result of the sync.
With this change: I can do the same if a get a created resource back. But when I don't, this may happen in different scenarios:
a) the latest version is the same as the published one and there is no change -> no need to publish
b) the latest version is not the same as the published one and there is no change between the latest version and the previous one -> publication required
To find out whether I need to publish, I could try to query the repository and get the latest version. But I can't be sure that this is the latest version the sync has seen (another task may have sneaked in). Thus, I may publish the result of a different operation.
To rule out the "sneak in" case, I might be able to infer from the timestamps (latest version vs. task finish) that this has happened.
I discussed this with @dkliban and we came out with two options:
- In the no change case, put the latest version used for comparison into the created resource field. Although the version already existed and wasn't created)
- Accepting that there is no repo version information is better than providing information in the wrong field
This choice isn't nice (to put it mildly). I would love to hear about other solution proposals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd opt for option (2) because having the data be incorrect is a liability because the user can't distinguish between the one that just ran versus the one that ran two days ago.
It will be a bit difficult for integrators to receive a CreatedResource 80% of the time when calling sync(), and then one day they don't get one. We should document this up-front in the docs (maybe a story could be filed to do so and I could fix soon). It's a case integrators need to think about and be aware of and think about how they want to handle. I think that would be better than us handling it in this way for them, although it will take more effort from them.
Also in terms of a human user, I am hoping for a log-streaming addition to the task API in the future which would allow the sync code to log data directly to the human user. pulp_ansible has a plugin-based version of this. If that feature is available in the future I think human users would know easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to propose option 3, which we have discussed quite a bit. We'd eventually provide an option to sync that would allow users to always create a repo version even if nothing changed. I'd suggest we NOT pursue this until users request it or end up in the situation that @gmbnomis describes but it does solve @gmbnomis's problem should we ever encounter it. It's also backwards compatible which means we don't have to do anything now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that reporting something as created that isn't is wrong. (So this will be option 2)
As a long term solution, the tasks could be much more verbose (For failed tasks we already put the whole stacktrace there.). The CreatedResource model could turn into an AffectedResource that has some additional attributes.
But for the short term integrators, the proper solution imo is to query for the last repoversion, trigger the sync/modification (probably even based on that one) and in case none got created, use the result of before. Then there is no option for any other action to sneak in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daviddavis and @mdellweg yes, both proposals would solve the problem.
@mdellweg If you can't set the base_version (which is the case for sync), querying the latest repo version first does not help. Between the query and the sync, another task may sneak in. For example, another sync. Then the second sync would most probably produce no change and, consequently, one would not publish the repo version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmbnomis So what you are saying is "If you have two users, one syncs the reopsitory daily at 8 and the other one at 9, the second one will never realize, that he needs to republish." It does not even help to look at that in a more Einsteinian way than Newtonian.
And we surely do not want to resync the same thing from another base_version.
In the end, it again sounds like sync should be 'find or create a repository version that equals upstream'. Alternatively, the sync task must be chained with the publish task that runs even if the new version has not been created by that sync task.
| @@ -75,8 +83,11 @@ def test_01_create(self): | |||
| self.client.post(FILE_REMOTE_PATH, gen_file_remote()) | |||
| ) | |||
| # create 3 repository versions | |||
| for _ in range(3): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we still test whether multiples syncs using the same remote it will not created new repository versions? Has this changed was well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tests like that in pulp-file. https://github.com/pulp/pulp_file/pull/311/files
Required PR: pulp/pulpcore#380 closes #3308
Required PR: pulp/pulpcore#380 closes #3308
|
The code in this PR looks good to me. Can you update the docs though to state that new repo versions aren't created if content doesn't change? |
Required PR: pulp/pulpcore#380 closes #3308
7ae6a5a
to
6dac2fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM and fulfills my expectations. I would get at least one more approval before merging though.
845c099
to
2428c29
Compare
| @@ -625,7 +625,8 @@ def __exit__(self, exc_type, exc_value, traceback): | |||
| """ | |||
| Finalize and save the RepositoryVersion if no errors are raised, delete it if not | |||
| """ | |||
| if exc_value: | |||
| no_change = not self.added() and not self.removed() | |||
| if exc_value or no_change: | |||
| self.delete() | |||
| else: | |||
| self.repository.finalize_new_version(self) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's at the wrong place now. a finalizer may for example render a changed version into one without a change.
2428c29
to
44393c5
Compare
44393c5
to
a1c3249
Compare
Required PR: pulp/pulpcore#380 closes #3308
Required PR: pulp/pulpcore#380 closes #3308
Adapt to pulp/pulpcore#369 (Repository /modify mixin) and pulp/pulpcore#380 (Don't create a new repository version if no content was added or removed). No functional changes.
Adapt to pulp/pulpcore#369 (Repository /modify mixin) and pulp/pulpcore#380 (Don't create a new repository version if no content was added or removed). No functional changes.
closes: #3308
https://pulp.plan.io/issues/3308