Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't create a new repository version if no content was added or removed. #380

Merged
merged 2 commits into from Nov 13, 2019

Conversation

dkliban
Copy link
Member

@dkliban dkliban commented Nov 12, 2019

@dkliban dkliban requested a review from a team November 12, 2019 13:48
CHANGES/3308.feature Outdated Show resolved Hide resolved
@dkliban dkliban force-pushed the no-change-no-repo-version branch 2 times, most recently from aa62347 to 8e7f5ec Compare November 12, 2019 15:05
dkliban added a commit to dkliban/pulp_file that referenced this pull request Nov 12, 2019
dkliban added a commit to dkliban/pulp_file that referenced this pull request Nov 12, 2019
dkliban added a commit to dkliban/pulp_file that referenced this pull request Nov 12, 2019
dkliban added a commit to dkliban/pulp_file that referenced this pull request Nov 12, 2019
@@ -628,7 +628,8 @@ def __exit__(self, exc_type, exc_value, traceback):
"""
Save the RepositoryVersion if no errors are raised, delete it if not
"""
if exc_value:
no_change = not self.added() and not self.removed()
if exc_value or no_change:
self.delete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to the task's created_resource entry if there is no change? It is set when creating the new version, but I don't see it being unset anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, thank you. Relational databases are magic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that because of how GFKs work the cascade doesn't actually happen in the database, it's enforced by Django. So it might be worthwhile to double check all of this.

@@ -628,7 +628,8 @@ def __exit__(self, exc_type, exc_value, traceback):
"""
Save the RepositoryVersion if no errors are raised, delete it if not
"""
if exc_value:
no_change = not self.added() and not self.removed()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is implemented right now applies to all finalizations of a repo version. I thought when wanted to implement it for sync only?

I think there is a use case for creating repo versions with empty changes for /modify: If you want to set an old repo version to be the latest, just do a modify giving the old repo_version as the base version and no artifacts to add/remove

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this old version is different from the latest version, then self.added() and self.removed() will contain values that reflect those differences. If you are trying to create the exact same version as the latest, then no new version will be created. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that my mental model of added/removed was wrong when discussing validation/modification with @bmbouter. Hooks for add_content/remove_content actually are different from inspecting added/removed content during finalization.

But you are right of course, .added/.removed is the difference to the previous version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This convo made me thinking. How does the user of the API finds out to which repo version the "no change" result refers to?

For example, suppose I want to sync and publish

Prior to this change: After the sync, I can use the created repository version resource I got back in the task result for publication. As a user, I am sure that I publish the result of the sync.

With this change: I can do the same if a get a created resource back. But when I don't, this may happen in different scenarios:
a) the latest version is the same as the published one and there is no change -> no need to publish
b) the latest version is not the same as the published one and there is no change between the latest version and the previous one -> publication required

To find out whether I need to publish, I could try to query the repository and get the latest version. But I can't be sure that this is the latest version the sync has seen (another task may have sneaked in). Thus, I may publish the result of a different operation.

To rule out the "sneak in" case, I might be able to infer from the timestamps (latest version vs. task finish) that this has happened.

I discussed this with @dkliban and we came out with two options:

  1. In the no change case, put the latest version used for comparison into the created resource field. Although the version already existed and wasn't created)
  2. Accepting that there is no repo version information is better than providing information in the wrong field

This choice isn't nice (to put it mildly). I would love to hear about other solution proposals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd opt for option (2) because having the data be incorrect is a liability because the user can't distinguish between the one that just ran versus the one that ran two days ago.

It will be a bit difficult for integrators to receive a CreatedResource 80% of the time when calling sync(), and then one day they don't get one. We should document this up-front in the docs (maybe a story could be filed to do so and I could fix soon). It's a case integrators need to think about and be aware of and think about how they want to handle. I think that would be better than us handling it in this way for them, although it will take more effort from them.

Also in terms of a human user, I am hoping for a log-streaming addition to the task API in the future which would allow the sync code to log data directly to the human user. pulp_ansible has a plugin-based version of this. If that feature is available in the future I think human users would know easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to propose option 3, which we have discussed quite a bit. We'd eventually provide an option to sync that would allow users to always create a repo version even if nothing changed. I'd suggest we NOT pursue this until users request it or end up in the situation that @gmbnomis describes but it does solve @gmbnomis's problem should we ever encounter it. It's also backwards compatible which means we don't have to do anything now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that reporting something as created that isn't is wrong. (So this will be option 2)
As a long term solution, the tasks could be much more verbose (For failed tasks we already put the whole stacktrace there.). The CreatedResource model could turn into an AffectedResource that has some additional attributes.
But for the short term integrators, the proper solution imo is to query for the last repoversion, trigger the sync/modification (probably even based on that one) and in case none got created, use the result of before. Then there is no option for any other action to sneak in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daviddavis and @mdellweg yes, both proposals would solve the problem.

@mdellweg If you can't set the base_version (which is the case for sync), querying the latest repo version first does not help. Between the query and the sync, another task may sneak in. For example, another sync. Then the second sync would most probably produce no change and, consequently, one would not publish the repo version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmbnomis So what you are saying is "If you have two users, one syncs the reopsitory daily at 8 and the other one at 9, the second one will never realize, that he needs to republish." It does not even help to look at that in a more Einsteinian way than Newtonian.
And we surely do not want to resync the same thing from another base_version.
In the end, it again sounds like sync should be 'find or create a repository version that equals upstream'. Alternatively, the sync task must be chained with the publish task that runs even if the new version has not been created by that sync task.

dkliban added a commit to dkliban/pulp_rpm that referenced this pull request Nov 13, 2019
dkliban added a commit to dkliban/pulp_rpm that referenced this pull request Nov 13, 2019
dkliban added a commit to dkliban/pulp_rpm that referenced this pull request Nov 13, 2019
@@ -75,8 +83,11 @@ def test_01_create(self):
self.client.post(FILE_REMOTE_PATH, gen_file_remote())
)
# create 3 repository versions
for _ in range(3):
Copy link

@nixocio nixocio Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we still test whether multiples syncs using the same remote it will not created new repository versions? Has this changed was well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tests like that in pulp-file. https://github.com/pulp/pulp_file/pull/311/files

ipanova added a commit to ipanova/pulp_container that referenced this pull request Nov 13, 2019
ipanova added a commit to ipanova/pulp_container that referenced this pull request Nov 13, 2019
@daviddavis
Copy link
Contributor

The code in this PR looks good to me. Can you update the docs though to state that new repo versions aren't created if content doesn't change?

ipanova added a commit to ipanova/pulp_container that referenced this pull request Nov 13, 2019
Copy link
Contributor

@daviddavis daviddavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM and fulfills my expectations. I would get at least one more approval before merging though.

@dkliban dkliban force-pushed the no-change-no-repo-version branch 2 times, most recently from 845c099 to 2428c29 Compare November 13, 2019 18:53
@@ -625,7 +625,8 @@ def __exit__(self, exc_type, exc_value, traceback):
"""
Finalize and save the RepositoryVersion if no errors are raised, delete it if not
"""
if exc_value:
no_change = not self.added() and not self.removed()
if exc_value or no_change:
self.delete()
else:
self.repository.finalize_new_version(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's at the wrong place now. a finalizer may for example render a changed version into one without a change.

@dkliban dkliban merged commit 1380b09 into pulp:master Nov 13, 2019
@dkliban dkliban deleted the no-change-no-repo-version branch November 13, 2019 19:50
ipanova added a commit to ipanova/pulp_container that referenced this pull request Nov 14, 2019
ipanova added a commit to ipanova/pulp_container that referenced this pull request Nov 14, 2019
gmbnomis added a commit to pulp/pulp_cookbook that referenced this pull request Nov 18, 2019
Adapt to pulp/pulpcore#369 (Repository /modify
mixin) and pulp/pulpcore#380 (Don't create a new
repository version if no content was added or removed).  No functional
changes.
gmbnomis added a commit to pulp/pulp_cookbook that referenced this pull request Nov 18, 2019
Adapt to pulp/pulpcore#369 (Repository /modify
mixin) and pulp/pulpcore#380 (Don't create a new
repository version if no content was added or removed).  No functional
changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants