Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection sync update_highest_version calculation can potentially fail on re-sync #1547

Closed
gerrod3 opened this issue Aug 7, 2023 · 0 comments · Fixed by #1548
Closed

Collection sync update_highest_version calculation can potentially fail on re-sync #1547

gerrod3 opened this issue Aug 7, 2023 · 0 comments · Fixed by #1548

Comments

@gerrod3
Copy link
Contributor

gerrod3 commented Aug 7, 2023

Version
Since this commit: 8ad72ba

Describe the bug
When syncing collections the collection version with the highest version is specially marked with the is_highest=True field. There is an uniqueness constraint on this field to only allow one collection version from a collection to be marked as the highest at a time. Due to the design of the sync pipeline there is a special scenario where the current update logic tries to save two collection versions as the highest resulting in the constraint being triggered and the sync failing. Here's how:

  1. Sync a collection and its versions. Let's say collection A and versions 1, 2 & 3.
  2. CV A-3 will be marked is_highest=True
  3. Collection A has a new version released upstream, 4.
  4. Re-sync collection A
  5. Sync task creates potential CVs A-1, A-2, A-3, & A-4 and sends them down the sync pipeline. A-4 gets sent before the others.
  6. Pipeline discovers that A-1, A-2, & A-3 already exist and replaces them with their current records, these are now in-memory objects.
  7. A-4 reaches the save stage and gets saved. update_highest_version properly sets A-4 as the new highest and updates the record of A-3 to no longer be highest.
  8. A-3 reaches the save stage, but now its in-memory representation is desynced with its DB representation. The custom save stage then tries to save the instance again (even if nothing has changed) and thus triggers the constraint since the model will try to save with is_highest=True.

To Reproduce
Sync a repository, wait for some new collection-versions to be uploaded, and then sync again. I think the error might occur only sporadically due to the async nature of the sync pipeline.

Note this issue is a regression and has been fixed before in #481.
I am going to apply the same fix, but a refactoring of the CollectionSaver stage might be needed to better account for assumptions made on when to save/update collections.

gerrod3 added a commit to gerrod3/pulp_ansible that referenced this issue Aug 7, 2023
gerrod3 added a commit to gerrod3/pulp_ansible that referenced this issue Aug 7, 2023
gerrod3 added a commit to gerrod3/pulp_ansible that referenced this issue Aug 8, 2023
mdellweg pushed a commit that referenced this issue Aug 9, 2023
patchback bot pushed a commit that referenced this issue Aug 9, 2023
fixes: #1547
(cherry picked from commit 6f7f087)
gerrod3 added a commit that referenced this issue Aug 9, 2023
fixes: #1547
(cherry picked from commit 6f7f087)

Co-authored-by: Gerrod <gerrodubben@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant