New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add retry on the caller of v2DeleteManifest instead within v2DeleteManifest #18662
Conversation
Codecov Report
@@ Coverage Diff @@
## main #18662 +/- ##
===========================================
+ Coverage 44.63% 70.54% +25.91%
===========================================
Files 235 749 +514
Lines 13082 93936 +80854
Branches 2669 0 -2669
===========================================
+ Hits 5839 66267 +60428
- Misses 6949 24078 +17129
- Partials 294 3591 +3297
Flags with carried forward coverage won't be shown. Click here to find out more.
|
…eteManifest Signed-off-by: Shengwen Yu <yshengwen@vmware.com>
9af39ff
to
de5e091
Compare
@zyyw I'm going to build this on top of v2.7.1 (this is the version we are currently running) and will take it for a spin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@zyyw @wy65701436 I running local build with this change in our sandbox, AFAICT this works as expected, I'm doing additional tests in the next ~12 hours so will provide more feedback, but looking at the implementation I think this is good to go! one more question: once this is merged, I'm assuming this fix can be cherry-picked into 2.7 & 2.8 releases? we're still not ready to upgrade to 2.8 since we have work around switching charts from chartmuseum to OCI and would be great to have this fix in 2.7 |
@dkulchinsky , thank you for the feedback. Really appreciate it! We think this is essentially an enhancement, not a bug fix. I'm afraid we will not back port this change to 2.7 & 2.8 releases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Hey @wy65701436, that's unfortunate 😕 I'm curious why you consider this an enhancement only and not a bug fix? I think I've shown in my analysis in the original issue that the current behavior is improperly handling 404 when it is returned inside the retry loop which I'm sure you'll agree is not correct behavior and is fixed by this change. The fix is small and adresses a real issue, I think it would be beneficial (not just for my sake) to include it as a fix and backport it to the current stable versions (2.7 & 2.8). |
@dkulchinsky Since the root cause of this problem could be either a network issue or a storage problem, in the harbor side, we simply added a retry mechanism to temporarily conceal the problem. It is important to note that this is not a bug in Harbor, but rather a failure tolerance in the garbage collector (GC). |
Thanks @wy65701436, I agree that the retry mechanism was added a while back to allow GC to be more resilient with temporal issues in the upstream backend storage or network and it has improved GC considerably. My main point here is there was a gap in the retry implementation where a specific edge case was not correctly handled and is now addressed by this PR, I believe it would be highly beneficial to backport this small and low risk change to 2.7 & 2.8, and happy to work on the PRs if that helps. |
…eteManifest (goharbor#18662) Signed-off-by: Shengwen Yu <yshengwen@vmware.com> Co-authored-by: Wang Yan <wangyan@vmware.com> Signed-off-by: Wilfred Almeida <60785452+WilfredAlmeida@users.noreply.github.com>
…eteManifest (goharbor#18662) Signed-off-by: Shengwen Yu <yshengwen@vmware.com> Co-authored-by: Wang Yan <wangyan@vmware.com>
Thank you for contributing to Harbor!
Comprehensive Summary of your change
Issue being fixed
Fixes #(issue)
Please indicate you've done the following: