New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In GuaranteedUpdate, retry on any error if we are working with cached data #77619
Conversation
return s.Interface.GuaranteedUpdate(ctx, key, ptrToType, ignoreNotFound, preconditions, tryUpdate, s.cachedObj) | ||
} | ||
|
||
func TestDeleteWithCachedObject(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails without the fix.
Added a unit test that mocks the flake we saw in #76346. |
/retest |
/release-note-none |
could this have caused bugs previously? At first glance, this seems like something that should be backported, which also indicates it needs a release note |
Release note added. I checked all calls to GaranteedUpdate. We are lucky. The only problematic behavior caused by this bug is that finalizers get ignored. |
/retest |
Thanks for checking. I found the ways the we're performing reads with GuaranteedUpdate to be generally difficult to reason about and error prone, but maybe that's a topic for another day. I've walked through the code more carefully and it does fix the issue. I'm in favor of getting it in and backported. /lgtm |
/approve |
/priority critical-urgent |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: caesarxuchao, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
…#77619-upstream-release-1.12 Automated cherry pick of #77619: In GuaranteedUpdate, retry on any error if we are working
…#77619-upstream-release-1.13 Automated cherry pick of #77619: In GuaranteedUpdate, retry on any error if we are working
…#77619-upstream-release-1.14 Automated cherry pick of #77619: In GuaranteedUpdate, retry on any error if we are working
/kind bug
/sig api-machinery
/assign @liggitt @jpbetz
Previously, GuaranteedUpdate only retry with refreshed data on "Conflict" error. This is wrong in general, and causes a specific problem we found in #76346, where an object was deleted by the apiserver even if a previous operation had added finalizer to the object. It's because the
tryUpdate
function returned errDeleteNow based on the stale object in the watch cache. The wrappingGuaranteedUpdate
didn't retry with fresh data as errDeleteNow is not a "Conflict" error.I'm not sure how to add a reliable test to reproduce flakes in #76346. I plan to add a unit test to store_test.go, with a fake watchcache that always returns stale object.I checked all calls to
GaranteedUpdate
. We are lucky. The only problematic behavior caused by this bug is that finalizers get ignored.