Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In GuaranteedUpdate, retry on any error if we are working with cached data #77619

Merged
merged 1 commit into from May 13, 2019

Conversation

@caesarxuchao
Copy link
Member

commented May 8, 2019

/kind bug
/sig api-machinery
/assign @liggitt @jpbetz

Previously, GuaranteedUpdate only retry with refreshed data on "Conflict" error. This is wrong in general, and causes a specific problem we found in #76346, where an object was deleted by the apiserver even if a previous operation had added finalizer to the object. It's because the tryUpdate function returned errDeleteNow based on the stale object in the watch cache. The wrapping GuaranteedUpdate didn't retry with fresh data as errDeleteNow is not a "Conflict" error.

I'm not sure how to add a reliable test to reproduce flakes in #76346. I plan to add a unit test to store_test.go, with a fake watchcache that always returns stale object.

I checked all calls to GaranteedUpdate. We are lucky. The only problematic behavior caused by this bug is that finalizers get ignored.

Fixed a bug in the apiserver storage that could cause just-added finalizers to be ignored on an immediately following delete request, leading to premature deletion.

@caesarxuchao caesarxuchao force-pushed the caesarxuchao:always-retry branch from 0762e04 to ed9f856 May 9, 2019

@k8s-ci-robot k8s-ci-robot added size/M and removed size/S labels May 9, 2019

@caesarxuchao caesarxuchao force-pushed the caesarxuchao:always-retry branch from ed9f856 to 8d43d99 May 9, 2019

return s.Interface.GuaranteedUpdate(ctx, key, ptrToType, ignoreNotFound, preconditions, tryUpdate, s.cachedObj)
}

func TestDeleteWithCachedObject(t *testing.T) {

This comment has been minimized.

Copy link
@caesarxuchao

caesarxuchao May 9, 2019

Author Member

This test fails without the fix.

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

Added a unit test that mocks the flake we saw in #76346.

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

/retest

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

/release-note-none

@liggitt

This comment has been minimized.

Copy link
Member

commented May 9, 2019

release-note-none

could this have caused bugs previously? At first glance, this seems like something that should be backported, which also indicates it needs a release note

@caesarxuchao caesarxuchao force-pushed the caesarxuchao:always-retry branch from 8d43d99 to 5e53522 May 9, 2019

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

Release note added.

I checked all calls to GaranteedUpdate. We are lucky. The only problematic behavior caused by this bug is that finalizers get ignored.

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

/retest

@jpbetz

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

I checked all calls to GaranteedUpdate. We are lucky. The only problematic behavior caused by this bug is that finalizers get ignored.

Thanks for checking. I found the ways the we're performing reads with GuaranteedUpdate to be generally difficult to reason about and error prone, but maybe that's a topic for another day.

I've walked through the code more carefully and it does fix the issue. I'm in favor of getting it in and backported.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 13, 2019

@liggitt

This comment has been minimized.

Copy link
Member

commented May 13, 2019

/approve

@liggitt

This comment has been minimized.

Copy link
Member

commented May 13, 2019

/priority critical-urgent

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@caesarxuchao

This comment has been minimized.

Copy link
Member Author

commented May 13, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit ed4c508 into kubernetes:master May 13, 2019

18 of 20 checks passed

pull-kubernetes-integration Job triggered.
Details
pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
cla/linuxfoundation caesarxuchao authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
k8s-ci-robot added a commit that referenced this pull request May 18, 2019
Merge pull request #77880 from caesarxuchao/automated-cherry-pick-of-…
…#77619-upstream-release-1.12

Automated cherry pick of #77619: In GuaranteedUpdate, retry on any error if we are working
k8s-ci-robot added a commit that referenced this pull request May 21, 2019
Merge pull request #77879 from caesarxuchao/automated-cherry-pick-of-…
…#77619-upstream-release-1.13

Automated cherry pick of #77619: In GuaranteedUpdate, retry on any error if we are working
k8s-ci-robot added a commit that referenced this pull request May 21, 2019
Merge pull request #77875 from caesarxuchao/automated-cherry-pick-of-…
…#77619-upstream-release-1.14

Automated cherry pick of #77619: In GuaranteedUpdate, retry on any error if we are working
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.