Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double check PVC if not found in syncVolume #67062

Merged
merged 1 commit into from
Aug 17, 2018

Conversation

cofyc
Copy link
Member

@cofyc cofyc commented Aug 7, 2018

What this PR does / why we need it:

Double check PVC if not found in syncVolume.

If PV is bound by external PV binder (e.g. kube-scheduler), it's possible on heavy load that corresponding PVC is not synced to controller local cache yet.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #66287

Special notes for your reviewer:

Release note:

PVC may not be synced to controller local cache in time if PV is bound by external PV binder (e.g. kube-scheduler), double check if PVC is not found to prevent reclaiming PV wrongly.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 7, 2018
// that corresponding PVC is not synced in controller yet. So we
// double-check PVC in apiserver to make sure we will not reclaim a
// PV due to API delay.
obj, err = ctrl.kubeClient.CoreV1().PersistentVolumeClaims(volume.Spec.ClaimRef.Namespace).Get(volume.Spec.ClaimRef.Name, metav1.GetOptions{})
Copy link
Contributor

@wenlxie wenlxie Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to get the storage-class and then check whether volumeBindingMode=WaitForFirstConsumer before get the pvc from api-server?

Copy link
Member Author

@cofyc cofyc Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ok to check PVC in apiserver only if storage class volumeBindingMode is WaitForFirstConsumer normally, but the cause of problem is cache out of sync, checking class (in cache) here add some complexity.
@msau42 @jsafrane
What do you think?

Copy link
Contributor

@wenlxie wenlxie Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One scenario need to concede is if there are lots of PVCs are deleted, then there will be many requests send to api server here.

@cofyc
Copy link
Member Author

cofyc commented Aug 7, 2018

/assign @msau42
@kubernetes/sig-storage-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 7, 2018
@cofyc
Copy link
Member Author

cofyc commented Aug 8, 2018

/test pull-kubernetes-e2e-gce-100-performance

// that corresponding PVC is not synced in controller yet. So
// we double-check PVC in apiserver to make sure we will not
// reclaim a PV due to API delay.
obj, err = ctrl.kubeClient.CoreV1().PersistentVolumeClaims(volume.Spec.ClaimRef.Namespace).Get(volume.Spec.ClaimRef.Name, metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what order does scheduler write bound PV and PVC? If it writes PV first, this code might be executed before the scheduler even writes the PVC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading #66287, it probably does not matter. Still I'd like to understand the issue before approving this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduler only writes PV.ClaimRef. It doesn't not modify PVC (in the preprovisioned case)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of only checking this of some special scheduler annotation is set. Otherwise, this will trigger an extra API call for regular volumes too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although eventually when almost everything moves to late binding, then it makes no difference at that point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about annBoundByScheduler - if I have an external PV binder, then I still experience the issue. IMO, it should just check annBoundByController (the controller has seen the PVC so it must be in its cache).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the scheduler is already setting annBoundByController. I need to think about if it is ok to remove that annotation from scheduler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, I'm not sure how useful it is to add a scheduler annotation because eventually almost everything should be going through the scheduler, and like @jsafrane pointed out, any other external component that is binding PVs could hit this too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we cannot remove annBoundByController from the scheduler. The annotation determines whether or not we automatically try to rollback the binding in case of some error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Controller in annBoundByController means all possible PV binders, not only PV controller itself. IIUC, if we want to optimize for external PV binders, we need to add a additional annotation for all external PV binders, but that requires upgrading external PV binders.

@msau42
Copy link
Member

msau42 commented Aug 11, 2018

One thing that bothered me: why is this different than the case where a user manually creates a PV with ClaimRef set to a non-existent PVC. The answer is that when the user sets ClaimRef, pvc.UID is not set. But in the case of the scheduler prebinding, pvc.UID is set. So that's why PV controller thinks that PVC previously existed and got deleted.

So another possible solution may be to not set pvc.UID in the case of scheduler prebinding. However, then we run into the issue where the user might actually delete the PVC and then this half bound PV is stuck forever and unavailable to other PVCs.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 11, 2018
@cofyc
Copy link
Member Author

cofyc commented Aug 11, 2018

/test pull-kubernetes-e2e-gce

// Note that only non-released and non-failed volumes will be
// updated to Released state when PVC does not eixst.
if volume.Status.Phase != v1.VolumeReleased && volume.Status.Phase != v1.VolumeFailed {
obj, err = ctrl.kubeClient.CoreV1().PersistentVolumeClaims(volume.Spec.ClaimRef.Namespace).Get(volume.Spec.ClaimRef.Name, metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So another theory as to what could be happening here. PV controller's pvc cache is different from the informer cache. Whenever a PVC informer event comes in, the pvc gets added to a queue, and the queue is processed by a single claimWorker, which will then add it to the PVC cache. So if there are many PVC events all at once, then it's possible that the "create pvc event" is stuck in the queue, even though the informer actually saw the update.

So one more thing we could add here is to check the informer cache directly before checking the API server. That could avoid the extra API call in this scenario, although will not avoid the api call on a normal PVC delete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I hadn't thought about it, not familiar with PV controller.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 15, 2018
@cofyc
Copy link
Member Author

cofyc commented Aug 15, 2018

@msau42 @jsafrane
PR has been updated (and squashed too), see https://github.com/kubernetes/kubernetes/pull/67062/files.

@cofyc cofyc changed the title Double check PVC in apiserver to make sure we will not reclaim a PV due to API delay. Double check PVC if not found in syncVolume Aug 15, 2018
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 15, 2018
Copy link
Member

@msau42 msau42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -123,6 +123,8 @@ const annBindCompleted = "pv.kubernetes.io/bind-completed"
// the binding (PV->PVC or PVC->PV) was installed by the controller. The
// absence of this annotation means the binding was done by the user (i.e.
// pre-bound). Value of this annotation does not matter.
// Exteranl PV binders must bind PV the same way as PV controller, otherwise PV
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: External

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

If PV is bound by external PV binder (e.g. kube-scheduler), it's
possible on heavy load that corresponding PVC is not synced to
controller local cache yet.
@cofyc
Copy link
Member Author

cofyc commented Aug 16, 2018

/test pull-kubernetes-integration

@jsafrane
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cofyc, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2018
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 67062, 67169, 67539, 67504, 66876). If you want to cherry-pick this change to another branch, please follow the instructions here.

@cofyc
Copy link
Member Author

cofyc commented Aug 18, 2018

I've created cherry pick PRs for 1.11/1.10 and 1.9:

release-1.11: #67557
release-1.10: #67558
release-1.9: #67559

k8s-github-robot pushed a commit that referenced this pull request Aug 22, 2018
…upstream-release-1.10

Automatic merge from submit-queue.

Automated cherry pick of #67062: Double check PVC if not found in syncVolume.

Cherry pick of #67062 on release-1.10.

#67062: Double check PVC if not found in syncVolume.
k8s-github-robot pushed a commit that referenced this pull request Aug 28, 2018
…upstream-release-1.11

Automatic merge from submit-queue.

Automated cherry pick of #67062: Double check PVC if not found in syncVolume.

Cherry pick of #67062 on release-1.11.

#67062: Double check PVC if not found in syncVolume.
k8s-ci-robot added a commit that referenced this pull request Sep 20, 2018
…upstream-release-1.9

Automated cherry pick of #67062: Double check PVC if not found in syncVolume.
@cofyc cofyc deleted the fix66287 branch May 4, 2019 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler and pv_controller info not sync caused volume reclaimed
6 participants