-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double check PVC if not found in syncVolume #67062
Conversation
// that corresponding PVC is not synced in controller yet. So we | ||
// double-check PVC in apiserver to make sure we will not reclaim a | ||
// PV due to API delay. | ||
obj, err = ctrl.kubeClient.CoreV1().PersistentVolumeClaims(volume.Spec.ClaimRef.Namespace).Get(volume.Spec.ClaimRef.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to get the storage-class and then check whether volumeBindingMode=WaitForFirstConsumer before get the pvc from api-server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One scenario need to concede is if there are lots of PVCs are deleted, then there will be many requests send to api server here.
/assign @msau42 |
/test pull-kubernetes-e2e-gce-100-performance |
// that corresponding PVC is not synced in controller yet. So | ||
// we double-check PVC in apiserver to make sure we will not | ||
// reclaim a PV due to API delay. | ||
obj, err = ctrl.kubeClient.CoreV1().PersistentVolumeClaims(volume.Spec.ClaimRef.Namespace).Get(volume.Spec.ClaimRef.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what order does scheduler write bound PV and PVC? If it writes PV first, this code might be executed before the scheduler even writes the PVC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading #66287, it probably does not matter. Still I'd like to understand the issue before approving this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler only writes PV.ClaimRef. It doesn't not modify PVC (in the preprovisioned case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of only checking this of some special scheduler annotation is set. Otherwise, this will trigger an extra API call for regular volumes too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although eventually when almost everything moves to late binding, then it makes no difference at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about annBoundByScheduler
- if I have an external PV binder, then I still experience the issue. IMO, it should just check annBoundByController
(the controller has seen the PVC so it must be in its cache).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the scheduler is already setting annBoundByController. I need to think about if it is ok to remove that annotation from scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, I'm not sure how useful it is to add a scheduler annotation because eventually almost everything should be going through the scheduler, and like @jsafrane pointed out, any other external component that is binding PVs could hit this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we cannot remove annBoundByController from the scheduler. The annotation determines whether or not we automatically try to rollback the binding in case of some error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Controller
in annBoundByController
means all possible PV binders, not only PV controller itself. IIUC, if we want to optimize for external PV binders, we need to add a additional annotation for all external PV binders, but that requires upgrading external PV binders.
One thing that bothered me: why is this different than the case where a user manually creates a PV with ClaimRef set to a non-existent PVC. The answer is that when the user sets ClaimRef, pvc.UID is not set. But in the case of the scheduler prebinding, pvc.UID is set. So that's why PV controller thinks that PVC previously existed and got deleted. So another possible solution may be to not set pvc.UID in the case of scheduler prebinding. However, then we run into the issue where the user might actually delete the PVC and then this half bound PV is stuck forever and unavailable to other PVCs. |
/test pull-kubernetes-e2e-gce |
// Note that only non-released and non-failed volumes will be | ||
// updated to Released state when PVC does not eixst. | ||
if volume.Status.Phase != v1.VolumeReleased && volume.Status.Phase != v1.VolumeFailed { | ||
obj, err = ctrl.kubeClient.CoreV1().PersistentVolumeClaims(volume.Spec.ClaimRef.Namespace).Get(volume.Spec.ClaimRef.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So another theory as to what could be happening here. PV controller's pvc cache is different from the informer cache. Whenever a PVC informer event comes in, the pvc gets added to a queue, and the queue is processed by a single claimWorker, which will then add it to the PVC cache. So if there are many PVC events all at once, then it's possible that the "create pvc event" is stuck in the queue, even though the informer actually saw the update.
So one more thing we could add here is to check the informer cache directly before checking the API server. That could avoid the extra API call in this scenario, although will not avoid the api call on a normal PVC delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I hadn't thought about it, not familiar with PV controller.
@msau42 @jsafrane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -123,6 +123,8 @@ const annBindCompleted = "pv.kubernetes.io/bind-completed" | |||
// the binding (PV->PVC or PVC->PV) was installed by the controller. The | |||
// absence of this annotation means the binding was done by the user (i.e. | |||
// pre-bound). Value of this annotation does not matter. | |||
// Exteranl PV binders must bind PV the same way as PV controller, otherwise PV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: External
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
If PV is bound by external PV binder (e.g. kube-scheduler), it's possible on heavy load that corresponding PVC is not synced to controller local cache yet.
/test pull-kubernetes-integration |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cofyc, jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Automatic merge from submit-queue (batch tested with PRs 67062, 67169, 67539, 67504, 66876). If you want to cherry-pick this change to another branch, please follow the instructions here. |
…upstream-release-1.9 Automated cherry pick of #67062: Double check PVC if not found in syncVolume.
What this PR does / why we need it:
Double check PVC if not found in syncVolume.
If PV is bound by external PV binder (e.g. kube-scheduler), it's possible on heavy load that corresponding PVC is not synced to controller local cache yet.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #66287
Special notes for your reviewer:
Release note: