New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait until backend storage PVC is ready before starting VMIs #11373
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jean-edouard! Some comments below! Thanks
pkg/virt-controller/watch/vmi.go
Outdated
var backendStoragePending bool | ||
backendStoragePending, err = backendstorage.IsPVCPending(vmi, c.clientset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe do it inline?
var backendStoragePending bool | |
backendStoragePending, err = backendstorage.IsPVCPending(vmi, c.clientset) | |
backendStoragePending, err := backendstorage.IsPVCPending(vmi, c.clientset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if PVC goes to Lost
state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe do it inline?
'err' redeclared in this block
What happens if PVC goes to
Lost
state?
Good point, changed to an error in that case
func IsPVCPending(vmi *corev1.VirtualMachineInstance, client kubecli.KubevirtClient) (bool, error) { | ||
if !IsBackendStorageNeededForVMI(&vmi.Spec) { | ||
return false, nil | ||
} | ||
|
||
pvc, err := client.CoreV1().PersistentVolumeClaims(vmi.Namespace).Get(context.Background(), PVCForVMI(vmi), metav1.GetOptions{}) | ||
if err != nil { | ||
return false, err | ||
} | ||
|
||
if pvc.Status.Phase != v1.ClaimBound { | ||
return true, nil | ||
} | ||
|
||
return false, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used from the vmi controller which has a pvcInformer. How about using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if the PVC just got created, the informer might not have been updated yet, triggering an error.
In general, it seems like a better idea to get the latest status instead of risking failing the whole sync()
on an outdated phase.
d35ba2e
to
eb4e1e0
Compare
Moving to draft since storage tests are failing. |
Are you holding off VMI start when pvc.phase == pending? In that case, this is breaking WaitForFirstConsumer based storage |
eb4e1e0
to
98065fe
Compare
Should be fixed now, we had to error out to ensure the VMI gets re-enqueued. |
Yes, but this is limited to the backend storage PVC, which is not a volume of the VMI
I believe this is not a concern for RWX PVCs, is that correct? |
More often than not, RWX storage backends won't be topology-constrained (accessible from all nodes in the cluster),
Hmm, interesting. How does the backend storage PVC bind today with WFFC storage? Who is mounting it? /cc @mhenriks |
98065fe
to
326d9ae
Compare
It's a RWX FS PVC created like that: https://github.com/kubevirt/kubevirt/blob/main/pkg/storage/backend-storage/backend-storage.go#L99-L114 By the way, everything works fine today, but on first VM start we sometimes get a failure logged because the PVC wasn't ready (the VM then successfully starts). |
RWX PVCs may be WFFC |
Good to know, thanks! So I guess we need to check for that and start the "doppleganger" pod if needed. |
You shouldn't have to do the "doppleganger" for TPM/EFI PVCs. I would simply suggest checking if the PVC is WFFC. If it is WFFC, you can create the pod right away otherwise you can wait for it to be bound EDIT: maybe you will still get the error you are trying to avoid in the WFFC case but it should be rare and as you said it is non fatal. doppleganger is probably the only way to avoid the error entirely |
Ah, so |
Yes that what I was thinking |
You can test RWX WFFC in kubevirtci with |
7537e3e
to
1c90989
Compare
/cc @acardace |
7662eff
to
1feb44c
Compare
Signed-off-by: Jed Lejosne <jed@redhat.com>
1feb44c
to
12f0e6c
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @jean-edouard! Everything looks good from my side.
Just a very small note :)
/lgtm
Entry("using explicit Immediate mode", pointer.P(storagev1.VolumeBindingImmediate)), | ||
Entry("using nil mode", nil)) | ||
|
||
It("should create a corresponding Pod on VMI creation with a storage class in WaitForFirstCustomer mode", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/WaitForFirstCustomer/WaitForFirstConsumer/
// IsPVCReady returns true if either: | ||
// - No PVC is needed for the VMI since it doesn't use backend storage | ||
// - The backend storage PVC is bound | ||
// - The backend storage PVC is pending uses a WaitForFirstCustomer storage class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/WaitForFirstCustomer/WaitForFirstConsumer/
Signed-off-by: Jed Lejosne <jed@redhat.com>
12f0e6c
to
bba9c9e
Compare
Thank You! Great work! |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mhenriks The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Required labels detected, running phase 2 presubmits: |
@jean-edouard: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cherrypick release-1.2 |
@acardace: #11373 failed to apply on top of branch "release-1.2":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jean-edouard can you do a manual backport for this? |
What this PR does
Before this PR:
VMIs can encounter a (non-fatal) error while starting if the PVC is not ready.
After this PR:
VMIs will wait until the PVC is ready and not run into any error.
Fixes #
Why we need it and why it was done in this way
The following tradeoffs were made:
The following alternatives were considered:
Links to places where the discussion took place:
Special notes for your reviewer
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note