-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DV garbage collection #8134
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, couple questions
hack/cluster-deploy.sh
Outdated
@@ -21,6 +21,9 @@ set -ex pipefail | |||
|
|||
DOCKER_TAG=${DOCKER_TAG:-devel} | |||
KUBEVIRT_DEPLOY_CDI=${KUBEVIRT_DEPLOY_CDI:-true} | |||
##### DO NOT APPROVE ##### | |||
# FIXME: default should be false, as we want to cover the existing no-GC flows | |||
CDI_DV_GC=${CDI_DV_GC:-0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You planning a GC specific lane?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already added an optional lane pull-kubevirt-e2e-k8s-1.24-sig-storage-dv-gc
pkg/controller/virtinformers.go
Outdated
@@ -409,6 +409,9 @@ func GetVMIInformerIndexers() cache.Indexers { | |||
if vol.PersistentVolumeClaim != nil { | |||
pvcs = append(pvcs, fmt.Sprintf("%s/%s", vmi.Namespace, vol.PersistentVolumeClaim.ClaimName)) | |||
} | |||
if vol.DataVolume != nil { | |||
pvcs = append(pvcs, fmt.Sprintf("%s/%s", vmi.Namespace, vol.DataVolume.Name)) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the rationale for this? To me, it seems to muddy things a bit. Why not lookup in both indexes? We do that in a couple places, like here: https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-controller/watch/vm.go#L1720-L1733
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good idea. I'll try and update.
@@ -693,7 +693,8 @@ func (ctrl *VMExportController) isKubevirtContentType(pvc *corev1.PersistentVolu | |||
if pvc.Spec.VolumeMode != nil && *pvc.Spec.VolumeMode == corev1.PersistentVolumeBlock { | |||
return true | |||
} | |||
isKubevirt := pvc.Annotations[annContentType] == string(cdiv1.DataVolumeKubeVirt) | |||
contentType, ok := pvc.Annotations[annContentType] | |||
isKubevirt := ok && (contentType == string(cdiv1.DataVolumeKubeVirt) || contentType == "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases was this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KubeVirt was mistakenly (inconsistently with CDI) regarding a PVC with default (empty string) content type as non-KubeVirt (which means Archive, I guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt it also kubevirt if !ok? if we dont have the annotation at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShellyKa13 if the PVC doesn't have the ContentType annotation at all (unlike empty which defaults to KubeVirt), we cannot say it's KubeVirt content type as CDI did not set the annotation.
tests/datavolume_gc.go
Outdated
. "kubevirt.io/kubevirt/tests/framework/matcher" | ||
) | ||
|
||
func DeleteDataVolume(dv *cdiv1.DataVolume) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this is not the right place for these helper functions. Perhaps in libstorage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can move it to libstorage. @ShellyKa13 ack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah Im not even sure we need a new file maybe all this functions can get into tests/libstorage/datavolume.go..
If you think it should be in a different file then tests/libstorage/datavolume_gc.go
pkg/virt-controller/watch/vm.go
Outdated
// Don't create DV if PVC already exists | ||
pvc, err := c.getPersistentVolumeClaimFromCache(vm.Namespace, template.Name) | ||
if err == nil && pvc != nil { | ||
populatedFor := pvc.Annotations["cdi.kubevirt.io/storage.populatedFor"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this annotation added when DV is GC'd? Otherwise, it shouldn't always be there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's already used in the restore for creating a DV when its populated PVC already exists.
@ShellyKa13 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes in the restore in case of datavolumetemplate we update the populatedFor to be the name of the new dvrestore name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what is going on here is correct. I don't think "populatedFor" should have any effect here. It is used for more things than vm restore, btw.
As-is, it appears to me that any "populatedFor" DataVolumes will never be garbage collected. Or more precisely, they will always get recreated after they are garbage collected.
I think VM controller should simply only create the DV if the PVC does not exist.
That means that the restore controller has to create the DVs during restore, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As-is, it appears to me that any "populatedFor" DataVolumes will never be garbage collected. Or more precisely, they will always get recreated after they are garbage collected.
not sure I understand why they will always be recreated, actually this code exactly should make sure that they will not be recreated (if annotation exists continue).
But either way I do understand what you are saying regarding the datavolume should be created in the restore and I agree, I did wonder why we create the pvc but not the datavolume in the restore process.
The question is if we decided that the default behavior is as if gc is on then on the case of restore there is no need to create the data volumes at all since the pvcs are already created from the volumesnapshots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "right" thing to do is to have restore create the DV even if it is garbage collected right away. BUT this may be a tricky thing to do if GC is enabled. How do we track that a DataVolume was created successfully if it may be garbage collected right away? May have to persist in restore status. Basically we have to make sure that the restore code does not consistently keep creating DVs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhenriks things get a bit complex with your "right" direction. I don't see why not keep restore consistent with the new behavior, so if PVC exists DV is not created. Afaik no KubeVirt code currently requires/refers the DV in the restore flow, and all CI lanes look green with this behavior for both GC enabled/disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the idea of mixing this compact PR with improvement of the restore DV creation. We sure won't duplicate the VM controller logic, so a quite heavy refactoring will be needed there.
However, I understand that for some reason (preparation for DV source = volume snapshot?) you don't want to get rid of the restore DV creation when PVC exist.
Let's consider this simple solution: when restore creates the PVC it is annotated with dvRequired
, so when VM controller encounters PVC with that annotation, it creates the DV and removes the PVC annotation.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought of an easy way to track whether restore controller created the DataVolume. Initially, the restore PVC has no owner. When DV is created, it will take ownership of the PVC. When the DV is garbage collected, the VM will own the PVC. So, restore process will only create the DV when it does not exist and the PVC has no owner. What do you think?
As you can probably guess, my preference is to not change existing behavior unless there is a very good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a clean logic, so why not keep DV creation in the VM controller with this minor condition? of course, you can later move DV creation to restore, but not in this PR.
9acc0d9
to
e4fe7bf
Compare
if err == nil { | ||
return storageClassName, nil | ||
} | ||
pvcKey := cacheKeyFunc(namespace, volume.VolumeSource.DataVolume.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already done at the end of getStorageClassNameForDV: https://github.com/arnongilboa/kubevirt/blob/9acc0d9206a3850a1daa86e4de41964032def920/pkg/virt-controller/watch/snapshot/snapshot.go#L852-L858
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getStorageClassNameForDV
returns error if DV is not in cache (e.g. GC), but I agree handling it there is better.
pkg/virt-controller/watch/vm.go
Outdated
// Don't create DV if PVC already exists | ||
pvc, err := c.getPersistentVolumeClaimFromCache(vm.Namespace, template.Name) | ||
if err == nil && pvc != nil { | ||
populatedFor := pvc.Annotations["cdi.kubevirt.io/storage.populatedFor"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes in the restore in case of datavolumetemplate we update the populatedFor to be the name of the new dvrestore name
tests/datavolume_gc.go
Outdated
. "kubevirt.io/kubevirt/tests/framework/matcher" | ||
) | ||
|
||
func DeleteDataVolume(dv *cdiv1.DataVolume) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah Im not even sure we need a new file maybe all this functions can get into tests/libstorage/datavolume.go..
If you think it should be in a different file then tests/libstorage/datavolume_gc.go
tests/datavolume_gc.go
Outdated
|
||
err = virtCli.CdiClient().CdiV1beta1().DataVolumes(dv.Namespace).Delete(context.Background(), dv.Name, metav1.DeleteOptions{}) | ||
if !IsDataVolumeGC(virtCli) { | ||
Expect(err).To(BeNil()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some places added dv = nil, not sure if should be added here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding that. Let's see what @mhenriks thinks about DeleteDataVolume(dv **v1beta1.DataVolume)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no dont change it to **, just do dv=nil were it was deleted just after calling this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not nil them all in the func? I don't see any risk
tests/utils.go
Outdated
@@ -106,6 +106,8 @@ const ( | |||
StartingVMInstance = "Starting a VirtualMachineInstance" | |||
WaitingVMInstanceStart = "Waiting until the VirtualMachineInstance will start" | |||
CouldNotFindComputeContainer = "could not find compute container for pod" | |||
DeletingDataVolume = "Deleting the DataVolume" | |||
VerifyingGC = "Verifying DataVolume garbage collection" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this 2 doesnt seem to belong here, should be in the same file as the helper functions they are used in
pkg/virt-controller/watch/vm.go
Outdated
@@ -479,6 +479,15 @@ func (c *VMController) handleDataVolumes(vm *virtv1.VirtualMachine, dataVolumes | |||
} | |||
} | |||
if !exists { | |||
// Don't create DV if PVC already exists | |||
pvc, err := c.getPersistentVolumeClaimFromCache(vm.Namespace, template.Name) | |||
if err == nil && pvc != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requeue if err != nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Nice catch!
285eb36
to
27797be
Compare
/retest-required |
/test pull-kubevirt-e2e-k8s-1.24-sig-storage-dv-gc |
1 similar comment
/test pull-kubevirt-e2e-k8s-1.24-sig-storage-dv-gc |
/test ? |
@akalenyu: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubevirt-e2e-k8s-1.24-sig-storage-dv-gc |
/test pull-kubevirt-fossa |
/test pull-kubevirt-e2e-k8s-1.24-sig-storage-dv-gc |
1 similar comment
/test pull-kubevirt-e2e-k8s-1.24-sig-storage-dv-gc |
/test pull-kubevirt-e2e-kind-1.23-vgpu-nonroot |
/lgtm |
/test pull-kubevirt-e2e-k8s-1.24-sig-storage-dv-gc |
/retest-required |
/test pull-kubevirt-e2e-k8s-1.24-sig-storage-dv-gc |
/retest |
@arnongilboa: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubevirt-e2e-k8s-1.22-sig-storage |
/retest-required |
/test pull-kubevirt-e2e-kind-1.22-sriov |
/retest-required |
/test pull-kubevirt-e2e-k8s-1.24-sig-storage-dv-gc |
/hold |
@mhenriks @ShellyKa13 @awels @akalenyu |
CDI added support for [garbage collection of completed DVs] (kubevirt/containerized-data-importer#2233). Here we add the KubeVirt support, and adapt tests accordingly, so in case of garbage collected DV it won't be referred. See design [here](https://github.com/kubevirt/community/blob/main/ design-proposals/garbage-collect-completed-dvs.md). Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
@arnongilboa: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
/unhold |
Signed-off-by: Arnon Gilboa agilboa@redhat.com
What this PR does / why we need it:
CDI added support for garbage collection of completed DVs.
Here we add the KubeVirt support, and adapt tests accordingly, so in case of garbage collected DV it won't be referred.
See design here.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note: