Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup DataImportCron jobs on deletion #2088

Merged
merged 3 commits into from Jan 12, 2022

Conversation

arnongilboa
Copy link
Collaborator

Signed-off-by: Arnon Gilboa agilboa@redhat.com

What this PR does / why we need it:
Mostly for the case the job pod is stuck Pending (e.g. due to failed MountVolume.SetUp) and not terminated even when deleting its DataImportCron.
Warning FailedMount 24s (x8 over 87s) kubelet MountVolume.SetUp failed for volume "cdi-cert-vol" : configmap "no-certs" not found

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

NONE

@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jan 10, 2022
@mhenriks
Copy link
Member

Functional test?

@@ -471,6 +470,34 @@ func (r *DataImportCronReconciler) cleanup(ctx context.Context, dataImportCron *
return nil
}

func (r *DataImportCronReconciler) deleteJobs(ctx context.Context, dataImportCron *cdiv1.DataImportCron) error {
cronJobName := GetCronJobName(dataImportCron)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not work?

cronJobName := GetCronJobName(dataImportCron)
deletePropagationBackground := metav1.DeletePropagationBackground
cronJob := &v1beta1.CronJob{ObjectMeta: metav1.ObjectMeta{Namespace: r.cdiNamespace, Name: cronJobName}}
if err := r.client.Delete(ctx, cronJob, &client.DeleteOptions{PropagationPolicy: &deletePropagationBackground}); IgnoreNotFound(err) != nil {
  return err
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this will delete the current cronjob job, but not the initial job which is not a cronjob one. However, initial job should be deleted by name, so no need to list the jobs. Fixed. Thanks @awels .

Copy link
Member

@awels awels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, what about a functional test to make sure everything is gone?

@arnongilboa arnongilboa force-pushed the cleanup_dic_jobs_n_pods branch 2 times, most recently from 898a069 to 055e08d Compare January 11, 2022 14:21
cron, err = f.CdiClient.CdiV1beta1().DataImportCrons(f.Namespace.Name).Create(context.TODO(), cron, metav1.CreateOptions{})
Expect(err).ToNot(HaveOccurred())

By("Verify jobs were created")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it also create pods associated with those jobs? AFAICT there are 3 levels of objects. Top level DataImportCrons, second level CronJob and Initial Job, and third level the actual pods created by those jobs. So I would expect the test to verify all 3 get created, and that all 3 get deleted when the top level is deleted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
@brybacki
Copy link
Contributor

it is interesting - on the ceph lane the DV actually succeeded, but test failed (maybe it did not finish in time)

1.641918946014968e+09	DEBUG	controller.datavolume-controller	UUID not already used, marking checkpoint completed by current pod ID.	{"checkpoint": "cirros.raw", "podId": "c40fb2b7-3899-4092-a34d-bf4ecf46dc30"}
1.641918946015955e+09	DEBUG	controller.import-controller	Updated PVC	{"PVC": "cdi-e2e-tests-dv-func-test-s68fl/dv-imageio-warm-import-test", "pvc.anno.Phase": "Succeeded", "pvc.anno.Restarts": "0"}
1.6419189460159824e+09	DEBUG	controller.import-controller	Import completed successfully	{"PVC": "cdi-e2e-tests-dv-func-test-s68fl/dv-imageio-warm-import-test"}

@brybacki
Copy link
Contributor

/retest

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Copy link
Member

@awels awels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic is fine, just some alternative suggestions on some tests to better convey their intentions.


By("Verify initial job created")
initialJobName := controller.GetInitialJobName(cron)
Eventually(func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more readable

Eventually(func() *corev1.Job {
  job, _ := f.K8sClient.BatchV1().Jobs(f.CdiInstallNs).Get(context.TODO(), initialJobName, metav1.GetOptions{})
  return job
}, dataImportCronTimeout, pollingInterval).ShouldNot(BeNil())

The intention is to verify the job exists, the eventually func doesn't have to return a bool, it can return anything.

}, dataImportCronTimeout, pollingInterval).Should(BeTrue())

By("Verify initial job pod created")
Eventually(func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative suggestion:

Eventually(func() *corev1.Pod {
  pod, _ := utils.FindPodByPrefixOnce(f.K8sClient, f.CdiInstallNs, initialJobName, "")
  return pod
}, dataImportCronTimeout, pollingInterval).ShouldNot(BeNil())

By("Verify cronjob created and has active job")
cronJobName := controller.GetCronJobName(cron)
var jobName string
Eventually(func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative suggestion:

Eventually(func() string {
  jobName = ""
  cronjob, _ := f.K8sClient.BatchV1beta1().CronJobs(f.CdiInstallNs).Get(context.TODO(), cronJobName, metav1.GetOptions{})
  if cronjob != nil && len(cronjob.Status.Active) > 0 {
    jobName = cronjob.Status.Active[0].Name
  }
  return jobName
}, dataImportCronTimeout, pollingInterval).ShouldNot(BeEmpty())

Note we are ignoring errors here, maybe write them to the output to help debug if there are issues.

}, dataImportCronTimeout, pollingInterval).Should(BeTrue())

By("Verify cronjob first job created")
Eventually(func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative:

Eventually(func() *corev1.Job {
  job, _ := f.K8sClient.BatchV1().Jobs(f.CdiInstallNs).Get(context.TODO(), jobName, metav1.GetOptions{})
  return job
}, dataImportCronTimeout, pollingInterval).ShouldNot(BeNil())

}, dataImportCronTimeout, pollingInterval).Should(BeTrue())

By("Verify cronjob first job pod created")
Eventually(func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative:

Eventually(func() *corev1.Pod {
  pod, _ := utils.FindPodByPrefixOnce(f.K8sClient, f.CdiInstallNs, jobName, "")
  return pod
}, dataImportCronTimeout, pollingInterval).ShouldNot(BeNil())

}, dataImportCronTimeout, pollingInterval).Should(BeTrue())

By("Verify initial job pod deleted")
Eventually(func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative:

Eventually(func() bool {
  _, err := utils.FindPodByPrefixOnce(f.K8sClient, f.CdiInstallNs, intialJobName, "")
  return errors.IsNotFound(err)
}, dataImportCronTimeout, pollingInterval).Should(BeTrue())

}, dataImportCronTimeout, pollingInterval).Should(BeTrue())

By("Verify cronjob first job pod deleted")
Eventually(func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative:

Eventually(func() bool {
  _, err := utils.FindPodByPrefixOnce(f.K8sClient, f.CdiInstallNs, jobName, "")
  return errors.IsNotFound(err)
}, dataImportCronTimeout, pollingInterval).Should(BeTrue())

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Copy link
Member

@awels awels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
lets see if our fix to the findPodByPrefix func breaks anything, but looks fine otherwise.

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 12, 2022
@awels
Copy link
Member

awels commented Jan 12, 2022

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awels

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 12, 2022
@kubevirt-bot kubevirt-bot merged commit 97288e3 into kubevirt:main Jan 12, 2022
@arnongilboa
Copy link
Collaborator Author

/cherrypick release-v1.43

@kubevirt-bot
Copy link
Contributor

@arnongilboa: new pull request created: #2100

In response to this:

/cherrypick release-v1.43

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants