Garbage collect finalized Migration objects #7144

davidvossel · 2022-01-26T21:54:26Z

related to https://bugzilla.redhat.com/show_bug.cgi?id=2021992

With the addition of the workload updates api, VMIs will automatically get migrated after a kubevirt update in order to run on the latest virt-launcher pod.

This automation is eventually consistent, and will continually attempt to migrate VMIs until all "migratable" VMIs are running on new virt-launcher pods. In the even that a VMI continually fails to live migrate, the number of Migration objects will grow indefinitely as the system continually tries to move the VMI.

To avoid allowing the finalized migrations for a VMI grow indefinitely, this PR garbage collects all but the most recent 5 migration objects. This means we continue to have a buffer of migration objects which we can use to debug issues, while not having an indefinitely large list of migration objects weight down the system.

Garbage collect finalized migration objects only leaving the most recent 5 objects

rmohr

I think we need expectations here to avoid failing subsequent deletes. Especially because we are potentially deleting migrations which are not triggering the controller loop in the first place.

pkg/virt-controller/watch/migration.go

Signed-off-by: David Vossel <davidvossel@gmail.com>

davidvossel · 2022-01-31T14:54:14Z

/retest

davidvossel · 2022-01-31T17:12:32Z

/retest

stu-gott · 2022-01-31T18:07:41Z

tests/migration_test.go

+				By("Starting the VirtualMachineInstance")
+				vmi = runVMIAndExpectLaunch(vmi, 240)
+
+				for i := 0; i < 10; i++ {


Is the migration buffer size configurable? perhaps 2x size should be used here vs 10?

the buffer size isn't configurable right now. I just set it to a default of 5. The only reason i wanted a buffer at all is so we have some objects to use for debugging.

stu-gott · 2022-01-31T18:09:37Z

tests/migration_test.go

+
+				migrations, err := virtClient.VirtualMachineInstanceMigration(vmi.Namespace).List(&metav1.ListOptions{})
+				Expect(err).To(BeNil())
+				Expect(migrations.Items).To(HaveLen(5))


Will all buffer lengths on all clusters be 5? Should this also reflect cluster config's setting?

i hard coded the buffer length for now in the migration controller. I'd only want to expose it in a config if we have a reason to someone to modify it.

stu-gott · 2022-01-31T18:13:10Z

pkg/virt-controller/watch/migration.go

+	}
+
+	// only keep the oldest 5 finalized migration objects
+	garbageCollectionCount := len(finalizedMigrations) - defaultFinalizedMigrationGarbageCollectionBuffer


Is this something a cluster would need to realistically configure? Is 5 universally reasonable? If it's going to stay hard coded is calling it the "default" misleading?

the buffer value likely won't ever need to be configurable. This value is "per vmi" so we have a trail of the last 5 finalized migration objects per vmi that can be used for debug purposes.

stu-gott · 2022-01-31T18:14:45Z

Looks good overall. Added a few minor questions.

stu-gott · 2022-01-31T21:04:20Z

/lgtm

Deferring approval to @rmohr

stu-gott · 2022-01-31T22:27:47Z

/retest

rmohr · 2022-02-01T10:50:19Z

/approve

kubevirt-bot · 2022-02-01T10:50:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rmohr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rmohr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

davidvossel · 2022-02-01T13:53:40Z

/retest

stu-gott · 2022-02-01T14:29:23Z

/cherry-pick release-0.49

kubevirt-bot · 2022-02-01T14:29:24Z

@stu-gott: once the present PR merges, I will cherry-pick it on top of release-0.49 in a new PR and assign it to you.

In response to this:

/cherry-pick release-0.49

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot · 2022-02-01T20:59:04Z

@davidvossel: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubevirt-check-tests-for-flakes	`fada881`	link	false	`/test pull-kubevirt-check-tests-for-flakes`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

kubevirt-bot · 2022-02-01T22:32:49Z

@stu-gott: new pull request created: #7166

In response to this:

/cherry-pick release-0.49

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L labels Jan 26, 2022

kubevirt-bot requested review from ILpinto and omeryahud January 26, 2022 21:54

rmohr requested changes Jan 27, 2022

View reviewed changes

pkg/virt-controller/watch/migration.go Show resolved Hide resolved

kubevirt-bot assigned rmohr Jan 27, 2022

davidvossel force-pushed the garbage-collect-failed-vmim branch from 7023ff2 to d271278 Compare January 27, 2022 17:23

kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2022

davidvossel added 3 commits January 28, 2022 17:31

garbage collect migrations for vmis

edcea29

Signed-off-by: David Vossel <davidvossel@gmail.com>

Unit test verifying migration garbage collection

9c288a7

Signed-off-by: David Vossel <davidvossel@gmail.com>

functional test verifying finalized migrations get garbage collected

fada881

Signed-off-by: David Vossel <davidvossel@gmail.com>

davidvossel force-pushed the garbage-collect-failed-vmim branch from d271278 to fada881 Compare January 28, 2022 22:31

kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2022

stu-gott reviewed Jan 31, 2022

View reviewed changes

kubevirt-bot assigned stu-gott Jan 31, 2022

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 31, 2022

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2022

kubevirt-bot merged commit 1384972 into kubevirt:main Feb 1, 2022

kubevirt-bot mentioned this pull request Feb 1, 2022

[release-0.49] Garbage collect finalized Migration objects #7166

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage collect finalized Migration objects #7144

Garbage collect finalized Migration objects #7144

davidvossel commented Jan 26, 2022

rmohr left a comment

davidvossel commented Jan 31, 2022

davidvossel commented Jan 31, 2022

stu-gott Jan 31, 2022

davidvossel Jan 31, 2022

stu-gott Jan 31, 2022

davidvossel Jan 31, 2022

stu-gott Jan 31, 2022

davidvossel Jan 31, 2022

stu-gott commented Jan 31, 2022

stu-gott commented Jan 31, 2022

stu-gott commented Jan 31, 2022

rmohr commented Feb 1, 2022

kubevirt-bot commented Feb 1, 2022

davidvossel commented Feb 1, 2022

stu-gott commented Feb 1, 2022

kubevirt-bot commented Feb 1, 2022

kubevirt-bot commented Feb 1, 2022

kubevirt-bot commented Feb 1, 2022

Garbage collect finalized Migration objects #7144

Garbage collect finalized Migration objects #7144

Conversation

davidvossel commented Jan 26, 2022

rmohr left a comment

Choose a reason for hiding this comment

davidvossel commented Jan 31, 2022

davidvossel commented Jan 31, 2022

stu-gott Jan 31, 2022

Choose a reason for hiding this comment

davidvossel Jan 31, 2022

Choose a reason for hiding this comment

stu-gott Jan 31, 2022

Choose a reason for hiding this comment

davidvossel Jan 31, 2022

Choose a reason for hiding this comment

stu-gott Jan 31, 2022

Choose a reason for hiding this comment

davidvossel Jan 31, 2022

Choose a reason for hiding this comment

stu-gott commented Jan 31, 2022

stu-gott commented Jan 31, 2022

stu-gott commented Jan 31, 2022

rmohr commented Feb 1, 2022

kubevirt-bot commented Feb 1, 2022

davidvossel commented Feb 1, 2022

stu-gott commented Feb 1, 2022

kubevirt-bot commented Feb 1, 2022

kubevirt-bot commented Feb 1, 2022

kubevirt-bot commented Feb 1, 2022