Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collect finalized Migration objects #7144

Merged
merged 3 commits into from Feb 1, 2022

Conversation

davidvossel
Copy link
Member

related to https://bugzilla.redhat.com/show_bug.cgi?id=2021992

With the addition of the workload updates api, VMIs will automatically get migrated after a kubevirt update in order to run on the latest virt-launcher pod.

This automation is eventually consistent, and will continually attempt to migrate VMIs until all "migratable" VMIs are running on new virt-launcher pods. In the even that a VMI continually fails to live migrate, the number of Migration objects will grow indefinitely as the system continually tries to move the VMI.

To avoid allowing the finalized migrations for a VMI grow indefinitely, this PR garbage collects all but the most recent 5 migration objects. This means we continue to have a buffer of migration objects which we can use to debug issues, while not having an indefinitely large list of migration objects weight down the system.

Garbage collect finalized migration objects only leaving the most recent 5 objects 

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L labels Jan 26, 2022
Copy link
Member

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need expectations here to avoid failing subsequent deletes. Especially because we are potentially deleting migrations which are not triggering the controller loop in the first place.

pkg/virt-controller/watch/migration.go Show resolved Hide resolved
@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2022
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2022
@davidvossel
Copy link
Member Author

/retest

1 similar comment
@davidvossel
Copy link
Member Author

/retest

By("Starting the VirtualMachineInstance")
vmi = runVMIAndExpectLaunch(vmi, 240)

for i := 0; i < 10; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the migration buffer size configurable? perhaps 2x size should be used here vs 10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the buffer size isn't configurable right now. I just set it to a default of 5. The only reason i wanted a buffer at all is so we have some objects to use for debugging.


migrations, err := virtClient.VirtualMachineInstanceMigration(vmi.Namespace).List(&metav1.ListOptions{})
Expect(err).To(BeNil())
Expect(migrations.Items).To(HaveLen(5))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will all buffer lengths on all clusters be 5? Should this also reflect cluster config's setting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hard coded the buffer length for now in the migration controller. I'd only want to expose it in a config if we have a reason to someone to modify it.

}

// only keep the oldest 5 finalized migration objects
garbageCollectionCount := len(finalizedMigrations) - defaultFinalizedMigrationGarbageCollectionBuffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something a cluster would need to realistically configure? Is 5 universally reasonable? If it's going to stay hard coded is calling it the "default" misleading?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the buffer value likely won't ever need to be configurable. This value is "per vmi" so we have a trail of the last 5 finalized migration objects per vmi that can be used for debug purposes.

@stu-gott
Copy link
Member

Looks good overall. Added a few minor questions.

@stu-gott
Copy link
Member

/lgtm

Deferring approval to @rmohr

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 31, 2022
@stu-gott
Copy link
Member

/retest

@rmohr
Copy link
Member

rmohr commented Feb 1, 2022

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rmohr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2022
@davidvossel
Copy link
Member Author

/retest

@stu-gott
Copy link
Member

stu-gott commented Feb 1, 2022

/cherry-pick release-0.49

@kubevirt-bot
Copy link
Contributor

@stu-gott: once the present PR merges, I will cherry-pick it on top of release-0.49 in a new PR and assign it to you.

In response to this:

/cherry-pick release-0.49

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot
Copy link
Contributor

@davidvossel: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-check-tests-for-flakes fada881 link false /test pull-kubevirt-check-tests-for-flakes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kubevirt-bot
Copy link
Contributor

@stu-gott: new pull request created: #7166

In response to this:

/cherry-pick release-0.49

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants