Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload-update: cancel a workload update migration #11641

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

alicefr
Copy link
Member

@alicefr alicefr commented Apr 3, 2024

What this PR does

Before this PR:
The workload updater doesn't currently support the ability to abort a migration due to an automated workload update. Today, the workload update is mostly used for CPU and memory hotplug.

After this PR:
This change introduces a mechanism to aborts a workload update.

The migration abortion can be detected if the VMI doesn't have any VMI*Change conditions but there is a migration due to a automated workload update. If this condition is met, then it means that the change condition was present (has triggered the migration) but now it has been removed. This assumption simplifies the workload updater that needs only to take care of deleting the migration.

We have also introduced a new annotation kubevirt.io/kubevirt.io/testWorkloadUpdateMigrationAbortion that should only be used during the tests.

Special notes for your reviewer

This feature will be needed for volume Migration. Volume migration still relays on the same workload update as CPU and memory hotplug. However, in this case, the migration might take very long and we want to allow user to be able to cancel the change (for more details please check the design proposal). A draft PR for volume migration can be found in #11533

This PR is a pre-requirement to introduce such as cancellation mechanism. Since the volume migration PR is complex and this is a generic mechanism, I'd like to discuss it and present it separately, even if this still not used in the code.

Release note

Add kubevirt.io/testWorkloadUpdateMigrationAbortion annotation and a mechanism to abort workload updates

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L labels Apr 3, 2024
@kubevirt-bot kubevirt-bot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/buildsystem Denotes an issue or PR that relates to changes in the build system. labels Apr 3, 2024
@alicefr alicefr marked this pull request as draft April 3, 2024 12:14
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 3, 2024
@alicefr
Copy link
Member Author

alicefr commented Apr 3, 2024

@vladikr @xpivarc @acardace when you have a bit of time , would you mind to take a look to this change and provide feedback? Many thanks!

@alicefr alicefr force-pushed the delete-mig-updater branch 2 times, most recently from dd69c95 to 4c764b1 Compare April 3, 2024 14:01
@alicefr alicefr marked this pull request as ready for review April 3, 2024 14:02
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 3, 2024
@alicefr
Copy link
Member Author

alicefr commented Apr 4, 2024

The volume migration PR is now based on the abortion mechanism implemented in this PR. The volume migration PR also includes a functional test that verifies that the cancellation properly work and it can be used as an example of this feature to work.

@acardace acardace added sig/compute and removed sig/buildsystem Denotes an issue or PR that relates to changes in the build system. labels Apr 8, 2024
if migration.IsFinal() {
continue
}
if migration.Namespace != ns {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't/shouldn't this ionformer have a "namespace" index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also add your own "vmi" index that indexes on namespace/name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhenriks I'm not sure I understand the feedback, could you please expand?

@alicefr alicefr marked this pull request as draft April 9, 2024 15:07
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2024
@kubevirt-bot kubevirt-bot added the sig/buildsystem Denotes an issue or PR that relates to changes in the build system. label Apr 9, 2024
@alicefr alicefr marked this pull request as ready for review April 10, 2024 07:05
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 10, 2024
This annotation enables the deletion of a migration due to a workload
update of the VMI.
The main use of this annotation is to test the abortion of an update.
The users should also remove manually this annotation once the update
has been aborted.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
The function ListWorkloadUpdateMigrations is an helper function that
returns the not finalized migrations due to a workload update.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
@alicefr alicefr force-pushed the delete-mig-updater branch 2 times, most recently from 9d7fc9c to c18c24f Compare April 19, 2024 08:13
The testWorkloadUpdateMigrationAbortionAnnotation cancels the migrations
due to a workload update. If a VMI has this annotation, new changes
shouldn't be considered until this annotation is removed.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
@alicefr
Copy link
Member Author

alicefr commented Apr 23, 2024

/test pull-kubevirt-e2e-k8s-1.29-sig-compute-migrations

@alicefr
Copy link
Member Author

alicefr commented Apr 24, 2024

/test pull-kubevirt-e2e-k8s-1.29-sig-operator

@alicefr alicefr marked this pull request as draft April 24, 2024 12:04
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 24, 2024
For certain workload updates, we might desire to cancel a migration
triggred by the update.

The workload updater checks if the VMI has a  VirtualMachineInstance*Change,
if the conditions aren't present but there is a migration associated
with an automated update, then the migration will be aborted.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
@alicefr
Copy link
Member Author

alicefr commented Apr 24, 2024

/test pull-kubevirt-e2e-k8s-1.29-sig-operato

@kubevirt-bot
Copy link
Contributor

@alicefr: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-kubevirt-apidocs
  • /test pull-kubevirt-build
  • /test pull-kubevirt-build-arm64
  • /test pull-kubevirt-check-unassigned-tests
  • /test pull-kubevirt-client-python
  • /test pull-kubevirt-code-lint
  • /test pull-kubevirt-e2e-k8s-1.27-sig-compute
  • /test pull-kubevirt-e2e-k8s-1.27-sig-network
  • /test pull-kubevirt-e2e-k8s-1.27-sig-operator
  • /test pull-kubevirt-e2e-k8s-1.27-sig-storage
  • /test pull-kubevirt-e2e-k8s-1.28-sig-compute
  • /test pull-kubevirt-e2e-k8s-1.28-sig-network
  • /test pull-kubevirt-e2e-k8s-1.28-sig-operator
  • /test pull-kubevirt-e2e-k8s-1.28-sig-storage
  • /test pull-kubevirt-e2e-k8s-1.29-ipv6-sig-network
  • /test pull-kubevirt-e2e-k8s-1.29-sig-compute
  • /test pull-kubevirt-e2e-k8s-1.29-sig-compute-migrations
  • /test pull-kubevirt-e2e-k8s-1.29-sig-monitoring
  • /test pull-kubevirt-e2e-k8s-1.29-sig-network
  • /test pull-kubevirt-e2e-k8s-1.29-sig-operator
  • /test pull-kubevirt-e2e-k8s-1.29-sig-performance
  • /test pull-kubevirt-e2e-k8s-1.29-sig-storage
  • /test pull-kubevirt-e2e-kind-1.27-vgpu
  • /test pull-kubevirt-e2e-kind-sriov
  • /test pull-kubevirt-e2e-windows2016
  • /test pull-kubevirt-fossa
  • /test pull-kubevirt-generate
  • /test pull-kubevirt-manifests
  • /test pull-kubevirt-prom-rules-verify
  • /test pull-kubevirt-unit-test
  • /test pull-kubevirt-verify-go-mod

The following commands are available to trigger optional jobs:

  • /test build-kubevirt-builder
  • /test pull-kubevirt-build-s390x
  • /test pull-kubevirt-check-tests-for-flakes
  • /test pull-kubevirt-conformance-arm64
  • /test pull-kubevirt-e2e-arm64
  • /test pull-kubevirt-e2e-k8s-1.29-sig-compute-realtime
  • /test pull-kubevirt-e2e-k8s-1.29-sig-compute-root
  • /test pull-kubevirt-e2e-k8s-1.29-sig-network-multus-v4
  • /test pull-kubevirt-e2e-k8s-1.29-sig-storage-root
  • /test pull-kubevirt-e2e-k8s-1.29-single-node
  • /test pull-kubevirt-e2e-k8s-1.29-swap-enabled
  • /test pull-kubevirt-e2e-k8s-1.30-sig-compute
  • /test pull-kubevirt-e2e-k8s-1.30-sig-network
  • /test pull-kubevirt-e2e-k8s-1.30-sig-operator
  • /test pull-kubevirt-e2e-k8s-1.30-sig-storage
  • /test pull-kubevirt-gosec
  • /test pull-kubevirt-goveralls
  • /test pull-kubevirt-metrics-lint
  • /test pull-kubevirt-unit-test-arm64
  • /test pull-kubevirt-verify-rpms

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubevirt-apidocs
  • pull-kubevirt-build
  • pull-kubevirt-build-arm64
  • pull-kubevirt-check-tests-for-flakes
  • pull-kubevirt-check-unassigned-tests
  • pull-kubevirt-client-python
  • pull-kubevirt-code-lint
  • pull-kubevirt-conformance-arm64
  • pull-kubevirt-e2e-arm64
  • pull-kubevirt-e2e-k8s-1.29-sig-compute
  • pull-kubevirt-e2e-k8s-1.29-sig-compute-migrations
  • pull-kubevirt-e2e-k8s-1.29-sig-network
  • pull-kubevirt-e2e-k8s-1.29-sig-operator
  • pull-kubevirt-e2e-k8s-1.29-sig-performance
  • pull-kubevirt-e2e-k8s-1.29-sig-storage
  • pull-kubevirt-fossa
  • pull-kubevirt-generate
  • pull-kubevirt-goveralls
  • pull-kubevirt-manifests
  • pull-kubevirt-prom-rules-verify
  • pull-kubevirt-unit-test
  • pull-kubevirt-unit-test-arm64
  • pull-kubevirt-verify-go-mod

In response to this:

/test pull-kubevirt-e2e-k8s-1.29-sig-operato

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alicefr
Copy link
Member Author

alicefr commented Apr 24, 2024

Need to fix some test failure in the operator suite

@alicefr
Copy link
Member Author

alicefr commented Apr 24, 2024

/test pull-kubevirt-e2e-k8s-1.29-sig-operator

@alicefr alicefr marked this pull request as ready for review April 24, 2024 13:07
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 24, 2024
@alicefr
Copy link
Member Author

alicefr commented Apr 24, 2024

The tests for the operator now pass locally.

@alicefr
Copy link
Member Author

alicefr commented Apr 25, 2024

/test pull-kubevirt-e2e-k8s-1.29-sig-operator

@alicefr
Copy link
Member Author

alicefr commented Apr 25, 2024

@acardace @vladikr can you please have another look to this PR? The CI failures should be solved now.

@acardace
Copy link
Member

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: acardace

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 26, 2024
Add functional test to verify the abortion of the migration during the
CPU update. The migration cancellation is triggered by the annotation
kubevirt.io/testWorkloadUpdateMigrationAbortion on the VMI.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
@alicefr
Copy link
Member Author

alicefr commented Apr 29, 2024

Fixed last nit in the tests and a typo

@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Apr 29, 2024

@alicefr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-manifests 4dfde6f link true /test pull-kubevirt-manifests
pull-kubevirt-apidocs 4dfde6f link true /test pull-kubevirt-apidocs
pull-kubevirt-verify-go-mod 4dfde6f link true /test pull-kubevirt-verify-go-mod
pull-kubevirt-check-tests-for-flakes 4dfde6f link false /test pull-kubevirt-check-tests-for-flakes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

if isHotplugInProgress(vmi) {
return true
}

return false
}

func (c *WorkloadUpdateController) shouldAbortMigration(vmi *virtv1.VirtualMachineInstance) bool {
numMig := len(migrationutils.ListWorkloadUpdateMigrations(c.migrationInformer.GetStore(), vmi.Name, vmi.Namespace))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth ensuring that we are only aborting the migration this controller started?
_, ok = migration.Annotations[virtv1.WorkloadUpdateMigrationAnnotation]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already the case. The function ListWorkloadUpdateMigrations checks if the migration object has the annotation WorkloadUpdateMigrationAnnotation (see here), otherwise the migration isn't included in the list.

Copy link
Member Author

@alicefr alicefr Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it cleaner encapsulated in the function, and the function it is also re-used below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/buildsystem Denotes an issue or PR that relates to changes in the build system. sig/compute size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants