SR-IOV Migration: Move attach SRIOV devices to virt-handler #6581

ormergi · 2021-10-12T14:53:46Z

What this PR does / why we need it:

Currently when SR-IOV VM is migrated we detach its SR-IOV network devices just before migration
starts and attach similar devices to VM on the target when migration is finished successfully.
Attaching the SR-IOV devices is one-shot operation and is performed one time only at post migration.

Due to the fact that it is a one-shot operation the VM might end up in incomplete state (missing SR-IOV devices)
when a SR-IOV device is disconnected manually or due to aborted migration.
Also the current implementation doesn't follow Kubernetes desire state design that is followed all over the project.

With this PR changes, virt-handler VMController is now aware of the SR-IOV network devices state and reconciles
them which means they will be attached to the VM when needed.
Since attaching SR-IOV host-device is an intrusive operation it will be rate-limited and stops after a while.

Also With this change, when a migration is aborted (due to failure or client request) the SR-IOV devices there were detached will be attached again to the source VM, and in case SR-IOV devices are disconnected from the guest they will be attached
again.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Special notes for your reviewer:

Release note:

SRIOV network interfaces are now hot-plugged when disconnected manually or due to aborted migrations.

kubevirt-bot · 2021-10-12T14:53:47Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

ormergi · 2021-10-12T14:54:30Z

/uncc @enp0s3 @vatsalparekh

ormergi · 2021-11-16T08:03:29Z

/test pull-kubevirt-e2e-kind-1.19-sriov pull-kubevirt-unit-test

EdDev

I reviewed the first commit and except the inline comments, it will be nice if you could extract the addition of the new command to a separate commit. The next commit will just user it.
(should help with the review focus)

pkg/virt-handler/vm.go

ormergi · 2021-11-22T09:15:36Z

/test pull-kubevirt-unit-test pull-kubevirt-e2e-kind-1.19-sriov

ormergi · 2021-11-22T11:24:30Z

Rebased

/test pull-kubevirt-unit-test pull-kubevirt-e2e-kind-1.19-sriov

AlonaKaplan · 2022-01-06T12:34:22Z

pkg/virt-handler/vm.go

@@ -2560,6 +2565,13 @@ func (d *VirtualMachineController) hotplugSriovInterfaces(vmi *v1.VirtualMachine
 		return nil
 	}

+	rateLimitedExecutor := d.sriovHotplugExecutorPool.LoadOrStore(vmi.UID)


Why don't you use - vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go?
same as the controller.
workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "sriov")

The workqueue.DefaultControllerRateLimiter is ItemExponentialFailureRateLimiter.

I reviewed its implementation and tried to change the code to use it, I came up to a point where it's basically a controller that requires another goroutine.
With more details, using the workqueue, in this case, includes dequeuing an element, performing the hotplug, and according to the result adding the element back to the queue with the rate-limiter.
Dequeue'ing an element is done with Get() which is a blocking func (until it ables to dequeue an element).

AlonaKaplan · 2022-01-10T12:19:25Z

pkg/virt-handler/ratelimitcmd/backoff.go

+	return newLimitedBackoffWithClock(l.baseBackoff.backoff, l.baseBackoff.limit, l.baseBackoff.clock)
+}
+
+func NewExponentialLimitedBackoffCreator() LimitedBackoffCreator {


I don't understand why this creator is needed. It creates identical instance to LimitedBackoffCreator.baseBackoff. What do I miss?

As part of LimitedBackoff creation, maxStepTime (time.Now() + limit duration) is set relatively to the time it is created.
If we dont do this, by the time the RateLimitedExecutor will try to Exec(..) it may fail due to the limit time is already passed, if the first time its being called is after the limit time.

In other words, there is a need to clone baseBackoff each time, but at the same time to stamp it with the time of its instantiation.

AlonaKaplan · 2022-01-10T12:21:41Z

pkg/virt-handler/ratelimitcmd/backoff_test.go

+		testsClock = clock.NewFakeClock(time.Time{})
+		backoff = ratelimitcmd.NewExponentialLimitedBackoffWithClock(ratelimitcmd.DefaultMaxStep, testsClock)
+
+		testsClock.Step(time.Nanosecond)


Why this should be called before we start?

The tests-clock (fake clock) Now() will return the exact same timestamp as backoff.stepEnd timestamp since both getting the same fake clock.
Causing backoff.Ready to return false, which is expected, thus we need to bump the tests-clock a bit before starting.

pkg/virt-handler/ratelimitcmd/executor.go

AlonaKaplan · 2022-01-10T12:57:42Z

pkg/virt-handler/ratelimitcmd/backoff_test.go

+ *
+ */
+
+package ratelimitcmd_test


Please consider moving it under already existing pkg/util/ratelimiter.

That one is a flowcontrol rate limiter, no idea what it is.. but it is for sure unrelated with an executor/cmd rate limiter.

I"m not saying the flowcontrol should be used. Just having the new ratelimiter under the same package.

I think ratelimitercmd and pkg/util/ratelimiter even tough resembles each other semantically, they are totally two different things.
But I do agree ratelimitercmd should be moved to pkg/util/,
what you think about moving there as is a side to the one that exists?

Yes, that's what I meant.

EDIT:
Sorry, misunderstood you, that's not what I meant:)
I meant moving the content of ratelimitercmd to pkg/util/ratelimiter.
I find it weird to have ratelimiter and ratelimitercmd under pkg\util. You can create hierarchy under pkg/util/ratelimiter.

I do not think anything should be under something called util, and there is nothing in common about ratelimiter as a super-package. Multiple objects may have a ratelimiter behavior, in this case it is a cmd/exec.

How about creating an executor package, which has a ratelimter behavior?
IMO it should have been the same with the other one that exists now, it is a "flowcontrol" thing that has a ratelimiter option and not the other way around.

I think the ratelimiter can live by its own and it not only for the executor. The executor is just one usage of the ratelimiter. But I won't block the PR on it.
@ormergi what do you think?

Thinking of this again, I do agree we should not put it under pkg/util (we had bad experience with util packages..),
the current ratelimiter package we have (the one that warping flowcontrol) should have different name this is what confusing.
If there were more usages for the rate-limiter part (basically backoff.go) of the executor
it would have been natural to put it under its own package pkg/ratelimiter for reuse.
But since there are not other consumers at the moment we can keep it all under one package executor.

If you prefer to have the rate-limiter part (backoff.go) on its own package for viability and encouraging others to use it
we can split the it to two packages pkg/ratelimiter and pkg/executor, and deal with pkg/util/ratelimiter later

pkg/ |_ executor/ | |_ pool.go | |_ executor.go | ... |_ ratelimiter/ |_ backoff.go

@AlonaKaplan @EdDev WDYT?

I think it is a property of an executor at the moment. I would only promote it to its own package if another user of it appears.

I think the ratelimiter can live by its own and it not only for the executor

Yes, but it is still a property of the executor and having its own package is just an option if we see it used by another functionality. I do not have one in mind at the moment.
And I also do not like util or have a name that this one can live nicely with the other one you mentioned.

AlonaKaplan · 2022-01-10T13:08:00Z

pkg/virt-handler/vm.go

@@ -2560,6 +2565,13 @@ func (d *VirtualMachineController) hotplugSriovInterfaces(vmi *v1.VirtualMachine
 		return nil


Shouldn't you delete the vm from the d.sriovHotplugExecutorPool in this case? All the sriov nics are plugged, the rate limiting backoff should be zeroed.

Good catch!
Thanks for the heads up, done.

AlonaKaplan · 2022-01-10T13:29:25Z

pkg/virt-handler/vm.go

@@ -2517,6 +2517,10 @@ func (d *VirtualMachineController) vmUpdateHelperDefault(origVMI *v1.VirtualMach
 			return fmt.Errorf("failed to adjust resources: %v", err)
 		}
 	} else if vmi.IsRunning() {
+		if err := d.hotplugSriovInterfaces(vmi); err != nil {
+			log.Log.Object(vmi).Error(err.Error())


Shouldn't you re-enqueue the vmi is such case? How do you make sure that the vmi reconcile will be invoked again?

Also, on our hangouts meeting @EdDev mentioned the d.hotplugSriovInterfaces is async. In such case even if an error is not returned, the operation may fail. How do we make sure the vmi reconcile in called again?

In general, currently when not all SR-IOV interfaces are plugged we dont change the VM phase to failing, we do best effort to do the hotplug w/o disrupting the VM update flow similar to how it was before this PR changes.
Having that being said, it may change in the future.

virt-launcher domain-notifier sends (Modified) event periodically that triggers virt-handler to perform VMI sync that eventually calles hotplugSriovInterfaces that does the hotplug.

On SR-IOV VM's logs there are periodic "Synced VMI" log messages every 1 minute or so.
And on virt-handler I added some debug messages to indicate a VM update and SR-IOV hotplug, both are seen periodically.
I will add references to the code soon

So you're saying the re-enqueqe we do in the controller is redundant?
The reconcile for the vmi is periodically invoked anyway?

EDIT: or what you're saying is that in the sriov hotplug case the re-enqueue is not needed since the vm is running and for running vm we have periodic reconcile?

EDIT: or what you're saying is that in the sriov hotplug case the re-enqueue is not needed since the vm is running and for running vm we have periodic reconcile?

Yes.
Regarding the periodic VMI sync here are the references to what I described earlier:
The domains informer on virt-handler, triggers a VMI sync every 5 minutes [1] [2] [3] [4]
Other than that, virt-launcher domain-notifier client triggers a VMI sync (by sending domain modified event) every two minutes when QEMU guest-agent presents [5] [6] [7] [8] [9] [10]

AlonaKaplan · 2022-01-16T13:41:23Z

/approve

kubevirt-bot · 2022-01-16T13:41:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AlonaKaplan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [AlonaKaplan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ormergi · 2022-01-20T20:14:40Z

Following offline discussion about how this PR changes affect SR-IOV hotplug on old virt-launcher pods and backward comparability in general.
I did some testing on my local env that emulating different stages of Kubevirt during an upgrade in order to verify that SR-IOV hotplug is performed as expected.
link to the my test branch

There are two interesting scenarios:

When a VM is migrated to a node that runs old version of virt-handler (w/o this PR changes), and new virt-launcher migration target pod (includes this PR changes)
For example: when Kubevirt was upgraded but there is a node still running old virt-handler and there is SR-IOV VM migration to that node.
virt-handler sends FinalizeVirtualMachineMigration command as part of post migration on target flow.
But new virt-launcher doesn't attach host-devices as part of FinalizeVirtualMachineMigration any longer.
Thus SR-IOV interfaces are not plugged to the guest at the end of the migration.
Once virt-handler is upgraded it will send HotplugHostDevices as part of the reconcile loop that will trigger
virt-launcher to attach the host-devices.

This is transient state as we expect all virt-handler pods to eventually upgrade as part of Kubevirt upgrade.

When a VM is migrated to a node that runs new virt-handler and virt-launcher migration target pod is old (w/o this PR changes).
For example: when Kubevirt is upgrading and virt-handler pods just finished its upgrade.
virt-handler sends FinalizeVirtualMachineMigration command as part of post migration on target flow.
old virt-launcher attach host-devices as part of FinalizeVirtualMachineMigration, but fails due to QEMU process lack of resources:

{"component":"virt-launcher","level":"error","msg":"cannot limit locked memory of process 117 to 1586495488: Operation not permitted","pos":"virProcessSetMaxMemLock:962","subcomponent":"libvirt","thread":"51","timestamp":"2022-01-18T15:46:56.085000Z"}
{"component":"virt-launcher","level":"error","msg":""msg":"failed to hot-plug host-devices","name":"testvmi-bcxxb","namespace":"kubevirt-test-default1","pos":"live-migration-target.go:42",
"reason":"failed to attach host-device \u003chostdev type=\"pci\" managed=\"no\"\u003e\u003csource\u003e\u003caddress type=\"pci\" domain=\"0x0000\" bus=\"0x04\" slot=\"0x07\" function=\"0x0\"\u003e\u003c/address\u003e\u003c/source\u003e\u003calias name=\"ua-sriov-sriov\"\u003e\u003c/alias\u003e\u003c/hostdev\u003e, 
err: virError(Code=38, Domain=0, Message='cannot limit locked memory of process 117 to 65536: Operation not permitted')\n","timestamp":"2022-01-18T15:46:56.091744Z","uid":"da2b6fb3-faef-4546-adcc-f44c76e1e2ba"}

Next, when the migration is completed and virt-handler now switched to the regular "vmUpdate" flow, it will send HotplugHostDevices command as part of the reconciliation loop, but old virt-launcher does not support it and return the following error:

"failed to hot-plug SR-IOV interfaces: unknown error encountered sending command HotplugHostDevices: rpc error: code = Unimplemented desc = unknown method HotplugHostDevices for service"

To solve this and support old virt-launcher pods it is necessary that make virt-handler keep adjusting QEMU process memloc limits (which is pre-requirement for attaching SRI-IOV host-device) as part of post migration on target node.

I have pushed new changes to fix it.

Currently when SR-IOV VM is migrated we detach its SR-IOV devices just before migration start and attach them back to the target VM when migration is finished successfully. It is performed only one time only at post migration. The current implementation wont leverage the VMController, may leave the VM in incomplete state (missing SR-IOV devices) and also doesn't follow Kubernetes desire state design that is followed all over the project. With this change instead of attaching SR-IOV devices only on post migration, virt-handler VMController will handles it as part of its reconcile loop when needed. In order to support host-devices attachment at post migration on older virt-launcher pod's, virt-handler keep adjusting QMEU process memlock limits as part of post-migration on the target node flow. Signed-off-by: Or Mergi <ormergi@redhat.com>

Attaching SR-IOV host-device is an intrusive operation that may disturbs the VM workloads and overall availability. It should be called with a backoff in order to give the underlying components to finish. Performing host-devices hot-plug with a limited backoff should make sure it's done with reasonable time gaps and stop after a while. Signed-off-by: Or Mergi <ormergi@redhat.com>

Now that hot-plug host-devices is done as part of virt-handler VM controller reconcile loop, it should run on the background instead of blocking the loop and causing the VM update flow to hang. With this changes hot-plug host-devices logic will run on its own goroutine in a way that it won't block virt-handler VM controller reconcile loop. Also, in order to prevent disruption for the VM workloads, and redundant resource consumption, there will be no more than one concurrent hot-plug host-devices goroutine on virt-launcher. Signed-off-by: Or Mergi <ormergi@redhat.com>

ormergi · 2022-01-20T20:20:18Z

I accidentally pushed wrong change along with the one that was needed [1], I have pushed new change to remove just that [2].

ormergi · 2022-01-23T12:49:45Z

/hold

Placing hold until we have more eyes on it

EdDev

Thank you, the result looks really good!

EdDev · 2022-01-23T13:12:16Z

@AlonaKaplan , this change was added (after your approve) to support the (edge) scenario where a new virt-handler handles the migration target of an old virt-launcher.

It was explained in detail here.

I think we are good to go.

EdDev · 2022-01-23T13:32:27Z

/unhold

We seem to be good, lets have this in.

kubevirt-commenter-bot · 2022-01-23T18:29:54Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

kubevirt-bot · 2022-01-23T21:09:21Z

@ormergi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubevirt-e2e-k8s-1.20-sig-network	`fcef8f3`	link	true	`/test pull-kubevirt-e2e-k8s-1.20-sig-network`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

kubevirt-commenter-bot · 2022-01-23T23:29:54Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

phoracek · 2022-10-03T15:31:26Z

/cherry-pick release-0.49

kubevirt-bot · 2022-10-03T15:32:20Z

@phoracek: new pull request created: #8560

In response to this:

/cherry-pick release-0.49

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot requested review from enp0s3 and vatsalparekh October 12, 2021 14:53

kubevirt-bot removed request for vatsalparekh and enp0s3 October 12, 2021 14:54

kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 12, 2021

ormergi mentioned this pull request Oct 13, 2021

test #6581 on top of #5922 #6582

Closed

ormergi force-pushed the hotplug-sriov-on-reconciler branch from edfeafd to 7b6c580 Compare October 14, 2021 13:38

kubevirt-bot added size/M and removed size/S labels Oct 14, 2021

ormergi changed the title ~~sriov migration: Move reattach SRIOV devices logic to virt-launcher~~ SR-IOV Migration: Attach SRIOV devices as part of virt-handler reconcile loop Nov 1, 2021

kubevirt-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL and removed size/M labels Nov 1, 2021

ormergi force-pushed the hotplug-sriov-on-reconciler branch from 63fab4a to 800737a Compare November 16, 2021 07:43

kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 16, 2021

ormergi changed the title ~~SR-IOV Migration: Attach SRIOV devices as part of virt-handler reconcile loop~~ SR-IOV Migration: Move attach SRIOV devices to virt-handler Nov 16, 2021

EdDev suggested changes Nov 16, 2021

View reviewed changes

pkg/virt-handler/vm.go Outdated Show resolved Hide resolved

pkg/virt-handler/vm.go Outdated Show resolved Hide resolved

kubevirt-bot assigned EdDev Nov 16, 2021

ormergi force-pushed the hotplug-sriov-on-reconciler branch from 800737a to 5625e3e Compare November 22, 2021 09:14

ormergi force-pushed the hotplug-sriov-on-reconciler branch from 5625e3e to f01b437 Compare November 22, 2021 11:23

ormergi mentioned this pull request Jan 6, 2022

Re-attach SRIOV devices to target VM when live migration fails or is canceled #5922

Closed

AlonaKaplan reviewed Jan 6, 2022

View reviewed changes

AlonaKaplan reviewed Jan 10, 2022

View reviewed changes

ormergi force-pushed the hotplug-sriov-on-reconciler branch from fcef8f3 to 025856f Compare January 11, 2022 16:21

kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2022

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2022

ormergi force-pushed the hotplug-sriov-on-reconciler branch 2 times, most recently from bb58d66 to 98a9de1 Compare January 20, 2022 14:55

ormergi added 3 commits January 20, 2022 22:17

ormergi force-pushed the hotplug-sriov-on-reconciler branch from 98a9de1 to 5958e0c Compare January 20, 2022 20:17

kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2022

EdDev approved these changes Jan 23, 2022

View reviewed changes

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2022

kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2022

kubevirt-bot merged commit bb05152 into kubevirt:main Jan 24, 2022

kubevirt-bot mentioned this pull request Oct 3, 2022

[release-0.49] SR-IOV Migration: Move attach SRIOV devices to virt-handler #8560

Merged

		@@ -2560,6 +2565,13 @@ func (d VirtualMachineController) hotplugSriovInterfaces(vmi v1.VirtualMachine
		return nil

SR-IOV Migration: Move attach SRIOV devices to virt-handler #6581

SR-IOV Migration: Move attach SRIOV devices to virt-handler #6581

Conversation

ormergi commented Oct 12, 2021 • edited

kubevirt-bot commented Oct 12, 2021

ormergi commented Oct 12, 2021

ormergi commented Nov 16, 2021

EdDev left a comment

Choose a reason for hiding this comment

ormergi commented Nov 22, 2021

ormergi commented Nov 22, 2021

Choose a reason for hiding this comment

ormergi Jan 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EdDev Jan 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlonaKaplan Jan 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlonaKaplan Jan 13, 2022 • edited

Choose a reason for hiding this comment

ormergi Jan 16, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlonaKaplan Jan 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlonaKaplan commented Jan 16, 2022

kubevirt-bot commented Jan 16, 2022

ormergi commented Jan 20, 2022 • edited

ormergi commented Jan 20, 2022

ormergi commented Jan 23, 2022

EdDev left a comment

Choose a reason for hiding this comment

EdDev commented Jan 23, 2022

EdDev commented Jan 23, 2022

kubevirt-commenter-bot commented Jan 23, 2022

kubevirt-bot commented Jan 23, 2022 • edited

kubevirt-commenter-bot commented Jan 23, 2022

phoracek commented Oct 3, 2022

kubevirt-bot commented Oct 3, 2022

ormergi commented Oct 12, 2021 •

edited

ormergi Jan 9, 2022 •

edited

EdDev Jan 10, 2022 •

edited

AlonaKaplan Jan 12, 2022 •

edited

AlonaKaplan Jan 13, 2022 •

edited

ormergi Jan 16, 2022 •

edited

AlonaKaplan Jan 11, 2022 •

edited

ormergi commented Jan 20, 2022 •

edited

kubevirt-bot commented Jan 23, 2022 •

edited