-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SR-IOV Migration: Move attach SRIOV devices to virt-handler #6581
SR-IOV Migration: Move attach SRIOV devices to virt-handler #6581
Conversation
Skipping CI for Draft Pull Request. |
/uncc @enp0s3 @vatsalparekh |
edfeafd
to
7b6c580
Compare
63fab4a
to
800737a
Compare
/test pull-kubevirt-e2e-kind-1.19-sriov pull-kubevirt-unit-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the first commit and except the inline comments, it will be nice if you could extract the addition of the new command to a separate commit. The next commit will just user it.
(should help with the review focus)
800737a
to
5625e3e
Compare
/test pull-kubevirt-unit-test pull-kubevirt-e2e-kind-1.19-sriov |
5625e3e
to
f01b437
Compare
Rebased /test pull-kubevirt-unit-test pull-kubevirt-e2e-kind-1.19-sriov |
@@ -2560,6 +2565,13 @@ func (d *VirtualMachineController) hotplugSriovInterfaces(vmi *v1.VirtualMachine | |||
return nil | |||
} | |||
|
|||
rateLimitedExecutor := d.sriovHotplugExecutorPool.LoadOrStore(vmi.UID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use - vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go
?
same as the controller.
workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "sriov")
The workqueue.DefaultControllerRateLimiter
is ItemExponentialFailureRateLimiter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed its implementation and tried to change the code to use it, I came up to a point where it's basically a controller that requires another goroutine.
With more details, using the workqueue
, in this case, includes dequeuing an element, performing the hotplug, and according to the result adding the element back to the queue with the rate-limiter.
Dequeue'ing an element is done with Get()
which is a blocking func (until it ables to dequeue an element).
return newLimitedBackoffWithClock(l.baseBackoff.backoff, l.baseBackoff.limit, l.baseBackoff.clock) | ||
} | ||
|
||
func NewExponentialLimitedBackoffCreator() LimitedBackoffCreator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this creator is needed. It creates identical instance to LimitedBackoffCreator.baseBackoff
. What do I miss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of LimitedBackoff
creation, maxStepTime
(time.Now() + limit duration) is set relatively to the time it is created.
If we dont do this, by the time the RateLimitedExecutor
will try to Exec(..)
it may fail due to the limit time is already passed, if the first time its being called is after the limit time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, there is a need to clone baseBackoff
each time, but at the same time to stamp it with the time of its instantiation.
testsClock = clock.NewFakeClock(time.Time{}) | ||
backoff = ratelimitcmd.NewExponentialLimitedBackoffWithClock(ratelimitcmd.DefaultMaxStep, testsClock) | ||
|
||
testsClock.Step(time.Nanosecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this should be called before we start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests-clock (fake clock) Now()
will return the exact same timestamp as backoff.stepEnd
timestamp since both getting the same fake clock.
Causing backoff.Ready
to return false
, which is expected, thus we need to bump the tests-clock a bit before starting.
* | ||
*/ | ||
|
||
package ratelimitcmd_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider moving it under already existing pkg/util/ratelimiter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is a flowcontrol
rate limiter, no idea what it is.. but it is for sure unrelated with an executor/cmd rate limiter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I"m not saying the flowcontrol
should be used. Just having the new ratelimiter under the same package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ratelimitercmd
and pkg/util/ratelimiter
even tough resembles each other semantically, they are totally two different things.
But I do agree ratelimitercmd
should be moved to pkg/util/,
what you think about moving there as is a side to the one that exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I meant.
EDIT:
Sorry, misunderstood you, that's not what I meant:)
I meant moving the content of ratelimitercmd
to pkg/util/ratelimiter
.
I find it weird to have ratelimiter
and ratelimitercmd
under pkg\util
. You can create hierarchy under pkg/util/ratelimiter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think anything should be under something called util
, and there is nothing in common about ratelimiter
as a super-package. Multiple objects may have a ratelimiter behavior, in this case it is a cmd/exec.
How about creating an executor
package, which has a ratelimter behavior?
IMO it should have been the same with the other one that exists now, it is a "flowcontrol" thing that has a ratelimiter option and not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ratelimiter
can live by its own and it not only for the executor
. The executor
is just one usage of the ratelimiter
. But I won't block the PR on it.
@ormergi what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking of this again, I do agree we should not put it under pkg/util (we had bad experience with util packages..),
the current ratelimiter
package we have (the one that warping flowcontrol
) should have different name this is what confusing.
If there were more usages for the rate-limiter part (basically backoff.go) of the executor
it would have been natural to put it under its own package pkg/ratelimiter
for reuse.
But since there are not other consumers at the moment we can keep it all under one package executor
.
If you prefer to have the rate-limiter part (backoff.go
) on its own package for viability and encouraging others to use it
we can split the it to two packages pkg/ratelimiter
and pkg/executor
, and deal with pkg/util/ratelimiter
later
pkg/
|_ executor/
| |_ pool.go
| |_ executor.go
| ...
|_ ratelimiter/
|_ backoff.go
@AlonaKaplan @EdDev WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a property of an executor at the moment. I would only promote it to its own package if another user of it appears.
I think the
ratelimiter
can live by its own and it not only for theexecutor
Yes, but it is still a property of the executor and having its own package is just an option if we see it used by another functionality. I do not have one in mind at the moment.
And I also do not like util
or have a name that this one can live nicely with the other one you mentioned.
@@ -2560,6 +2565,13 @@ func (d *VirtualMachineController) hotplugSriovInterfaces(vmi *v1.VirtualMachine | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you delete the vm from the d.sriovHotplugExecutorPool
in this case? All the sriov nics are plugged, the rate limiting backoff should be zeroed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Thanks for the heads up, done.
@@ -2517,6 +2517,10 @@ func (d *VirtualMachineController) vmUpdateHelperDefault(origVMI *v1.VirtualMach | |||
return fmt.Errorf("failed to adjust resources: %v", err) | |||
} | |||
} else if vmi.IsRunning() { | |||
if err := d.hotplugSriovInterfaces(vmi); err != nil { | |||
log.Log.Object(vmi).Error(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you re-enqueue the vmi is such case? How do you make sure that the vmi reconcile will be invoked again?
Also, on our hangouts meeting @EdDev mentioned the d.hotplugSriovInterfaces
is async. In such case even if an error is not returned, the operation may fail. How do we make sure the vmi reconcile in called again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, currently when not all SR-IOV interfaces are plugged we dont change the VM phase to failing, we do best effort to do the hotplug w/o disrupting the VM update flow similar to how it was before this PR changes.
Having that being said, it may change in the future.
virt-launcher domain-notifier sends (Modified) event periodically that triggers virt-handler to perform VMI sync that eventually calles hotplugSriovInterfaces
that does the hotplug.
On SR-IOV VM's logs there are periodic "Synced VMI" log messages every 1 minute or so.
And on virt-handler I added some debug messages to indicate a VM update and SR-IOV hotplug, both are seen periodically.
I will add references to the code soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're saying the re-enqueqe
we do in the controller is redundant?
The reconcile for the vmi is periodically invoked anyway?
EDIT: or what you're saying is that in the sriov hotplug case the re-enqueue is not needed since the vm is running and for running vm we have periodic reconcile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: or what you're saying is that in the sriov hotplug case the re-enqueue is not needed since the vm is running and for running vm we have periodic reconcile?
Yes.
Regarding the periodic VMI sync here are the references to what I described earlier:
The domains informer on virt-handler, triggers a VMI sync every 5 minutes [1] [2] [3] [4]
Other than that, virt-launcher domain-notifier client triggers a VMI sync (by sending domain modified event) every two minutes when QEMU guest-agent presents [5] [6] [7] [8] [9] [10]
fcef8f3
to
025856f
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AlonaKaplan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
bb58d66
to
98a9de1
Compare
Following offline discussion about how this PR changes affect SR-IOV hotplug on old virt-launcher pods and backward comparability in general. There are two interesting scenarios:
This is transient state as we expect all
Next, when the migration is completed and
To solve this and support old I have pushed new changes to fix it. |
Currently when SR-IOV VM is migrated we detach its SR-IOV devices just before migration start and attach them back to the target VM when migration is finished successfully. It is performed only one time only at post migration. The current implementation wont leverage the VMController, may leave the VM in incomplete state (missing SR-IOV devices) and also doesn't follow Kubernetes desire state design that is followed all over the project. With this change instead of attaching SR-IOV devices only on post migration, virt-handler VMController will handles it as part of its reconcile loop when needed. In order to support host-devices attachment at post migration on older virt-launcher pod's, virt-handler keep adjusting QMEU process memlock limits as part of post-migration on the target node flow. Signed-off-by: Or Mergi <ormergi@redhat.com>
Attaching SR-IOV host-device is an intrusive operation that may disturbs the VM workloads and overall availability. It should be called with a backoff in order to give the underlying components to finish. Performing host-devices hot-plug with a limited backoff should make sure it's done with reasonable time gaps and stop after a while. Signed-off-by: Or Mergi <ormergi@redhat.com>
Now that hot-plug host-devices is done as part of virt-handler VM controller reconcile loop, it should run on the background instead of blocking the loop and causing the VM update flow to hang. With this changes hot-plug host-devices logic will run on its own goroutine in a way that it won't block virt-handler VM controller reconcile loop. Also, in order to prevent disruption for the VM workloads, and redundant resource consumption, there will be no more than one concurrent hot-plug host-devices goroutine on virt-launcher. Signed-off-by: Or Mergi <ormergi@redhat.com>
98a9de1
to
5958e0c
Compare
/hold Placing hold until we have more eyes on it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the result looks really good!
@AlonaKaplan , this change was added (after your approve) to support the (edge) scenario where a new virt-handler handles the migration target of an old virt-launcher. It was explained in detail here. I think we are good to go. |
/unhold We seem to be good, lets have this in. |
/retest |
@ormergi: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
/cherry-pick release-0.49 |
@phoracek: new pull request created: #8560 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
Currently when SR-IOV VM is migrated we detach its SR-IOV network devices just before migration
starts and attach similar devices to VM on the target when migration is finished successfully.
Attaching the SR-IOV devices is one-shot operation and is performed one time only at post migration.
Due to the fact that it is a one-shot operation the VM might end up in incomplete state (missing SR-IOV devices)
when a SR-IOV device is disconnected manually or due to aborted migration.
Also the current implementation doesn't follow Kubernetes desire state design that is followed all over the project.
With this PR changes, virt-handler VMController is now aware of the SR-IOV network devices state and reconciles
them which means they will be attached to the VM when needed.
Since attaching SR-IOV host-device is an intrusive operation it will be rate-limited and stops after a while.
Also With this change, when a migration is aborted (due to failure or client request) the SR-IOV devices there were detached will be attached again to the source VM, and in case SR-IOV devices are disconnected from the guest they will be attached
again.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Special notes for your reviewer:
Release note: