-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding and fixing support for host-model migration in heterogeneous cluster #7672
Adding and fixing support for host-model migration in heterogeneous cluster #7672
Conversation
/cc @vladikr @iholder-redhat @davidvossel |
15dbfcd
to
5b5d25d
Compare
/retest-required |
2 similar comments
/retest-required |
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking on this issue @Barakmor1 !
I have a suggestion in line.
@@ -1533,19 +1533,28 @@ func (c *MigrationController) alertIfHostModelIsUnschedulable(vmi *virtv1.Virtua | |||
} | |||
} | |||
|
|||
func prepareNodeSelectorForHostCpuModel(node *k8sv1.Node, pod *k8sv1.Pod) error { | |||
func prepareNodeSelectorForHostCpuModel(node *k8sv1.Node, pod *k8sv1.Pod, vmi *virtv1.VirtualMachineInstance, sourcePod *k8sv1.Pod) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can make this function look like or something similar...
func prepareNodeSelectorForHostCpuModel(node *k8sv1.Node, pod *k8sv1.Pod, sourcePod *k8sv1.Pod) error {
var hostCpuModel, cpuModelLabelKey, hostModelLabelValue string
haveCPUNodeSelector := false
for key, value := range sourcePod.Spec.NodeSelector {
if strings.Contains(key, virtv1.CPUFeatureLabel) || strings.Contains(key, virtv1.CPUModelLabel) {
pod.Spec.NodeSelector[key] = value
haveCPUNodeSelector := true
}
}
if !haveCPUNodeSelector {
for key, value := range node.Labels {
if strings.HasPrefix(key, virtv1.HostModelCPULabel) {
hostCpuModel = strings.TrimPrefix(key, virtv1.HostModelCPULabel)
hostModelLabelValue = value
}
if strings.HasPrefix(key, virtv1.HostModelRequiredFeaturesLabel) {
requiredFeature := strings.TrimPrefix(key, virtv1.HostModelRequiredFeaturesLabel)
pod.Spec.NodeSelector[virtv1.CPUFeatureLabel+requiredFeature] = value
}
}
if hostCpuModel == "" {
return fmt.Errorf("node does not contain labal \"%s\" with information about host cpu model", virtv1.HostModelCPULabel)
}
cpuModelLabelKey = virtv1.CPUModelLabel + hostCpuModel
pod.Spec.NodeSelector[cpuModelLabelKey] = hostModelLabelValue
log.Log.Object(pod).Infof("cpu model label selector (\"%s\") defined for migration target pod", cpuModelLabelKey)
}
return nil
}
var hostCpuModel, hostCpuModelLabelKey, hostModelLabelValue string | ||
|
||
for key, value := range node.Labels { | ||
if strings.HasPrefix(key, virtv1.HostModelCPULabel) { | ||
hostCpuModel = strings.TrimPrefix(key, virtv1.HostModelCPULabel) | ||
hostModelLabelValue = value | ||
} | ||
|
||
if strings.HasPrefix(key, virtv1.HostModelRequiredFeaturesLabel) { | ||
if vmi.Status.MigratedAtLeastOnce == false && strings.HasPrefix(key, virtv1.HostModelRequiredFeaturesLabel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we don't need vmi.Status.MigratedAtLeastOnce
field at all?
I think we can simply rely on the presence of the previous node selector and just copy it to the destination pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key here is that previously we were searching by the HostModelCPULabel
prefix, but we should have been searching by the CPUModelLabel
prefix
So this
hostCpuModelLabelKey = virtv1.HostModelCPULabel + hostCpuModel
pod.Spec.NodeSelector[hostCpuModelLabelKey] = hostModelLabelValue
would become
cpuModelLabelKey = virtv1.CPUModelLabel + hostCpuModel
pod.Spec.NodeSelector[cpuModelLabelKey] = hostModelLabelValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Vladik,
Thanks for reviewing.
Your idea sounds good but what if the initial node doesn't have any required feature labels i think that in that case if we won't use vmi.Status.MigratedAtLeastOnce
than we will be coupled to the target node required features.
On second thought we can search for CPUModelLabel
selector instead of CPUFeatureLabel
that is always in virt-launcher after the first migration to solve this, i will update the PR soon.
and about the cpuModelLabelKey
i have a similar branch to solve this issue, i'm waiting for this one to be merged to publish it because i use this new test's Context to test this issue modifications and i prefer to keep this PR small and more cohesive to make it easier to review in the future.
just to be clear the next PR is to solve this issue:
vmi with host-model can't migrate if the target node does't have the same host model even if the target node support the host-model of the source node and has all the required features.
and it will be publish immediately after this one will get merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your idea sounds good but what if the initial node doesn't have any required feature labels i think that in that case if we won't use vmi.Status.MigratedAtLeastOnce than we will be coupled to the target node required features.
If the problem we're trying to solve here involves knowledge of the first node a VMI runs on, can we store all that information in the VMI status after the VMI starts so we don't have to keep track of whether migrations have already occurred or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Vladik, Thanks for reviewing. Your idea sounds good but what if the initial node doesn't have any required feature labels i think that in that case if we won't use
vmi.Status.MigratedAtLeastOnce
than we will be coupled to the target node required features. On second thought we can search forCPUModelLabel
selector instead ofCPUFeatureLabel
that is always in virt-launcher after the first migration to solve this, i will update the PR soon.
The function I've posted above searches for either CPUFeatureLabel
or CPUModelLabel
if one of these labels exists in the source pod nodeSelector then we should propagate this to the destination pod nodeSelector.
Otherwise, we will create a new nodeSelector for the destination pod that is based on the translation of the host-model CPU, as this is probably the first time we are migrating this VMI.
for key, value := range sourcePod.Spec.NodeSelector {
if strings.Contains(key, virtv1.CPUFeatureLabel) || strings.Contains(key, virtv1.CPUModelLabel) {
pod.Spec.NodeSelector[key] = value
haveCPUNodeSelector := true
}
}
if !haveCPUNodeSelector {
and about the
cpuModelLabelKey
i have a similar branch to solve this issue, i'm waiting for this one to be merged to publish it because i use this new test's Context to test this issue modifications and i prefer to keep this PR small and more cohesive to make it easier to review in the future. just to be clear the next PR is to solve this issue:vmi with host-model can't migrate if the target node does't have the same host model even if the target node support the host-model of the source node and has all the required features.
and it will be publish immediately after this one will get merged
In my view, these two changes should happen in the same PR.
Previously, we were generating the nodeSelector using the HostModelCPULabel
field. Without changing the cpuModelLabelKey, I don't see how can we test this PR.
Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidvossel we probably could, but perhaps we can get away without updating the vmi status.
On the first migration, the translated CPU model will be stored in the destination pod nodeSelector.
Any following migration could simply reuse this nodeSelector.
My concern with storing this information in the VMI status is that it won't be backward compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function I've posted above searches for either CPUFeatureLabel or CPUModelLabel if one of these labels exists in the source pod nodeSelector then we should propagate this to the destination pod nodeSelector.
Otherwise, we will create a new nodeSelector for the destination pod that is based on the translation of the host-model CPU, as this is probably the first time we are migrating this VMI.
Right, I missed the OR operator ||
sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladikr i updated the PR to not include the new field in vmi.status, updated the description of this PR.
In my view, these two changes should happen in the same PR.
Previously, we were generating the nodeSelector using the HostModelCPULabel field. Without changing the cpuModelLabelKey, I don't see how can we test this PR.
I also added the modification of my second branch to this PR including a test that disable the target node labeller and simulate a target node with fake host-model that support the source node host-model.
(if it is possible to simulate it in the cluster otherwise i skip the test just like the first test)
regarding the changes to the hostCpuModelLabelKey
:
This modification is necessary but we also need to include the host-model cpuModel of a node as part of the supported cpuModels in the node(using CPUModelLabel + <host-model> : true
) , i included this modification as part of the fourth commit of this PR.
The new functional test in the third commit of this PR doesn't pass without both modification.
Honestly i think that keeping PR as cohesive as possible is best practice.
From my point of view we should deal with one bug in each PR and add a test for this bug to make sure it doesn't happen again,
this way it is easier to follow and understand in future reviews.
5960cbd
to
797dd92
Compare
180f5b8
to
4ebc7f2
Compare
/retest-required |
dc4b4a9
to
65c5cf6
Compare
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I think it's almost there, from my side.
@jean-edouard can you please take a look as well?
func prepareNodeSelectorForHostCpuModel(node *k8sv1.Node, pod *k8sv1.Pod) error { | ||
var hostCpuModel, hostCpuModelLabelKey, hostModelLabelValue string | ||
func prepareNodeSelectorForHostCpuModel(node *k8sv1.Node, pod *k8sv1.Pod, sourcePod *k8sv1.Pod) error { | ||
var hostCpuModel, nodeSelectorKeyForHostModel, hostModelLabelValue string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like the following is needed here.
if cpu := vmi.Spec.Domain.CPU; cpu == nil || cpu.Model != virtv1.CPUModeHostModel {
return
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, perfect, thanks, I missed it.
@Barakmor1: new pull request created: #7853 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-0.52 |
@Barakmor1: #7672 failed to apply on top of branch "release-0.52":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
… features, Allow vmi with host-model to migrate when target node does't have the same host-model cpuModel. Manual back port kubevirt#7672,kubevirt#7770 from main Signed-off-by: bmordeha <bmodeha@redhat.com>
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):For instance:
can run on node01 witout any problem.
Fixes #
Added new field to know if the vmi already migrated and if it did use the old node selectors(of the initial node).
Special notes for your reviewer:
I noticed that there are more bugs related to host-model cpuModel for instance:
vmi with host-model can't migrate if the target node does't have the same host model even if the target node support the host-model of the source node and has all the required features.
I will publish more PRs soon.
Release note: