Skip to content

Commit

Permalink
libvirt: Add a workaround to skip compareCPU() on destination
Browse files Browse the repository at this point in the history
Nova's use of libvirt's compareCPU() API served its purpose
over the years, but its design limitations break live migration in
subtle ways.  For example, the compareCPU() API compares against the
host physical CPUID.  Some of the features from this CPUID aren not
exposed by KVM, and then there are some features that KVM emulates that
are not in the host CPUID.  The latter can cause bogus live migration
failures.

With QEMU >=2.9 and libvirt >= 4.4.0, libvirt will do the right thing in
terms of CPU compatibility checks on the destination host during live
migration.  Nova satisfies these minimum version requirements by a good
margin.  So, provide a workaround to skip the CPU comparison check on
the destination host before migrating a guest, and let libvirt handle it
correctly.  This workaround will be removed once Nova replaces the older
libvirt APIs with their newer and improved counterparts[1][2].

                - - -

Note that Nova's libvirt driver calls compareCPU() in another method,
_check_cpu_compatibility(); I did not remove its usage yet.  As it needs
more careful combing of the code, and then:

  - where possible, remove the usage of compareCPU() altogether, and
    rely on libvirt doing the right thing under the hood; or

  - where Nova _must_ do the CPU comparison checks, switch to the better
    libvirt CPU APIs -- baselineHypervisorCPU() and
    compareHypervisorCPU() -- that are described here[1].  This is work
    in progress[2].

[1] https://opendev.org/openstack/nova-specs/commit/70811da221035044e27
[2] https://review.opendev.org/q/topic:bp%252Fcpu-selection-with-hypervisor-consideration

Change-Id: I444991584118a969e9ea04d352821b07ec0ba88d
Closes-Bug: #1913716
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Balazs Gibizer <bgibizer@redhat.com>
  • Loading branch information
kashyapc committed May 24, 2022
1 parent 449aa39 commit 267a406
Show file tree
Hide file tree
Showing 4 changed files with 61 additions and 9 deletions.
8 changes: 8 additions & 0 deletions nova/conf/workarounds.py
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,14 @@
Related options:
* :oslo.config:option:`quota.driver`
"""),
cfg.BoolOpt('skip_cpu_compare_on_dest',
default=False,
help="""
With the libvirt driver, during live migration, skip comparing guest CPU
with the destination host. When using QEMU >= 2.9 and libvirt >=
4.4.0, libvirt will do the correct thing with respect to checking CPU
compatibility on the destination host during live migration.
"""),
]

Expand Down
19 changes: 19 additions & 0 deletions nova/tests/unit/virt/libvirt/test_driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -10915,6 +10915,25 @@ def test_check_can_live_migrate_guest_cpu_none_model(
'serial_listen_addr': None},
result.obj_to_primitive()['nova_object.data'])

@mock.patch(
'nova.network.neutron.API.has_port_binding_extension',
new=mock.Mock(return_value=False))
@mock.patch.object(libvirt_driver.LibvirtDriver,
'_create_shared_storage_test_file',
return_value='fake')
@mock.patch.object(libvirt_driver.LibvirtDriver, '_compare_cpu')
def test_check_can_live_migrate_guest_cpu_none_model_skip_compare(
self, mock_cpu, mock_test_file):
self.flags(group='workarounds', skip_cpu_compare_on_dest=True)
instance_ref = objects.Instance(**self.test_instance)
instance_ref.vcpu_model = test_vcpu_model.fake_vcpumodel
instance_ref.vcpu_model.model = None
drvr = libvirt_driver.LibvirtDriver(fake.FakeVirtAPI(), False)
compute_info = {'cpu_info': 'asdf', 'disk_available_least': 1}
drvr.check_can_live_migrate_destination(
self.context, instance_ref, compute_info, compute_info)
mock_cpu.assert_not_called()

@mock.patch(
'nova.network.neutron.API.has_port_binding_extension',
new=mock.Mock(return_value=False))
Expand Down
19 changes: 10 additions & 9 deletions nova/virt/libvirt/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -9330,15 +9330,16 @@ def check_can_live_migrate_destination(self, context, instance,
disk_available_mb = (
(disk_available_gb * units.Ki) - CONF.reserved_host_disk_mb)

# Compare CPU
try:
if not instance.vcpu_model or not instance.vcpu_model.model:
source_cpu_info = src_compute_info['cpu_info']
self._compare_cpu(None, source_cpu_info, instance)
else:
self._compare_cpu(instance.vcpu_model, None, instance)
except exception.InvalidCPUInfo as e:
raise exception.MigrationPreCheckError(reason=e)
if not CONF.workarounds.skip_cpu_compare_on_dest:
# Compare CPU
try:
if not instance.vcpu_model or not instance.vcpu_model.model:
source_cpu_info = src_compute_info['cpu_info']
self._compare_cpu(None, source_cpu_info, instance)
else:
self._compare_cpu(instance.vcpu_model, None, instance)
except exception.InvalidCPUInfo as e:
raise exception.MigrationPreCheckError(reason=e)

# Create file on storage, to be checked on source host
filename = self._create_shared_storage_test_file(instance)
Expand Down
24 changes: 24 additions & 0 deletions releasenotes/notes/skip-compare-cpu-on-dest-6ae419ddd61fd0f8.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
issues:
- |
Nova's use of libvirt's compareCPU() API served its purpose over the
years, but its design limitations break live migration in subtle
ways. For example, the compareCPU() API compares against the host
physical CPUID. Some of the features from this CPUID aren not
exposed by KVM, and then there are some features that KVM emulates
that are not in the host CPUID. The latter can cause bogus live
migration failures.
With QEMU >=2.9 and libvirt >= 4.4.0, libvirt will do the right
thing in terms of CPU compatibility checks on the destination host
during live migration. Nova satisfies these minimum version
requirements by a good margin. So, this workaround provides a way to
skip the CPU comparison check on the destination host before
migrating a guest, and let libvirt handle it correctly.
This workaround will be deprecated and removed once Nova replaces
the older libvirt APIs with their newer counterparts. The work is
being tracked via this `blueprint
cpu-selection-with-hypervisor-consideration`_.
.. _blueprint cpu-selection-with-hypervisor-consideration: https://blueprints.launchpad.net/nova/+spec/cpu-selection-with-hypervisor-consideration

0 comments on commit 267a406

Please sign in to comment.