Skip to content
Permalink
Browse files

ironic: complete the flavor data migration started in pike

Blueprint custom-resource-classes-in-flavors in Pike added
support for reporting custom resource class inventory for
ironic instances and the ability to schedule to ironic nodes
based on that custom resource class being in flavor extra specs.
The goal was to eventually stop scheduling ironic instances based
on the standard VCPU, MEMORY_MB and DISK_GB resource classes and
just use the single custom resource class defined on the node
itself.

As part of that work, a data migration was added for existing
ironic instances to report their custom resource class usage
in the embedded instance.flavor.extra_specs.

Also in Pike the nova-scheduler started creating resource
allocations during scheduling, which before Pike was handled
by the ResourceTracker in the nova-compute service. Once all
computes were upgraded to at least Pike, the ResourceTracker
would no longer report those allocations. This was problematic
for existing ironic instances created before Pike since their
allocations were no longer being reported to placement which
meant the scheduler could incorrectly try to schedule new instances
to an already at-capacity node. This was fixed with change
Ibbf65a8d817d359786abcdffa6358089ed1107f6 to always report
allocations from nova-compute for ironic instances.

Another problem existed because even if ironic instances had
their embedded flavor migrated to report custom resource usage,
the driver would still report standard resource class inventory
which meant the scheduler could incorrectly try to schedule
non-baremetal flavors to ironic nodes. Since the code that reported
standard resource class inventory was removed in Stein with change
If2b8c1a76d7dbabbac7bb359c9e572cfed510800, a stable-branch only
workaround was added in Id3c74c019da29070811ffc368351e2238b3f6da5
so deployments that had completed the data migration for their
ironic instances could configure nova-compute to set
[workarounds]/report_ironic_standard_resource_class_inventory=False
so the driver would no longer report standard resource class inventory.
Note that the code to always report ironic instance allocations from
the compute service was also removed in Stein with change
If272365e58a583e2831a15a5c2abad2d77921729.

However (now we get to the current bug), if the driver no longer
reports standard resource class inventory but the standard resource
class usage on existing embedded instances does not have zeroed out
values for those standard resource classes, because of the fix for
Ibbf65a8d817d359786abcdffa6358089ed1107f6 the ResourceTracker will
try to report allocations to placement for the standard resource
classes which will fail if the ironic node resource provider does
not have standard resource class inventory - which happens if
report_ironic_standard_resource_class_inventory=False.

Given reporting standard resource class inventory and usage has been
removed in Stein this is not really an issue for Stein, but the
embedded flavor data migration code still exists in Stein and this
change deals with zero'ing out the embedded standard resource class
usage so it does not get reported to the scheduler (which as noted
above can fail depending on configuration). It should also be noted
that the standard resource class zero'ing out should have always
been a part of the data migration introduced in Pike, there is even
a comment about it in the code itself:

"This code can be removed in Queens, and will need to be updated to
also alter extra_specs to zero-out the old-style standard resource
classes of VCPU, MEMORY_MB, and DISK_GB."

This change implements the latter part of that statement. Note that
even though the standard resource classes will be zeroed out in
the embedded instance flavor, that only overrides what is reported
to placement (it actually causes those resource classes to *not* be
reported for allocations - which is the fix), the vcpus/ram/disk
attributes on the flavor itself are left untouched so the user can
still get them as display information about their instance in the API.

Change-Id: Ic64f5362de9a930e68e1649eaca4619ecde122de
Closes-Bug: #1816034
(cherry picked from commit ca9335c)
  • Loading branch information...
mriedem authored and markgoddard committed Feb 15, 2019
1 parent d8f1bef commit da1b23b8ea66bb2be428d002258daf9e7e535cc1
@@ -2773,9 +2773,15 @@ def fake_inst_by_uuid(ctx, uuid, expected_attrs=None):

self.driver._refresh_cache()
self.assertEqual(2, mock_save.call_count)
expected_specs = {"resources:CUSTOM_FIRST": "1"}
expected_specs = {"resources:CUSTOM_FIRST": "1",
"resources:VCPU": "0",
"resources:MEMORY_MB": "0",
"resources:DISK_GB": "0"}
self.assertEqual(expected_specs, inst1.flavor.extra_specs)
expected_specs = {"resources:CUSTOM_SECOND": "1"}
expected_specs = {"resources:CUSTOM_SECOND": "1",
"resources:VCPU": "0",
"resources:MEMORY_MB": "0",
"resources:DISK_GB": "0"}
self.assertEqual(expected_specs, inst2.flavor.extra_specs)

@mock.patch.object(ironic_driver.IronicDriver, '_get_node_list')
@@ -2790,7 +2796,10 @@ def test_pike_flavor_migration_instance_migrated(self, mock_save,
node2_uuid = uuidutils.generate_uuid()
hostname = "ironic-compute"
fake_flavor1 = objects.Flavor()
fake_flavor1.extra_specs = {"resources:CUSTOM_FIRST": "1"}
fake_flavor1.extra_specs = {"resources:CUSTOM_FIRST": "1",
"resources:VCPU": "0",
"resources:MEMORY_MB": "0",
"resources:DISK_GB": "0"}
fake_flavor2 = objects.Flavor()
fake_flavor2.extra_specs = {}
inst1 = fake_instance.fake_instance_obj(self.ctx,
@@ -2829,11 +2838,67 @@ def fake_inst_by_uuid(ctx, uuid, expected_attrs=None):
# custom resource_class, only the other one should be updated and
# saved.
self.assertEqual(1, mock_save.call_count)
expected_specs = {"resources:CUSTOM_FIRST": "1"}
expected_specs = {"resources:CUSTOM_FIRST": "1",
"resources:VCPU": "0",
"resources:MEMORY_MB": "0",
"resources:DISK_GB": "0"}
self.assertEqual(expected_specs, inst1.flavor.extra_specs)
expected_specs = {"resources:CUSTOM_SECOND": "1"}
expected_specs = {"resources:CUSTOM_SECOND": "1",
"resources:VCPU": "0",
"resources:MEMORY_MB": "0",
"resources:DISK_GB": "0"}
self.assertEqual(expected_specs, inst2.flavor.extra_specs)

@mock.patch.object(ironic_driver.IronicDriver, '_get_node_list')
@mock.patch.object(objects.ServiceList, 'get_all_computes_by_hv_type')
@mock.patch.object(objects.InstanceList, 'get_uuids_by_host')
@mock.patch.object(objects.Instance, 'get_by_uuid')
@mock.patch.object(objects.Instance, 'save')
def test_pike_flavor_migration_partial_migrated(self, mock_save,
mock_get_by_uuid,
mock_get_uuids_by_host,
mock_svc_by_hv,
mock_get_node_list):
"""Tests the case that the instance embedded flavor was partially
migrated to set the "resources:CUSTOM_FIRST" extra spec but the
standard resource classes were not overridden.
"""
node_uuid = uuidutils.generate_uuid()
hostname = "ironic-compute"
fake_flavor = objects.Flavor()
fake_flavor.extra_specs = {"resources:CUSTOM_FIRST": "1"}
inst = fake_instance.fake_instance_obj(self.ctx,
node=node_uuid,
host=hostname,
flavor=fake_flavor)
node = _get_cached_node(
uuid=node_uuid,
instance_uuid=inst.uuid,
instance_type_id=1,
resource_class="first",
network_interface="flat")
inst_dict = {inst.uuid: inst}
mock_get_uuids_by_host.return_value = [inst.uuid]
self.driver.node_cache = {}
mock_get_node_list.return_value = [node]
mock_svc_by_hv.return_value = []

def fake_inst_by_uuid(ctx, uuid, expected_attrs=None):
return inst_dict.get(uuid)

mock_get_by_uuid.side_effect = fake_inst_by_uuid

self.driver._refresh_cache()
# The instance already had the custom resource class extra spec set
# but not the standard resource class overrides so it should have been
# updatd.
self.assertEqual(1, mock_save.call_count)
expected_specs = {"resources:CUSTOM_FIRST": "1",
"resources:VCPU": "0",
"resources:MEMORY_MB": "0",
"resources:DISK_GB": "0"}
self.assertEqual(expected_specs, inst.flavor.extra_specs)

@mock.patch.object(ironic_driver.LOG, 'warning')
@mock.patch.object(ironic_driver.IronicDriver, '_get_node_list')
@mock.patch.object(objects.ServiceList, 'get_all_computes_by_hv_type')
@@ -2887,7 +2952,10 @@ def fake_inst_by_uuid(ctx, uuid, expected_attrs=None):
self.assertEqual(1, mock_save.call_count)
expected_specs = {}
self.assertEqual(expected_specs, inst1.flavor.extra_specs)
expected_specs = {"resources:CUSTOM_SECOND": "1"}
expected_specs = {"resources:CUSTOM_SECOND": "1",
"resources:VCPU": "0",
"resources:MEMORY_MB": "0",
"resources:DISK_GB": "0"}
self.assertEqual(expected_specs, inst2.flavor.extra_specs)
# Verify that the LOG.warning was called correctly
self.assertEqual(1, mock_warning.call_count)
@@ -25,6 +25,7 @@
import tempfile
import time

import os_resource_classes as orc
from oslo_log import log as logging
from oslo_serialization import jsonutils
from oslo_service import loopingcall
@@ -535,12 +536,27 @@ def _pike_flavor_migration_for_node(ctx, node_rc, instance_uuid):
instance = objects.Instance.get_by_uuid(ctx, instance_uuid,
expected_attrs=["flavor"])
specs = instance.flavor.extra_specs
resource_key = "resources:%s" % normalized_rc
if resource_key in specs:
# We need to look for both the custom resource class and a standard
# resource class, so we'll use VCPU.
custom_rc_resource_key = "resources:%s" % normalized_rc
vcpu_resource_key = "resources:%s" % orc.VCPU
# Are both the custom and standard resource class handled?
if custom_rc_resource_key in specs and vcpu_resource_key in specs:
# The compute must have been restarted, and the instance.flavor
# has already been migrated
return False
specs[resource_key] = "1"
# Has the custom resource class been handled?
if custom_rc_resource_key not in specs:
specs[custom_rc_resource_key] = "1"
# Have the standard resource classes been handled?
if vcpu_resource_key not in specs:
# We override the standard resource classes so that we can still
# display the embedded instance.flavor attributes to the user like
# vcpus, ram and disk, but override the actual allocated value to
# 0 which is what the ResourceTracker will report to placement.
specs[vcpu_resource_key] = "0"
specs["resources:%s" % orc.MEMORY_MB] = "0"
specs["resources:%s" % orc.DISK_GB] = "0"
instance.save()
return True

@@ -553,9 +569,7 @@ def _pike_flavor_migration(self, node_uuids):
extra_specs, the periodic call to update_available_resources() will add
an allocation against the custom resource class, and prevent placement
from thinking that that node is available. This code can be removed in
Queens, and will need to be updated to also alter extra_specs to
zero-out the old-style standard resource classes of VCPU, MEMORY_MB,
and DISK_GB.
Queens.
"""
ctx = nova_context.get_admin_context()

@@ -574,7 +588,8 @@ def _pike_flavor_migration(self, node_uuids):
node.instance_uuid)
self._migrated_instance_uuids.add(node.instance_uuid)
LOG.debug("The flavor extra_specs for Ironic instance %(inst)s "
"have been updated for custom resource class '%(rc)s'.",
"have been updated for custom resource class '%(rc)s' "
"and standard resource classes have been zeroed out.",
{"inst": node.instance_uuid, "rc": node_rc})
return

@@ -0,0 +1,14 @@
---
fixes:
- |
Ironic instances have had their embedded flavor data migrated since the
16.0.0 Pike in order to report custom resource class allocations to the
Placement service. However, that data migration did not zero out the
standard resource classes (``VCPU``, ``MEMORY_MB``, ``DISK_GB``) on the
embedded flavor which meant trying to report allocations for those classes
to Placement would fail if the ironic compute node resource provider itself
no longer reported inventory for those resource classes. A fix has been
made for `bug 1816034 <https://bugs.launchpad.net/nova/+bug/1816034>`_
which will zero out standard resource class reporting for existing ironic
instances even if those instances have already been migrated once to
account for the custom resource class allocations.

0 comments on commit da1b23b

Please sign in to comment.
You can’t perform that action at this time.