Skip to content

Commit

Permalink
Run shelve/shelve_offload_instance in a semaphore
Browse files Browse the repository at this point in the history
When an instance is shelved, by default it is immediately
offloaded because CONF.shelved_offload_time defaults to 0.

When a shelved instance is offloaded, it's destroyed and it's
host/node values are nulled out.

Unshelving an instance is basically the same flow as building
an instance for the first time. The instance.host/node values
are set in the resource tracker when claiming resources.

Tempest has some tests which use a shared server resource and
perform actions on that shared server. These tests are triggering
a race when unshelve is called while the compute is offloading
the shelved instance. The race hits a window where unshelve is
running before shelve_offload_instance nulls out the instance
host/node values. The resource claim during unshelve sets the
host/node values (which were actually already set) and then
shelve_offload_instance nulls them out. The unshelve operation
sets the instance.vm_state to ACTIVE, however. So Tempest sees
an instance that's ACTIVE and thinks it can run the next action
test on it, for example 'suspend'. This fails because the
instance.host isn't set (from shelve_offload_instance) and the
test fails in the compute API.

To close the race window, we add a lock to shelve_instance and
shelve_offload_instance to match the lock that's in
unshelve_instance. This way when unshelve is called it will
wait until the shelve_offload_instance operation is complete
and the instance.host value is nulled out.

Closes-Bug: #1611008

Change-Id: Id36b3b9516d72d28519c18c38d98b646b47d288d
(cherry picked from commit e285eb1)
  • Loading branch information
Matt Riedemann committed Aug 19, 2016
1 parent 8f20e70 commit c7b2664
Showing 1 changed file with 17 additions and 2 deletions.
19 changes: 17 additions & 2 deletions nova/compute/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -4254,6 +4254,14 @@ def shelve_instance(self, context, instance, image_id,
:param image_id: an image id to snapshot to.
:param clean_shutdown: give the GuestOS a chance to stop
"""

@utils.synchronized(instance.uuid)
def do_shelve_instance():
self._shelve_instance(context, instance, image_id, clean_shutdown)
do_shelve_instance()

def _shelve_instance(self, context, instance, image_id,
clean_shutdown):
compute_utils.notify_usage_exists(self.notifier, context, instance,
current_period=True)
self._notify_about_instance_usage(context, instance, 'shelve.start')
Expand Down Expand Up @@ -4288,8 +4296,8 @@ def update_task_state(task_state, expected_state=task_states.SHELVING):
self._notify_about_instance_usage(context, instance, 'shelve.end')

if CONF.shelved_offload_time == 0:
self.shelve_offload_instance(context, instance,
clean_shutdown=False)
self._shelve_offload_instance(context, instance,
clean_shutdown=False)

@wrap_exception()
@reverts_task_state
Expand All @@ -4306,6 +4314,13 @@ def shelve_offload_instance(self, context, instance, clean_shutdown):
:param instance: nova.objects.instance.Instance
:param clean_shutdown: give the GuestOS a chance to stop
"""

@utils.synchronized(instance.uuid)
def do_shelve_offload_instance():
self._shelve_offload_instance(context, instance, clean_shutdown)
do_shelve_offload_instance()

def _shelve_offload_instance(self, context, instance, clean_shutdown):
self._notify_about_instance_usage(context, instance,
'shelve_offload.start')

Expand Down

0 comments on commit c7b2664

Please sign in to comment.