Skip to content

Conversation

github-actions[bot]
Copy link

This PR contains a snapshot of 2023.1 from upstream stable/2023.1.

jsanemet and others added 30 commits November 14, 2023 01:38
This adds 'debug' level messages at the branches of the function that
lead to a 'False' result. These branches are:

  - Physnet found affinity on a NUMA cell outside the chosen ones
  - Tunneled networks found affinity on a NUMA cell outside the
    chosen ones

Partial-Bug: #1751784

Change-Id: I4d45f383b3c4794f8a114047455efb764f60f2a2
(cherry picked from commit 1915a31)
(cherry picked from commit 0d970bf)
This reverts commit 41096f8 as it has a
bug in it expecting a wrong exception type (FileNotFoundError vs
nova.exception.FileNotFound). Also there is a proper fix on master
2c44215 that can be backported instead.

Change-Id: Id2c253a6e223bd5ba22512d9e5a40a9d12680da2
(cherry picked from commit 03ef4d6)
(cherry picked from commit 25d0db7)
In Icb913ed9be8d508de35e755a9c650ba25e45aca2 we forgot to add a privsep
decorator for the set_offline() method.

Change-Id: I769d35907ab9466fe65b942295fd7567a757087a
Closes-Bug: #2022955
(cherry picked from commit 3fab437)
Change-Id: Ifb7d001cfdb95b1b0aa29f45c0ef71c0673e1760
Closes-Bug: #2023018
(cherry picked from commit 2c44215)
(cherry picked from commit 891d177)
The URLs had the wrong order of "/latest/nova" instead of the correct
one, leading to "404 not found" errors.

Closes-Bug: 2036530

Change-Id: I083381ad2649c06be9443f5ed6a55bddafab4df8
(cherry picked from commit 32dc852)
(cherry picked from commit fb59ca6)
Ran into this randomly today, if a test sets
CONF.scheduler.enabled_filters to a non-default value, the affinity
support global variables will be set to False which can affect
subsequent test runs that expect the default configuration (affinity
filter support enabled).

Example error:

  WARNING [nova.scheduler.utils] Failed to
    compute_task_build_instances: ServerGroup policy is not supported:
      ServerGroupAffinityFilter not configured

This resets the global variables during base test setup, similar to how
other globals are reset.

Change-Id: Icbc75b1001c0a609280241f99a780313b244aa9d
(cherry picked from commit d533727)
We are still having some issues in the gate where greenlets from
previous tests continue to run while the next test starts, causing
false negative failures in unit or functional test jobs.

This adds a new fixture that will ensure
GreenThreadPoolExecutor.shutdown() is called with wait=True, to wait
for greenlets in the pool to finish running before moving on.

In local testing, doing this does not appear to adversely affect test
run times, which was my primary concern.

As a baseline, I ran a subset of functional tests in a loop
until failure without the patch and after 11 hours, I got a failure
reproducing the bug. With the patch, running the same subset of
functional tests in a loop has been running for 24 hours and has not
failed yet.

Based on this, I think it may be worth trying this out to see if it
will help stability of our unit and functional test jobs. And if it
ends up impacting test run times or causes other issues, we can
revert it.

Partial-Bug: #1946339

Change-Id: Ia916310522b007061660172fa4d63d0fde9a55ac
(cherry picked from commit c095cfe)
In I008841988547573878c4e06e82f0fa55084e51b5 we started enabling a
bunch of libvirt enlightenments for Windows unconditionally. Turns
out, the `evmcs` enlightenment only works on Intel hosts, and we broke
the ability to run Windows guests on AMD machines. Until we become
smarter about conditionally enabling evmcs (with something like traits
for host CPU features), just stop enabling it at all.

Change-Id: I2ff4fdecd9dc69de283f0e52e07df1aeaf0a9048
Closes-bug: 2009280
(cherry picked from commit 86a35e9)
(cherry picked from commit 0b7a59a)
The 'reenlightenment' hyperv enlightenment will cause
instances live-migration to fail (KVM currently doesn’t
fully support reenlightenment notifications, see
www.qemu.org/docs/master/system/i386/hyperv.html),
so don't enable it now.

Change-Id: I6821819450bc96e4304125ea3b76a0e462e6e33f
Closes-Bug: #2046549
Related-Bug: #2009280
(cherry picked from commit e618e78)
(cherry picked from commit 436e525)
With the change from ml2/ovs DHCP agents towards OVN implementation
in neutron there is no port with device_owner network:dhcp anymore.
Instead DHCP is provided by network:distributed port.

Closes-Bug: 2055245
Change-Id: Ibb569b9db1475b8bbd8f8722d49228182cd47f85
(cherry picked from commit 135af52)
(cherry picked from commit 45a9261)
Emptying the cpu init file and directly calling the submodule API.

NOTE(artom) Backporting this makes the whole stack on top of this pass
functional and unit tests without any stable-only modifications.
Otherwise we'd have to refactor nova/virt/libvirt/cpu/__init__.py to
use the new per-driver API objects.

Relates to blueprint libvirt-cpu-state-mgmt

Change-Id: I1299ca4b49743f58bec6f541785dd9fbee0ae9e2
(cherry picked from commit 37fa501)
When we pin emulator threads with the `isolate` policy, those pins are
stored in the `cpuset_reserved` field in each NUMACell. In subsequent
patches we'll need those pins for the whole instance, so this patch
adds a helper property that does this for us, similar to how the
`cpu_pinning` property helper currently works.

Related-bug: 2056612
Change-Id: I8597f13e8089106434018b94e9bbc2091f95fee9
(cherry picked from commit 8dbfecd)
(cherry picked from commit 62a35d2)
Related-bug: 2056612
Change-Id: Icd586cdd015143b2e113fd14904f40410809d247
(cherry picked from commit 521af26)
(cherry picked from commit 3dde972)
Previously, with the `isolate` emulator threads policy and libvirt cpu
power management enabled, we did not power on the cores to which the
emulator threads were pin. Start doing that, and don't forget to power
them down when the instance is stopped.

Closes-bug: 2056612
Change-Id: I6e5383d8a0bf3f0ed8c870754cddae4e9163b4fd
(cherry picked from commit 0986d2b)
(cherry picked from commit 4bcf5ad)
We want to test power management in our functional tests in multinode
scenarios (ex: live migration).

This was previously impossible because all the methods in
nova.virt.libvirt.cpu.api and were at the module level, meaning both
source and destination libvirt drivers would call the same method to
online and offline cores. This made it impossible to maintain distinct
core power state between source and destination.

This patch inserts a nova.virt.libvirt.cpu.api.API class, and gives
the libvirt driver a cpu_api attribute with an instance of that
class. Along with the tiny API.core() helper, this allows new
functional tests in the subsequent patches to stub out the core
"model" code with distinct objects on the source and destination
libvirt drivers, and enables a whole bunch of testing (and fixes!)
around live migration.

Related-bug: 2056613
Change-Id: I052535249b9a3e144bb68b8c588b5995eb345b97
(cherry picked from commit 29dc044)
(cherry picked from commit 2a0e638)
Building on the previous patch's refactor, we can now do functional
testing of live migration with CPU power management. We quickly notice
that it's mostly broken, leaving the CPUs powered up on the source,
and not powering them up on the dest.

Related-bug: 2056613
Change-Id: Ib4de77d68ceeffbc751bca3567ada72228b750af
(cherry picked from commit 1f5e342)
(cherry picked from commit 95bbb04)
Previously, live migrations completely ignored CPU power management.
This patch makes sure that we correctly:

* Power up the cores on the destination during pre_live_migration, as
  we need them powered up before the instance starts on the
  destination.
* If the live migration is successful, power down the vacated cores on
  the source.
* In case of a rollback, power down the cores previously powered up on
  pre_live_migration.

Closes-bug: 2056613
Change-Id: I787bd7807950370cd865f29b95989d489d4826d0
(cherry picked from commit c1ccc1a)
(cherry picked from commit c5a73e6)
Reproduction steps:
1. Execute: nova-manage image_property show <vm_uuid> \
                            hw_vif_multiqueue_enabled
2. Observe:
  An error has occurred:
  Traceback (most recent call last):
    File "/var/lib/kolla/venv/lib/python3.9/
          site-packages/nova/cmd/manage.py", line 3394, in main
      ret = fn(*fn_args, **fn_kwargs)
  TypeError: show() got an unexpected keyword argument 'property'

Change-Id: I1349b880934ad9f44a943cf7de324d7338619d2e
Closes-Bug: #2016346
(cherry picked from commit 1c02c0d)
(cherry picked from commit fc4b592)
(cherry picked from commit 1bbd44e)
list_instances and list_instance_uuids, as written in the Ironic driver,
do not currently respect conductor_group paritioning. Given a nova
compute is intended to limit it's scope of work to the conductor group
it is configured to work with; this is a bug.

Additionally, this should be a significant performance boost for a
couple of reasons; firstly, instead of calling the Ironic API and
getting all nodes, instead of the subset (when using conductor group),
we're now properly getting the subset of nodes -- this is the optimized
path in the Ironic DB and API code. Secondly, we're now using the
driver's node cache to respond to these requests. Since list_instances
and list_instance_uuids is used by periodic tasks, these operating with
data that may be slightly stale should have minimal impact compared to
the performance benefits.

Closes-bug: #2043036
Change-Id: If31158e3269e5e06848c29294fdaa147beedb5a5
(cherry picked from commit fa3cf7d)
(cherry picked from commit 555d7d0)
@github-actions github-actions bot requested a review from a team as a code owner June 25, 2024 09:12
@github-actions github-actions bot added automated Automated action performed by GitHub Actions synchronisation labels Jun 25, 2024
@markgoddard markgoddard reopened this Jun 25, 2024
@markgoddard markgoddard merged commit 5516304 into stackhpc/2023.1 Jun 25, 2024
@markgoddard markgoddard deleted the upstream/2023.1-2024-06-25 branch June 25, 2024 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automated Automated action performed by GitHub Actions synchronisation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants