Synchronise 2023.1 with upstream #78

github-actions · 2024-06-25T09:12:11Z

This PR contains a snapshot of 2023.1 from upstream stable/2023.1.

This adds 'debug' level messages at the branches of the function that lead to a 'False' result. These branches are: - Physnet found affinity on a NUMA cell outside the chosen ones - Tunneled networks found affinity on a NUMA cell outside the chosen ones Partial-Bug: #1751784 Change-Id: I4d45f383b3c4794f8a114047455efb764f60f2a2 (cherry picked from commit 1915a31) (cherry picked from commit 0d970bf)

This reverts commit 41096f8 as it has a bug in it expecting a wrong exception type (FileNotFoundError vs nova.exception.FileNotFound). Also there is a proper fix on master 2c44215 that can be backported instead. Change-Id: Id2c253a6e223bd5ba22512d9e5a40a9d12680da2 (cherry picked from commit 03ef4d6) (cherry picked from commit 25d0db7)

In Icb913ed9be8d508de35e755a9c650ba25e45aca2 we forgot to add a privsep decorator for the set_offline() method. Change-Id: I769d35907ab9466fe65b942295fd7567a757087a Closes-Bug: #2022955 (cherry picked from commit 3fab437)

Change-Id: Ifb7d001cfdb95b1b0aa29f45c0ef71c0673e1760 Closes-Bug: #2023018 (cherry picked from commit 2c44215) (cherry picked from commit 891d177)

The URLs had the wrong order of "/latest/nova" instead of the correct one, leading to "404 not found" errors. Closes-Bug: 2036530 Change-Id: I083381ad2649c06be9443f5ed6a55bddafab4df8 (cherry picked from commit 32dc852) (cherry picked from commit fb59ca6)

Ran into this randomly today, if a test sets CONF.scheduler.enabled_filters to a non-default value, the affinity support global variables will be set to False which can affect subsequent test runs that expect the default configuration (affinity filter support enabled). Example error: WARNING [nova.scheduler.utils] Failed to compute_task_build_instances: ServerGroup policy is not supported: ServerGroupAffinityFilter not configured This resets the global variables during base test setup, similar to how other globals are reset. Change-Id: Icbc75b1001c0a609280241f99a780313b244aa9d (cherry picked from commit d533727)

We are still having some issues in the gate where greenlets from previous tests continue to run while the next test starts, causing false negative failures in unit or functional test jobs. This adds a new fixture that will ensure GreenThreadPoolExecutor.shutdown() is called with wait=True, to wait for greenlets in the pool to finish running before moving on. In local testing, doing this does not appear to adversely affect test run times, which was my primary concern. As a baseline, I ran a subset of functional tests in a loop until failure without the patch and after 11 hours, I got a failure reproducing the bug. With the patch, running the same subset of functional tests in a loop has been running for 24 hours and has not failed yet. Based on this, I think it may be worth trying this out to see if it will help stability of our unit and functional test jobs. And if it ends up impacting test run times or causes other issues, we can revert it. Partial-Bug: #1946339 Change-Id: Ia916310522b007061660172fa4d63d0fde9a55ac (cherry picked from commit c095cfe)

…2023.1

…table/2023.1

In I008841988547573878c4e06e82f0fa55084e51b5 we started enabling a bunch of libvirt enlightenments for Windows unconditionally. Turns out, the `evmcs` enlightenment only works on Intel hosts, and we broke the ability to run Windows guests on AMD machines. Until we become smarter about conditionally enabling evmcs (with something like traits for host CPU features), just stop enabling it at all. Change-Id: I2ff4fdecd9dc69de283f0e52e07df1aeaf0a9048 Closes-bug: 2009280 (cherry picked from commit 86a35e9) (cherry picked from commit 0b7a59a)

The 'reenlightenment' hyperv enlightenment will cause instances live-migration to fail (KVM currently doesn’t fully support reenlightenment notifications, see www.qemu.org/docs/master/system/i386/hyperv.html), so don't enable it now. Change-Id: I6821819450bc96e4304125ea3b76a0e462e6e33f Closes-Bug: #2046549 Related-Bug: #2009280 (cherry picked from commit e618e78) (cherry picked from commit 436e525)

With the change from ml2/ovs DHCP agents towards OVN implementation in neutron there is no port with device_owner network:dhcp anymore. Instead DHCP is provided by network:distributed port. Closes-Bug: 2055245 Change-Id: Ibb569b9db1475b8bbd8f8722d49228182cd47f85 (cherry picked from commit 135af52) (cherry picked from commit 45a9261)

Emptying the cpu init file and directly calling the submodule API. NOTE(artom) Backporting this makes the whole stack on top of this pass functional and unit tests without any stable-only modifications. Otherwise we'd have to refactor nova/virt/libvirt/cpu/__init__.py to use the new per-driver API objects. Relates to blueprint libvirt-cpu-state-mgmt Change-Id: I1299ca4b49743f58bec6f541785dd9fbee0ae9e2 (cherry picked from commit 37fa501)

When we pin emulator threads with the `isolate` policy, those pins are stored in the `cpuset_reserved` field in each NUMACell. In subsequent patches we'll need those pins for the whole instance, so this patch adds a helper property that does this for us, similar to how the `cpu_pinning` property helper currently works. Related-bug: 2056612 Change-Id: I8597f13e8089106434018b94e9bbc2091f95fee9 (cherry picked from commit 8dbfecd) (cherry picked from commit 62a35d2)

Related-bug: 2056612 Change-Id: Icd586cdd015143b2e113fd14904f40410809d247 (cherry picked from commit 521af26) (cherry picked from commit 3dde972)

Previously, with the `isolate` emulator threads policy and libvirt cpu power management enabled, we did not power on the cores to which the emulator threads were pin. Start doing that, and don't forget to power them down when the instance is stopped. Closes-bug: 2056612 Change-Id: I6e5383d8a0bf3f0ed8c870754cddae4e9163b4fd (cherry picked from commit 0986d2b) (cherry picked from commit 4bcf5ad)

We want to test power management in our functional tests in multinode scenarios (ex: live migration). This was previously impossible because all the methods in nova.virt.libvirt.cpu.api and were at the module level, meaning both source and destination libvirt drivers would call the same method to online and offline cores. This made it impossible to maintain distinct core power state between source and destination. This patch inserts a nova.virt.libvirt.cpu.api.API class, and gives the libvirt driver a cpu_api attribute with an instance of that class. Along with the tiny API.core() helper, this allows new functional tests in the subsequent patches to stub out the core "model" code with distinct objects on the source and destination libvirt drivers, and enables a whole bunch of testing (and fixes!) around live migration. Related-bug: 2056613 Change-Id: I052535249b9a3e144bb68b8c588b5995eb345b97 (cherry picked from commit 29dc044) (cherry picked from commit 2a0e638)

Building on the previous patch's refactor, we can now do functional testing of live migration with CPU power management. We quickly notice that it's mostly broken, leaving the CPUs powered up on the source, and not powering them up on the dest. Related-bug: 2056613 Change-Id: Ib4de77d68ceeffbc751bca3567ada72228b750af (cherry picked from commit 1f5e342) (cherry picked from commit 95bbb04)

Previously, live migrations completely ignored CPU power management. This patch makes sure that we correctly: * Power up the cores on the destination during pre_live_migration, as we need them powered up before the instance starts on the destination. * If the live migration is successful, power down the vacated cores on the source. * In case of a rollback, power down the cores previously powered up on pre_live_migration. Closes-bug: 2056613 Change-Id: I787bd7807950370cd865f29b95989d489d4826d0 (cherry picked from commit c1ccc1a) (cherry picked from commit c5a73e6)

Reproduction steps: 1. Execute: nova-manage image_property show <vm_uuid> \ hw_vif_multiqueue_enabled 2. Observe: An error has occurred: Traceback (most recent call last): File "/var/lib/kolla/venv/lib/python3.9/ site-packages/nova/cmd/manage.py", line 3394, in main ret = fn(*fn_args, **fn_kwargs) TypeError: show() got an unexpected keyword argument 'property' Change-Id: I1349b880934ad9f44a943cf7de324d7338619d2e Closes-Bug: #2016346 (cherry picked from commit 1c02c0d) (cherry picked from commit fc4b592) (cherry picked from commit 1bbd44e)

…ble/2023.1

…" into stable/2023.1

… stable/2023.1

…table/2023.1

…o stable/2023.1

list_instances and list_instance_uuids, as written in the Ironic driver, do not currently respect conductor_group paritioning. Given a nova compute is intended to limit it's scope of work to the conductor group it is configured to work with; this is a bug. Additionally, this should be a significant performance boost for a couple of reasons; firstly, instead of calling the Ironic API and getting all nodes, instead of the subset (when using conductor group), we're now properly getting the subset of nodes -- this is the optimized path in the Ironic DB and API code. Secondly, we're now using the driver's node cache to respond to these requests. Since list_instances and list_instance_uuids is used by periodic tasks, these operating with data that may be slightly stale should have minimal impact compared to the performance benefits. Closes-bug: #2043036 Change-Id: If31158e3269e5e06848c29294fdaa147beedb5a5 (cherry picked from commit fa3cf7d) (cherry picked from commit 555d7d0)

jsanemet and others added 30 commits November 14, 2023 01:38

cpu: fix the privsep issue when offlining the cpu

bb472db

In Icb913ed9be8d508de35e755a9c650ba25e45aca2 we forgot to add a privsep decorator for the set_offline() method. Change-Id: I769d35907ab9466fe65b942295fd7567a757087a Closes-Bug: #2022955 (cherry picked from commit 3fab437)

cpu: make governors to be optional

5c33bd4

Change-Id: Ifb7d001cfdb95b1b0aa29f45c0ef71c0673e1760 Closes-Bug: #2023018 (cherry picked from commit 2c44215) (cherry picked from commit 891d177)

Merge "testing: Reset affinity support global variables" into stable/…

caee734

…2023.1

Merge "tests: Use GreenThreadPoolExecutor.shutdown(wait=True)" into s…

4b4fc27

…table/2023.1

Merge "libvirt: Stop unconditionally enabling evmcs" into stable/2023.1

fab81c3

Reproducer for not powering on isolated emulator threads cores

6ea5c69

Related-bug: 2056612 Change-Id: Icd586cdd015143b2e113fd14904f40410809d247 (cherry picked from commit 521af26) (cherry picked from commit 3dde972)

Merge "fup for power management series" into stable/2023.1

c63046f

Merge "Add cpuset_reserved helper to instance NUMA topology" into sta…

01107e9

…ble/2023.1

Merge "Reproducer for not powering on isolated emulator threads cores…

273ecae

…" into stable/2023.1

Merge "Power on cores for isolated emulator threads" into stable/2023.1

ffc95fe

Merge "pwr mgmt: make API into a per-driver object" into stable/2023.1

d0a6cd0

Merge "Reproducer test for live migration with power management" into…

955f070

… stable/2023.1

Merge "pwr mgmt: handle live migrations correctly" into stable/2023.1

9cad74c

Merge "Fix nova-manage image_property show unexpected keyword" into s…

f8ea57f

…table/2023.1

Merge "Improve logging at '_numa_cells_support_network_metadata'" int…

dd66f40

…o stable/2023.1

github-actions bot requested a review from a team as a code owner June 25, 2024 09:12

github-actions bot added automated Automated action performed by GitHub Actions synchronisation labels Jun 25, 2024

markgoddard closed this Jun 25, 2024

markgoddard reopened this Jun 25, 2024

markgoddard merged commit 5516304 into stackhpc/2023.1 Jun 25, 2024

markgoddard deleted the upstream/2023.1-2024-06-25 branch June 25, 2024 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Synchronise 2023.1 with upstream #78

Synchronise 2023.1 with upstream #78

Uh oh!

github-actions bot commented Jun 25, 2024

Uh oh!

Uh oh!

Synchronise 2023.1 with upstream #78

Synchronise 2023.1 with upstream #78

Uh oh!

Conversation

github-actions bot commented Jun 25, 2024

Uh oh!

Uh oh!