From 774979e3f1c57890872c37d766efe11b82ff1452 Mon Sep 17 00:00:00 2001 From: Stig Telfer Date: Sun, 15 Nov 2020 21:23:57 +0000 Subject: [PATCH 1/6] First draft documentation for GPU config in Kayobe Based on Isaacs playbook in kayobe-ops and other work by Will and the SIB team. --- source/gpus_in_openstack.rst | 217 +++++++++++++++++++++++++++++++++++ source/index.rst | 1 + 2 files changed, 218 insertions(+) create mode 100644 source/gpus_in_openstack.rst diff --git a/source/gpus_in_openstack.rst b/source/gpus_in_openstack.rst new file mode 100644 index 0000000..63e49e1 --- /dev/null +++ b/source/gpus_in_openstack.rst @@ -0,0 +1,217 @@ +.. include:: vars.rst + +============================= +Support for GPUs in OpenStack +============================= + +This guide is has been developed for Nvidia GPUs and CentOS 8. + +See `Kayobe Ops `_ for +a playbook implementation of host setup for GPU. + +BIOS Configuration Requirements +------------------------------- + +On an Intel system: + +* Enable `VT-x` in the BIOS for virtualisation support. +* Enable `VT-d` in the BIOS for IOMMU support. + +Hypervisor Configuration Requirements +------------------------------------- + +Find the GPU device IDs +~~~~~~~~~~~~~~~~~~~~~~~ + +From the host OS, use ``lspci -nn`` to find the PCI vendor ID and +device ID for the GPU device and supporting components. These are +4-digit hex numbers. + +For example: + +.. code-block:: text + + 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 980M] [10de:13d7] (rev a1) (prog-if 00 [VGA controller]) + 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1) + +In this case the vendor ID is ``10de``, display ID is ``13d7`` and audio ID is ``0fbb``. + +Alternatively, for an Nvidia Quadro RTX 6000: + +.. code-block:: yaml + + # NVIDIA Quadro RTX 6000/8000 PCI device IDs + vendor_id: "10de" + display_id: "1e30" + audio_id: "10f7" + usba_id: "1ad6" + usba_class: "0c0330" + usbc_id: "1ad7" + usbc_class: "0c8000" + +These parameters will be used for device-specific configuration. + +Kernel Ramdisk Reconfiguration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ramdisk loaded during kernel boot can be extended to include the +vfio PCI drivers and ensure they are loaded early in system boot. + +.. code-block:: yaml + + - name: Template dracut config + blockinfile: + path: /etc/dracut.conf.d/gpu-vfio.conf + block: | + add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd" + owner: root + group: root + mode: 0660 + create: true + become: true + notify: + - Regenerate initramfs + - reboot + +The handler for regenerating the Dracut initramfs is: + +.. code-block:: yaml + + - name: Regenerate initramfs + shell: |- + #!/bin/bash + set -eux + dracut -v -f /boot/initramfs-$(uname -r).img $(uname -r) + become: true + +Kernel Boot Parameters +~~~~~~~~~~~~~~~~~~~~~~ + +Set the following kernel parameters by adding to +``GRUB_CMDLINE_LINUX_DEFAULT`` or ``GRUB_CMDLINE_LINUX`` in +``/etc/default/grub.conf``. We can use the +`stackhpc.grubcmdline `_ +role from Ansible Galaxy: + +.. code-block:: yaml + + - name: Add vfio-pci.ids kernel args + include_role: + name: stackhpc.grubcmdline + vars: + kernel_cmdline: + - intel_iommu=on + - iommu=pt + - "vfio-pci.ids={{ vendor_id }}:{{ display_id }},{{ vendor_id }}:{{ audio_id }}" + kernel_cmdline_remove: + - iommu + - intel_iommu + - vfio-pci.ids: + +Kernel Device Management +~~~~~~~~~~~~~~~~~~~~~~~~ + +In the hypervisor, we must prevent kernel device initialisation of +the GPU and prevent drivers from loading for binding the GPU in the +host OS. We do this using ``udev`` rules: + +.. code-block:: yaml + + - name: Template udev rules to blacklist GPU usb controllers + blockinfile: + # We want this to execute as soon as possible + path: /etc/udev/rules.d/99-gpu.rules + block: | + #Remove NVIDIA USB xHCI Host Controller Devices, if present + ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usba_class }}", ATTR{remove}="1" + #Remove NVIDIA USB Type-C UCSI devices, if present + ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usbc_class }}", ATTR{remove}="1" + owner: root + group: root + mode: 0644 + create: true + become: true + +Kernel Drivers +~~~~~~~~~~~~~~ + +Prevent the ``nouveau`` kernel driver from loading by +blacklisting the module: + +.. code-block:: yaml + + - name: Blacklist nouveau + blockinfile: + path: /etc/modprobe.d/blacklist-nouveau.conf + block: | + blacklist nouveau + options nouveau modeset=0 + mode: 0664 + owner: root + group: root + create: true + become: true + notify: + - reboot + - Regenerate initramfs + +Ensure that the ``vfio`` drivers are loaded into the kernel on boot: + +.. code-block:: yaml + + - name: Add vfio to modules-load.d + blockinfile: + path: /etc/modules-load.d/vfio.conf + block: | + vfio + vfio_iommu_type1 + vfio_pci + vfio_virqfd + owner: root + group: root + mode: 0664 + create: true + become: true + notify: reboot + +Once this code has taken effect (after a reboot), the VFIO kernel drivers should be loaded on boot: + +.. code-block:: text + + # lsmod | grep vfio + vfio_pci 49152 0 + vfio_virqfd 16384 1 vfio_pci + vfio_iommu_type1 28672 0 + vfio 32768 2 vfio_iommu_type1,vfio_pci + irqbypass 16384 5 vfio_pci,kvm + +OpenStack Nova configuration +---------------------------- + +Testing GPU in a Guest VM +------------------------- + +The Nvidia drivers must be installed first. For example, on an Ubuntu guest: + +.. code-block:: text + + sudo apt install nvidia-headless-440 nvidia-utils-440 nvidia-compute-utils-440 + +The ``nvidia-smi`` command will generate detailed output if the driver has loaded +successfully. + +Further Reference +----------------- + +For PCI Passthrough and GPUs in OpenStack: + +* Consumer-grade GPUs: https://gist.github.com/claudiok/890ab6dfe76fa45b30081e58038a9215 +* https://www.jimmdenton.com/gpu-offloading-openstack/ +* https://docs.openstack.org/nova/latest/admin/pci-passthrough.html +* https://docs.openstack.org/nova/latest/admin/virtual-gpu.html (vGPU only) +* Telsa models in OpenStack: https://egallen.com/openstack-nvidia-tesla-gpu-passthrough/ +* https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF +* https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt +* https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/appe-configuring_a_hypervisor_host_for_pci_passthrough +* https://www.gresearch.co.uk/article/utilising-the-openstack-placement-service-to-schedule-gpu-and-nvme-workloads-alongside-general-purpose-instances/ + diff --git a/source/index.rst b/source/index.rst index d47c91f..7e8db2c 100644 --- a/source/index.rst +++ b/source/index.rst @@ -25,6 +25,7 @@ Contents managing_users_and_projects operations_and_monitoring customising_deployment + gpus_in_openstack Indices and search ================== From e7890fc056d5f2f1154a0b97e6e099176ae4bf35 Mon Sep 17 00:00:00 2001 From: Stig Telfer Date: Mon, 16 Nov 2020 15:56:38 +0000 Subject: [PATCH 2/6] Add example config for Nova compute service --- source/gpus_in_openstack.rst | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/source/gpus_in_openstack.rst b/source/gpus_in_openstack.rst index 63e49e1..04dff41 100644 --- a/source/gpus_in_openstack.rst +++ b/source/gpus_in_openstack.rst @@ -188,6 +188,42 @@ Once this code has taken effect (after a reboot), the VFIO kernel drivers should OpenStack Nova configuration ---------------------------- +Scheduler Filters +~~~~~~~~~~~~~~~~~ + +Hypervisor Resource Tracking +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Configuration can be applied in flexible ways using Kolla-Ansible's +methods for `inventory-driven customisation of configuration +`_. +The following configuration could be added to +``etc/kayobe/kolla/config/nova/nova-compute.conf`` to enable PCI +passthrough of GPU devices for hosts in a group named ``compute_gpu``. +Again, the 4-digit PCI Vendor ID and Device ID extracted from ``lspci +-nn`` can be used here to specify the GPU device(s). + +.. code-block:: yaml + + [pci] + {% raw %} + {% if inventory_hostname in groups['compute_gpu'] %} + # We could support multiple models of GPU. + # This can be done more selectively using different inventory groups. + # GPU models defined here: + # NVidia Tesla V100 16GB + # NVidia Tesla V100 32GB + # NVidia Tesla P100 16GB + passthrough_whitelist = [{ "vendor_id":"10de", "product_id":"1db4" }, + { "vendor_id":"10de", "product_id":"1db5" }, + { "vendor_id":"10de", "product_id":"15f8" }] + alias = { "vendor_id":"10de", "product_id":"1db4", "device_type":"type-PCI", "name":"gpu-v100-16" } + alias = { "vendor_id":"10de", "product_id":"1db5", "device_type":"type-PCI", "name":"gpu-v100-32" } + alias = { "vendor_id":"10de", "product_id":"15f8", "device_type":"type-PCI", "name":"gpu-p100" } + {% endif %} + {% endraw %} + + Testing GPU in a Guest VM ------------------------- From a8bc7571d57aa62a79dc83c09e31600525c36471 Mon Sep 17 00:00:00 2001 From: Stig Telfer Date: Mon, 16 Nov 2020 15:58:04 +0000 Subject: [PATCH 3/6] Feedback from Pierre --- source/gpus_in_openstack.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/source/gpus_in_openstack.rst b/source/gpus_in_openstack.rst index 04dff41..f0647d9 100644 --- a/source/gpus_in_openstack.rst +++ b/source/gpus_in_openstack.rst @@ -4,7 +4,7 @@ Support for GPUs in OpenStack ============================= -This guide is has been developed for Nvidia GPUs and CentOS 8. +This guide has been developed for Nvidia GPUs and CentOS 8. See `Kayobe Ops `_ for a playbook implementation of host setup for GPU. @@ -245,7 +245,7 @@ For PCI Passthrough and GPUs in OpenStack: * https://www.jimmdenton.com/gpu-offloading-openstack/ * https://docs.openstack.org/nova/latest/admin/pci-passthrough.html * https://docs.openstack.org/nova/latest/admin/virtual-gpu.html (vGPU only) -* Telsa models in OpenStack: https://egallen.com/openstack-nvidia-tesla-gpu-passthrough/ +* Tesla models in OpenStack: https://egallen.com/openstack-nvidia-tesla-gpu-passthrough/ * https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF * https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt * https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/appe-configuring_a_hypervisor_host_for_pci_passthrough From 91d3e25dd31e7c8eb72780c1af6b550e07953e41 Mon Sep 17 00:00:00 2001 From: Bartosz Bezak Date: Thu, 26 Nov 2020 11:22:28 +0100 Subject: [PATCH 4/6] aditional GPU Passhtrough configuration bits: - nova-scheduler, nova-api configuration - flavor and instance creation - iommu testing in nova_libvirt --- source/gpus_in_openstack.rst | 74 ++++++++++++++++++++++++++++++++---- 1 file changed, 67 insertions(+), 7 deletions(-) diff --git a/source/gpus_in_openstack.rst b/source/gpus_in_openstack.rst index f0647d9..1c0b787 100644 --- a/source/gpus_in_openstack.rst +++ b/source/gpus_in_openstack.rst @@ -106,7 +106,7 @@ role from Ansible Galaxy: kernel_cmdline_remove: - iommu - intel_iommu - - vfio-pci.ids: + - vfio-pci.ids Kernel Device Management ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -185,14 +185,38 @@ Once this code has taken effect (after a reboot), the VFIO kernel drivers should vfio 32768 2 vfio_iommu_type1,vfio_pci irqbypass 16384 5 vfio_pci,kvm + # lspci -nnk -s 3d:00.0 + 3d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Tesla M10] [10de:13bd] (rev a2) + Subsystem: NVIDIA Corporation Tesla M10 [10de:1160] + Kernel driver in use: vfio-pci + Kernel modules: nouveau + +IOMMU should be enabled at kernel level as well - we can verify that on the compute host: + +.. code-block:: text + + # docker exec -it nova_libvirt virt-host-validate | grep IOMMU + QEMU: Checking for device assignment IOMMU support : PASS + QEMU: Checking if IOMMU is enabled by kernel : PASS + OpenStack Nova configuration ---------------------------- -Scheduler Filters -~~~~~~~~~~~~~~~~~ +Configure nova-scheduler +~~~~~~~~~~~~~~~~~~~~~~~~ + +The nova-scheduler service must be configured to enable the ``PciPassthroughFilter`` +To enable it add it to the list of filters to Kolla-Ansible configuration file: +``etc/kayobe/kolla/config/nova.conf``, for instance: + +.. code-block:: yaml + + [filter_scheduler] + available_filters = nova.scheduler.filters.all_filters + enabled_filters = AvailabilityZoneFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter -Hypervisor Resource Tracking -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Configure nova-compute +~~~~~~~~~~~~~~~~~~~~~~ Configuration can be applied in flexible ways using Kolla-Ansible's methods for `inventory-driven customisation of configuration @@ -203,7 +227,7 @@ passthrough of GPU devices for hosts in a group named ``compute_gpu``. Again, the 4-digit PCI Vendor ID and Device ID extracted from ``lspci -nn`` can be used here to specify the GPU device(s). -.. code-block:: yaml +.. code-block:: jinja [pci] {% raw %} @@ -223,6 +247,43 @@ Again, the 4-digit PCI Vendor ID and Device ID extracted from ``lspci {% endif %} {% endraw %} +Configure nova-api +~~~~~~~~~~~~~~~~~~ + +pci.alias also needs to be configured on the controller. +This configuration should match the configuration found on the compute nodes. +Add it to Kolla-Ansible configuration file: +``etc/kayobe/kolla/config/nova-api/nova.conf``, for instance: + +.. code-block:: yaml + + [pci] + alias = { "vendor_id":"10de", "product_id":"1db4", "device_type":"type-PCI", "name":"gpu-v100-16" } + alias = { "vendor_id":"10de", "product_id":"1db5", "device_type":"type-PCI", "name":"gpu-v100-32" } + alias = { "vendor_id":"10de", "product_id":"15f8", "device_type":"type-PCI", "name":"gpu-p100" } + +Reconfigure nova service +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: text + + kayobe overcloud service reconfigure -kt nova --kolla-skip-tags common --skip-precheck + +Configure a flavor +~~~~~~~~~~~~~~~~~~ +For example, to request two of the GPUs with alias gpu-p100 + +.. code-block:: text + + openstack flavor set m1.medium --property "pci_passthrough:alias"="gpu-p100:2" + + +Create instance with GPU passthrough +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: text + + openstack server create --flavor m1.medium --image ubuntu2004 --wait test-pci Testing GPU in a Guest VM ------------------------- @@ -250,4 +311,3 @@ For PCI Passthrough and GPUs in OpenStack: * https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt * https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/appe-configuring_a_hypervisor_host_for_pci_passthrough * https://www.gresearch.co.uk/article/utilising-the-openstack-placement-service-to-schedule-gpu-and-nvme-workloads-alongside-general-purpose-instances/ - From 2e86f7ad5174b80be99acafbdf6c03b37ae19f3f Mon Sep 17 00:00:00 2001 From: Bartosz Bezak Date: Thu, 26 Nov 2020 16:19:31 +0100 Subject: [PATCH 5/6] added flavor configuration for GPU in project_config changed subsubsection characters to more conventional ones --- source/gpus_in_openstack.rst | 50 ++++++++++++++++++++++++++++-------- 1 file changed, 39 insertions(+), 11 deletions(-) diff --git a/source/gpus_in_openstack.rst b/source/gpus_in_openstack.rst index 1c0b787..8bf6d7d 100644 --- a/source/gpus_in_openstack.rst +++ b/source/gpus_in_openstack.rst @@ -21,7 +21,7 @@ Hypervisor Configuration Requirements ------------------------------------- Find the GPU device IDs -~~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^^ From the host OS, use ``lspci -nn`` to find the PCI vendor ID and device ID for the GPU device and supporting components. These are @@ -52,7 +52,7 @@ Alternatively, for an Nvidia Quadro RTX 6000: These parameters will be used for device-specific configuration. Kernel Ramdisk Reconfiguration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ramdisk loaded during kernel boot can be extended to include the vfio PCI drivers and ensure they are loaded early in system boot. @@ -85,7 +85,7 @@ The handler for regenerating the Dracut initramfs is: become: true Kernel Boot Parameters -~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^ Set the following kernel parameters by adding to ``GRUB_CMDLINE_LINUX_DEFAULT`` or ``GRUB_CMDLINE_LINUX`` in @@ -109,7 +109,7 @@ role from Ansible Galaxy: - vfio-pci.ids Kernel Device Management -~~~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^^^ In the hypervisor, we must prevent kernel device initialisation of the GPU and prevent drivers from loading for binding the GPU in the @@ -133,7 +133,7 @@ host OS. We do this using ``udev`` rules: become: true Kernel Drivers -~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^ Prevent the ``nouveau`` kernel driver from loading by blacklisting the module: @@ -203,7 +203,7 @@ OpenStack Nova configuration ---------------------------- Configure nova-scheduler -~~~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^^^ The nova-scheduler service must be configured to enable the ``PciPassthroughFilter`` To enable it add it to the list of filters to Kolla-Ansible configuration file: @@ -216,7 +216,7 @@ To enable it add it to the list of filters to Kolla-Ansible configuration file: enabled_filters = AvailabilityZoneFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter Configure nova-compute -~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^ Configuration can be applied in flexible ways using Kolla-Ansible's methods for `inventory-driven customisation of configuration @@ -248,7 +248,7 @@ Again, the 4-digit PCI Vendor ID and Device ID extracted from ``lspci {% endraw %} Configure nova-api -~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^ pci.alias also needs to be configured on the controller. This configuration should match the configuration found on the compute nodes. @@ -263,14 +263,15 @@ Add it to Kolla-Ansible configuration file: alias = { "vendor_id":"10de", "product_id":"15f8", "device_type":"type-PCI", "name":"gpu-p100" } Reconfigure nova service -~~~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: text kayobe overcloud service reconfigure -kt nova --kolla-skip-tags common --skip-precheck Configure a flavor -~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^ + For example, to request two of the GPUs with alias gpu-p100 .. code-block:: text @@ -278,8 +279,35 @@ For example, to request two of the GPUs with alias gpu-p100 openstack flavor set m1.medium --property "pci_passthrough:alias"="gpu-p100:2" +This can be also defined in the |project_config| repository: +|project_config_source_url| + +add extra_specs to flavor in etc/|project_config|/|project_config|.yml: + +.. code-block:: console + :substitutions: + + admin# cd |base_path|/src/|project_config| + admin# vim etc/|project_config|/|project_config|.yml + + name: "m1.medium" + ram: 4096 + disk: 40 + vcpus: 2 + extra_specs: + "pci_passthrough:alias": "gpu-p100:2" + +Invoke configuration playbooks afterwards: + +.. code-block:: console + :substitutions: + + admin# source |base_path|/src/|kayobe_config|/etc/kolla/public-openrc.sh + admin# source |base_path|/venvs/|project_config|/bin/activate + admin# tools/|project_config| --vault-password-file |vault_password_file_path| + Create instance with GPU passthrough -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: text From 0ff66f8162efa812f1d4585dc5347d999f2ed3fd Mon Sep 17 00:00:00 2001 From: Bartosz Bezak Date: Mon, 30 Nov 2020 10:53:02 +0100 Subject: [PATCH 6/6] nova-api configure path fix, --skip-prechecks --- source/gpus_in_openstack.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/source/gpus_in_openstack.rst b/source/gpus_in_openstack.rst index 8bf6d7d..4343998 100644 --- a/source/gpus_in_openstack.rst +++ b/source/gpus_in_openstack.rst @@ -253,7 +253,7 @@ Configure nova-api pci.alias also needs to be configured on the controller. This configuration should match the configuration found on the compute nodes. Add it to Kolla-Ansible configuration file: -``etc/kayobe/kolla/config/nova-api/nova.conf``, for instance: +``etc/kayobe/kolla/config/nova/nova-api.conf``, for instance: .. code-block:: yaml @@ -267,7 +267,7 @@ Reconfigure nova service .. code-block:: text - kayobe overcloud service reconfigure -kt nova --kolla-skip-tags common --skip-precheck + kayobe overcloud service reconfigure --kolla-tags nova --kolla-skip-tags common --skip-prechecks Configure a flavor ^^^^^^^^^^^^^^^^^^