diff --git a/docs/SRIOV.md b/docs/SRIOV.md index 15f69715..81f76951 100644 --- a/docs/SRIOV.md +++ b/docs/SRIOV.md @@ -4,7 +4,7 @@ In PMO version 3.11.x, support for SR-IOV has been introduced. SR-IOV provides i SR-IOV is supported by multiple network interface cards (NICs) provided by many networking vendors, including Intel, Cisco, Mellanox, Broadcom, QLogic, and others. -The following NICs have been tested with the Platform9 PMO 3.11 release: +The following NICs have been tested with the Platform9 PMO 3.11.3 release: * Mellanox ConnectX-4 Lx EN * Mellanox ConnectX-5 EN @@ -12,6 +12,13 @@ The following NICs have been tested with the Platform9 PMO 3.11 release: * Intel X540-T2 * Broadcom NetXtreme II (BCM57810 / HP 533FLR-T) +The following drivers are considered supported: + +* ixgbe +* bnx2x + +> Mellanox cards require additional configuration that is outside the scope of this guide and Platform9 Express. + ## Limitations The following are a few of the limitations of SR-IOV: @@ -92,10 +99,12 @@ Compute node-specific configurations can be implemented using what is known as h * Network interface name * Quantity of network interfaces used for SRIOV * Provider network mappings -* +* Number of VFs per interface + Using **host_vars**, the following are some variables that can be modified: * physical_device_mappings (required) +* sriov_numvfs (required) * neutron_ovs_bridge_mappings (optional) In this example, two hosts have different NICs installed that report different names to the operating system. @@ -148,13 +157,15 @@ supports-register-dump: yes supports-priv-flags: yes ``` -The **host_vars** for each host can be implemented in a file that corresponds to the host's short name located at **/opt/pf9-express/host_vars/.yml**. In the following example, **compute01** uses a single network interface for SR-IOV, while **compute02** uses two. SR-IOV networks will leverage a new provider label named **sriov**, as shown here: +The **host_vars** for each host can be implemented in a file that corresponds to the host's short name located at **/opt/pf9-express/host_vars/.yml**. In the following example, **compute01** uses a single network interface for SR-IOV, while **compute02** uses two. SR-IOV networks will leverage a new provider label named **sriov** and 8 VFs per interface, as shown here: ``` --- # compute01.yml physical_device_mappings: - sriov:ens1 +sriov_numvfs: + - ens1:8 ``` ``` @@ -163,6 +174,9 @@ physical_device_mappings: physical_device_mappings: - sriov:ens1f0 - sriov:ens1f1 +sriov_numvfs: + - ens1f0:8 + - ens1f1:8 ``` > SR-IOV supports VLAN networks only. Flat and overlay networks are not supported. @@ -172,12 +186,13 @@ physical_device_mappings: Group-wide configurations can be implemented using what is known as **group_vars**. Configurations that may be consistent between groups include: * Network interface name -* Quantity of network interfaces used for SRIOV +* Number of VFs per interface * Provider network mappings Using **group_vars**, the following are some variables that can be modified: * neutron_ovs_bridge_mappings +* sriov_numvfs * physical_device_mappings The **group_vars** for the **hypervisors** group can be implemented in a file that corresponds to the group's name located at **/opt/pf9-express/group_vars/.yml**. In the following example, every host in the **hypervisors** group has the same NIC installed in the same slot, so the naming convention is consistent across all hosts. A second provider bridge mapping has been established that will allow non-SR-IOV capable ports, such as DHCP, to connect to a vSwitch and communicate with SR-IOV ports: @@ -190,6 +205,9 @@ neutron_ovs_bridge_mappings: "external:br-pf9, sriov:br-sriov" physical_device_mappings: - sriov:ens1f0 - sriov:ens1f1 +sriov_numvfs: + - ens1f0:8 + - ens1f1:8 ... ``` @@ -222,7 +240,7 @@ Lastly, SR-IOV can be enabled via the respective **host_vars** file, as shown he ``` --- -# compute01.yml### +# compute01.yml sriov: "on" physical_device_mappings: - sriov:ens1 @@ -234,3 +252,9 @@ Once the respective configuration is in place, install PMO with Express using so ``` # ./pf9-express -a pmo ``` + +To refresh VFs, run **pf9-express** with the **refresh-sriov** tag: + +``` +# ./pf9-express -t refresh-sriov hypervisors +``` diff --git a/group_vars/hypervisors.yml b/group_vars/hypervisors.yml index f4f0ed0c..bccbeac4 100644 --- a/group_vars/hypervisors.yml +++ b/group_vars/hypervisors.yml @@ -11,8 +11,8 @@ glance: "off" multipath: False nova_instances_path: /opt/pf9/data/instances/ neutron_ovs_allow_dhcp_vms: "False" -neutron_ovs_bridge_name: "br-pf9, br-sriov" -neutron_ovs_bridge_mappings: "external:br-pf9, sriov:br-sriov" +#neutron_ovs_bridge_name: "br-pf9,br-sriov" +neutron_ovs_bridge_mappings: "external:br-pf9" ceilometer_customize: False ceilometer_cpu_interval: 600 diff --git a/host_vars/compute01.yml.example b/host_vars/compute01.yml.example index 7c7e3246..d82d6132 100644 --- a/host_vars/compute01.yml.example +++ b/host_vars/compute01.yml.example @@ -1,3 +1,5 @@ --- physical_device_mappings: - sriov:ens1 +sriov_numvfs: + - ens1:8 diff --git a/host_vars/compute02.yml.example b/host_vars/compute02.yml.example index a3c00dbd..ffc867ee 100644 --- a/host_vars/compute02.yml.example +++ b/host_vars/compute02.yml.example @@ -2,3 +2,6 @@ physical_device_mappings: - sriov:ens1f0 - sriov:ens1f1 +sriov_numvfs: + - ens1f0:8 + - ens1f1:8 diff --git a/pf9-express b/pf9-express index 6cdaaf70..fb32388d 100755 --- a/pf9-express +++ b/pf9-express @@ -514,7 +514,7 @@ while [ $# -gt 0 ]; do tags=${2} for tag in $(echo ${tags} | sed -e 's/,/ /g'); do case ${tag} in - live-migration|image-import) + live-migration|image-import|refresh-sriov) ;; *) assert "invalid tag : '${tag}'" diff --git a/pf9-express.yml b/pf9-express.yml index eec58e03..b0756358 100644 --- a/pf9-express.yml +++ b/pf9-express.yml @@ -175,3 +175,16 @@ become: true roles: - post-hook + +# Run SR-IOV role +- hosts: + - hypervisors + become: true + tasks: + - import_role: + name: neutron-sriov + when: + - sriov == "on" + - ansible_virtualization_role == "host" + tags: + - refresh-sriov diff --git a/roles/neutron-prerequisites/tasks/main.yml b/roles/neutron-prerequisites/tasks/main.yml index 457f7256..04379cbd 100644 --- a/roles/neutron-prerequisites/tasks/main.yml +++ b/roles/neutron-prerequisites/tasks/main.yml @@ -19,8 +19,9 @@ - name: Create required OVS bridges openvswitch_bridge: - bridge: "{{ item }}" + bridge: "{{ item.split(':')[1] }}" fail_mode: secure state: present - with_items: "{{ neutron_ovs_bridge_name.split(',') }}" + with_items: "{{ neutron_ovs_bridge_mappings.split(',') }}" +# with_items: "{{ neutron_ovs_bridge_name.split(',') }}" diff --git a/roles/neutron-sriov/handlers/main.yml b/roles/neutron-sriov/handlers/main.yml new file mode 100644 index 00000000..3a9cee75 --- /dev/null +++ b/roles/neutron-sriov/handlers/main.yml @@ -0,0 +1,18 @@ +--- +- name: Restart sysfsutils + systemd: + name: sysfsutils.service + state: restarted + listen: restart_sysfsutils + +- name: Restart pf9-ostackhost + systemd: + name: pf9-ostackhost.service + state: restarted + listen: restart_ostackhost + +- name: Restart pf9-sriov-agent + systemd: + name: pf9-neutron-sriov-agent.service + state: restarted + listen: restart_neutronsriovagent diff --git a/roles/neutron-sriov/tasks/main.yml b/roles/neutron-sriov/tasks/main.yml new file mode 100644 index 00000000..7582e7aa --- /dev/null +++ b/roles/neutron-sriov/tasks/main.yml @@ -0,0 +1,50 @@ +--- +# SR-IOV virtual functions get reset at boot unless commands exist in +# rc.local (deprecated), a systemctl unit file, or sysfs.conf. We set it +# up in sysfs. + +# Get current VF count in running sysfs +- name: Get current VF count for interface + slurp: + path: "/sys/class/net/{{ item.split(':')[0] }}/device/sriov_numvfs" + register: slurp_vfs + with_items: "{{ sriov_numvfs }}" + +- set_fact: + current_vfs: "{{ current_vfs|default({}) | combine({item.item.split(':')[0]:item.content | b64decode | replace('\n', '')}) }}" + with_items: "{{ slurp_vfs.results }}" + +# Remove entries in sysfs when count changed. This WILL break connectivity +# for instances using VFs on the interface until the instance is shutoff +# or hard rebooted! +- name: Remove existing entries for interface from sysfs + lineinfile: + path: /etc/sysfs.conf + state: absent + regexp: "^class\\/net\\/{{ item.split(':')[0] }}\\/device\\/sriov_numvfs = .*" + with_items: "{{ sriov_numvfs }}" + when: current_vfs[item.split(':')[0]] != item.split(':')[1] + +- name: Set VFs to 0 to work around I/O error when count is changed + lineinfile: + path: /etc/sysfs.conf + line: "class/net/{{ item.split(':')[0] }}/device/sriov_numvfs = 0" + create: yes + with_items: "{{ sriov_numvfs }}" + +- name: Add VFs to sysfs.conf + lineinfile: + path: /etc/sysfs.conf + insertafter: "^class\\/net\\/{{ item.split(':')[0] }}\\/device\\/sriov_numvfs = 0" + line: "class/net/{{ item.split(':')[0] }}/device/sriov_numvfs = {{ item.split(':')[1] }}" + create: yes + with_items: "{{ sriov_numvfs }}" + register: sysfs_vfs + notify: + - restart_sysfsutils + - restart_ostackhost + - restart_neutronsriovagent + +- debug: + msg: "ALERT - VFs on {{ ansible_hostname }} changed. Instances using SR-IOV ports must be shutdown or hard rebooted for interfaces to be reconnected." + when: sysfs_vfs.changed diff --git a/roles/pre-flight-checks-openstack/tasks/prerequisites-sriov.yml b/roles/pre-flight-checks-openstack/tasks/prerequisites-sriov.yml index 96af5bd9..a9d2d7a4 100644 --- a/roles/pre-flight-checks-openstack/tasks/prerequisites-sriov.yml +++ b/roles/pre-flight-checks-openstack/tasks/prerequisites-sriov.yml @@ -1,4 +1,8 @@ --- +# There are prerequisites for SR-IOV support, including IOMMU enabled, +# passthrough, and driver support. Many drivers support SR-IOV, but some +# require out-of-kernel drivers (ie. Mellanox) and some don't support sysfs. + - name: Fail on incompatible CPU architecture fail: msg: "Detected {{ cpu_vendor }} CPU not supported! Must be {{ supported_cpus }}." @@ -11,8 +15,6 @@ # We need to # - Check to see if IOMMU is already enabled. If not, let's check grub (and break out iommu/pt) -# - Check for SRIOV compatibility (via NIC) -# - Update grub and reboot if necessary. Wait for reboot. - name: Check GRUB defaults and enable IOMMU if necessary lineinfile: @@ -28,8 +30,23 @@ - name: Fail if IOMMU is not enabled fail: - msg: | - IOMMU is not currently enabled in the kernel but has been configured. - Please reboot the host and rerun Express. - Refer to https://platform9.com/knowledge/KB12345 + msg: + - "IOMMU is not currently enabled in the kernel but has been configured. Please reboot the host and rerun Express. Refer to https://platform9.com/knowledge/KB12345 for assistance." when: iommus.examined < 1 + +# Warn if NIC driver is not supported +- name: Determine driver bound to NICs + find: + paths: "/sys/class/net/{{ item.split(':')[1] }}/device/driver/module/drivers" + file_type: link + register: find_result + with_items: "{{ physical_device_mappings }}" + +- set_fact: + nic_driver: "{{ (item.path | basename).split(':')[1] }}" + with_items: "{{ find_result.results[0].files }}" + +- debug: + msg: + - "The {{ nic_driver }} NIC driver is not currently supported by Platform9. Refer to https://platform9.com/knowledge/KB12345 for assistance." + when: nic_driver not in supported_nic_drivers diff --git a/roles/pre-flight-checks-openstack/vars/main.yml b/roles/pre-flight-checks-openstack/vars/main.yml index 1d93b9c4..3e21d7cf 100644 --- a/roles/pre-flight-checks-openstack/vars/main.yml +++ b/roles/pre-flight-checks-openstack/vars/main.yml @@ -5,3 +5,6 @@ iommu_kernel_cmds: '{{ cpu_vendor }}_iommu=on iommu=pt' supported_cpus: - intel - amd +supported_nic_drivers: + - ixgbe + - bnx2x