diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml index 6468c67fcd28..28a5e21ee36e 100644 --- a/_topic_maps/_topic_map.yml +++ b/_topic_maps/_topic_map.yml @@ -1600,6 +1600,8 @@ Topics: File: using-sriov-multicast - Name: Using DPDK and RDMA File: using-dpdk-and-rdma + - Name: High availability for pod-level bonds on SR-IOV networks + File: configure-lacp-for-sriov - Name: Using pod-level bonding for secondary networks File: using-pod-level-bonding - Name: Configuring hardware offloading diff --git a/modules/installing-pfsr-operator-cli.adoc b/modules/installing-pfsr-operator-cli.adoc new file mode 100644 index 000000000000..30cfc2c88371 --- /dev/null +++ b/modules/installing-pfsr-operator-cli.adoc @@ -0,0 +1,75 @@ +:_mod-docs-content-type: PROCEDURE +[id="installing-pfsr-cli_{context}"] += Installing the PF Status Relay Operator using the CLI + +Install the PF Status Relay Operator to enable {product-title} to use Link Aggregation Control Protocol (LACP) as an active health check on physical functions (PFs). + +.Prerequisites + +* You configured LACP on your upstream switch. + +* You configured pod-level bonding for your SR-IOV networks. + +* You installed the OpenShift CLI (`oc`). + +* You have cluster-admin privileges. + +.Procedure + +. Create the `openshift-pf-status-relay-operator` namespace by entering the following command: ++ +[source,bash] +---- +$ cat << EOF| oc create -f - +apiVersion: v1 +kind: Namespace +metadata: + name: openshift-pf-status-relay-operator + annotations: + workload.openshift.io/allowed: management +EOF +---- + +. Create an `OperatorGroup` custom resource (CR) by entering the following command: ++ +[source,bash] +---- +$ cat << EOF| oc create -f - +apiVersion: operators.coreos.com/v1 +kind: OperatorGroup +metadata: + name: pf-status-relay-operators + namespace: openshift-pf-status-relay-operator +spec: + targetNamespaces: + - openshift-pf-status-relay-operator +EOF +---- + +. Create a `Subscription` CR for the PF Status Relay Operator by entering the following command: ++ +[source,bash] +---- +$ cat << EOF| oc create -f - +apiVersion: operators.coreos.com/v1alpha1 +kind: Subscription +metadata: + name: pf-status-relay-operator-subscription + namespace: openshift-pf-status-relay-operator +spec: + channel: stable + name: pf-status-relay-operator + source: redhat-operators + sourceNamespace: openshift-marketplace +EOF +---- + + +.Verification + +* To verify that the Operator is installed, enter the following command and then check that output shows `Succeeded` for the Operator: ++ +[source,bash] +---- +$ oc get csv -n openshift-pf-status-relay-operator -o custom-columns=Name:.metadata.name,Phase:.status.phase +---- diff --git a/modules/installing-pfsr-operator-console.adoc b/modules/installing-pfsr-operator-console.adoc new file mode 100644 index 000000000000..1b072a00c89b --- /dev/null +++ b/modules/installing-pfsr-operator-console.adoc @@ -0,0 +1,29 @@ +:_mod-docs-content-type: PROCEDURE +[id="installing-pfsr-console_{context}"] += Installing the PF Status Relay Operator using the web console + +Install the PF Status Relay Operator to enable {product-title} to use Link Aggregation Control Protocol (LACP) as an active health check on physical functions (PFs). + +.Prerequisites + +* You configured LACP on your upstream switch. + +* You configured pod-level bonding for your SR-IOV networks. + +* You have cluster-admin privileges. + +.Procedure + +. Install the PF Status Relay Operator: + +.. In the {product-title} web console, click *Ecosystem* -> *Software Catalog*. + +.. Select *PF Status Relay Operator* from the list of available Operators, and then click *Install*. + +.. On the *Install Operator* page, under *Installed Namespace*, select *Operator recommended Namespace*. + +.. Click *Install*. + +.Verification + +* Verify that the PF Status Relay Operator shows the *Status* as *Succeeded* on the Installed Operators dashboard. diff --git a/modules/lacp-switch-monitoring.adoc b/modules/lacp-switch-monitoring.adoc new file mode 100644 index 000000000000..a629887b66cf --- /dev/null +++ b/modules/lacp-switch-monitoring.adoc @@ -0,0 +1,481 @@ +// Module included in the following assemblies: +// +// * networking/hardware_networks/configure-lacp-for-sriov.adoc + +:_mod-docs-content-type: PROCEDURE +[id="configuring-lacp-sriov_{context}"] += Configuring the PF Status Relay Operator for LACP state monitoring on SR-IOV networks + +Use the PF Status Relay Operator to enable Link Aggregation Control Protocol (LACP) state monitoring for workloads using pod-level bonding with SR-IOV networks. The Operator monitors the LACP state on physical functions (PF) and changes the link state for attached virtual functions (VF) when it detects an upstream failure. With this approach, you can detect failures on VFs attached to a PF to ensure a timely fail over to backup network path, ensuring high availability for your workloads. + +The following scenario demonstrates how to configure and verify LACP state monitoring for SR-IOV networks: + +* Create host-level NIC bonds on worker nodes and configure LACP. + +* Define SR-IOV network policies to create virtual functions (VFs) on the bonded interfaces. + +* Deploy the PF Status Relay Operator to monitor PFs and monitor the LACP state. + +* Verify that pods using these VFs automatically fail over to a backup network path in case of upstream switch failure. + +The following scenario demonstrates how to configure and verify LACP state monitoring for SR-IOV networks. This scenario uses SR-IOV network cards with two ports on each node, `worker-0` and `worker-1`, with both ports connected to a shared switch to support LACP bonding. + +.Prerequisites + +* Nodes must have a NIC that supports SR-IOV. + +* The SR-IOV Network Operator is installed. + +* The PF Status Relay Operator is installed. + +* The physical switch ports connected to the worker nodes are configured for LACP with a fast polling rate. + +* The `linkState` is set to `auto` or `disable` for the SR-IOV VFs that you want to monitor. The Operator ignores VFs with the `linkState` set to `enable`. The default value for SR-IOV VFs is `linkState: auto`. + +.Procedure + +. Create the project namespace by creating a `namespace.yaml` file such as the following example: ++ +.Example `namespace.yaml` file +[source,yaml] +---- +apiVersion: v1 +kind: Namespace +metadata: + labels: + kubernetes.io/metadata.name: sriov-operator-tests + pod-security.kubernetes.io/audit: privileged + pod-security.kubernetes.io/enforce: privileged + pod-security.kubernetes.io/warn: privileged + security.openshift.io/scc.podSecurityLabelSync: "false" + name: sriov-operator-tests <1> +---- +<1> The namespace where you deploy the high-availability pod. + +. Apply the namespace by running the following command: ++ +[source,bash] +---- +$ oc apply -f namespace.yaml +---- + +. Configure host-level LACP bonds: + +.. Create a YAML file that defines the `NodeNetworkConfigurationPolicy` resource for the `ens5f0` interface on the `worker-0` node: ++ +.Example `nncpBondF0Worker0.yaml` file +[source,yaml] +---- +apiVersion: nmstate.io/v1 +kind: NodeNetworkConfigurationPolicy +metadata: + name: example-bond-f0 +spec: + nodeSelector: + kubernetes.io/hostname: worker-0 <1> + desiredState: + interfaces: + - name: example-bond-f0 + description: example-bond-f0 + type: bond + state: up + mtu: 9216 + link-aggregation: + mode: 802.3ad <2> + options: + miimon: '100' + lacp_rate: 'fast' <3> + min_links: '1' + port: + - ens5f0 <4> + - name: ens5f0 + type: ethernet + state: up + mtu: 9216 +---- +<1> The node where the bonded interface is created. +<2> You must set the LACP mode to `802.3ad` to enable LACP on the bond. +<3> You must set the LACP rate `fast` on the interface and on the switch. The `fast` rate sends LACP packets every second. +<4> The PF that you want to include in the bond. + +.. Create a YAML file that defines the `NodeNetworkConfigurationPolicy` resource for the `ens5f1` interface on the `worker-0` node: ++ +.Example `nncpBondF1Worker0.yaml` file +[source,yaml] +---- +apiVersion: nmstate.io/v1 +kind: NodeNetworkConfigurationPolicy +metadata: + name: example-bond-f1 +spec: + nodeSelector: + kubernetes.io/hostname: worker-0 <1> + desiredState: + interfaces: + - name: example-bond-f1 + description: example-bond-f1 + type: bond + state: up + mtu: 9216 + link-aggregation: + mode: 802.3ad <2> + options: + miimon: '100' + lacp_rate: 'fast' <3> + min_links: '1' + port: + - ens5f1 <4> + - name: ens5f1 + type: ethernet + state: up + mtu: 9216 +---- +<1> The node where the bonded interface is created. +<2> You must set the LACP mode to `802.3ad` to enable LACP on the bond. +<3> You must set the LACP rate `fast` on the interface and on the switch. The `fast` rate sends LACP packets every second. +<4> The PF that you want to include in the bond. + +.. Apply the resources by running the following commands: ++ +[source,bash] +---- +$ oc apply -f nncpBondF0Worker0.yaml +$ oc apply -f nncpBondF1Worker0.yaml +---- + +. Create SR-IOV network VFs for the bonded interfaces: + +.. Create a YAML file that defines the `SriovNetworkNodePolicy` resource for the `ens5f0` interface on the `worker-0` node: ++ +.Example `sriovnetworkpolicy-port1.yaml` file +[source,yaml] +---- +apiVersion: sriovnetwork.openshift.io/v1 +kind: SriovNetworkNodePolicy +metadata: + name: sriovnetpolicy-port-0 + namespace: openshift-sriov-network-operator +spec: + deviceType: netdevice + nicSelector: + pfNames: + - ens5f0 <1> + nodeSelector: + kubernetes.io/hostname: worker-0 <2> + numVfs: 10 <3> + priority: 99 + resourceName: resourceport0 <4> +---- +<1> The PF to create the VFs from. +<2> The node where the VFs are created. +<3> The number of VFs to create on the PF. +<4> The resource name used by pods to request these VFs. + +.. Create a YAML file that defines the `SriovNetworkNodePolicy` resource for the `ens5f1` interface on the `worker-0` node: ++ +.Example `sriovnetworkpolicy-port2.yaml` file +[source,yaml] +---- +apiVersion: sriovnetwork.openshift.io/v1 +kind: SriovNetworkNodePolicy +metadata: + name: sriovnetpolicy-port-1 + namespace: openshift-sriov-network-operator +spec: + deviceType: netdevice + nicSelector: + pfNames: + - ens5f1 <1> + nodeSelector: + kubernetes.io/hostname: worker-0 <2> + numVfs: 10 <3> + priority: 99 + resourceName: resourceport1 <4> +---- +<1> The PF to create the VFs from. +<2> The node where the VFs are created. +<3> The number of VFs to create on the PF. +<4> The resource name used by pods to request these VFs. + +.. Apply the resources by running the following commands: ++ +[source,bash] +---- +$ oc apply -f sriovnetworkpolicy-port1.yaml +$ oc apply -f sriovnetworkpolicy-port2.yaml +---- + +. Configure the PF Status Relay Operator: + +.. Create a YAML file that defines the `PFLACPMonitor` resource. This example file configures the Operator to monitor the LACP status of `ens5f0` and `ens5f1` bonded interfaces on the `worker-0` node: ++ +.Example `pflacpmonitor.yaml` file +[source,yaml] +---- +apiVersion: pfstatusrelay.openshift.io/v1alpha1 +kind: PFLACPMonitor +metadata: + namespace: openshift-pf-status-relay-operator + labels: + app.kubernetes.io/name: pf-status-relay-operator + name: pflacpmonitor-worker-0 +spec: + interfaces: + - ens5f0 <1> + - ens5f1 + pollingInterval: 1000 <2> + nodeSelector: + kubernetes.io/hostname: worker-0 <3> +---- +<1> The list of PFs to monitor. +<2> The polling interval in milliseconds to check the LACP status on the monitored interfaces. The minimum value is `1000`. +<3> The node for the target interfaces. ++ +[IMPORTANT] +==== +Use only one `PFLACPMonitor` custom resource to monitor each network interface on a node. If you create multiple resources that target the same interface, the PF Status Relay Operator will not process the conflicting configurations. +==== + +.. Apply the `PFLACPMonitor` resource by running the following command: ++ +[source,bash] +---- +$ oc apply -f pflacpmonitor.yaml +---- + +.Verification + +. Check the logs of the PF Status Relay Operator to verify that it is monitoring the LACP state: ++ +[source,bash] +---- +$ oc logs -n openshift-pf-status-relay-operator +---- ++ +.Example output +[source,bash] +---- +{"time":"2025-07-24T13:35:54.653201692Z","level":"INFO","msg":"lacp is up","interface":"ens5f0"} +{"time":"2025-07-24T13:35:54.65347273Z","level":"INFO","msg":"vf link state was set","id":0,"state":"auto","interface":"ens5f0"} +... +---- + +. Apply the `SriovNetwork` resources to make the VFs available for use within the `sriov-operator-tests` namespace: + +.. Create a YAML file that defines the `SriovNetwork` resource for the VFs created on `ens5f0`: ++ +.Example `sriovnetwork-port1.yaml` file +[source,yaml] +---- +apiVersion: sriovnetwork.openshift.io/v1 +kind: SriovNetwork +metadata: + name: sriovnetwork-port0 + namespace: openshift-sriov-network-operator +spec: + capabilities: '{ "mac": true }' + networkNamespace: sriov-operator-tests + resourceName: resourceport0 +---- + +.. Create a YAML file that defines the `SriovNetwork` resource for the VFs created on `ens5f1`: ++ +.Example `sriovnetwork-port2.yaml` file +[source,yaml] +---- +apiVersion: sriovnetwork.openshift.io/v1 +kind: SriovNetwork +metadata: + name: sriovnetwork-port1 + namespace: openshift-sriov-network-operator +spec: + capabilities: '{ "mac": true }' + networkNamespace: sriov-operator-tests + resourceName: resourceport1 +---- + +.. Apply the resources by running the following commands: ++ +[source,bash] +---- +$ oc apply -f sriovnetwork-port1.yaml +$ oc apply -f sriovnetwork-port2.yaml +---- + +. Define a high-availability pod that uses the SR-IOV VFs: + +.. Apply the `NetworkAttachmentDefinition` resource to create an `active-backup` bond using the two SR-IOV networks: ++ +.Example `nad-bond.yaml` file +[source,yaml] +---- +apiVersion: k8s.cni.cncf.io/v1 +kind: NetworkAttachmentDefinition +metadata: + name: nad-bond-1 + namespace: sriov-operator-tests +spec: + config: |- + {"type": "bond", "cniVersion": "0.3.1", "name": "bond-net1", + "mode": "active-backup", "failOverMac": 1, "linksInContainer": true, "miimon": "100", "mtu": 1450, + "links": [{"name": "net1"},{"name": "net2"}], "capabilities": {"ips": true}, "ipam": {"type": "static"}} +---- ++ +* `linksInContainer: true` creates the bond inside the pod's network namespace. +* `mode: active-backup` configures the bond to use active-backup mode. +* `links` specifies the pod-level interfaces to include in the bond. ++ +[IMPORTANT] +==== +The PF Status Relay Operator provides LACP state monitoring for pod-level bonding with the `mode: active-backup` configuration only. +==== + +.. Apply the `NetworkAttachmentDefinition` resource by running the following command: ++ +[source,bash] +---- +$ oc apply -f nad-bond.yaml +---- + +.. Create a YAML file that defines the `Pod` resource that uses the VFs from the bonded interfaces in active-backup mode: ++ +.Example `client-bond.yaml` file +[source,yaml] +---- +apiVersion: v1 +kind: Pod +metadata: + name: client-bond + namespace: sriov-operator-tests + annotations: + k8s.v1.cni.cncf.io/networks: |- <1> + [{ + "name": "sriovnetwork-port0", + "interface": "net1", + "mac": "" + },{ + "name": "sriovnetwork-port1", + "interface": "net2", + "mac": "" + },{ + "name": "nad-bond-1", + "interface": "bond0", + "ips": ["192.168.10.254/24","2001:100::254/64"], + "mac": "" + }] +spec: + nodeName: worker-0 + containers: + - name: client-bond + image: quay.io/nginx/nginx-unprivileged + imagePullPolicy: IfNotPresent + command: ["/bin/sh", "-c", "sleep 3650d"] + securityContext: + privileged: true + command: ["/bin/sleep", "3650d"] +---- +<1> The annotation requests three networks: two SR-IOV VFs, `net1` and `net2` and one bond, `bond0`, which uses them. + +.. Apply the `Pod` resource by running the following command: ++ +[source,bash] +---- +$ oc apply -f client-bond.yaml +---- + +. Check that the failover mechanism: + +.. Log in to the `client-bond` pod by running the following command: ++ +[source,bash] +---- +$ oc rsh -n sriov-operator-tests client-bond +---- + +.. Check the initial status of the pod-level bond by running the following command: ++ +[source,bash] +---- +sh-4.4# cat /proc/net/bonding/bond0 +---- ++ +.Example output +[source,bash] +---- +[root@client-bond-tlb /]# cat /proc/net/bonding/bond0 +... + +Bonding Mode: transmit load balancing +Transmit Hash Policy: layer2 (0) +Primary Slave: None +Currently Active Slave: net1 +MII Status: up +MII Polling Interval (ms): 100 +Up Delay (ms): 0 +Down Delay (ms): 0 +Peer Notification Delay (ms): 0 + +Slave Interface: net1 +MII Status: up +Speed: 25000 Mbps +Duplex: full +Link Failure Count: 0 +Permanent HW addr: AA:BB:CC:DD:EE:FF +Slave queue ID: 0 + +Slave Interface: net2 +MII Status: up +Speed: 25000 Mbps +Duplex: full +Link Failure Count: 0 +Permanent HW addr: BB:CC:DD:EE:FF:GG +---- ++ +* Both `net1` and `net2` interfaces are up. + +.. Exit the pod shell. + +.. Simulate an LACP failure on your upstream physical switch. To simulate this scenario, you can filter LACP traffic on the switch port that you want to test the failure on. This ensures that the physical link remains up while the LACP pollings fails. The command to do this is vendor-dependent. + +.. Verify the failover inside the pod by logging back into the `client-bond` pod and checking the bond status again: ++ +[source,bash] +---- +sh-4.4# cat /proc/net/bonding/bond0 +---- ++ +.Example output +[source,bash] +---- +... + +Bonding Mode: transmit load balancing +Transmit Hash Policy: layer2 (0) +Primary Slave: None +Currently Active Slave: net2 +MII Status: up +MII Polling Interval (ms): 100 +Up Delay (ms): 0 +Down Delay (ms): 0 +Peer Notification Delay (ms): 0 + +Slave Interface: net1 +MII Status: down +Speed: Unknown +Duplex: Unknown +Link Failure Count: 1 +Permanent HW addr: AA:BB:CC:DD:EE:FF +Slave queue ID: 0 + +Slave Interface: net2 +MII Status: up +Speed: 25000 Mbps +Duplex: full +Link Failure Count: 0 +Permanent HW addr: BB:CC:DD:EE:FF:GG +Slave queue ID: 0 +---- ++ +* The `net1` interface is down, and the `net2` interface is now the active interface. ++ +The client-bond pod detects the link state change and switches to the backup network path. diff --git a/networking/hardware_networks/configure-lacp-for-sriov.adoc b/networking/hardware_networks/configure-lacp-for-sriov.adoc new file mode 100644 index 000000000000..d653dd8c6f74 --- /dev/null +++ b/networking/hardware_networks/configure-lacp-for-sriov.adoc @@ -0,0 +1,23 @@ +:_mod-docs-content-type: ASSEMBLY +[id="sriov-lacp-sriov"] += High availability for pod-level bonds on SR-IOV networks +include::_attributes/common-attributes.adoc[] +:context: sriov-lacp-sriov + +toc::[] + +For workloads using pod-level bonding with SR-IOV virtual functions (VFs), despite an upstream switch failure, an underlying physical function (PF) might still report an `up` state. This creates a silent failure, as attached VFs remain up and pods continue to send traffic to a dead endpoint, causing packet loss. + +The PF Status Relay Operator solves this issue by using Link Aggregation Control Protocol (LACP) as an active health check. In this configuration, each physical function (PF) is placed in its own single-member LACP bond with the upstream switch. When the Operator detects an LACP failure on a PF's bond, it changes the link state of the attached VFs from `auto` to `disabled`. This action triggers the pod's `active-backup` bond to fail over to its backup network path, maintaining high availability. + +:FeatureName: Configuring LACP state monitoring for SR-IOV networks +include::snippets/technology-preview.adoc[] + +include::modules/installing-pfsr-operator-cli.adoc[leveloffset=+1] + +include::modules/installing-pfsr-operator-console.adoc[leveloffset=+1] + +include::modules/lacp-switch-monitoring.adoc[leveloffset=+1] + + +:!context: sriov-lacp-sriov \ No newline at end of file