Skip to content

Latest commit



283 lines (253 loc) · 10.7 KB

File metadata and controls

283 lines (253 loc) · 10.7 KB

SR-IOV Deployment with dpdk-app-centos image

This directory contains the YAML files needed to start the SR-IOV Device Plugin as a daemonset and launch a DPDK based docker image which leverages the SR-IOV CNI to plug SR-IOV VFs into the pod. This file only describes how these files are used. See the following two repos for more details on how SR-IOV Device Plugin and CNI work and how to build the SR-IOV Device Plugin image (SR-IOV Device Plugin is run in a container as a daemonset) and build the SR-IOV CNI:

The files to build the dpdk-app-centos image are located in this same repo. See the following link for how to build the image and details regarding what the image is doing:

SR-IOV Setup

This test setup assumes:

  • Running on baremetal.
  • Kubernetes, Multus and SR-IOV CNI are installed.
  • SR-IOV VFs have already been created on the PFs being used.

This test setup uses two physical NICs (PFs) with one VF from each PF attached to the pod. It maps traffic from the PF to the each VF using VLANs.

Download the sample yaml files

Use the following steps to download the sample YAML files:

cd $GOPATH/src
go get

The following sections all assume your working directory is $GOPATH/src/

Create the Network-Attachment-Definition for each desired network

This setup assumes there are two networks, one network for each PF. The following commands setup those networks:

kubectl create -f netAttach-sriov-dpdk-a.yaml
kubectl create -f netAttach-sriov-dpdk-b.yaml

These YAML files map a VLAN to the VF. It is currently using VLAN 100 for "sriov-network-a" and VLAN 200 for "sriov-network-b". These values can be changed if needed.

The following command can be used to determine the set of Network-Attachment-Definitions currently created on the system:

kubectl get network-attachment-definitions
NAME          AGE
sriov-net-a   4h18m
sriov-net-b   4h18m

Create ConfigMap

The following command creates the configMap. The ConfigMap provides the filters to the SR-IOV Device-Plugin to allow it to select the set of VFs that are available to a given Network-Attachment-Definition. This file uses the PFs eno1 and eno2. If your system is using other interfaces, then update the file accordingly.

NOTE: This file will most likely need to be updated before using.

kubectl create -f ./configMap.yaml

The following command can be used to collect info on which interfaces are in your system and manufacturer details that are used in the configMap to select available VFs.

lspci -nn | grep Ethernet
01:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 01)
01:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 01)
01:02.0 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:154c] (rev 01)
01:02.1 Ethernet controller [0200]: Intel Corporation Ethernet Virtual Function 700 Series [8086:154c] (rev 01)

The following command can be used to determine the set of configMaps currently created in the system:

kubectl get configmaps  --all-namespaces
NAMESPACE     NAME                                 DATA   AGE
kube-public   cluster-info                         2      5d23h
kube-system   coredns                              1      5d23h
kube-system   extension-apiserver-authentication   6      5d23h
kube-system   kube-flannel-cfg                     2      5d23h
kube-system   kube-proxy                           2      5d23h
kube-system   kubeadm-config                       2      5d23h
kube-system   kubelet-config-1.15                  1      5d23h
kube-system   multus-cni-config                    1      5d23h
kube-system   sriovdp-config                       1      4h24m

Start the SR-IOV Device Plugin Daemonset

The following command starts the SR-IOV Device Plugin as a daemonset container:

kubectl create -f sriovdp-daemonset.yaml

To determine if the SR-IOV Device Plugin is running, use the following command and find the kube-sriov-device-plugin-amd64-xxx pod:

kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY   STATUS    RESTARTS   AGE
kube-system   coredns-5c98db65d4-78v6k                1/1     Running   16         5d23h
kube-system   coredns-5c98db65d4-r5mmj                1/1     Running   16         5d23h
kube-system   etcd-nfvsdn-22-oot                      1/1     Running   16         5d23h
kube-system   kube-apiserver-nfvsdn-22-oot            1/1     Running   16         5d23h
kube-system   kube-controller-manager-nfvsdn-22-oot   1/1     Running   16         5d23h
kube-system   kube-flannel-ds-amd64-jvnm5             1/1     Running   16         5d23h
kube-system   kube-multus-ds-amd64-lxv5v              1/1     Running   16         5d23h
kube-system   kube-proxy-6w7sn                        1/1     Running   16         5d23h
kube-system   kube-scheduler-nfvsdn-22-oot            1/1     Running   16         5d23h
kube-system   kube-sriov-device-plugin-amd64-6cj7g    1/1     Running   0          4h6m

Once the SR-IOV Device Plugin is started, it probes the system looking for VFs that meet the selector’s criteria. This takes a couple of seconds to collect. The following command can be used to determine the number of detected VFs. (NOTE: This is the allocated values and does not change as VFs are doled out.) See "" and "":

kubectl get node nfvsdn-22-oot -o json | jq '.status.allocatable'
  "cpu": "64",
  "ephemeral-storage": "396858657750",
  "hugepages-1Gi": "64Gi",
  "": "8",
  "": "8",
  "memory": "64773512Ki",
  "pods": "110"

Start the DPDK based container

Use the following command to start the DPDK based container using SR-IOV Interfaces:

kubectl create -f sriov-pod-1.yaml

If needed, ‘exec’ into the container to customize DPDK application:

kubectl exec -it sriov-pod-1 -- sh

Exposing Hugepage Allocation in Container

In Kubernetes 1.20, an alpha feature was added to expose the requested hugepages to the container via the Downward API. Being alpha, this feature is disabled in Kubernetes by default. If enabled when Kubernetes is deployed via FEATURE_GATES="DownwardAPIHugePages=true", then those values can be used by the container by requesting the fields in Pod Spec. To manually update the Pod Spec, uncomment the 1G or 2M Hugepage section in sriov-pod-1.yaml:

$ vi sriov-pod-1.yaml
        # Exposing Hugepages via Downward API is an alpha feature in
        # Kubernetes 1.20. If K8s is greater than or equal to 1.20 and
        # and Feature Gate is enabled (FEATURE_GATES="DownwardAPIHugePages=true"),
        # then uncomment the 1G or 2M Hugepage sections below.
        # 1G Hugepages
        - path: "hugepages_request"
            containerName: sriov-example
            resource: requests.hugepages-1Gi
            divisor: 1Mi
        - path: "hugepages_limit"
            containerName: sriov-example
            resource: limits.hugepages-1Gi
            divisor: 1Mi
        # 2M Hugepages
        #- path: "hugepages_request"
        #  resourceFieldRef:
        #    containerName: sriov-example
        #    resource: requests.hugepages-2Mi
        #    divisor: 1Mi
        #- path: "hugepages_limit"
        #  resourceFieldRef:
        #    containerName: sriov-example
        #    resource: limits.hugepages-2Mi
        #    divisor: 1Mi

The other option is to use network-resources-injector to dynamically inject the Downward API settings into the Pod Spec. sriov-pod-nri-1.yaml is a copy of sriov-pod-1.yaml with all the Downward API settings commented out.

Sample commands to build a network-resources-injector image:

go get
cd $GOPATH/src/
make image

For a quick deployment, add --insecure and --injectHugepageDownApi as follows:

$ git diff deployments/server.yaml
diff --git a/deployments/server.yaml b/deployments/server.yaml
index 80c1340..dd0cafd 100644
--- a/deployments/server.yaml
+++ b/deployments/server.yaml
@@ -33,6 +33,8 @@ spec:
     - -tls-private-key-file=/etc/tls/tls.key
     - -tls-cert-file=/etc/tls/tls.crt
     - -logtostderr
+    - --insecure
+    - --injectHugepageDownApi
       runAsUser: 10000
       runAsGroup: 10000

Then deploy network-resources-injector image:

kubectl apply -f deployments/auth.yaml -f deployments/server.yaml

Use the following command to start the DPDK based container using SR-IOV Interfaces:

kubectl create -f sriov-pod-nri-1.yaml

To tear down network-resources-injector image:

kubectl delete -f deployments/auth.yaml -f deployments/server.yaml

Test Generator

By default, the DPDK based container ‘dpdk-app-centos’ is running the DPDK ‘l3fwd’ sample application (see This sample application does some simple routing based on a hard-coded routing table. The following subnets are assigned to interfaces:

Interface 0: Route / 24
Interface 1: Route / 24
Interface 2: Route / 24
Interface 3: Route / 24
Interface 4: Route / 24

In the test setup described in this document, VFs from the first interface, ‘eno1’, will be assigned ‘Interface 0’ in DPDK, and thus will get route / 24 assigned to it. VFs from the second interface, ‘eno2’, will be assigned ‘Interface 1’ in DPDK, and thus will get route / 24 assigned to it. At this time, this is not configurable.

As described above, VLAN IDs are used to map packets from the PF to a given VF. This value is configurable, but in test setup described here, VLAN 100 is used to map packets from the first PF, ‘eno1’, to its associated VF. VLAN 200 is used to map packets from the second PF, ‘eno2’, to its associated VF.

SR-IOV Teardown

The following steps are used to stop the container and SR-IOV Device Plugin:

kubectl delete pod sriov-pod-1
kubectl delete -f sriovdp-daemonset.yaml
kubectl delete -f configMap.yaml
kubectl delete -f netAttach-sriov-dpdk-b.yaml
kubectl delete -f netAttach-sriov-dpdk-a.yaml