How to install DANM CNI in Rancher custom cluster

Prerequisites

Rancher (v2.5.1 is used in this test)
Servers or VMs with Docker installed for a new custom K8S cluster (v1.18.8 is used in this test)

Prepare K8S cluster

Create a custom cluster with the following extra arguments added. If you're working with the existing cluster, you can also edit the cluster yaml and add the extra args.

    kube-controller:
      extra_args:
        cluster-signing-cert-file: /etc/kubernetes/ssl/kube-ca.pem
        cluster-signing-key-file: /etc/kubernetes/ssl/kube-ca-key.pem

Once the cluster is ready, I had to create a default CNI configuration for DANM from all cluster nodes to workaround CNI operation for network:default failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:calico because:OS exec call failed:no etcd endpoints specified this warning. As a note, I choose calico as default CNI but any other CNIs should work.

# Run below commands from all cluster nodes
sudo cp /etc/cni/net.d/10-calico.conflist /etc/cni/net.d/10-calico.conf
sudo mv /etc/cni/net.d/10-calico.conflist /etc/cni/net.d/11-calico.conflist
sudo vi /etc/cni/net.d/10-calico.conf
{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "type": "calico",
  "log_level": "info",
  "datastore_type": "kubernetes",
  "nodename": "[node name]",
  "mtu": 1440,
  "ipam": {
      "type": "calico-ipam"
  },
  "policy": {
      "type": "k8s"
  },
  "kubernetes": {
      "kubeconfig": "/etc/kubernetes/ssl/kubecfg-kube-node.yaml"
  }
}

Prepare DANM CNI images

DANM does not provide public image repository. I had to build the images and push them to my private image registry.

git clone github.com/nokia/danm
cd danm

# Fix KUBECTL_VERSION
vi scm/build/Dockerfile.install
ARG KUBECTL_VERSION=1.18.8

# Fix svcwatcher node selector to Rancher specific cluster label
vi integration/manifests/svcwatcher/svcwatcher_ds.yaml.tmpl
       nodeSelector:
-        "node-role.kubernetes.io/master": ""
+        "node-role.kubernetes.io/controlplane": "true"

# Build and push the images
TAG_PREFIX=my.private.registry/danm/ IMAGE_PUSH=true ./build_danm.sh

Install DANM CNI

First, download the kubeconfig for the target cluster from Rancher and create an image pull secret for the private registry. Check out this page for more detailed instructions.

kubectl create secret -n kube-system generic regcred \
    --from-file=.dockerconfigjson=/Users/hyunsun/Work/private-registry.json \
    --type=kubernetes.io/dockerconfigjson

I tried the the installer job that DANM provides instead of following the manual steps. The overall deployment procedure with the installer was nice. It worked as expected and most of the troubles that I encountered were related to some Rancher specific cluster settings. So I recommend to try out the installer job unless you want to understand all the details of the deployment procedure.

Anyway, all I need to do to run the installer job was to edit integration/install/danm-installer-config.yaml this file.

default_cni_type: calico
default_cni_network_id: 10-calico
image_registry_prefix: my.private.registry/danm/
image_pull_secret: regcred
api_ca_cert: [root ca of the target cluster]

If you left api_ca_cert blank, the installer job would try to fetch it automatically but that option did not work for me and made svcwatcher keep crashing for Unable to connect to the server: x509: certificate signed by unknown authority this error as a result. And I tried give the root CA as api_ca_cert value and it worked.

As a note, kubeconfig that Rancher provides does not include certificate-authority-data field. I referenced /etc/cni/net.d/calico-kubeconfig this file from one of the cluster node to find certificate-authority-data and used the following command to get the final api_ca_cert.

echo [certificate-authority-data] | base64 -D

Lastly, edit integration/install/danm-installer.yaml file and provide the private image registry and the image pull secret name for the installer job pod, and run the following command to deploy the installer job.

kubectl apply -f integration/install

To uninstall the installer job and the CNI, run the commands below.

kubectl delete ds -n kube-system netwatcher svcwatcher danm-cni
kubectl delete deployment -n kube-system danm-webhook-deployment
kubectl delete serviceaccount danm -n kube-system
kubectl delete csr danm-webhook-svc.kube-system
kubectl delete -f integration/install

If there is no crashing pod in kube-system namespace, it means the CNI is installed successfully. You can also try to list and describe the default ClusterNetwork, which is the network managed by calico.

$ kubectl get cn
NAME      AGE
default   25h

$ kubectl describe cn default
Name:         default
Namespace:
Labels:       <none>
Annotations:  API Version:  danm.k8s.io/v1
Kind:         ClusterNetwork
Metadata:
  Creation Timestamp:  2020-11-27T22:47:59Z
  Generation:          1
  Managed Fields:
    API Version:  danm.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:NetworkID:
        f:NetworkType:
    Manager:         kubectl
    Operation:       Update
    Time:            2020-11-27T22:47:59Z
  Resource Version:  22742
  Self Link:         /apis/danm.k8s.io/v1/clusternetworks/default
  UID:               5865e712-35bb-49dd-a3cd-ec80cad77262
Spec:
  Network ID:    10-calico
  Network Type:  calico
Events:          <none>

Test ClusterNetwork

DANM CNI provides two kind of network, ClusterNetwork and TenantNetwork depending on the scope of the network, cluster-wide or a single namespace accordingly. As my first test, I created a new ClusterNetwork named core.

$ vi core-net.yml
apiVersion: danm.k8s.io/v1
kind: ClusterNetwork
metadata:
  name: core
spec:
  NetworkID: core
  NetworkType: ipvlan
  Options:
    host_device: enp23s0f0
    cidr: 192.168.250.0/24
    allocation_pool:
      start: 192.168.250.100
      end: 192.168.250.200
    container_prefix: core
    rt_tables: 10
    routes:
      10.250.0.0/16: 192.168.250.1

$ kubectl apply -f core-net.yml
$ kubectl get cn
NAME      AGE
core      20h
default   25h

If I add a little information about my test environment, my cluster nodes had two interfaces, one is enp23s0f1 used by calico for building an overlay network for pod-to-pod communication(10.50.0.0/16 subnet was being used for that), and the other one is enp23s0f0 connected to 192.168.250.0/24 network segment.

And created a pod attached attached to both default and core networks.

$ vi deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: danm-test-deployment
spec:
  selector:
    matchLabels:
      app: danm-test
  replicas: 1
  template:
    metadata:
      labels:
        app: danm-test
      annotations:
        danm.k8s.io/interfaces: |
          [
            {"clusterNetwork":"default"},
            {"clusterNetwork":"core", "ip":"dynamic"}
          ]
    spec:
      imagePullSecrets:
      - name: regcred
      containers:
      - name: busybox
        imagePullPolicy: IfNotPresent
        image: my.private.registry/busybox/busybox:latest
        args:
        - tail
        - -f
        - /dev/null

Attach to the pod and check if the pod can access both network works as expected.

$ kubectl exec -it danm-test-deployment-548df67dd4-jz8bg -- sh
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if38: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1440 qdisc noqueue
    link/ether 9e:39:65:71:23:2a brd ff:ff:ff:ff:ff:ff
    inet 10.50.99.9/32 scope global eth0
       valid_lft forever preferred_lft forever
37: core1@tunl0: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    link/ether 68:05:ca:b1:ad:b0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.250.101/24 brd 192.168.250.255 scope global core1
       valid_lft forever preferred_lft forever

/ # ping 10.50.99.0 -c 1
PING 10.50.99.0 (10.50.99.0): 56 data bytes
64 bytes from 10.50.99.0: seq=0 ttl=64 time=0.116 ms

--- 10.50.99.0 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.116/0.116/0.116 ms

/ # ping 192.168.250.1 -c 1
PING 192.168.250.1 (192.168.250.1): 56 data bytes
64 bytes from 192.168.250.1: seq=0 ttl=64 time=34.438 ms

--- 192.168.250.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 34.438/34.438/34.438 ms

Test service discovery

The reason I wanted to try out DANM was the ability to do the service discovery of non-primary interface, which is not possible with Multus. The Service definition is a little different from the normal Service and described well in schema/DanmService.yaml this file. In short, I just needed to specify the name of ClusterNetwork with danm.k8s.io/clusterNetwork annotation. Also, selector needs to be specified with danm.k8s.io/selector annotation. The usual selector option can still be added but that will just add another useless endpoint to the default network. One more notable thing is that only headless service is allowed, which makes sense to me.

$ vi service-core-net.yml
apiVersion: v1
kind: Service
metadata:
  name: core-net-service
  namespace: default
  annotations:
    danm.k8s.io/selector: '{"app":"danm-test"}'
    danm.k8s.io/clusterNetwork: core
spec:
  type: ClusterIP
  clusterIP: None

$ kubectl apply -f service-core-net.yml

$ kubectl get svc
NAME               TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
core-net-service   ClusterIP   None          <none>        <none>    19h
kubernetes         ClusterIP   10.50.128.1   <none>        443/TCP   28h

$ kubectl get ep
NAME               ENDPOINTS         AGE
core-net-service   192.168.250.101   19h
kubernetes         10.92.1.41:6443   28h

And the service discovery worked as expected!

k8s-node1:~$ nslookup core-net-service.default.svc.cluster.local 10.50.128.10
Server:		10.50.128.10
Address:	10.50.128.10#53

Name:	core-net-service.default.svc.cluster.local
Address: 192.168.250.101

Provide feedback

Saved searches

Use saved searches to filter your results more quickly