-
Notifications
You must be signed in to change notification settings - Fork 4
How to install DANM CNI in Rancher custom cluster
- Rancher (
v2.5.1
is used in this test) - Servers or VMs with Docker installed for a new custom K8S cluster (
v1.18.8
is used in this test)
Create a custom cluster with the following extra arguments added. If you're working with the existing cluster, you can also edit the cluster yaml and add the extra args.
kube-controller:
extra_args:
cluster-signing-cert-file: /etc/kubernetes/ssl/kube-ca.pem
cluster-signing-key-file: /etc/kubernetes/ssl/kube-ca-key.pem
Once the cluster is ready, I had to create a default CNI configuration for DANM from all cluster nodes to workaround CNI operation for network:default failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:calico because:OS exec call failed:no etcd endpoints specified
this warning. As a note, I choose calico
as default CNI but any other CNIs should work.
# Run below commands from all cluster nodes
sudo cp /etc/cni/net.d/10-calico.conflist /etc/cni/net.d/10-calico.conf
sudo mv /etc/cni/net.d/10-calico.conflist /etc/cni/net.d/11-calico.conflist
sudo vi /etc/cni/net.d/10-calico.conf
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "[node name]",
"mtu": 1440,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/kubernetes/ssl/kubecfg-kube-node.yaml"
}
}
DANM does not provide public image repository. I had to build the images and push them to my private image registry.
git clone github.com/nokia/danm
cd danm
# Fix KUBECTL_VERSION
vi scm/build/Dockerfile.install
ARG KUBECTL_VERSION=1.18.8
# Fix svcwatcher node selector to Rancher specific cluster label
vi integration/manifests/svcwatcher/svcwatcher_ds.yaml.tmpl
nodeSelector:
- "node-role.kubernetes.io/master": ""
+ "node-role.kubernetes.io/controlplane": "true"
# Build and push the images
TAG_PREFIX=my.private.registry/danm/ IMAGE_PUSH=true ./build_danm.sh
First, download the kubeconfig for the target cluster from Rancher and create an image pull secret for the private registry. Check out this page for more detailed instructions.
kubectl create secret -n kube-system generic regcred \
--from-file=.dockerconfigjson=/Users/hyunsun/Work/private-registry.json \
--type=kubernetes.io/dockerconfigjson
I tried the the installer job that DANM provides instead of following the manual steps. The overall deployment procedure with the installer was nice. It worked as expected and most of the troubles that I encountered were related to some Rancher specific cluster settings. So I recommend to try out the installer job unless you want to understand all the details of the deployment procedure.
Anyway, all I need to do to run the installer job was to edit integration/install/danm-installer-config.yaml
this file.
default_cni_type: calico
default_cni_network_id: 10-calico
image_registry_prefix: my.private.registry/danm/
image_pull_secret: regcred
api_ca_cert: [root ca of the target cluster]
If you left api_ca_cert
blank, the installer job would try to fetch it automatically but that option did not work for me and made svcwatcher
keep crashing for Unable to connect to the server: x509: certificate signed by unknown authority
this error as a result. And I tried give the root CA as api_ca_cert
value and it worked.
As a note, kubeconfig that Rancher provides does not include certificate-authority-data
field. I referenced /etc/cni/net.d/calico-kubeconfig
this file from one of the cluster node to find certificate-authority-data
and used the following command to get the final api_ca_cert
.
echo [certificate-authority-data] | base64 -D
Lastly, edit integration/install/danm-installer.yaml
file and provide the private image registry and the image pull secret name for the installer job pod, and run the following command to deploy the installer job.
kubectl apply -f integration/install
To uninstall the installer job and the CNI, run the commands below.
kubectl delete ds -n kube-system netwatcher svcwatcher danm-cni
kubectl delete deployment -n kube-system danm-webhook-deployment
kubectl delete serviceaccount danm -n kube-system
kubectl delete csr danm-webhook-svc.kube-system
kubectl delete -f integration/install
If there is no crashing pod in kube-system
namespace, it means the CNI is installed successfully. You can also try to list and describe the default ClusterNetwork
, which is the network managed by calico
.
$ kubectl get cn
NAME AGE
default 25h
$ kubectl describe cn default
Name: default
Namespace:
Labels: <none>
Annotations: API Version: danm.k8s.io/v1
Kind: ClusterNetwork
Metadata:
Creation Timestamp: 2020-11-27T22:47:59Z
Generation: 1
Managed Fields:
API Version: danm.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:NetworkID:
f:NetworkType:
Manager: kubectl
Operation: Update
Time: 2020-11-27T22:47:59Z
Resource Version: 22742
Self Link: /apis/danm.k8s.io/v1/clusternetworks/default
UID: 5865e712-35bb-49dd-a3cd-ec80cad77262
Spec:
Network ID: 10-calico
Network Type: calico
Events: <none>
DANM CNI provides two kind of network, ClusterNetwork
and TenantNetwork
depending on the scope of the network, cluster-wide or a single namespace accordingly. As my first test, I created a new ClusterNetwork
named core
.
$ vi core-net.yml
apiVersion: danm.k8s.io/v1
kind: ClusterNetwork
metadata:
name: core
spec:
NetworkID: core
NetworkType: ipvlan
Options:
host_device: enp23s0f0
cidr: 192.168.250.0/24
allocation_pool:
start: 192.168.250.100
end: 192.168.250.200
container_prefix: core
rt_tables: 10
routes:
10.250.0.0/16: 192.168.250.1
$ kubectl apply -f core-net.yml
$ kubectl get cn
NAME AGE
core 20h
default 25h
If I add a little information about my test environment, my cluster nodes had two interfaces, one is enp23s0f1
used by calico
for building an overlay network for pod-to-pod communication(10.50.0.0/16
subnet was being used for that), and the other one is enp23s0f0
connected to 192.168.250.0/24
network segment.
And created a pod attached attached to both default and core networks.
$ vi deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: danm-test-deployment
spec:
selector:
matchLabels:
app: danm-test
replicas: 1
template:
metadata:
labels:
app: danm-test
annotations:
danm.k8s.io/interfaces: |
[
{"clusterNetwork":"default"},
{"clusterNetwork":"core", "ip":"dynamic"}
]
spec:
imagePullSecrets:
- name: regcred
containers:
- name: busybox
imagePullPolicy: IfNotPresent
image: my.private.registry/busybox/busybox:latest
args:
- tail
- -f
- /dev/null
Attach to the pod and check if the pod can access both network works as expected.
$ kubectl exec -it danm-test-deployment-548df67dd4-jz8bg -- sh
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if38: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1440 qdisc noqueue
link/ether 9e:39:65:71:23:2a brd ff:ff:ff:ff:ff:ff
inet 10.50.99.9/32 scope global eth0
valid_lft forever preferred_lft forever
37: core1@tunl0: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether 68:05:ca:b1:ad:b0 brd ff:ff:ff:ff:ff:ff
inet 192.168.250.101/24 brd 192.168.250.255 scope global core1
valid_lft forever preferred_lft forever
/ # ping 10.50.99.0 -c 1
PING 10.50.99.0 (10.50.99.0): 56 data bytes
64 bytes from 10.50.99.0: seq=0 ttl=64 time=0.116 ms
--- 10.50.99.0 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.116/0.116/0.116 ms
/ # ping 192.168.250.1 -c 1
PING 192.168.250.1 (192.168.250.1): 56 data bytes
64 bytes from 192.168.250.1: seq=0 ttl=64 time=34.438 ms
--- 192.168.250.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 34.438/34.438/34.438 ms
The reason I wanted to try out DANM was the ability to do the service discovery of non-primary interface, which is not possible with Multus. The Service
definition is a little different from the normal Service
and described well in schema/DanmService.yaml
this file. In short, I just needed to specify the name of ClusterNetwork
with danm.k8s.io/clusterNetwork
annotation. Also, selector
needs to be specified with danm.k8s.io/selector
annotation. The usual selector
option can still be added but that will just add another useless endpoint to the default network. One more notable thing is that only headless service is allowed, which makes sense to me.
$ vi service-core-net.yml
apiVersion: v1
kind: Service
metadata:
name: core-net-service
namespace: default
annotations:
danm.k8s.io/selector: '{"app":"danm-test"}'
danm.k8s.io/clusterNetwork: core
spec:
type: ClusterIP
clusterIP: None
$ kubectl apply -f service-core-net.yml
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
core-net-service ClusterIP None <none> <none> 19h
kubernetes ClusterIP 10.50.128.1 <none> 443/TCP 28h
$ kubectl get ep
NAME ENDPOINTS AGE
core-net-service 192.168.250.101 19h
kubernetes 10.92.1.41:6443 28h
And the service discovery worked as expected!
k8s-node1:~$ nslookup core-net-service.default.svc.cluster.local 10.50.128.10
Server: 10.50.128.10
Address: 10.50.128.10#53
Name: core-net-service.default.svc.cluster.local
Address: 192.168.250.101