Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kilo create bridge interafce only on one of the k8s nodes #129

Open
3rmack opened this issue Mar 3, 2021 · 24 comments
Open

kilo create bridge interafce only on one of the k8s nodes #129

3rmack opened this issue Mar 3, 2021 · 24 comments

Comments

@3rmack
Copy link

3rmack commented Mar 3, 2021

2 node k8s cluster created via kubeadm. Nodes are placed in different availability zones, have only dedicated external IP addresses (no private networks attached, etc.).

# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.5 LTS
Release:	18.04
Codename:	bionic
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:09:38Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
# docker version
Client: Docker Engine - Community
 Version:           19.03.15
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        99e3ed8919
 Built:             Sat Jan 30 03:16:51 2021
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.15
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       99e3ed8919
  Built:            Sat Jan 30 03:15:20 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

kubeadm init command (other params are default):

kubeadm init --pod-network-cidr "10.10.0.0/16"

Initial k8s cluster status (no CNI):

# kubectl get node -o wide
NAME          STATUS     ROLES                  AGE     VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
test-kilo-0   NotReady   control-plane,master   2m16s   v1.20.4   194.182.164.214   <none>        Ubuntu 18.04.5 LTS   4.15.0-136-generic   docker://19.3.15
test-kilo-1   NotReady   worker                 49s     v1.20.4   185.19.28.241     <none>        Ubuntu 18.04.5 LTS   4.15.0-136-generic   docker://19.3.15
# kubectl get po -o wide -A
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE     IP                NODE          NOMINATED NODE   READINESS GATES
kube-system   coredns-74ff55c5b-6gvdk               0/1     Pending   0          4m5s    <none>            <none>        <none>           <none>
kube-system   coredns-74ff55c5b-hfd5l               0/1     Pending   0          4m5s    <none>            <none>        <none>           <none>
kube-system   etcd-test-kilo-0                      1/1     Running   0          4m19s   194.182.164.214   test-kilo-0   <none>           <none>
kube-system   kube-apiserver-test-kilo-0            1/1     Running   1          4m19s   194.182.164.214   test-kilo-0   <none>           <none>
kube-system   kube-controller-manager-test-kilo-0   1/1     Running   0          4m19s   194.182.164.214   test-kilo-0   <none>           <none>
kube-system   kube-proxy-nhrzd                      1/1     Running   0          2m56s   185.19.28.241     test-kilo-1   <none>           <none>
kube-system   kube-proxy-pnsb8                      1/1     Running   0          4m5s    194.182.164.214   test-kilo-0   <none>           <none>
kube-system   kube-scheduler-test-kilo-0            1/1     Running   0          4m19s   194.182.164.214   test-kilo-0   <none>           <none>

Using kilo-kubeadm.yaml to install kilo. Here is the result:

# kubectl get po -o wide -A
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE     IP                NODE          NOMINATED NODE   READINESS GATES
kube-system   coredns-74ff55c5b-6gvdk               0/1     Running   0          6m25s   10.10.1.2         test-kilo-1   <none>           <none>
kube-system   coredns-74ff55c5b-hfd5l               0/1     Running   0          6m25s   10.10.1.3         test-kilo-1   <none>           <none>
kube-system   etcd-test-kilo-0                      1/1     Running   0          6m39s   194.182.164.214   test-kilo-0   <none>           <none>
kube-system   kilo-cthl8                            1/1     Running   0          72s     185.19.28.241     test-kilo-1   <none>           <none>
kube-system   kilo-rk4vf                            1/1     Running   0          72s     194.182.164.214   test-kilo-0   <none>           <none>
kube-system   kube-apiserver-test-kilo-0            1/1     Running   1          6m39s   194.182.164.214   test-kilo-0   <none>           <none>
kube-system   kube-controller-manager-test-kilo-0   1/1     Running   0          6m39s   194.182.164.214   test-kilo-0   <none>           <none>
kube-system   kube-proxy-nhrzd                      1/1     Running   0          5m16s   185.19.28.241     test-kilo-1   <none>           <none>
kube-system   kube-proxy-pnsb8                      1/1     Running   0          6m25s   194.182.164.214   test-kilo-0   <none>           <none>
kube-system   kube-scheduler-test-kilo-0            1/1     Running   0          6m39s   194.182.164.214   test-kilo-0   <none>           <none>

Looks like configuring cni is stuck at some step. If we check kilo logs we can see that config on node1 is incomplete.

# kubectl -n kube-system logs kilo-cthl8
{"caller":"mesh.go:96","component":"kilo","level":"warn","msg":"no private key found on disk; generating one now","ts":"2021-03-03T10:18:49.037598301Z"}
{"caller":"main.go:221","msg":"Starting Kilo network mesh '2b959f7020a8dbb6b32860965ed4dbfd0dd11215'.","ts":"2021-03-03T10:18:49.078808548Z"}
{"caller":"cni.go:60","component":"kilo","err":"failed to read IPAM config from CNI config list file: no IP ranges specified","level":"warn","msg":"failed to get CIDR from CNI file; overwriting it","ts":"2021-03-03T10:18:49.180229464Z"}
{"caller":"cni.go:68","component":"kilo","level":"info","msg":"CIDR in CNI file is empty","ts":"2021-03-03T10:18:49.180312275Z"}
{"CIDR":"10.10.1.0/24","caller":"cni.go:73","component":"kilo","level":"info","msg":"setting CIDR in CNI file","ts":"2021-03-03T10:18:49.180347087Z"}
{"caller":"mesh.go:532","component":"kilo","level":"info","msg":"WireGuard configurations are different","ts":"2021-03-03T10:18:49.559865866Z"}
{"caller":"mesh.go:301","component":"kilo","event":"add","level":"info","node":{"Endpoint":{"DNS":"","IP":"194.182.164.214","Port":51820},"Key":"OURzdHQwdFJBWU5PRzUxTVFTZE9ZaVFWUnp1NHNxS3ZKdEdvZGtGK2huaz0=","InternalIP":null,"LastSeen":1614766724,"Leader":false,"Location":"","Name":"test-kilo-0","PersistentKeepalive":0,"Subnet":{"IP":"10.10.0.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.1","Mask":"//8AAA=="}},"ts":"2021-03-03T10:18:49.587900799Z"}
{"caller":"mesh.go:532","component":"kilo","level":"info","msg":"WireGuard configurations are different","ts":"2021-03-03T10:18:49.727266934Z"}
# kubectl -n kube-system logs kilo-rk4vf
{"caller":"mesh.go:96","component":"kilo","level":"warn","msg":"no private key found on disk; generating one now","ts":"2021-03-03T10:18:43.12428951Z"}
{"caller":"main.go:221","msg":"Starting Kilo network mesh '2b959f7020a8dbb6b32860965ed4dbfd0dd11215'.","ts":"2021-03-03T10:18:43.149461372Z"}
{"caller":"cni.go:60","component":"kilo","err":"failed to read IPAM config from CNI config list file: no IP ranges specified","level":"warn","msg":"failed to get CIDR from CNI file; overwriting it","ts":"2021-03-03T10:18:43.250712032Z"}
{"caller":"cni.go:68","component":"kilo","level":"info","msg":"CIDR in CNI file is empty","ts":"2021-03-03T10:18:43.251153965Z"}
{"CIDR":"10.10.0.0/24","caller":"cni.go:73","component":"kilo","level":"info","msg":"setting CIDR in CNI file","ts":"2021-03-03T10:18:43.251455081Z"}
E0303 10:18:43.282674       1 reflector.go:126] pkg/k8s/backend.go:407: Failed to list *v1alpha1.Peer: the server could not find the requested resource (get peers.kilo.squat.ai)
{"caller":"mesh.go:532","component":"kilo","level":"info","msg":"WireGuard configurations are different","ts":"2021-03-03T10:18:44.649959279Z"}
{"caller":"mesh.go:301","component":"kilo","event":"update","level":"info","node":{"Endpoint":{"DNS":"","IP":"185.19.28.241","Port":51820},"Key":"TjJsb0Z1eG51N2dVM21yS2VHRVAyV0thN2MxdkFIU0piVWZwZ2ZZT09qOD0=","InternalIP":null,"LastSeen":1614766729,"Leader":false,"Location":"","Name":"test-kilo-1","PersistentKeepalive":0,"Subnet":{"IP":"10.10.1.0","Mask":"////AA=="},"WireGuardIP":null},"ts":"2021-03-03T10:18:49.31665582Z"}
{"caller":"mesh.go:532","component":"kilo","level":"info","msg":"WireGuard configurations are different","ts":"2021-03-03T10:18:49.437484515Z"}
{"caller":"mesh.go:301","component":"kilo","event":"update","level":"info","node":{"Endpoint":{"DNS":"","IP":"185.19.28.241","Port":51820},"Key":"TjJsb0Z1eG51N2dVM21yS2VHRVAyV0thN2MxdkFIU0piVWZwZ2ZZT09qOD0=","InternalIP":null,"LastSeen":1614766729,"Leader":false,"Location":"","Name":"test-kilo-1","PersistentKeepalive":0,"Subnet":{"IP":"10.10.1.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.2","Mask":"//8AAA=="}},"ts":"2021-03-03T10:18:49.862060523Z"}

Wireguard looks ok on both nodes:

node1:# wg
interface: kilo0
  public key: 9Dstt0tRAYNOG51MQSdOYiQVRzu4sqKvJtGodkF+hnk=
  private key: (hidden)
  listening port: 51820

peer: N2loFuxnu7gU3mrKeGEP2WKa7c1vAHSJbUfpgfYOOj8=
  endpoint: 185.19.28.241:51820
  allowed ips: 10.10.1.0/24, 10.4.0.2/32
node2:# wg
interface: kilo0
  public key: N2loFuxnu7gU3mrKeGEP2WKa7c1vAHSJbUfpgfYOOj8=
  private key: (hidden)
  listening port: 51820

peer: 9Dstt0tRAYNOG51MQSdOYiQVRzu4sqKvJtGodkF+hnk=
  endpoint: 194.182.164.214:51820
  allowed ips: 10.10.0.0/24, 10.4.0.1/32

Let's check network interfaces and here is the problem. WG tunnel is ok, but bridge interface is not created on node1

node1: # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 06:53:98:00:0d:ba brd ff:ff:ff:ff:ff:ff
    inet 194.182.164.214/22 brd 194.182.167.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::453:98ff:fe00:dba/64 scope link
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:5c:f9:e9:5b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
4: kilo0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    link/none
    inet 10.4.0.1/16 brd 10.4.255.255 scope global kilo0
       valid_lft forever preferred_lft forever
node2:# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 06:fa:0a:00:01:2e brd ff:ff:ff:ff:ff:ff
    inet 185.19.28.241/22 brd 185.19.31.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::4fa:aff:fe00:12e/64 scope link
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:b9:77:a3:dd brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
4: kilo0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    link/none
    inet 10.4.0.2/16 brd 10.4.255.255 scope global kilo0
       valid_lft forever preferred_lft forever
5: kube-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1420 qdisc noqueue state UP group default qlen 1000
    link/ether 52:1b:ad:e0:59:3b brd ff:ff:ff:ff:ff:ff
    inet 10.10.1.1/24 scope global kube-bridge
       valid_lft forever preferred_lft forever
    inet6 fe80::501b:adff:fee0:593b/64 scope link
       valid_lft forever preferred_lft forever
6: vethd21e7af3@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1420 qdisc noqueue master kube-bridge state UP group default
    link/ether 8a:2a:7e:4e:9b:4e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::882a:7eff:fe4e:9b4e/64 scope link
       valid_lft forever preferred_lft forever
7: vethddbcea8c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1420 qdisc noqueue master kube-bridge state UP group default
    link/ether 26:5c:ee:4c:40:b7 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::245c:eeff:fe4c:40b7/64 scope link
       valid_lft forever preferred_lft forever
node1:# cat /etc/cni/net.d/10-kilo.conflist
{"cniVersion":"0.3.1","name":"kilo","plugins":[{"bridge":"kube-bridge","forceAddress":true,"ipam":{"ranges":[[{"subnet":"10.10.0.0/24"}]],"type":"host-local"},"isDefaultGateway":true,"mtu":1420,"name":"kubernetes","type":"bridge"},{"capabilities":{"portMappings":true},"snat":true,"type":"portmap"}]}
node2:# cat /etc/cni/net.d/10-kilo.conflist
{"cniVersion":"0.3.1","name":"kilo","plugins":[{"bridge":"kube-bridge","forceAddress":true,"ipam":{"ranges":[[{"subnet":"10.10.1.0/24"}]],"type":"host-local"},"isDefaultGateway":true,"mtu":1420,"name":"kubernetes","type":"bridge"},{"capabilities":{"portMappings":true},"snat":true,"type":"portmap"}]}
@squat
Copy link
Owner

squat commented Mar 3, 2021

Ack thanks a lot for reporting this. You provided tons of helpful details. Could you share some additional pieces of info:

  • What tag of the squat/kilo image are you using?
  • The kube-bridge interface is created by the kubelet, which reads the CNI config written by Kilo; can you share the kubelet logs for node-1?
  • Is this fixed by restarting node-1 or the kubelet process? If so, this really indicates some kubelet bug to me

@3rmack
Copy link
Author

3rmack commented Mar 3, 2021

  1. Guess it would be better if I paste the whole pod description here.
# kubectl -n kube-system describe po kilo-8smsx
Name:         kilo-8smsx
Namespace:    kube-system
Priority:     0
Node:         test-kilo-0/159.100.245.14
Start Time:   Wed, 03 Mar 2021 10:54:56 +0000
Labels:       app.kubernetes.io/name=kilo
              controller-revision-hash=597846cbb6
              pod-template-generation=1
Annotations:  <none>
Status:       Running
IP:           159.100.245.14
IPs:
  IP:           159.100.245.14
Controlled By:  DaemonSet/kilo
Init Containers:
  install-cni:
    Container ID:  docker://fd64ec764c26f171813f1a675f5320d06f3f8511bcd8a896e9369dc8a719bfe2
    Image:         squat/kilo
    Image ID:      docker-pullable://squat/kilo@sha256:05dcc0b50e597345a9b8afc2bb5c5eb633c205e5bd178bd257383f885cdf5ba2
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      set -e -x; cp /opt/cni/bin/* /host/opt/cni/bin/; TMP_CONF="$CNI_CONF_NAME".tmp; echo "$CNI_NETWORK_CONFIG" > $TMP_CONF; rm -f /host/etc/cni/net.d/*; mv $TMP_CONF /host/etc/cni/net.d/$CNI_CONF_NAME
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 03 Mar 2021 10:55:02 +0000
      Finished:     Wed, 03 Mar 2021 10:55:02 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      CNI_CONF_NAME:       10-kilo.conflist
      CNI_NETWORK_CONFIG:  <set to the key 'cni-conf.json' of config map 'kilo'>  Optional: false
    Mounts:
      /host/etc/cni/net.d from cni-conf-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kilo-token-4qmzq (ro)
Containers:
  kilo:
    Container ID:  docker://81b344c3451e93473974bb61fac34a9d81c57b09e4aa26045e5aa458d99408aa
    Image:         squat/kilo
    Image ID:      docker-pullable://squat/kilo@sha256:05dcc0b50e597345a9b8afc2bb5c5eb633c205e5bd178bd257383f885cdf5ba2
    Port:          <none>
    Host Port:     <none>
    Args:
      --kubeconfig=/etc/kubernetes/kubeconfig
      --hostname=$(NODE_NAME)
    State:          Running
      Started:      Wed, 03 Mar 2021 10:55:04 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /etc/cni/net.d from cni-conf-dir (rw)
      /etc/kubernetes from kubeconfig (ro)
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/kilo from kilo-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kilo-token-4qmzq (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
  cni-conf-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  kilo-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kilo
    HostPathType:
  kubeconfig:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  kilo-token-4qmzq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kilo-token-4qmzq
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     :NoSchedule op=Exists
                 :NoExecute op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason       Age    From               Message
  ----     ------       ----   ----               -------
  Normal   Scheduled    4m40s  default-scheduler  Successfully assigned kube-system/kilo-8smsx to test-kilo-0
  Warning  FailedMount  4m40s  kubelet            MountVolume.SetUp failed for volume "kilo-token-4qmzq" : failed to sync secret cache: timed out waiting for the condition
  Normal   Pulling      4m39s  kubelet            Pulling image "squat/kilo"
  Normal   Pulled       4m35s  kubelet            Successfully pulled image "squat/kilo" in 3.349492743s
  Normal   Created      4m35s  kubelet            Created container install-cni
  Normal   Started      4m35s  kubelet            Started container install-cni
  Normal   Pulling      4m35s  kubelet            Pulling image "squat/kilo"
  Normal   Pulled       4m33s  kubelet            Successfully pulled image "squat/kilo" in 1.646763634s
  Normal   Created      4m33s  kubelet            Created container kilo
  Normal   Started      4m33s  kubelet            Started container kilo
  1. There are tons of logs, here are the most useful I found:
Mar 03 10:54:50 test-kilo-0 kubelet[4618]: W0303 10:54:50.652407    4618 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Mar 03 10:54:52 test-kilo-0 kubelet[4618]: E0303 10:54:52.043437    4618 kubelet.go:2184] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Mar 03 10:54:55 test-kilo-0 kubelet[4618]: W0303 10:54:55.652779    4618 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: I0303 10:54:56.523457    4618 topology_manager.go:187] [topologymanager] Topology Admit Handler
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: E0303 10:54:56.533865    4618 reflector.go:138] object-"kube-system"/"kilo": Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "kilo" is forbidden: User "system:node:test-kilo-0" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node 'test-kilo-0' and this object
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: E0303 10:54:56.533931    4618 reflector.go:138] object-"kube-system"/"kilo-token-4qmzq": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "kilo-token-4qmzq" is forbidden: User "system:node:test-kilo-0" cannot list resource "secrets" in API group "" in the namespace "kube-system": no relationship found between node 'test-kilo-0' and this object
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: I0303 10:54:56.644244    4618 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "kubeconfig" (UniqueName: "kubernetes.io/configmap/6f2aa1fb-6811-4dbf-849b-0cbfb740d25f-kubeconfig") pod "kilo-8smsx" (UID: "6f2aa1fb-6811-4dbf-849b-0cbfb740d25f")
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: I0303 10:54:56.644318    4618 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/6f2aa1fb-6811-4dbf-849b-0cbfb740d25f-lib-modules") pod "kilo-8smsx" (UID: "6f2aa1fb-6811-4dbf-849b-0cbfb740d25f")
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: I0303 10:54:56.644341    4618 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/6f2aa1fb-6811-4dbf-849b-0cbfb740d25f-xtables-lock") pod "kilo-8smsx" (UID: "6f2aa1fb-6811-4dbf-849b-0cbfb740d25f")
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: I0303 10:54:56.644420    4618 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "kilo-dir" (UniqueName: "kubernetes.io/host-path/6f2aa1fb-6811-4dbf-849b-0cbfb740d25f-kilo-dir") pod "kilo-8smsx" (UID: "6f2aa1fb-6811-4dbf-849b-0cbfb740d25f")
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: I0303 10:54:56.644458    4618 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "kilo-token-4qmzq" (UniqueName: "kubernetes.io/secret/6f2aa1fb-6811-4dbf-849b-0cbfb740d25f-kilo-token-4qmzq") pod "kilo-8smsx" (UID: "6f2aa1fb-6811-4dbf-849b-0cbfb740d25f")
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: I0303 10:54:56.644480    4618 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "cni-bin-dir" (UniqueName: "kubernetes.io/host-path/6f2aa1fb-6811-4dbf-849b-0cbfb740d25f-cni-bin-dir") pod "kilo-8smsx" (UID: "6f2aa1fb-6811-4dbf-849b-0cbfb740d25f")
Mar 03 10:54:56 test-kilo-0 kubelet[4618]: I0303 10:54:56.644506    4618 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "cni-conf-dir" (UniqueName: "kubernetes.io/host-path/6f2aa1fb-6811-4dbf-849b-0cbfb740d25f-cni-conf-dir") pod "kilo-8smsx" (UID: "6f2aa1fb-6811-4dbf-849b-0cbfb740d25f")
Mar 03 10:54:57 test-kilo-0 kubelet[4618]: E0303 10:54:57.056909    4618 kubelet.go:2184] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Mar 03 10:54:57 test-kilo-0 kubelet[4618]: E0303 10:54:57.745928    4618 secret.go:195] Couldn't get secret kube-system/kilo-token-4qmzq: failed to sync secret cache: timed out waiting for the condition
Mar 03 10:54:57 test-kilo-0 kubelet[4618]: E0303 10:54:57.746888    4618 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/6f2aa1fb-6811-4dbf-849b-0cbfb740d25f-kilo-token-4qmzq podName:6f2aa1fb-6811-4dbf-849b-0cbfb740d25f nodeName:}" failed. No retries permitted until 2021-03-03 10:54:58.246746213 +0000 UTC m=+222.879815780 (durationBeforeRetry 500ms). Error: "MountVolume.SetUp failed for volume \"kilo-token-4qmzq\" (UniqueName: \"kubernetes.io/secret/6f2aa1fb-6811-4dbf-849b-0cbfb740d25f-kilo-token-4qmzq\") pod \"kilo-8smsx\" (UID: \"6f2aa1fb-6811-4dbf-849b-0cbfb740d25f\") : failed to sync secret cache: timed out waiting for the condition"
Mar 03 10:55:00 test-kilo-0 kubelet[4618]: W0303 10:55:00.653935    4618 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Mar 03 10:55:02 test-kilo-0 kubelet[4618]: E0303 10:55:02.092521    4618 kubelet.go:2184] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
  1. Reboot or any service resart doesn't help. Cluster remains in the same state.

@squat
Copy link
Owner

squat commented Mar 3, 2021

@3rmack thanks for the quick reply. It's comforting that restarting doesn't resolve the issue, otherwise we might not have a convincing solution.

The kubelet seems to complain that it can't find any configuration in the CNI directory. Indeed, it seems that the Kilo manifests for kubeadm install the CNI configuration in the wrong directory. They are using /etc/kubernetes/cni/net.d [0] when they should be using /etc/cni/net.d [1].

Can you try redeploying the Kilo DaemonSet with the corrected host path?

If this fixes the issue, then please submit a PR if you can :)

[0] https://github.com/squat/kilo/blob/main/manifests/kilo-kubeadm.yaml#L163
[1] https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#cni

@3rmack
Copy link
Author

3rmack commented Mar 4, 2021

Actually, it is already deployed with the correct path /etc/cni/net.d. It was found and corrected from the very beginning of testing kilo.

@squat
Copy link
Owner

squat commented Mar 4, 2021

Hmm in that case we'll need a bit more inspection. Can you please share the output of cat /etc/cni/net.d/10-kilo.conflist from the broken node? I need to double check nothing changed in v1.20 that prevents CNI v0.3.1 from being read.

You wrote:

If we check kilo logs we can see that config on node1 is incomplete.

What do you mean? I didn't see any logs from Kilo that imply that.

@squat
Copy link
Owner

squat commented Mar 4, 2021

Took a look and CNI v0.3.1 should still work. We need to ensure that the CNI configuration file exists for the broken node and check if there are recent kubelet logs that continue to complain about no networks found

squat added a commit that referenced this issue Mar 4, 2021
As discussed in
#129 (comment),
the Kilo manifests for kubeadm install the CNI configuration in the
wrong directory. They are using /etc/kubernetes/cni/net.d [0] when they
should be using /etc/cni/net.d [1].

[0]
https://github.com/squat/kilo/blob/main/manifests/kilo-kubeadm.yaml#L163
[1]
https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#cni

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>
@3rmack
Copy link
Author

3rmack commented Mar 12, 2021

What do you mean? I didn't see any logs from Kilo that imply that.

I want to say that if we compare logs from both kilo pods - logs from pod on "failed" node are shorter. There is an extra "update" event in logs on "ok" node.

We need to ensure that the CNI configuration file exists for the broken node and check if there are recent kubelet logs that continue to complain about no networks found

Config files present on both nodes. Please check my initial post - outputs of those files are in the very end of it.
Kubelete logs are complainnig about no networks found with existing cni config in /etc/cni/net.d dir.

@squat
Copy link
Owner

squat commented Mar 12, 2021

Ack, yes i somehow missed them earlier. Thanks.
I spun up a two node kubeadm cluster last night and haven't been able to replicate this issue yet.

@3rmack
Copy link
Author

3rmack commented Mar 12, 2021

I spun up a two nice kubeadm cluster last night and haven't been able to replicate this issue yet.

Also worth mentioning that it could be hw/os/etc issue on cloud provider where I tested this. Ubuntu servers where created from from cloud provider's templates.

@squat
Copy link
Owner

squat commented May 26, 2021

I haven't been able to replicate this :/ this seems like it may be an issue with the specific environment, perhaps a container runtime issue

@hhstu
Copy link
Contributor

hhstu commented Mar 25, 2022

I get the same err. Can i get any help?

root@RJYF-P-337:/etc/cni/net.d#
root@RJYF-P-337:/etc/cni/net.d# cat 10-kilo.conflist  |jq
{
  "cniVersion": "0.3.1",
  "name": "kilo",
  "plugins": [
    {
      "bridge": "kube-bridge",
      "forceAddress": true,
      "ipam": {
        "ranges": [
          [
            {
              "subnet": "10.1.0.0/24"
            }
          ]
        ],
        "type": "host-local"
      },
      "isDefaultGateway": true,
      "mtu": 1420,
      "name": "kubernetes",
      "type": "bridge"
    },
    {
      "capabilities": {
        "portMappings": true
      },
      "snat": true,
      "type": "portmap"
    }
  ]
}
root@RJYF-P-337:/etc/cni/net.d#
root@RJYF-P-337:/etc/cni/net.d#
root@RJYF-P-337:/etc/cni/net.d# kubectl  logs  -f  -n  kube-system kilo-lcjvb   kilo
{"caller":"main.go:277","msg":"Starting Kilo network mesh 'a1af9790ea541c683d528d5a1d23075528d682d4'.","ts":"2022-03-25T06:58:31.331505641Z"}
{"caller":"cni.go:61","component":"kilo","err":"failed to read IPAM config from CNI config list file: no IP ranges specified","level":"warn","msg":"failed to get CIDR from CNI file; overwriting it","ts":"2022-03-25T06:58:31.432995767Z"}
{"caller":"cni.go:69","component":"kilo","level":"info","msg":"CIDR in CNI file is empty","ts":"2022-03-25T06:58:31.433046208Z"}
{"CIDR":"10.1.0.0/24","caller":"cni.go:74","component":"kilo","level":"info","msg":"setting CIDR in CNI file","ts":"2022-03-25T06:58:31.43305818Z"}
{"caller":"mesh.go:375","component":"kilo","level":"info","msg":"overriding endpoint","new endpoint":"172.20.60.28:51820","node":"rjyf-p-337","old endpoint":"","ts":"2022-03-25T06:58:31.541709926Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T06:58:31.555442689Z"}
{"caller":"mesh.go:309","component":"kilo","event":"add","level":"info","node":{"Endpoint":{},"Key":[27,123,34,254,51,164,151,222,139,112,14,118,233,72,232,252,215,192,141,112,145,225,11,124,100,1,92,187,19,84,89,108],"NoInternalIP":false,"InternalIP":{"IP":"10.2.0.1","Mask":"/////w=="},"LastSeen":1648191504,"Leader":false,"Location":"","Name":"lc","PersistentKeepalive":0,"Subnet":{"IP":"10.1.3.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":"full"},"ts":"2022-03-25T06:58:31.555600099Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T06:58:31.556817226Z"}
{"caller":"mesh.go:309","component":"kilo","event":"add","level":"info","node":{"Endpoint":null,"Key":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"NoInternalIP":false,"InternalIP":null,"LastSeen":0,"Leader":false,"Location":"gcp","Name":"rjyf-p-335","PersistentKeepalive":0,"Subnet":{"IP":"10.1.1.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":""},"ts":"2022-03-25T06:58:31.556912803Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T06:58:31.557776266Z"}
{"caller":"mesh.go:309","component":"kilo","event":"add","level":"info","node":{"Endpoint":{},"Key":[199,66,125,140,234,59,65,207,73,92,126,95,247,144,33,194,75,219,98,104,213,187,67,24,129,193,0,124,228,8,160,31],"NoInternalIP":false,"InternalIP":{"IP":"172.20.60.31","Mask":"///8AA=="},"LastSeen":1648191502,"Leader":false,"Location":"gcp","Name":"rjyf-p-336","PersistentKeepalive":0,"Subnet":{"IP":"10.1.2.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":"full"},"ts":"2022-03-25T06:58:31.557862063Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T06:58:31.55877808Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T06:59:01.543738566Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T06:59:31.545704256Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:00:01.547772771Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:00:31.550088195Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:01:01.551853854Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:01:31.554154067Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:02:01.556278704Z"}
{"caller":"mesh.go:309","component":"kilo","event":"update","level":"info","node":{"Endpoint":null,"Key":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"NoInternalIP":false,"InternalIP":null,"LastSeen":0,"Leader":false,"Location":"gcp","Name":"rjyf-p-335","PersistentKeepalive":0,"Subnet":{"IP":"10.1.1.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":""},"ts":"2022-03-25T07:02:14.831024851Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:02:14.832378096Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:02:31.558733749Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:03:01.560622049Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:03:31.563116772Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:04:01.565075605Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:04:31.568063262Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:05:01.57004051Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:05:31.571529246Z"}
{"caller":"mesh.go:482","component":"kilo","error":"file does not exist","level":"error","ts":"2022-03-25T07:06:01.573270241Z"}
^C
root@RJYF-P-337:/etc/cni/net.d#

It is my yaml :

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kilo
  namespace: kube-system
  labels:
    app.kubernetes.io/name: kilo
    app.kubernetes.io/part-of: kilo
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kilo
      app.kubernetes.io/part-of: kilo
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kilo
        app.kubernetes.io/part-of: kilo
    spec:
      serviceAccountName: kilo
      hostNetwork: true
      containers:
      - name: boringtun
        image: leonnicolas/boringtun
        args:
          - --disable-drop-privileges=true
          - --foreground
          - kilo0
        securityContext:
          privileged: true
        volumeMounts:
          - name: wireguard
            mountPath: /var/run/wireguard
            readOnly: false
      - name: kilo
        image: squat/kilo
        args:
        - --kubeconfig=/etc/kubernetes/kubeconfig
        - --hostname=$(NODE_NAME)
        - --create-interface=false
        - --interface=kilo0
        - --mesh-granularity=full
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        ports:
        - containerPort: 1107
          name: metrics
        securityContext:
          privileged: true
        volumeMounts:
        - name: cni-conf-dir
          mountPath: /etc/cni/net.d
        - name: kilo-dir
          mountPath: /var/lib/kilo
        - name: kubeconfig
          mountPath: /etc/kubernetes
          readOnly: true
        - name: lib-modules
          mountPath: /lib/modules
          readOnly: true
        - name: xtables-lock
          mountPath: /run/xtables.lock
          readOnly: false
      initContainers:
      - name: install-cni
        image: squat/kilo
        command:
        - /bin/sh
        - -c
        - set -e -x;
          cp /opt/cni/bin/* /host/opt/cni/bin/;
          TMP_CONF="$CNI_CONF_NAME".tmp;
          echo "$CNI_NETWORK_CONFIG" > $TMP_CONF;
          rm -f /host/etc/cni/net.d/*;
          mv $TMP_CONF /host/etc/cni/net.d/$CNI_CONF_NAME
        env:
        - name: CNI_CONF_NAME
          value: 10-kilo.conflist
        - name: CNI_NETWORK_CONFIG
          valueFrom:
            configMapKeyRef:
              name: kilo
              key: cni-conf.json
        volumeMounts:
        - name: cni-bin-dir
          mountPath: /host/opt/cni/bin
        - name: cni-conf-dir
          mountPath: /host/etc/cni/net.d
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - effect: NoExecute
        operator: Exists
      volumes:
      - name: cni-bin-dir
        hostPath:
          path: /opt/cni/bin
      - name: cni-conf-dir
        hostPath:
          path: /etc/cni/net.d
      - name: kilo-dir
        hostPath:
          path: /var/lib/kilo
      - name: kubeconfig
        configMap:
          name: kube-proxy
          items:
          - key: kubeconfig.conf
            path: kubeconfig
      - name: lib-modules
        hostPath:
          path: /lib/modules
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
      - name: wireguard
        hostPath:
          path: /var/run/wireguard

kubernetes version: v1.20.9
containerd version: v1.5.2
os: ubuntu:18.04

@squat
Copy link
Owner

squat commented Mar 25, 2022

@hhstu this seems like the kilo0 device is not available / doesn't exist. Are there any logs from the boringtun container?

@hhstu
Copy link
Contributor

hhstu commented Mar 25, 2022

@hhstu this seems like the kilo0 device is not available / doesn't exist. Are there any logs from the boringtun container?

just .. @squat

root@RJYF-P-337:/etc/cni/net.d# kubectl  logs  -f  -n  kube-system  kilo-tgmnz    boringtun
  2022-03-25T07:23:39.490195Z  INFO boringtun_cli: BoringTun started successfully
    at boringtun-cli/src/main.rs:178

@squat
Copy link
Owner

squat commented Mar 25, 2022

Hmmm can you please show a list of the devices available in the erroring Kilo Pod? ip l

@hhstu
Copy link
Contributor

hhstu commented Mar 25, 2022

@squat

root@RJYF-P-337:/etc/cni/net.d# kubectl  exec  -it    -n  kube-system  kilo-tgmnz  -c  kilo  ip a
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:95:d4:7c brd ff:ff:ff:ff:ff:ff
    inet 172.20.60.28/22 brd 172.20.63.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe95:d47c/64 scope link
       valid_lft forever preferred_lft forever
3: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN
    link/ether ba:35:6f:9a:01:f1 brd ff:ff:ff:ff:ff:ff
    inet 10.2.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.2.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
4: kube-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1420 qdisc noqueue state UP qlen 1000
    link/ether 82:7f:e9:a4:2f:9f brd ff:ff:ff:ff:ff:ff
    inet 10.1.0.1/24 brd 10.1.0.255 scope global kube-bridge
       valid_lft forever preferred_lft forever
    inet6 fe80::a0ae:11ff:fee8:a477/64 scope link
       valid_lft forever preferred_lft forever
17: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
31: kilo0: <POINTOPOINT,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN qlen 500
    link/[65534]
32: vethd3527ba5@kube-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1420 qdisc noqueue master kube-bridge state UP
    link/ether 82:7f:e9:a4:2f:9f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::807f:e9ff:fea4:2f9f/64 scope link
       valid_lft forever preferred_lft forever
root@RJYF-P-337:/etc/cni/net.d#

@squat
Copy link
Owner

squat commented Mar 25, 2022

Thanks @hhstu so there is indeed a kilo0 interface available. Some things that come to mind:

What differences are there between this node and the one that is working? Different OS? OS version? Hardware? One uses boringtun the other doesn't? Etc knowing the differences may help determine why this works on one machine but not the other

@hhstu
Copy link
Contributor

hhstu commented Mar 25, 2022

This is a new cluster of kubedm. All pods of kilo are not work! I have never do it well @squat

It is my kubeadm-config

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 172.20.60.28
  bindPort: 6443
nodeRegistration:
  criSocket: /run/containerd/containerd.sock
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.20.9
controlPlaneEndpoint: apiserver.cluster.local:6443
imageRepository: 172.20.60.28:5000/ccs
networking:
  dnsDomain: cluster.local
  podSubnet: 10.1.0.0/16
  serviceSubnet: 10.2.0.0/16
apiServer:
  certSANs:
  - 127.0.0.1
  - apiserver.cluster.local
  - 172.20.60.28
  - 172.20.60.31
  - 172.20.60.32
  - 10.103.97.2
  extraArgs:
    feature-gates: TTLAfterFinished=true,RemoveSelfLink=false
    max-mutating-requests-inflight: "4000"
    max-requests-inflight: "8000"
    default-unreachable-toleration-seconds: "2"
  extraVolumes:
  - name: localtime
    hostPath: /etc/localtime
    mountPath: /etc/localtime
    readOnly: true
    pathType: File
controllerManager:
  extraArgs:
    bind-address: 0.0.0.0
    secure-port: "10257"
    port: "10252"
    kube-api-burst: "100"
    kube-api-qps: "50"
    feature-gates: TTLAfterFinished=true,RemoveSelfLink=false
    experimental-cluster-signing-duration: 876000h
  extraVolumes:
  - hostPath: /etc/localtime
    mountPath: /etc/localtime
    name: localtime
    readOnly: true
    pathType: File
scheduler:
  extraArgs:
    bind-address: 0.0.0.0
    kube-api-burst: "100"
    kube-api-qps: "50"
    port: "10251"
    secure-port: "10259"
    feature-gates: TTLAfterFinished=true,RemoveSelfLink=false
  extraVolumes:
  - hostPath: /etc/localtime
    mountPath: /etc/localtime
    name: localtime
    readOnly: true
    pathType: File
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
metricsBindAddress: 0.0.0.0
bindAddress: 0.0.0.0
ipvs:
  syncPeriod: 30s
  minSyncPeriod: 5s
  scheduler: rr
  excludeCIDRs:
  - 10.103.97.2/32

---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
kubeAPIQPS: 40
kubeAPIBurst: 50
imageMinimumGCAge: 48h
imageGCHighThresholdPercent: 85
evictionHard:
  imagefs.available: 5%
  memory.available: 100Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%

@squat
Copy link
Owner

squat commented Mar 25, 2022

Also, @hhstu does this work if you pin Kilo to 0.3.1?

I wonder if this might be due to the switch to using a different WireGuard client library.

@hhstu
Copy link
Contributor

hhstu commented Mar 25, 2022

After change to 0.3.1

root@RJYF-P-337:~# kubectl  logs  -f  -n  kube-system  kilo-8zvcp      kilo
{"caller":"main.go:221","msg":"Starting Kilo network mesh '0.3.1'.","ts":"2022-03-25T08:18:42.676269936Z"}
{"caller":"cni.go:60","component":"kilo","err":"failed to read IPAM config from CNI config list file: no IP ranges specified","level":"warn","msg":"failed to get CIDR from CNI file; overwriting it","ts":"2022-03-25T08:18:42.777749293Z"}
{"caller":"cni.go:68","component":"kilo","level":"info","msg":"CIDR in CNI file is empty","ts":"2022-03-25T08:18:42.777855805Z"}
{"CIDR":"10.1.2.0/24","caller":"cni.go:73","component":"kilo","level":"info","msg":"setting CIDR in CNI file","ts":"2022-03-25T08:18:42.777881579Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:18:42.903321326Z"}
{"caller":"mesh.go:297","component":"kilo","event":"add","level":"info","node":{"Endpoint":null,"Key":"","NoInternalIP":false,"InternalIP":null,"LastSeen":0,"Leader":false,"Location":"gcp","Name":"rjyf-p-337","PersistentKeepalive":0,"Subnet":{"IP":"10.1.0.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":""},"ts":"2022-03-25T08:18:42.903400228Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:18:42.904926106Z"}
{"caller":"mesh.go:297","component":"kilo","event":"add","level":"info","node":{"Endpoint":null,"Key":"","NoInternalIP":false,"InternalIP":null,"LastSeen":0,"Leader":false,"Location":"","Name":"lc","PersistentKeepalive":0,"Subnet":{"IP":"10.1.3.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":""},"ts":"2022-03-25T08:18:42.904978993Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:18:42.907857284Z"}
{"caller":"mesh.go:297","component":"kilo","event":"add","level":"info","node":{"Endpoint":null,"Key":"","NoInternalIP":false,"InternalIP":null,"LastSeen":0,"Leader":false,"Location":"gcp","Name":"rjyf-p-335","PersistentKeepalive":0,"Subnet":{"IP":"10.1.1.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":""},"ts":"2022-03-25T08:18:42.908017109Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:18:42.909536967Z"}
{"caller":"mesh.go:297","component":"kilo","event":"update","level":"info","node":{"Endpoint":{"DNS":"","IP":"172.20.60.32","Port":51820},"Key":"dHRvZ2VyaEZtMk5sczBtUTN2M0x6bFBLWWZ4R2dDQ0JobEtHZEZKVGFtaz0=","NoInternalIP":false,"InternalIP":{"IP":"172.20.60.32","Mask":"///8AA=="},"LastSeen":1648196324,"Leader":false,"Location":"gcp","Name":"rjyf-p-335","PersistentKeepalive":0,"Subnet":{"IP":"10.1.1.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":"full"},"ts":"2022-03-25T08:18:44.152516296Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:18:44.154150488Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:19:12.888451019Z"}
{"caller":"mesh.go:297","component":"kilo","event":"update","level":"info","node":{"Endpoint":{"DNS":"","IP":"172.10.97.10","Port":51820},"Key":"RzNzaS9qT2tsOTZMY0E1MjZVam8vTmZBalhDUjRRdDhaQUZjdXhOVVdXdz0=","NoInternalIP":false,"InternalIP":{"IP":"10.2.0.1","Mask":"/////w=="},"LastSeen":1648196362,"Leader":false,"Location":"","Name":"lc","PersistentKeepalive":0,"Subnet":{"IP":"10.1.3.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":"full"},"ts":"2022-03-25T08:19:22.203607866Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:19:22.20554322Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:19:42.891505001Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:20:12.894562702Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:20:42.897019171Z"}
{"caller":"mesh.go:459","component":"kilo","error":"failed to read the WireGuard dump output: Unable to access interface: Protocol not supported\n","level":"error","ts":"2022-03-25T08:21:12.900503875Z"}

@squat
Copy link
Owner

squat commented Mar 25, 2022

Thanks @hhstu those logs are a bit more helpful. Unable to access interface: Protocol not supported seems to be a pretty common symptom of WireGuard problems.

@hhstu
Copy link
Contributor

hhstu commented Mar 25, 2022

Thinks @squat I will continue to check the problem

@hhstu
Copy link
Contributor

hhstu commented Mar 29, 2022

Thanks @hhstu those logs are a bit more helpful. Unable to access interface: Protocol not supported seems to be a pretty common symptom of WireGuard problems.

Can I add some use cases such as kubeadm-userspace.yaml,kubeadm-flannel-userspace.yaml with a pr ?

@squat
Copy link
Owner

squat commented Mar 29, 2022

Hi @hhstu yes, I'd be very interested in taking a look at a PR for that 👍. I'm curious how/why it's different. Our E2E tests run on KinD, which uses kubeadm and we test userspace

@hhstu
Copy link
Contributor

hhstu commented Mar 29, 2022

There is nothing different,just my mistake i forget set the wiregard volume of kilo container. I hoped add the use cases kubeadm-userspace kubeadm-flannel-userspace for next person.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants