Node behind NAT always NotReady #314

idcmp · 2022-05-05T12:47:49Z

Setup

one EC2 instance running Amazon Linux 2 with a public IP, Kubernetes 1.24 bootstrapped via kubeadm init ...
one VM on my laptop, running Fedora 35, Kubernetes 1.24 bootstrapped kubeadm join ... (not as control plane)
applied manifests/kilo-kubeadm.yaml from this repo
Debugging turned on via echo "module wireguard +p" >/sys/kernel/debug/dynamic_debug/control
keepalive set to 10 seconds
VM laptop node never comes out of KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
I can sporadically ping the wg IPs of the other side of both instances.

Situation

Eventually the connection "breaks". Logs on EC2:

.compute.internal kernel: wireguard: kilo0: Keypair 17 destroyed for peer 28
.compute.internal kernel: wireguard: kilo0: Keypair 18 destroyed for peer 28
.compute.internal kernel: wireguard: kilo0: Peer 28 (my.external.laptop.ip:52412) destroyed
.compute.internal kernel: wireguard: kilo0: Invalid handshake initiation from my.external.laptop.ip:52412
.compute.internal kernel: wireguard: kilo0: Invalid handshake initiation from my.external.laptop.ip:52412

Relevant log on the laptop VM:

Sending handshake initiation to peer 18 (ec2.public.ip:51820)

The kilo pod on the EC2 instance is Running, but the one on laptop VM is in CrashLoopBackOff.

Description

The wg configuration for the EC2 instance has Endpoint=my.external.laptop.ip:52412 listed in the Peer configuration stanza. I think when key rotation happens, the EC2 instance tries to connect back to the laptop VM (which it can't), and the handshake errors are from the mismatch of the laptop VM trying to connect to the EC2 instance at the same time.

I think if the Endpoint= wasn't listed in the Peer stanza, the laptop VM would just reconnect after rekeying and life would continue on. I'm hoping that would also fix why the laptop VM is still in NotReady too.

The text was updated successfully, but these errors were encountered:

squat · 2022-05-05T13:08:00Z

hi @idcmp can you share the Kilo logs from the laptop VM? This is the most important piece of information here for why the node is unready.

idcmp · 2022-05-05T13:32:30Z

Here are some logs from the kilo pod running on the laptop VM:

{"caller":"main.go:280","msg":"Starting Kilo network mesh '0.5.0'.","ts":"2022-05-05T13:19:38.55270756Z"}
{"caller":"cni.go:61","component":"kilo","err":"failed to read IPAM config from CNI config list file: no IP ranges specified","level":"warn","msg":"failed to get CIDR from CNI file; overwriting it","ts":"2022-05-05T13:19:38.660242599Z"}
{"caller":"cni.go:69","component":"kilo","level":"info","msg":"CIDR in CNI file is empty","ts":"2022-05-05T13:19:38.660405863Z"}
{"CIDR":"10.244.1.0/24","caller":"cni.go:74","component":"kilo","level":"info","msg":"setting CIDR in CNI file","ts":"2022-05-05T13:19:38.660480957Z"}
{"caller":"mesh.go:309","component":"kilo","event":"add","level":"info","node":{"Endpoint":{},"Key":[8,....8],"NoInternalIP":false,"InternalIP":{"IP":"ec2.internal.ip","Mask":"////AA=="},"LastSeen":1651756768,"Leader":true,"Location":"aws","Name":".compute.internal","PersistentKeepalive":0,"Subnet":{"IP":"10.244.0.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.1","Mask":"//8AAA=="},"DiscoveredEndpoints":{"s...I=":{"IP":"my.external.laptop.ip","Port":52412,"Zone":""}},"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T13:19:38.782897818Z"}
{"caller":"mesh.go:375","component":"kilo","level":"info","msg":"overriding endpoint","new endpoint":"10.0.2.15:51820","node":"fedora","old endpoint":"","ts":"2022-05-05T13:19:38.78309792Z"}

Looking at journalctl, there's a bunch of:

May 05 06:28:19 fedora kubelet[113625]: E0505 06:28:19.456144  113625 file.go:104] "Unable to read config path" err="/etc/kubernetes/manifests: couldn't parse as pod(Object 'Kind' is missing in 'null'), please check config file" path="/etc/kubernetes/manifests"
...
May 05 06:28:21 fedora kubelet[113625]: E0505 06:28:21.503848  113625 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-bxmdk_kube-system(931c6970-dc21-4834-a73f-db7f33baacdd)\"" pod="kube-system/kube-proxy-bxmdk" podUID=931c6970-dc21-4834-a73f-db7f33baacdd

and of course

May 05 06:28:34 fedora kubelet[113625]: E0505 06:28:34.466078  113625 kubelet.go:2344] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

My dmesg has wg debug messages:

[40101.276150] wireguard: kilo0: Keypair 260 destroyed for peer 19
[40101.276152] wireguard: kilo0: Keypair 262 created for peer 19
[40101.276155] wireguard: kilo0: Sending keepalive packet to peer 19 (ec2.public.ip:51820)
[40176.492541] wireguard: kilo0: Retrying handshake with peer 19 (ec2.public.ip:51820) because we stopped hearing back after 15 seconds
[40176.492596] wireguard: kilo0: Sending handshake initiation to peer 19 (ec2.public.ip:51820)
[40176.540046] wireguard: kilo0: Receiving handshake response from peer 19 (ec2.public.ip:51820)
[40176.540056] wireguard: kilo0: Keypair 261 destroyed for peer 19
[40176.540057] wireguard: kilo0: Keypair 263 created for peer 19
[40176.540061] wireguard: kilo0: Sending keepalive packet to peer 19 (ec2.public.ip:51820)

Hope this helps, just let me know if there are other logs you'd like!

squat · 2022-05-05T14:03:41Z

Hi @idcmp, you mentioned that the Kilo pod on the laptop VM was crashlooping, but the logs you shared don't mention anything about a crash. Are there more logs that do include this?
If not, is there any other indication as to why the pod is crashlooping?

idcmp · 2022-05-05T18:14:05Z

I'll see if I can follow the logs and catch it in the act (logs -p doesn't seem to be working).

Here's a describe of the pod:

[root@ip-172-31-0-114 ~]# kubectl describe pod kilo-7mkm8 -n kube-system
Name:         kilo-7mkm8
Namespace:    kube-system
Priority:     0
Node:         fedora/10.0.2.15
Start Time:   Thu, 05 May 2022 04:59:31 -0700
Labels:       app.kubernetes.io/name=kilo
              app.kubernetes.io/part-of=kilo
              controller-revision-hash=dbcdbbbdb
              pod-template-generation=1
Annotations:  <none>
Status:       Running
IP:           laptop.vm.internal.ip
IPs:
  IP:           laptop.vm.internal.ip
Controlled By:  DaemonSet/kilo
Init Containers:
  install-cni:
    Container ID:  containerd://b07eee639a7ca47b81cedbee6a4086dcb2b5f3069775233e4304738eaf14cd29
    Image:         squat/kilo:0.5.0
    Image ID:      docker.io/squat/kilo@sha256:feb4d02f3c5a6d51ba9687b5f75d3e6b63ebb90675058a6a04f36ec46ff214ef
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      set -e -x; cp /opt/cni/bin/* /host/opt/cni/bin/; TMP_CONF="$CNI_CONF_NAME".tmp; echo "$CNI_NETWORK_CONFIG" > $TMP_CONF; rm -f /host/etc/cni/net.d/*; mv $TMP_CONF /host/etc/cni/net.d/$CNI_CONF_NAME
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 05 May 2022 07:03:53 -0700
      Finished:     Thu, 05 May 2022 07:03:53 -0700
    Ready:          True
    Restart Count:  4
    Environment:
      CNI_CONF_NAME:       10-kilo.conflist
      CNI_NETWORK_CONFIG:  <set to the key 'cni-conf.json' of config map 'kilo'>  Optional: false
    Mounts:
      /host/etc/cni/net.d from cni-conf-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k55bv (ro)
Containers:
  kilo:
    Container ID:  containerd://475caf1dca40a5509e9fe2fb7d18799da9e3130ceafd01f8b057d6ce3355e99c
    Image:         squat/kilo:0.5.0
    Image ID:      docker.io/squat/kilo@sha256:feb4d02f3c5a6d51ba9687b5f75d3e6b63ebb90675058a6a04f36ec46ff214ef
    Port:          1107/TCP
    Host Port:     1107/TCP
    Args:
      --kubeconfig=/etc/kubernetes/kubeconfig
      --hostname=$(NODE_NAME)
    State:          Running
      Started:      Thu, 05 May 2022 07:04:44 -0700
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 05 May 2022 07:01:18 -0700
      Finished:     Thu, 05 May 2022 07:03:52 -0700
    Ready:          True
    Restart Count:  23
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /etc/cni/net.d from cni-conf-dir (rw)
      /etc/kubernetes from kubeconfig (ro)
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/kilo from kilo-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k55bv (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
  cni-conf-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  kilo-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kilo
    HostPathType:
  kubeconfig:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  kube-api-access-k55bv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 :NoSchedule op=Exists
                             :NoExecute op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason   Age                      From     Message
  ----     ------   ----                     ----     -------
  Warning  BackOff  3h50m (x170 over 5h42m)  kubelet  Back-off restarting failed container
  Normal   Killing  3h45m (x21 over 5h44m)   kubelet  Stopping container kilo
  Normal   Pulled   3h40m (x24 over 5h45m)   kubelet  Container image "squat/kilo:0.5.0" already present on machine

kube-proxy configmap:

Name:         kube-proxy
Namespace:    kube-system
Labels:       app=kube-proxy
Annotations:  kubeadm.kubernetes.io/component-config.hash: sha256:cb60c18eab70a0ff3b37943616f791331d863a180d6371236b15c186913967ae

Data
====
config.conf:
----
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
bindAddressHardFail: false
clientConnection:
  acceptContentTypes: ""
  burst: 0
  contentType: ""
  kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
  qps: 0
clusterCIDR: 10.244.0.0/16
configSyncPeriod: 0s
conntrack:
  maxPerCore: null
  min: null
  tcpCloseWaitTimeout: null
  tcpEstablishedTimeout: null
detectLocal:
  bridgeInterface: ""
  interfaceNamePrefix: ""
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
  masqueradeAll: false
  masqueradeBit: null
  minSyncPeriod: 0s
  syncPeriod: 0s
ipvs:
  excludeCIDRs: null
  minSyncPeriod: 0s
  scheduler: ""
  strictARP: false
  syncPeriod: 0s
  tcpFinTimeout: 0s
  tcpTimeout: 0s
  udpTimeout: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
showHiddenMetricsForVersion: ""
udpIdleTimeout: 0s
winkernel:
  enableDSR: false
  forwardHealthCheckVip: false
  networkName: ""
  rootHnsEndpointName: ""
  sourceVip: ""
kubeconfig.conf:
----
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server: https://ec2.external.hostname:6443
  name: default
contexts:
- context:
    cluster: default
    namespace: default
    user: default
  name: default
current-context: default
users:
- name: default
  user:
    tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

BinaryData
====

Events:  <none>

kilo configmap:

# kubectl describe cm kilo -n kube-system
Name:         kilo
Namespace:    kube-system
Labels:       app.kubernetes.io/name=kilo
Annotations:  <none>

Data
====
cni-conf.json:
----
{
   "cniVersion":"0.4.0",
   "name":"kilo",
   "plugins":[
      {
         "name":"kubernetes",
         "type":"bridge",
         "bridge":"kube-bridge",
         "isDefaultGateway":true,
         "forceAddress":true,
         "mtu": 1420,
         "ipam":{
            "type":"host-local"
         }
      },
      {
         "type":"portmap",
         "snat":true,
         "capabilities":{
            "portMappings":true
         }
      }
   ]
}


BinaryData
====

Events:  <none>

idcmp · 2022-05-05T18:18:23Z

Caught one:

# kubectl logs -f -n kube-system kilo-7mkm8
Defaulted container "kilo" out of: kilo, install-cni (init)
{"caller":"main.go:280","msg":"Starting Kilo network mesh '0.5.0'.","ts":"2022-05-05T18:09:35.628352703Z"}
{"caller":"cni.go:61","component":"kilo","err":"failed to read IPAM config from CNI config list file: no IP ranges specified","level":"warn","msg":"failed to get CIDR from CNI file; overwriting it","ts":"2022-05-05T18:09:35.737697765Z"}
{"caller":"cni.go:69","component":"kilo","level":"info","msg":"CIDR in CNI file is empty","ts":"2022-05-05T18:09:35.73777305Z"}
{"CIDR":"10.244.1.0/24","caller":"cni.go:74","component":"kilo","level":"info","msg":"setting CIDR in CNI file","ts":"2022-05-05T18:09:35.737800051Z"}
{"caller":"mesh.go:309","component":"kilo","event":"add","level":"info","node":{"Endpoint":{},"Key":[8,2.....8],"NoInternalIP":false,"InternalIP":{"IP":"172.31.0.114","Mask":"////AA=="},"LastSeen":1651774147,"Leader":true,"Location":"aws","Name":"ip-......compute.internal","PersistentKeepalive":0,"Subnet":{"IP":"10.244.0.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.1","Mask":"//8AAA=="},"DiscoveredEndpoints":{},"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T18:09:35.855577419Z"}
{"caller":"mesh.go:375","component":"kilo","level":"info","msg":"overriding endpoint","new endpoint":"laptop.vm.internal.ip:51820","node":"fedora","old endpoint":"","ts":"2022-05-05T18:09:35.856321777Z"}
{"caller":"mesh.go:544","component":"kilo","diff":"port: old='\\x00', new=\"0xc000560020\"","level":"info","msg":"WireGuard configurations are different","ts":"2022-05-05T18:09:35.925980935Z"}
{"caller":"mesh.go:309","component":"kilo","event":"update","level":"info","node":{"Endpoint":{},"Key":[8,2.....8],"NoInternalIP":false,"InternalIP":{"IP":"ec2.instance.internal.ip","Mask":"////AA=="},"LastSeen":1651774327,"Leader":true,"Location":"aws","Name":"private.ip.hostname.compute.internal","PersistentKeepalive":0,"Subnet":{"IP":"10.244.0.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.1","Mask":"//8AAA=="},"DiscoveredEndpoints":{"s59...0I=":{"IP":"laptop.router.public.ip","Port":52382,"Zone":""}},"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T18:12:07.756161466Z"}
{"caller":"main.go:299","msg":"caught interrupt; gracefully cleaning up; see you next time!","ts":"2022-05-05T18:13:31.193565763Z"}

Here are the events from the pod:

Events:
  Type     Reason          Age                      From     Message
  ----     ------          ----                     ----     -------
  Warning  BackOff         4h22m (x170 over 6h15m)  kubelet  Back-off restarting failed container
  Normal   Pulled          4h12m (x24 over 6h17m)   kubelet  Container image "squat/kilo:0.5.0" already present on machine
  Normal   Killing         4h2m (x24 over 6h16m)    kubelet  Stopping container kilo
  Normal   SandboxChanged  4h2m (x24 over 6h16m)    kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Killing         3h59m                    kubelet  Stopping container kilo
  Normal   SandboxChanged  3h59m                    kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          3h59m                    kubelet  Container image "squat/kilo:0.5.0" already present on machine
  Normal   Created         3h59m                    kubelet  Created container install-cni
  Normal   Started         3h59m                    kubelet  Started container install-cni
  Normal   Pulled          3h59m                    kubelet  Container image "squat/kilo:0.5.0" already present on machine
  Normal   Created         3h59m                    kubelet  Created container kilo
  Normal   Started         3h59m                    kubelet  Started container kilo
  Normal   Created         3h46m                    kubelet  Created container install-cni
  Normal   Created         3h46m                    kubelet  Created container kilo
  Normal   Pulled          3h46m                    kubelet  Container image "squat/kilo:0.5.0" already present on machine
  Normal   Started         3h46m                    kubelet  Started container install-cni
  Normal   Started         3h46m                    kubelet  Started container kilo
  Normal   Pulled          3h46m                    kubelet  Container image "squat/kilo:0.5.0" already present on machine
  Normal   Pulled          3m49s (x2 over 3h48m)    kubelet  Container image "squat/kilo:0.5.0" already present on machine
  Normal   Started         3m49s (x2 over 3h48m)    kubelet  Started container install-cni
  Normal   Created         3m49s (x2 over 3h48m)    kubelet  Created container install-cni
  Normal   SandboxChanged  3m49s (x2 over 3h48m)    kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Killing         3m49s                    kubelet  Stopping container kilo
  Normal   Pulled          3m48s (x2 over 3h48m)    kubelet  Container image "squat/kilo:0.5.0" already present on machine
  Normal   Created         3m48s (x2 over 7m45s)    kubelet  Created container kilo
  Normal   Started         3m48s (x2 over 7m45s)    kubelet  Started container kilo

idcmp · 2022-05-05T18:58:04Z

Okay, I'm still experiencing wireguard dropouts, (as described in Situation), but both nodes think they're Ready now. I restarted the laptop VM so possibly there was some junk network configuration lingering previously. The kilo pod on the laptop VM still just gracefully cleans up and exits 4 minutes after the last event update it prints out.

squat · 2022-05-05T20:04:52Z

Thanks @idcmp those logs are really helpful :)
It's quite funny that Kilo is getting terminated. It seems that Kubernetes is shutting it down for some reason. Did you by chance add liveness checks to the daemonset?

squat · 2022-05-05T20:06:38Z

Ah, it sounds like maybe it's related to this:

Pod sandbox changed, it will be killed and re-created.

squat · 2022-05-05T20:09:54Z

I'll try testing on k8s v1.24 (and updating our ewe tests). So far our tests have been on v1.23. I wonder if this has caused an issue.

squat · 2022-05-05T20:20:29Z

I wanted to add that this issue sounds to me like it is a bit more fundamental than WireGuard/Kilo/networking. It seems like something is breaking with the container runtime interrupting Kilo. Once we figure out why Kilo is being killed prematurely, we can investigate any networking issues. In other words, there is no reason why Kubernetes should kill the Kilo Pod just because the network (and thus the node) isn't yet read.

Could you share any details about the container runtime? Are there any differences between the VM's runtime and the EC2 node?

idcmp · 2022-05-05T20:58:40Z

It's quite funny that Kilo is getting terminated. It seems that Kubernetes is shutting it down for some reason. Did you by chance add liveness checks to the daemonset?

Nope. No changes from what was fetched.

The EC2 instance is running Linux 5.10.109-104.500.amzn2.x86_64 with containerd 1.4.13, the laptop VM (which as you've probably guessed runs Fedora), runs Linux 5.17.5-200.fc35.x86_64 and containerd 1.6.2 in VirtualBox (as spun up by Vagrant).

I really think the problem relates to the EC2 instance having an Endpoint= in its kilo0 config for the laptop VM Peer, so when rekeying happens it tries to connect to the laptop (which it can't, because it's behind at least two NATs).

squat · 2022-05-05T21:07:05Z

I really think the problem relates to the EC2 instance having an Endpoint= in its kilo0 config for the laptop VM Peer, so when rekeying happens it tries to connect to the laptop (which it can't, because it's behind at least two NATs).

This could be part of the explanation why networking isn't working, however wouldn't explain why the Kilo Pod is getting killed and the node is unready.

idcmp · 2022-05-05T21:27:54Z

Added --log-level=debug to the kilo daemonset, will post an update when it dies.

idcmp · 2022-05-05T22:00:05Z

Ran chronyc first so both machines are within a second of each other:

Crashing kilo:

{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:37:56.104998312Z"}
{"DiscoveredEndpoints":{},"caller":"mesh.go:817","component":"kilo","level":"debug","msg":"Discovered WireGuard NAT Endpoints","ts":"2022-05-05T21:38:13.127679088Z"}
{"allowedIPs":[{"IP":"10.244.0.0","Mask":"////AA=="},{"IP":"ec2.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.0.0","Mask":"////AA=="}],"component":"kilo","endpoint":"ec2.public.ip:51820","hostnames":["ip-.compute.internal"],"leader":0,"level":"debug","location":"location:aws","msg":"generated segment","privateIPs":["ec2.private.ip"],"ts":"2022-05-05T21:38:13.127713095Z"}
{"allowedIPs":[{"IP":"10.244.1.0","Mask":"////AA=="},{"IP":"laptop.vm.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.1.0","Mask":"////AA=="}],"component":"kilo","endpoint":"laptop.vm.private.ip:51820","hostnames":["fedora"],"leader":0,"level":"debug","location":"location:laptop","msg":"generated segment","privateIPs":["laptop.vm.private.ip"],"ts":"2022-05-05T21:38:13.127785181Z"}
{"caller":"topology.go:239","component":"kilo","hostname":"fedora","leader":true,"level":"debug","location":"location:laptop","msg":"generated topology","privateIP":"laptop.vm.private.ip/24","subnet":"10.244.1.0/24","ts":"2022-05-05T21:38:13.127804122Z","wireGuardIP":"10.4.0.2/16"}
{"caller":"mesh.go:368","component":"kilo","level":"debug","msg":"successfully checked in local node in backend","ts":"2022-05-05T21:38:13.441011818Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:38:13.451682025Z"}
{"caller":"mesh.go:282","component":"kilo","event":"update","level":"debug","msg":"processing local node","node":{"Endpoint":{},"Key":[17....6],"NoInternalIP":false,"InternalIP":{"IP":"laptop.private.ip","Mask":"////AA=="},"LastSeen":1651786693,"Leader":false,"Location":"laptop","Name":"fedora","PersistentKeepalive":10000000000,"Subnet":{"IP":"10.244.1.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.2","Mask":"//8AAA=="},"DiscoveredEndpoints":{},"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T21:38:13.45170901Z"}
{"caller":"main.go:299","msg":"caught interrupt; gracefully cleaning up; see you next time!","ts":"2022-05-05T21:38:16.194754485Z"}

Non-crashing kilo:

{"caller":"mesh.go:368","component":"kilo","level":"debug","msg":"successfully checked in local node in backend","ts":"2022-05-05T21:37:56.092114638Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:37:56.094154858Z"}
{"caller":"mesh.go:282","component":"kilo","event":"update","level":"debug","msg":"processing local node","node":{"Endpoint":{},"Key":[8,...8],"NoInternalIP":false,"InternalIP":{"IP":"ec2.private.ip","Mask":"////AA=="},"LastSeen":1651786676,"Leader":true,"Location":"aws","Name":"ip-.compute.internal","PersistentKeepalive":0,"Subnet":{"IP":"10.244.0.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.1","Mask":"//8AAA=="},"DiscoveredEndpoints":{"s59IE....nlmbK0I=":{"IP":"216.180.70.97","Port":52382,"Zone":""}},"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T21:37:56.09426365Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:38:13.42907842Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:38:16.250928634Z"}
{"caller":"mesh.go:291","component":"kilo","event":"update","in-mesh":true,"level":"debug","msg":"received non ready node","node":{"Endpoint":null,"Key":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"NoInternalIP":false,"InternalIP":null,"LastSeen":0,"Leader":false,"Location":"laptop","Name":"fedora","PersistentKeepalive":10000000000,"Subnet":{"IP":"10.244.1.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":""},"ts":"2022-05-05T21:38:16.250977227Z"}
{"caller":"mesh.go:309","component":"kilo","event":"update","level":"info","node":{"Endpoint":null,"Key":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"NoInternalIP":false,"InternalIP":null,"LastSeen":0,"Leader":false,"Location":"laptop","Name":"fedora","PersistentKeepalive":10000000000,"Subnet":{"IP":"10.244.1.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":""},"ts":"2022-05-05T21:38:16.251872892Z"}
{"DiscoveredEndpoints":{},"caller":"mesh.go:817","component":"kilo","level":"debug","msg":"Discovered WireGuard NAT Endpoints","ts":"2022-05-05T21:38:16.25321525Z"}
{"allowedIPs":[{"IP":"10.244.0.0","Mask":"////AA=="},{"IP":"ec2.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.0.0","Mask":"////AA=="}],"component":"kilo","endpoint":"ec2.public.ip:51820","hostnames":["ip-.compute.internal"],"leader":0,"level":"debug","location":"location:aws","msg":"generated segment","privateIPs":["ec2.private.ip"],"ts":"2022-05-05T21:38:16.253256396Z"}
{"caller":"topology.go:239","component":"kilo","hostname":"ip-.compute.internal","leader":true,"level":"debug","location":"location:aws","msg":"generated topology","privateIP":"ec2.private.ip/24","subnet":"10.244.0.0/24","ts":"2022-05-05T21:38:16.253392438Z","wireGuardIP":"10.4.0.1/16"}
{"caller":"mesh.go:544","component":"kilo","diff":"number of peers: old=1, new=0","level":"info","msg":"WireGuard configurations are different","ts":"2022-05-05T21:38:16.290128511Z"}
{"caller":"mesh.go:545","component":"kilo","config":{"PrivateKey":[11...0],"ListenPort":51820,"FirewallMark":null,"ReplacePeers":true,"Peers":[]},"level":"debug","msg":"changing wg config","ts":"2022-05-05T21:38:16.290180001Z"}
{"DiscoveredEndpoints":{},"caller":"mesh.go:817","component":"kilo","level":"debug","msg":"Discovered WireGuard NAT Endpoints","ts":"2022-05-05T21:38:26.069421992Z"}
{"allowedIPs":[{"IP":"10.244.0.0","Mask":"////AA=="},{"IP":"ec2.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.0.0","Mask":"////AA=="}],"component":"kilo","endpoint":"ec2.public.ip:51820","hostnames":["ip-.compute.internal"],"leader":0,"level":"debug","location":"location:aws","msg":"generated segment","privateIPs":["ec2.private.ip"],"ts":"2022-05-05T21:38:26.069532304Z"}
{"caller":"topology.go:239","component":"kilo","hostname":"ip-.compute.internal","leader":true,"level":"debug","location":"location:aws","msg":"generated topology","privateIP":"ec2.private.ip/24","subnet":"10.244.0.0/24","ts":"2022-05-05T21:38:26.070919063Z","wireGuardIP":"10.4.0.1/16"}
{"caller":"mesh.go:368","component":"kilo","level":"debug","msg":"successfully checked in local node in backend","ts":"2022-05-05T21:38:26.104728892Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:38:26.104780613Z"}
{"caller":"mesh.go:282","component":"kilo","event":"update","level":"debug","msg":"processing local node","node":{"Endpoint":{},"Key":[8....8],"NoInternalIP":false,"InternalIP":{"IP":"ec2.private.ip","Mask":"////AA=="},"LastSeen":1651786706,"Leader":true,"Location":"aws","Name":"ip-.compute.internal","PersistentKeepalive":0,"Subnet":{"IP":"10.244.0.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.1","Mask":"//8AAA=="},"DiscoveredEndpoints":{},"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T21:38:26.104888963Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:38:32.565002006Z"}
{"caller":"mesh.go:309","component":"kilo","event":"update","level":"info","node":{"Endpoint":{},"Key":[17....6],"NoInternalIP":false,"InternalIP":{"IP":"laptop.vm.private.ip","Mask":"////AA=="},"LastSeen":1651786712,"Leader":false,"Location":"laptop","Name":"fedora","PersistentKeepalive":10000000000,"Subnet":{"IP":"10.244.1.0","Mask":"////AA=="},"WireGuardIP":null,"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T21:38:32.565059163Z"}
{"DiscoveredEndpoints":{},"caller":"mesh.go:817","component":"kilo","level":"debug","msg":"Discovered WireGuard NAT Endpoints","ts":"2022-05-05T21:38:32.56638394Z"}
{"allowedIPs":[{"IP":"10.244.0.0","Mask":"////AA=="},{"IP":"ec2.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.0.0","Mask":"////AA=="}],"component":"kilo","endpoint":"ec2.public.ip:51820","hostnames":["ip-.compute.internal"],"leader":0,"level":"debug","location":"location:aws","msg":"generated segment","privateIPs":["ec2.private.ip"],"ts":"2022-05-05T21:38:32.566422323Z"}
{"allowedIPs":[{"IP":"10.244.1.0","Mask":"////AA=="},{"IP":"laptop.vm.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.1.0","Mask":"////AA=="}],"component":"kilo","endpoint":"laptop.vm.private.ip:51820","hostnames":["fedora"],"leader":0,"level":"debug","location":"location:laptop","msg":"generated segment","privateIPs":["laptop.vm.private.ip"],"ts":"2022-05-05T21:38:32.566456917Z"}
{"caller":"topology.go:239","component":"kilo","hostname":"ip-.compute.internal","leader":true,"level":"debug","location":"location:aws","msg":"generated topology","privateIP":"ec2.private.ip/24","subnet":"10.244.0.0/24","ts":"2022-05-05T21:38:32.567166717Z","wireGuardIP":"10.4.0.1/16"}
{"caller":"mesh.go:544","component":"kilo","diff":"number of peers: old=0, new=1","level":"info","msg":"WireGuard configurations are different","ts":"2022-05-05T21:38:32.613991981Z"}
{"caller":"mesh.go:545","component":"kilo","config":{"PrivateKey":[1...0],"ListenPort":51820,"FirewallMark":null,"ReplacePeers":true,"Peers":[{"PublicKey":[17....6],"Remove":false,"UpdateOnly":false,"PresharedKey":null,"Endpoint":{"IP":"laptop.vm.private.ip","Port":51820,"Zone":""},"PersistentKeepaliveInterval":0,"ReplaceAllowedIPs":true,"AllowedIPs":[{"IP":"10.244.1.0","Mask":"////AA=="},{"IP":"laptop.vm.private.ip","Mask":"/////w=="},{"IP":"10.4.0.2","Mask":"/////w=="}]}]},"level":"debug","msg":"changing wg config","ts":"2022-05-05T21:38:32.614048003Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:38:32.635629998Z"}
{"caller":"mesh.go:309","component":"kilo","event":"update","level":"info","node":{"Endpoint":{},"Key":[179....6],"NoInternalIP":false,"InternalIP":{"IP":"laptop.vm.private.ip","Mask":"////AA=="},"LastSeen":1651786712,"Leader":false,"Location":"laptop","Name":"fedora","PersistentKeepalive":10000000000,"Subnet":{"IP":"10.244.1.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.2","Mask":"//8AAA=="},"DiscoveredEndpoints":null,"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T21:38:32.635677001Z"}
{"caller":"mesh.go:803","component":"kilo","endpoint":"laptop.vm.private.ip:51820","former-endpoint":"laptop.vm.private.ip:51820","latest-handshake":"0001-01-01T00:00:00Z","level":"debug","msg":"WireGuard Update NAT Endpoint","node":"fedora","same":true,"ts":"2022-05-05T21:38:32.636412618Z"}
{"DiscoveredEndpoints":{},"caller":"mesh.go:817","component":"kilo","level":"debug","msg":"Discovered WireGuard NAT Endpoints","ts":"2022-05-05T21:38:32.636451405Z"}
{"allowedIPs":[{"IP":"10.244.0.0","Mask":"////AA=="},{"IP":"ec2.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.0.0","Mask":"////AA=="}],"component":"kilo","endpoint":"ec2.public.ip:51820","hostnames":["ip-.compute.internal"],"leader":0,"level":"debug","location":"location:aws","msg":"generated segment","privateIPs":["ec2.private.ip"],"ts":"2022-05-05T21:38:32.636707965Z"}
{"allowedIPs":[{"IP":"10.244.1.0","Mask":"////AA=="},{"IP":"laptop.vm.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.1.0","Mask":"////AA=="}],"component":"kilo","endpoint":"laptop.vm.private.ip:51820","hostnames":["fedora"],"leader":0,"level":"debug","location":"location:laptop","msg":"generated segment","privateIPs":["laptop.vm.private.ip"],"ts":"2022-05-05T21:38:32.636748256Z"}
{"caller":"topology.go:239","component":"kilo","hostname":"ip-.compute.internal","leader":true,"level":"debug","location":"location:aws","msg":"generated topology","privateIP":"ec2-private-ip/24","subnet":"10.244.0.0/24","ts":"2022-05-05T21:38:32.63677386Z","wireGuardIP":"10.4.0.1/16"}
{"caller":"mesh.go:803","component":"kilo","endpoint":"laptop.PUBLIC.IP.ADDRESS:52382","former-endpoint":"laptop.private.ip:51820","latest-handshake":"2022-05-05T21:38:37.002033221Z","level":"debug","msg":"WireGuard Update NAT Endpoint","node":"fedora","same":false,"ts":"2022-05-05T21:38:56.084826409Z"}
{"DiscoveredEndpoints":{"s59IEyyKc0YEhhi6Q5h1Rey8YZ8RNnZSbOZZnlmbK0I=":{"IP":"216.180.70.97","Port":52382,"Zone":""}},"caller":"mesh.go:817","component":"kilo","level":"debug","msg":"Discovered WireGuard NAT Endpoints","ts":"2022-05-05T21:38:56.084952079Z"}
{"allowedIPs":[{"IP":"10.244.1.0","Mask":"////AA=="},{"IP":"laptop.vm.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.1.0","Mask":"////AA=="}],"component":"kilo","endpoint":"laptop.vm.private.ip:51820","hostnames":["fedora"],"leader":0,"level":"debug","location":"location:laptop","msg":"generated segment","privateIPs":["laptop.vm.private.ip"],"ts":"2022-05-05T21:38:56.085033834Z"}
{"allowedIPs":[{"IP":"10.244.0.0","Mask":"////AA=="},{"IP":"ec2.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.0.0","Mask":"////AA=="}],"component":"kilo","endpoint":"ec2.public.ip:51820","hostnames":["ip-.compute.internal"],"leader":0,"level":"debug","location":"location:aws","msg":"generated segment","privateIPs":["ec2.private.ip"],"ts":"2022-05-05T21:38:56.08650359Z"}
{"caller":"topology.go:239","component":"kilo","hostname":"ip-.compute.internal","leader":true,"level":"debug","location":"location:aws","msg":"generated topology","privateIP":"ec2.private.ip/24","subnet":"10.244.0.0/24","ts":"2022-05-05T21:38:56.087024217Z","wireGuardIP":"10.4.0.1/16"}
{"caller":"mesh.go:368","component":"kilo","level":"debug","msg":"successfully checked in local node in backend","ts":"2022-05-05T21:38:56.124735182Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:38:56.126058867Z"}
{"caller":"mesh.go:282","component":"kilo","event":"update","level":"debug","msg":"processing local node","node":{"Endpoint":{},"Key":[8...8],"NoInternalIP":false,"InternalIP":{"IP":"ec2.private.ip","Mask":"////AA=="},"LastSeen":1651786736,"Leader":true,"Location":"aws","Name":"ip-.compute.internal","PersistentKeepalive":0,"Subnet":{"IP":"10.244.0.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.1","Mask":"//8AAA=="},"DiscoveredEndpoints":{"s59IEyyKc0YEhhi6Q5h1Rey8YZ8RNnZSbOZZnlmbK0I=":{"IP":"216.180.70.97","Port":52382,"Zone":""}},"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T21:38:56.126092994Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:39:02.592118461Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:39:25.448517125Z"}
{"caller":"mesh.go:282","component":"kilo","event":"update","level":"debug","msg":"processing local node","node":{"Endpoint":{},"Key":[8...8],"NoInternalIP":false,"InternalIP":{"IP":"ec2.private.ip","Mask":"////AA=="},"LastSeen":1651786736,"Leader":true,"Location":"aws","Name":"ip-.compute.internal","PersistentKeepalive":0,"Subnet":{"IP":"10.244.0.0","Mask":"////AA=="},"WireGuardIP":{"IP":"10.4.0.1","Mask":"//8AAA=="},"DiscoveredEndpoints":{"s59IEyyKc0YEhhi6Q5h1Rey8YZ8RNnZSbOZZnlmbK0I=":{"IP":"216.180.70.97","Port":52382,"Zone":""}},"AllowedLocationIPs":null,"Granularity":"location"},"ts":"2022-05-05T21:39:25.448677271Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:39:25.448904043Z"}
{"caller":"mesh.go:803","component":"kilo","endpoint":"216.180.70.97:52382","former-endpoint":"laptop.vm.private.ip:51820","latest-handshake":"2022-05-05T21:38:37.002033221Z","level":"debug","msg":"WireGuard Update NAT Endpoint","node":"fedora","same":false,"ts":"2022-05-05T21:39:26.099158903Z"}
{"DiscoveredEndpoints":{"s59IEyyKc0YEhhi6Q5h1Rey8YZ8RNnZSbOZZnlmbK0I=":{"IP":"216.180.70.97","Port":52382,"Zone":""}},"caller":"mesh.go:817","component":"kilo","level":"debug","msg":"Discovered WireGuard NAT Endpoints","ts":"2022-05-05T21:39:26.099273113Z"}
{"allowedIPs":[{"IP":"10.244.0.0","Mask":"////AA=="},{"IP":"ec2.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.0.0","Mask":"////AA=="}],"component":"kilo","endpoint":"ec2.public.ip:51820","hostnames":["ip-.compute.internal"],"leader":0,"level":"debug","location":"location:aws","msg":"generated segment","privateIPs":["ec2.private.ip"],"ts":"2022-05-05T21:39:26.100461957Z"}
{"allowedIPs":[{"IP":"10.244.1.0","Mask":"////AA=="},{"IP":"laptop.vm.private.ip","Mask":"/////w=="}],"allowedLocationIPs":null,"caller":"topology.go:191","cidrs":[{"IP":"10.244.1.0","Mask":"////AA=="}],"component":"kilo","endpoint":"laptop.vm.private.ip:51820","hostnames":["fedora"],"leader":0,"level":"debug","location":"location:laptop","msg":"generated segment","privateIPs":["laptop.vm.private.ip"],"ts":"2022-05-05T21:39:26.101160484Z"}
{"caller":"topology.go:239","component":"kilo","hostname":"ip-.compute.internal","leader":true,"level":"debug","location":"location:aws","msg":"generated topology","privateIP":"ec2.private.ip/24","subnet":"10.244.0.0/24","ts":"2022-05-05T21:39:26.102044874Z","wireGuardIP":"10.4.0.1/16"}
{"caller":"mesh.go:368","component":"kilo","level":"debug","msg":"successfully checked in local node in backend","ts":"2022-05-05T21:39:26.137148488Z"}
{"caller":"mesh.go:280","component":"kilo","event":"update","level":"debug","msg":"syncing nodes","ts":"2022-05-05T21:39:26.138095411Z"}

idcmp · 2022-05-12T21:15:23Z

@squat is there anything i can do to help?

stv0g · 2023-06-14T12:31:34Z

This is a duplicate of #189. I think we should close this issue.

idcmp closed this as completed Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node behind NAT always NotReady #314

Node behind NAT always NotReady #314

idcmp commented May 5, 2022

squat commented May 5, 2022

idcmp commented May 5, 2022

squat commented May 5, 2022

idcmp commented May 5, 2022

idcmp commented May 5, 2022

idcmp commented May 5, 2022

squat commented May 5, 2022

squat commented May 5, 2022

squat commented May 5, 2022

squat commented May 5, 2022

idcmp commented May 5, 2022

squat commented May 5, 2022

idcmp commented May 5, 2022

idcmp commented May 5, 2022

idcmp commented May 12, 2022

stv0g commented Jun 14, 2023

Node behind NAT always NotReady #314

Node behind NAT always NotReady #314

Comments

idcmp commented May 5, 2022

Setup

Situation

Description

squat commented May 5, 2022

idcmp commented May 5, 2022

squat commented May 5, 2022

idcmp commented May 5, 2022

kube-proxy configmap:

kilo configmap:

idcmp commented May 5, 2022

idcmp commented May 5, 2022

squat commented May 5, 2022

squat commented May 5, 2022

squat commented May 5, 2022

squat commented May 5, 2022

idcmp commented May 5, 2022

squat commented May 5, 2022

idcmp commented May 5, 2022

idcmp commented May 5, 2022

idcmp commented May 12, 2022

stv0g commented Jun 14, 2023