-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
Description
We run a Kubernetes cluster with calico and sporadically the cluster goes down (~once a month). The calico pod on one of
the nodes goes into perpetual CrashLoopBackOff state and the node's loopback interface is down (can't even ping itself).
One thing to note is that we run around ~200 pods in that node.
Expected Behavior
The cluster should be stable.
Current Behavior
ip address show lo output on the node
lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state DOWN group default qlen 1000
calicoctl node status on the node
$ sudo calicoctl node status
Calico process is not running.
Possible Solution
What we do to fix this is to run ip link set dev lo up manually enabling the loopback interface and restarting the pod.
Steps to Reproduce (for bugs)
The issue pops up sporadically.
Context
We have a guess that this has probably something to do with us running more than the limit of 110 pods on a node.
Your Environment
- Calico version 3.17.0
- Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes 1.19.4
- Operating System and version: Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-126-generic x86_64)
- Link to your project (optional): https://github.com/gesiscss/orc
clementnuss and cl-bvlbitnik, arnim, shashidharatd and hrr888