This repository has been archived by the owner on Feb 12, 2024. It is now read-only.
Provide remedy controller for replacing/repairing nodes with lost IP address in vCenter #180
Labels
area/robustness
Robustness, reliability, resilience related
kind/enhancement
Enhancement, improvement, extension
platform/vsphere
VMware vSphere platform/infrastructure
priority/3
Priority (lower number equals higher priority)
How to categorize this issue?
/area robustness
/kind enhancement
/priority 3
/platform vsphere
What would you like to be added:
Deal somehow with nodes loosing their IP address in vCenter and as a consequence in the Kubernetes node object status.
Either restart such nodes, find some way to repair them, or at least move the
calico-typha-deploy-...
pod to another node.This seems to be a task for a remedy controller.
The solution to resolve the root cause would probably be a fix in vSphere/vCenter, but it is unclear how long we have to wait for that.
Why is this needed:
Sporadically worker node loose their IP address in vCenter. In such a situation the cloud-controller-manager cannot provide the IP address for the Kubernetes node object anymore.
Good case:
Bad case:
In this case pods in the kube-system namespace which are running on the host network of the node are loosing their IP address. The node itself has still the IP address assigned by DHCP.
This issue can break any cluster if a node with the
calico-typha-deploy-...
pod is loosing its IP address in the node object and as a consequence no calico-node pod starts successfully anymore.The text was updated successfully, but these errors were encountered: