Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Nodes unavailable during bootkube start when --cloud-provider=gce #463
Launching a GCE cluster with on-host Kubeletes with
bootstrap controller manager:
Pending control plane components list no nodes as the reason they cannot be scheduled, though nodes were registered successfully according to apiserver logs
Editing the on-host Kubelet to remove the provider and restarting the Kubelet immediately allows Setting Pod CIDR in kubelet logs, the bootstrap-scheduler sees the node, and Pending pods can be scheduled to complete bootstrapping. That provides a clue.
I've not yet found kubelet or apiserver logs that are clearly causal.
I've tried various things and I was able to solve this by adding
Seems similar to the symptoms in kubernetes/kubernetes#44254, though the setup there is not self-hosted.
referenced this issue
Apr 24, 2017
The behavior in v1.6 changed slightly where a node now reports as "not ready" if a network is unavailable, so they're registering - but not able to accept scheduled work. Our assumption is that flannel will eventually be deployed as a daemonset, which will be assigned to the nodes (not assigned by scheduler), then deploy the CNI config, then the node should start reporting as ready (and accept scheduled work).
So there might be something in that process that is not occurring (e.g. is flannel being started, why does the node continue to think it's not ready).
I mentioned in #464 -- but we should not be using the
@colemickens I don't think there is any canonical place to check the value of
At this point:
if you run --network-plugin=cni then NetworkUnavailable depends on your cni implementation (or something else) to clean it (None of them do this). Nodes are unavailable for ever (even if your pod to pod network is actually functional)
Also the (kubelet) nodeName which is the hostname and the gcp machine name must be the same otherwise the routes keep being recreated in loop.
I managed to deploy a Kubernetes self hosted using bootkube with all components using cloud-provider=gce by using kubenet and --configure-cloud-routes=true. This should probably be solved as part of the cloud provider refactor out of the main kubernetes tree I guess kubernetes/kubernetes#50811
Also not sure why this one got closed kubernetes/kubernetes#34398