New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes unavailable during bootkube start when --cloud-provider=gce #463

Open
dghubble opened this Issue Apr 24, 2017 · 10 comments

Comments

Projects
None yet
5 participants
@dghubble
Collaborator

dghubble commented Apr 24, 2017

Launching a GCE cluster with on-host Kubeletes with --cloud-provider=gce and creating assets with bootkube v0.4.0 and the --cloud-provider=gce render flag, bootkube start does not complete.

NAMESPACE     NAME                                                                        READY     STATUS             RESTARTS   AGE
kube-system   bootstrap-kube-apiserver-jakku-controller.c.dghubble-io.internal            1/1       Running            0          14m
kube-system   bootstrap-kube-controller-manager-jakku-controller.c.dghubble-io.internal   1/1       Running            0          13m
kube-system   bootstrap-kube-scheduler-jakku-controller.c.dghubble-io.internal            1/1       Running            0          14m
kube-system   kube-apiserver-9r5d4                                                        0/1       CrashLoopBackOff   10         14m
kube-system   kube-controller-manager-515890209-29xdg                                     0/1       Pending            0          14m
kube-system   kube-controller-manager-515890209-g08rc                                     0/1       Pending            0          14m
kube-system   kube-dns-2891576090-4q6pk                                                   0/3       Pending            0          14m
kube-system   kube-flannel-77r0p                                                          2/2       Running            0          14m
kube-system   kube-flannel-vn3wt                                                          2/2       Running            1          14m
kube-system   kube-proxy-j8ftt                                                            1/1       Running            0          14m
kube-system   kube-proxy-xnhbs                                                            1/1       Running            0          14m
kube-system   kube-scheduler-2387324182-7xvqb                                             0/1       Pending            0          14m
kube-system   kube-scheduler-2387324182-tfwkk                                             0/1       Pending            0          14m
kube-system   pod-checkpointer-f73x6                                                      1/1       Running            0          14m
kube-system   pod-checkpointer-f73x6-jakku-controller.c.dghubble-io.internal              1/1       Running            0          14m

bootstrap controller manager:

NodeController detected that all Nodes are not-Ready. Entering master disruption mode.
I0424 06:30:59.560436       1 taint_controller.go:180] Starting NoExecuteTaintManager
...
NodeController detected that some Nodes are Ready. Exiting master disruption mode.

Pending control plane components list no nodes as the reason they cannot be scheduled, though nodes were registered successfully according to apiserver logs

bootstrap scheduler:

FailedScheduling        no nodes available to schedule pods

Editing the on-host Kubelet to remove the provider and restarting the Kubelet immediately allows Setting Pod CIDR in kubelet logs, the bootstrap-scheduler sees the node, and Pending pods can be scheduled to complete bootstrapping. That provides a clue.

I've not yet found kubelet or apiserver logs that are clearly causal.

Maybe Solution

I've tried various things and I was able to solve this by adding --cloud-provider=gce on the bootstrap-apiserver and bootstrap-controller-manager (its not set) and by setting --configure-cloud-route=true on the bootstrap controller manager (along with --allocate-node-cidrs=true, already set). This is not altogether satisfying since "aws" supposedly works without this, which is puzzling.

Similar

Seems similar to the symptoms in kubernetes/kubernetes#44254, though the setup there is not self-hosted.

Versions

  • Hyperkube v1.6.1_coreos.0
  • bootkube v0.4.0 (both render and start)
  • Container Linux stable 1298.7.0
@aaronlevy

This comment has been minimized.

Member

aaronlevy commented Apr 24, 2017

The behavior in v1.6 changed slightly where a node now reports as "not ready" if a network is unavailable, so they're registering - but not able to accept scheduled work. Our assumption is that flannel will eventually be deployed as a daemonset, which will be assigned to the nodes (not assigned by scheduler), then deploy the CNI config, then the node should start reporting as ready (and accept scheduled work).

So there might be something in that process that is not occurring (e.g. is flannel being started, why does the node continue to think it's not ready).

I mentioned in #464 -- but we should not be using the --configure-cloud-routes=true because we assume a CNI plugin will be responsible for this.

@dghubble

This comment has been minimized.

Collaborator

dghubble commented Apr 24, 2017

Ok, --configure-cloud-routes=false being important sounds familiar. Flannel pods are running (though there logs may be interesting) and I believe the kubelet logs included some lines about network ready true.

@aaronlevy

This comment has been minimized.

Member

aaronlevy commented Jun 22, 2017

@squeed

This comment has been minimized.

squeed commented Jun 22, 2017

Indeed - if you're thinking to yourself "it worked on AWS, dammit!?!?" - that's the reason.

@colemickens

This comment has been minimized.

Contributor

colemickens commented Oct 17, 2017

@aaronlevy Any ideas why that code upstream doesn't just check the value of configure-cloud-routes instead of inferring based on off the cloudprovider's support of the Routes interface?

@aaronlevy

This comment has been minimized.

Member

aaronlevy commented Oct 17, 2017

@colemickens I don't think there is any canonical place to check the value of configure-cloud-routes. In case of bootkube it's a self-hosted cluster so I guess you could hypothetically read the flags of the controller-manager. But there's no API (to my knowledge) for "how is this component configured", and the controller-manager could be running as a binary on a random host.

@enxebre

This comment has been minimized.

enxebre commented Oct 18, 2017

@aaronlevy @colemickens
At the moment because of https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_network.go#L85
The following problematic scenario happens when running the Kubelet with --cloud-provider=gce:
The node always set NetworkUnavailable property to true.
This flag is only cleared by the cloud-provider route controller which is run when the flag --configure-cloud-routes=true is set in the controller manager (default).

At this point:
If you run network-plugin=kubenet which requires cloud routes (so you must have set --configure-cloud-routes=true ), NetworkUnavailable will be cleared when the routes are created. You are good.
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/route/route_controller.go?utf8=%E2%9C%93#L134
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/route/route_controller.go?utf8=%E2%9C%93#L227

if you run --network-plugin=cni then NetworkUnavailable depends on your cni implementation (or something else) to clean it (None of them do this). Nodes are unavailable for ever (even if your pod to pod network is actually functional)

Also the (kubelet) nodeName which is the hostname and the gcp machine name must be the same otherwise the routes keep being recreated in loop.
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce_routes.go?utf8=%E2%9C%93#L63

I managed to deploy a Kubernetes self hosted using bootkube with all components using cloud-provider=gce by using kubenet and --configure-cloud-routes=true. This should probably be solved as part of the cloud provider refactor out of the main kubernetes tree I guess kubernetes/kubernetes#50811

Also not sure why this one got closed kubernetes/kubernetes#34398

@aaronlevy

This comment has been minimized.

Member

aaronlevy commented Oct 18, 2017

For GCE you don't really need a network overlay - GCE networking supports the model that kubernetes expects - so it's probably most ideal to use --configure-cloud-routes=true and --network-plugin=kubenet.

But agree - this could be better handled upstream.

@squeed

This comment has been minimized.

squeed commented Nov 2, 2017

As long as you don't hit your 300-route limit (or whatever your quota is)...
...
yeah.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment