kubeadm 1.6.0 (only 1.6.0!!) is broken due to unconfigured CNI making kubelet NotReady #43815

Closed
jbeda opened this Issue Mar 29, 2017 · 211 comments

Comments

Projects
None yet
@jbeda
Contributor

jbeda commented Mar 29, 2017

Initial report in kubernetes/kubeadm#212.

I suspect that this was introduced in #43474.

What is going on (all on single master):

  1. kubeadm starts configures a kubelet and uses static pods to configure a control plane
  2. kubeadm creates node object and waits for kubelet to join and be ready
  3. kubelet is never ready and so kubeadm waits forever

In the conditions list for the node:

  Ready 		False 	Wed, 29 Mar 2017 15:54:04 +0000 	Wed, 29 Mar 2017 15:32:33 +0000 	KubeletNotReady 		runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Previous behavior was for the kubelet to join the cluster even with unconfigured CNI. The user will then typically run a DaemonSet with host networking to bootstrap CNI on all nodes. The fact that the node never joins means that, fundamentally, DaemonSets cannot be used to bootstrap CNI.

Edit by @mikedanese: please test patched debian amd64 kubeadm #43815 (comment) with fix

@jbeda

This comment has been minimized.

Show comment
Hide comment
Contributor

jbeda commented Mar 29, 2017

@jbeda

This comment has been minimized.

Show comment
Hide comment
@jbeda

jbeda Mar 29, 2017

Contributor

If we revert #43474 completely, we are in a situation again where we break 0.2.0 CNI plugins (see #43014)

Should we consider doing something like #43284?

Also /cc @thockin

Contributor

jbeda commented Mar 29, 2017

If we revert #43474 completely, we are in a situation again where we break 0.2.0 CNI plugins (see #43014)

Should we consider doing something like #43284?

Also /cc @thockin

@jbeda

This comment has been minimized.

Show comment
Hide comment
Contributor

jbeda commented Mar 29, 2017

@yujuhong

This comment has been minimized.

Show comment
Hide comment
Contributor

yujuhong commented Mar 29, 2017

@dcbw

This comment has been minimized.

Show comment
Hide comment
@dcbw

dcbw Mar 29, 2017

Member

@jbeda can I get some kubelet logs with --loglevel=5?

Member

dcbw commented Mar 29, 2017

@jbeda can I get some kubelet logs with --loglevel=5?

@jbeda

This comment has been minimized.

Show comment
Hide comment
@jbeda

jbeda Mar 29, 2017

Contributor

@yujuhong -- you mention that you think that this is working as intended. Regardless, kubeadm was depending on this behavior. We introduced a breaking change with #43474. We can talk about the right way to fix this for 1.7 but, for now, we need to get kubeadm working again.

Contributor

jbeda commented Mar 29, 2017

@yujuhong -- you mention that you think that this is working as intended. Regardless, kubeadm was depending on this behavior. We introduced a breaking change with #43474. We can talk about the right way to fix this for 1.7 but, for now, we need to get kubeadm working again.

@jbeda

This comment has been minimized.

Show comment
Hide comment
Contributor

jbeda commented Mar 29, 2017

@jbeda

This comment has been minimized.

Show comment
Hide comment
@jbeda

jbeda Mar 29, 2017

Contributor

It looks like DaemonSets will still get scheduled even if the node is not ready. This is really, in this case, kubeadm being a little too paranoid.

The current plan that we are going to test out is to have kubeadm no longer wait for the master node to be ready but instead just have it be registered. This should be good enough to let a CNI DaemonSet be scheduled to set up CNI.

@kensimon is testing this out.

Contributor

jbeda commented Mar 29, 2017

It looks like DaemonSets will still get scheduled even if the node is not ready. This is really, in this case, kubeadm being a little too paranoid.

The current plan that we are going to test out is to have kubeadm no longer wait for the master node to be ready but instead just have it be registered. This should be good enough to let a CNI DaemonSet be scheduled to set up CNI.

@kensimon is testing this out.

@mausch mausch referenced this issue in kubernetes/kubeadm Mar 29, 2017

Closed

kubeadm 1.5.6 package depends on kubelet 1.6.0 #213

@dcbw

This comment has been minimized.

Show comment
Hide comment
@dcbw

dcbw Mar 29, 2017

Member

@jbeda yeah, looks like the DaemonSet controller will still enqueue them mainly because it's completely ignorant of network-iness. We should really fix this more generally. Is there anything immediate to do in kube or is it all in kubeadm for now?

Member

dcbw commented Mar 29, 2017

@jbeda yeah, looks like the DaemonSet controller will still enqueue them mainly because it's completely ignorant of network-iness. We should really fix this more generally. Is there anything immediate to do in kube or is it all in kubeadm for now?

@luhkevin

This comment has been minimized.

Show comment
Hide comment
@luhkevin

luhkevin Mar 29, 2017

I'm trying to install kubernetes with kubeadm on Ubuntu 16.04. Is there a quick fix for this?

I'm trying to install kubernetes with kubeadm on Ubuntu 16.04. Is there a quick fix for this?

@stevenbower

This comment has been minimized.

Show comment
Hide comment
@stevenbower

stevenbower Mar 29, 2017

@jbeda if you have a patched version happy to test it..

@jbeda if you have a patched version happy to test it..

@kensimon

This comment has been minimized.

Show comment
Hide comment
@kensimon

kensimon Mar 29, 2017

Contributor

I have kubeadm getting past the node's NotReady status, but the dummy deployment it creates isn't working due to the node.alpha.kubernetes.io/notReady taint preventing it from running. Adding tolerations doesn't seem to help, I'm not exactly sure how to proceed at this point. Can anybody shed some light on how to deploy a pod that tolerates the notReady taint?

I'm exploring some other options like not marking the node as notReady, but it's not clear that's what we want to do.

Contributor

kensimon commented Mar 29, 2017

I have kubeadm getting past the node's NotReady status, but the dummy deployment it creates isn't working due to the node.alpha.kubernetes.io/notReady taint preventing it from running. Adding tolerations doesn't seem to help, I'm not exactly sure how to proceed at this point. Can anybody shed some light on how to deploy a pod that tolerates the notReady taint?

I'm exploring some other options like not marking the node as notReady, but it's not clear that's what we want to do.

@sbezverk

This comment has been minimized.

Show comment
Hide comment
@sbezverk

sbezverk Mar 29, 2017

Contributor

we worked around it by removing KUBELET_NETWORK_ARGS from kubelet command line. after that kubeadm init worked fine and we were able to install canal cni plugin.

Contributor

sbezverk commented Mar 29, 2017

we worked around it by removing KUBELET_NETWORK_ARGS from kubelet command line. after that kubeadm init worked fine and we were able to install canal cni plugin.

@overip

This comment has been minimized.

Show comment
Hide comment
@overip

overip Mar 29, 2017

@sbezverk would you please describe how to do that?

overip commented Mar 29, 2017

@sbezverk would you please describe how to do that?

@coeki

This comment has been minimized.

Show comment
Hide comment
@coeki

coeki Mar 29, 2017

Can confirm @sbezverk (good find :) ) findings, adjusting /etc/systemd/system/10-kubeadm.conf and removing KUBELET_NETWORK_ARGS will make it run on centos. Tested with weave.

coeki commented Mar 29, 2017

Can confirm @sbezverk (good find :) ) findings, adjusting /etc/systemd/system/10-kubeadm.conf and removing KUBELET_NETWORK_ARGS will make it run on centos. Tested with weave.

@sbezverk

This comment has been minimized.

Show comment
Hide comment
@sbezverk

sbezverk Mar 29, 2017

Contributor

@overip you need to edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_EXTRA_ARGS

remove $KUBELET_NETWORK_ARGS

and then restart kubelet after that kubeadm init should work.

Contributor

sbezverk commented Mar 29, 2017

@overip you need to edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_EXTRA_ARGS

remove $KUBELET_NETWORK_ARGS

and then restart kubelet after that kubeadm init should work.

@jp557198

This comment has been minimized.

Show comment
Hide comment
@jp557198

jp557198 Mar 29, 2017

this is what i did

kubeadm reset

remove ENV entries from:

/etc/systemd/system/kubelet.service.d/10-kubeadm.conf

reload systemd and kube services

systemctl daemon-reload
systemctl restart kubelet.service

re-run init

kubeadm init

this is what i did

kubeadm reset

remove ENV entries from:

/etc/systemd/system/kubelet.service.d/10-kubeadm.conf

reload systemd and kube services

systemctl daemon-reload
systemctl restart kubelet.service

re-run init

kubeadm init

@coeki

This comment has been minimized.

Show comment
Hide comment
@coeki

coeki Mar 29, 2017

All correct, and while we're at it

If you see this:
kubelet: error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

you have to edit your /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and add the flag --cgroup-driver="systemd"

and do as above

kuebadm reset
systemctl daemon-reload
systemctl restart kubelet.service
kubeadm init.

coeki commented Mar 29, 2017

All correct, and while we're at it

If you see this:
kubelet: error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

you have to edit your /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and add the flag --cgroup-driver="systemd"

and do as above

kuebadm reset
systemctl daemon-reload
systemctl restart kubelet.service
kubeadm init.

@kensimon

This comment has been minimized.

Show comment
Hide comment
@kensimon

kensimon Mar 29, 2017

Contributor

I'd be careful removing --network-plugin=cni from the kubelet CLI flags, this causes kubelet to use the no_op plugin by default... I would be surprised if common plugins like calico/weave would even work in this case (but then again my understanding of how these plugins operate underneath is a bit limited.)

Contributor

kensimon commented Mar 29, 2017

I'd be careful removing --network-plugin=cni from the kubelet CLI flags, this causes kubelet to use the no_op plugin by default... I would be surprised if common plugins like calico/weave would even work in this case (but then again my understanding of how these plugins operate underneath is a bit limited.)

@sbezverk

This comment has been minimized.

Show comment
Hide comment
@sbezverk

sbezverk Mar 29, 2017

Contributor

@kensimon hm, have not seen any issues on my setup, I deployed canal cni plugin and it worked fine..

Contributor

sbezverk commented Mar 29, 2017

@kensimon hm, have not seen any issues on my setup, I deployed canal cni plugin and it worked fine..

@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer

resouer Mar 29, 2017

Member

@sbezverk Is cross host networking also working well?

Member

resouer commented Mar 29, 2017

@sbezverk Is cross host networking also working well?

@sbezverk

This comment has been minimized.

Show comment
Hide comment
@sbezverk

sbezverk Mar 29, 2017

Contributor

@resouer cannot confirm, I have 1.6.0 only as All-In-One.

Contributor

sbezverk commented Mar 29, 2017

@resouer cannot confirm, I have 1.6.0 only as All-In-One.

@coeki

This comment has been minimized.

Show comment
Hide comment
@coeki

coeki Mar 29, 2017

@resouer @sbezverk I succesfully joined a machine.

 [root@deploy-01 x86_64]# kubectl get nodes
 NAME        STATUS    AGE       VERSION
 deploy-01   Ready     51m       v1.6.0
 master-01   Ready     4m        v1.6.0
     NAME                                    READY     STATUS    RESTARTS   AGE
etcd-deploy-01                          1/1       Running   0          50m
kube-apiserver-deploy-01                1/1       Running   0          51m
kube-controller-manager-deploy-01       1/1       Running   0          50m
kube-dns-3913472980-6plgh               3/3       Running   0          51m
kube-proxy-mbvdh                        1/1       Running   0          4m
kube-proxy-rmp36                        1/1       Running   0          51m
kube-scheduler-deploy-01                1/1       Running   0          50m
kubernetes-dashboard-2396447444-fm8cz   1/1       Running   0          24m
weave-net-3t487                         2/2       Running   0          44m
weave-net-hhcqp                         2/2       Running   0          4m

coeki commented Mar 29, 2017

@resouer @sbezverk I succesfully joined a machine.

 [root@deploy-01 x86_64]# kubectl get nodes
 NAME        STATUS    AGE       VERSION
 deploy-01   Ready     51m       v1.6.0
 master-01   Ready     4m        v1.6.0
     NAME                                    READY     STATUS    RESTARTS   AGE
etcd-deploy-01                          1/1       Running   0          50m
kube-apiserver-deploy-01                1/1       Running   0          51m
kube-controller-manager-deploy-01       1/1       Running   0          50m
kube-dns-3913472980-6plgh               3/3       Running   0          51m
kube-proxy-mbvdh                        1/1       Running   0          4m
kube-proxy-rmp36                        1/1       Running   0          51m
kube-scheduler-deploy-01                1/1       Running   0          50m
kubernetes-dashboard-2396447444-fm8cz   1/1       Running   0          24m
weave-net-3t487                         2/2       Running   0          44m
weave-net-hhcqp                         2/2       Running   0          4m

@stevenbower

This comment has been minimized.

Show comment
Hide comment
@stevenbower

stevenbower Mar 29, 2017

workaround works but can't get flannel going...

workaround works but can't get flannel going...

@sbezverk

This comment has been minimized.

Show comment
Hide comment
@sbezverk

sbezverk Mar 29, 2017

Contributor

@stevenbower worst case scenario, you can put back this setting and restart kubelet when you are done with kubeadm business..

Contributor

sbezverk commented Mar 29, 2017

@stevenbower worst case scenario, you can put back this setting and restart kubelet when you are done with kubeadm business..

@webwurst

This comment has been minimized.

Show comment
Hide comment
@webwurst

webwurst Mar 29, 2017

I got a three node cluster with weave working. Not sure how stable this might be with this hack, but thanks anyway! 😃

I got a three node cluster with weave working. Not sure how stable this might be with this hack, but thanks anyway! 😃

@coeki

This comment has been minimized.

Show comment
Hide comment
@coeki

coeki Mar 29, 2017

On a side node, you can put back the $KUBELET_NETWORK_ARGS, after the init on the master passes. I actually did not remove it on the machine I joined, only the cgroup-driver, otherwise kubelet and docker won't work together.

But you don't have to kubeadm reset, just change /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and do the systemctl dance:

systemctl daemon-reload
systemctl restart kubelet.service

coeki commented Mar 29, 2017

On a side node, you can put back the $KUBELET_NETWORK_ARGS, after the init on the master passes. I actually did not remove it on the machine I joined, only the cgroup-driver, otherwise kubelet and docker won't work together.

But you don't have to kubeadm reset, just change /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and do the systemctl dance:

systemctl daemon-reload
systemctl restart kubelet.service

kensimon pushed a commit to kensimon/aws-quickstart that referenced this issue Mar 29, 2017

Ken Simon
WIP: Initial support for 1.6
1.6 final is out, but there's still issues with kubeadm
(kubernetes/kubernetes#43815), this has a patched
version just for testing.

Still to do:

- Wait for kubernetes/kubernetes#43824 to be merged or
  for another solution to be available
- Deploy the AMI to the rest of the regions (this only works in us-west-1)

Signed-off-by: Ken Simon <ninkendo@gmail.com>
@ReSearchITEng

This comment has been minimized.

Show comment
Hide comment
@ReSearchITEng

ReSearchITEng Apr 12, 2017

adding "--cgroup-driver=systemd" in the kublet causes new issue on Centos/RHEL 7.3 (fully up to date - aka docker 1.10.3):

Apr 12 14:23:25 machine01 kubelet[3026]: W0412 14:23:25.542322    3026 docker_service.go:196] No cgroup driver is set in Docker
Apr 12 14:23:25 machine01 kubelet[3026]: W0412 14:23:25.542343    3026 docker_service.go:197] Falling back to use the default driver: "cgroupfs"
Apr 12 14:23:25 machine01 kubelet[3026]: error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"

while we can see clearly that native.cgroupdriver=systemd is set in the docker daemon:

 ps -ef|grep -i docker
root      4365     1  3 14:30 ?        00:00:33 /usr/bin/docker-current daemon --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --selinux-enabled --log-driver=journald --insecure-registry 172.30.0.0/16 --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/vg.docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true

adding "--cgroup-driver=systemd" in the kublet causes new issue on Centos/RHEL 7.3 (fully up to date - aka docker 1.10.3):

Apr 12 14:23:25 machine01 kubelet[3026]: W0412 14:23:25.542322    3026 docker_service.go:196] No cgroup driver is set in Docker
Apr 12 14:23:25 machine01 kubelet[3026]: W0412 14:23:25.542343    3026 docker_service.go:197] Falling back to use the default driver: "cgroupfs"
Apr 12 14:23:25 machine01 kubelet[3026]: error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"

while we can see clearly that native.cgroupdriver=systemd is set in the docker daemon:

 ps -ef|grep -i docker
root      4365     1  3 14:30 ?        00:00:33 /usr/bin/docker-current daemon --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --selinux-enabled --log-driver=journald --insecure-registry 172.30.0.0/16 --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/vg.docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true
@sbezverk

This comment has been minimized.

Show comment
Hide comment
@sbezverk

sbezverk Apr 12, 2017

Contributor

@ReSearchITEng why do not you update docker to 1.12.6? Works like a charm with this version.

Contributor

sbezverk commented Apr 12, 2017

@ReSearchITEng why do not you update docker to 1.12.6? Works like a charm with this version.

@ReSearchITEng

This comment has been minimized.

Show comment
Hide comment
@ReSearchITEng

ReSearchITEng Apr 12, 2017

@sbezverk: I just updated to 1.12.5 and now it's working! Many thanks!

@sbezverk: I just updated to 1.12.5 and now it's working! Many thanks!

@ReSearchITEng

This comment has been minimized.

Show comment
Hide comment
@ReSearchITEng

ReSearchITEng Apr 14, 2017

Thanks all for help.
Finally fully working k8s 1.6.1 with flannel. Everything is now in ansible playbooks.
Tested on Centos/RHEL. Preparations started for Debian based also (e.g. Ubuntu), but there might needs some refining.

https://github.com/ReSearchITEng/kubeadm-playbook/blob/master/README.md

PS: work based on sjenning/kubeadm-playbook - Many thanks @sjenning !

Thanks all for help.
Finally fully working k8s 1.6.1 with flannel. Everything is now in ansible playbooks.
Tested on Centos/RHEL. Preparations started for Debian based also (e.g. Ubuntu), but there might needs some refining.

https://github.com/ReSearchITEng/kubeadm-playbook/blob/master/README.md

PS: work based on sjenning/kubeadm-playbook - Many thanks @sjenning !

@thiagodasilva

This comment has been minimized.

Show comment
Hide comment
@thiagodasilva

thiagodasilva Apr 26, 2017

@sjenning @ReSearchITEng
Hi, I have a vagrant+ansible playbook [0] very similar to what you have created, but I'm still unable to get it working, altought for me it is failing on the networking setup. I have tried with calico, weave and flannel, and all three fail (although with different symptoms).

I'm running these versions:
[vagrant@master ~]$ docker --version
Docker version 1.12.6, build 3a094bd/1.12.6
[vagrant@master ~]$ kubelet --version
Kubernetes v1.6.1
[vagrant@master ~]$ kubeadm version
kubeadm version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

errors:

[vagrant@master ~]$ kubectl get all --all-namespaces
NAMESPACE     NAME                                           READY     STATUS             RESTARTS   AGE
kube-system   po/calico-etcd-gvrhd                           1/1       Running            0          47m
kube-system   po/calico-node-7jvs8                           1/2       CrashLoopBackOff   12         45m
kube-system   po/calico-node-7ljpn                           2/2       Running            0          47m
kube-system   po/calico-node-w15z3                           1/2       CrashLoopBackOff   12         45m
kube-system   po/calico-node-zq3zx                           1/2       CrashLoopBackOff   12         45m
kube-system   po/calico-policy-controller-1777954159-13x01   1/1       Running            0          47m
kube-system   po/etcd-master                                 1/1       Running            0          46m
kube-system   po/kube-apiserver-master                       1/1       Running            0          46m
kube-system   po/kube-controller-manager-master              1/1       Running            0          46m
kube-system   po/kube-dns-3913472980-16m01                   3/3       Running            0          47m
kube-system   po/kube-proxy-70bmf                            1/1       Running            0          45m
kube-system   po/kube-proxy-9642h                            1/1       Running            0          45m
kube-system   po/kube-proxy-jhtvm                            1/1       Running            0          45m
kube-system   po/kube-proxy-nb7q5                            1/1       Running            0          47m
kube-system   po/kube-scheduler-master                       1/1       Running            0          46m

NAMESPACE     NAME              CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       svc/kubernetes    10.96.0.1       <none>        443/TCP         47m
kube-system   svc/calico-etcd   10.96.232.136   <none>        6666/TCP        47m
kube-system   svc/kube-dns      10.96.0.10      <none>        53/UDP,53/TCP   47m

NAMESPACE     NAME                              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deploy/calico-policy-controller   1         1         1            1           47m
kube-system   deploy/kube-dns                   1         1         1            1           47m

NAMESPACE     NAME                                     DESIRED   CURRENT   READY     AGE
kube-system   rs/calico-policy-controller-1777954159   1         1         1         47m
kube-system   rs/kube-dns-3913472980                   1         1         1         47m
[vagrant@master ~]$ kubectl -n kube-system describe po/calico-node-zq3zx
Name:		calico-node-zq3zx
Namespace:	kube-system
Node:		node1/192.168.10.101
Start Time:	Wed, 26 Apr 2017 19:37:35 +0000
Labels:		k8s-app=calico-node
		pod-template-generation=1
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"kube-system","name":"calico-node","uid":"844cd287-2ab7-11e7-b184-5254008815b6","ap...
		scheduler.alpha.kubernetes.io/critical-pod=
Status:		Running
IP:		192.168.10.101
Controllers:	DaemonSet/calico-node
Containers:
  calico-node:
    Container ID:	docker://ca00b0a73a073a2d2e39cb0cc315b8366eaa20e2e479002dd16134b0d1e94f0b
    Image:		quay.io/calico/node:v1.1.3
    Image ID:		docker-pullable://quay.io/calico/node@sha256:8e62eee18612a6ac7bcae90afaba0ed95265baba7bf3c0ab632b7b40ddfaf603
    Port:		
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Error
      Exit Code:	1
      Started:		Mon, 01 Jan 0001 00:00:00 +0000
      Finished:		Wed, 26 Apr 2017 20:21:09 +0000
    Ready:		False
    Restart Count:	12
    Requests:
      cpu:	250m
    Environment:
      ETCD_ENDPOINTS:				<set to the key 'etcd_endpoints' of config map 'calico-config'>	Optional: false
      CALICO_NETWORKING_BACKEND:		<set to the key 'calico_backend' of config map 'calico-config'>	Optional: false
      CALICO_DISABLE_FILE_LOGGING:		true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:	ACCEPT
      CALICO_IPV4POOL_CIDR:			192.168.0.0/16
      CALICO_IPV4POOL_IPIP:			always
      FELIX_IPV6SUPPORT:			false
      FELIX_LOGSEVERITYSCREEN:			info
      IP:					
    Mounts:
      /lib/modules from lib-modules (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-5wnmg (ro)
  install-cni:
    Container ID:	docker://442c3adfa908f76654bb54070ef5ff638e4b68e0331ea0555ae877ce583ce858
    Image:		quay.io/calico/cni:v1.7.0
    Image ID:		docker-pullable://quay.io/calico/cni@sha256:3612ffb0bff609d65311b45f4bae57fa80a05d25e1580ceb83ba4162e2ceef9f
    Port:		
    Command:
      /install-cni.sh
    State:		Running
      Started:		Wed, 26 Apr 2017 19:38:29 +0000
    Ready:		True
    Restart Count:	0
    Environment:
      ETCD_ENDPOINTS:		<set to the key 'etcd_endpoints' of config map 'calico-config'>		Optional: false
      CNI_NETWORK_CONFIG:	<set to the key 'cni_network_config' of config map 'calico-config'>	Optional: false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-5wnmg (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  lib-modules:
    Type:	HostPath (bare host directory volume)
    Path:	/lib/modules
  var-run-calico:
    Type:	HostPath (bare host directory volume)
    Path:	/var/run/calico
  cni-bin-dir:
    Type:	HostPath (bare host directory volume)
    Path:	/opt/cni/bin
  cni-net-dir:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/cni/net.d
  calico-cni-plugin-token-5wnmg:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	calico-cni-plugin-token-5wnmg
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	<none>
Tolerations:	CriticalAddonsOnly=:Exists
		node-role.kubernetes.io/master=:NoSchedule
		node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
		node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
  FirstSeen	LastSeen	Count	From		SubObjectPath			Type		Reason		Message
  ---------	--------	-----	----		-------------			--------	------		-------
  46m		46m		1	kubelet, node1	spec.containers{calico-node}	Normal		Pulling		pulling image "quay.io/calico/node:v1.1.3"
  45m		45m		1	kubelet, node1	spec.containers{calico-node}	Normal		Pulled		Successfully pulled image "quay.io/calico/node:v1.1.3"
  45m		45m		1	kubelet, node1	spec.containers{calico-node}	Normal		Created		Created container with id e035a82202b2c8490e879cb9647773158ff05def6c60b31a001e23e6d288a977
  45m		45m		1	kubelet, node1	spec.containers{calico-node}	Normal		Started		Started container with id e035a82202b2c8490e879cb9647773158ff05def6c60b31a001e23e6d288a977
  45m		45m		1	kubelet, node1	spec.containers{install-cni}	Normal		Pulling		pulling image "quay.io/calico/cni:v1.7.0"
  45m		45m		1	kubelet, node1	spec.containers{install-cni}	Normal		Pulled		Successfully pulled image "quay.io/calico/cni:v1.7.0"
  45m		45m		1	kubelet, node1	spec.containers{install-cni}	Normal		Created		Created container with id 442c3adfa908f76654bb54070ef5ff638e4b68e0331ea0555ae877ce583ce858
  45m		45m		1	kubelet, node1	spec.containers{install-cni}	Normal		Started		Started container with id 442c3adfa908f76654bb54070ef5ff638e4b68e0331ea0555ae877ce583ce858
  44m		44m		1	kubelet, node1	spec.containers{calico-node}	Normal		Created		Created container with id 163a9073070aa52ce7ee98c798ffe130a581e4fdbbc503540ed5d3b79651c549
  44m		44m		1	kubelet, node1	spec.containers{calico-node}	Normal		Started		Started container with id 163a9073070aa52ce7ee98c798ffe130a581e4fdbbc503540ed5d3b79651c549
  44m		44m		1	kubelet, node1					Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 10s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  44m	44m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id 07453d944dfb9a4ebae57c83158e4b51f8870bcab94b4f706239f6c0b93bb62d
  44m	44m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id 07453d944dfb9a4ebae57c83158e4b51f8870bcab94b4f706239f6c0b93bb62d
  43m	43m	2	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 20s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  43m	43m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id 00f363848c16ff66743d54b87948133a87a97bfd32fbde2338622904d0990601
  43m	43m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id 00f363848c16ff66743d54b87948133a87a97bfd32fbde2338622904d0990601
  42m	42m	3	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 40s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  41m	41m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id a5aad1f1a57a361fafcaa2ee6aba244bf19925f56c5b46771cfd45e5e7fd884e
  41m	41m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id a5aad1f1a57a361fafcaa2ee6aba244bf19925f56c5b46771cfd45e5e7fd884e
  41m	40m	6	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  40m	40m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id 520ee97fe986fd726a0347cab6de5b2a8fba91f73df2d601e8b7625531ed2117
  40m	40m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id 520ee97fe986fd726a0347cab6de5b2a8fba91f73df2d601e8b7625531ed2117
  39m	36m	12	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  36m	36m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id 90be4da6fd2e8c111c3e2a91256d60656db80316c1497c29c4155b8f009f241f
  36m	36m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id 90be4da6fd2e8c111c3e2a91256d60656db80316c1497c29c4155b8f009f241f
  31m	31m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id bf0d93f45d5ffa2d2c42487851f80048757da5c767491f673bfecfa37fe76e48
  31m	31m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id bf0d93f45d5ffa2d2c42487851f80048757da5c767491f673bfecfa37fe76e48
  44m	3m	12	kubelet, node1	spec.containers{calico-node}	Normal	Pulled		Container image "quay.io/calico/node:v1.1.3" already present on machine
  25m	3m	5	kubelet, node1	spec.containers{calico-node}	Normal	Started		(events with common reason combined)
  25m	3m	5	kubelet, node1	spec.containers{calico-node}	Normal	Created		(events with common reason combined)
  36m	15s	149	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  44m	15s	173	kubelet, node1	spec.containers{calico-node}	Warning	BackOff	Back-off restarting failed container

This looks like key information, but I'm not sure how to fix it:

[vagrant@master ~]$ kubectl -n kube-system logs calico-node-zq3zx calico-node
Skipping datastore connection test
time="2017-04-26T20:20:39Z" level=info msg="NODENAME environment not specified - check HOSTNAME" 
time="2017-04-26T20:20:39Z" level=info msg="Loading config from environment" 
ERROR: Unable to access datastore to query node configuration
Terminating
time="2017-04-26T20:21:09Z" level=info msg="Unhandled error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
" 
time="2017-04-26T20:21:09Z" level=info msg="Unable to query node configuration" Name=node1 error="client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
" 
Calico node failed to start

Any help would be greatly appreciated...

[0]- https://github.com/thiagodasilva/kubernetes-swift/tree/master/roles

@sjenning @ReSearchITEng
Hi, I have a vagrant+ansible playbook [0] very similar to what you have created, but I'm still unable to get it working, altought for me it is failing on the networking setup. I have tried with calico, weave and flannel, and all three fail (although with different symptoms).

I'm running these versions:
[vagrant@master ~]$ docker --version
Docker version 1.12.6, build 3a094bd/1.12.6
[vagrant@master ~]$ kubelet --version
Kubernetes v1.6.1
[vagrant@master ~]$ kubeadm version
kubeadm version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

errors:

[vagrant@master ~]$ kubectl get all --all-namespaces
NAMESPACE     NAME                                           READY     STATUS             RESTARTS   AGE
kube-system   po/calico-etcd-gvrhd                           1/1       Running            0          47m
kube-system   po/calico-node-7jvs8                           1/2       CrashLoopBackOff   12         45m
kube-system   po/calico-node-7ljpn                           2/2       Running            0          47m
kube-system   po/calico-node-w15z3                           1/2       CrashLoopBackOff   12         45m
kube-system   po/calico-node-zq3zx                           1/2       CrashLoopBackOff   12         45m
kube-system   po/calico-policy-controller-1777954159-13x01   1/1       Running            0          47m
kube-system   po/etcd-master                                 1/1       Running            0          46m
kube-system   po/kube-apiserver-master                       1/1       Running            0          46m
kube-system   po/kube-controller-manager-master              1/1       Running            0          46m
kube-system   po/kube-dns-3913472980-16m01                   3/3       Running            0          47m
kube-system   po/kube-proxy-70bmf                            1/1       Running            0          45m
kube-system   po/kube-proxy-9642h                            1/1       Running            0          45m
kube-system   po/kube-proxy-jhtvm                            1/1       Running            0          45m
kube-system   po/kube-proxy-nb7q5                            1/1       Running            0          47m
kube-system   po/kube-scheduler-master                       1/1       Running            0          46m

NAMESPACE     NAME              CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       svc/kubernetes    10.96.0.1       <none>        443/TCP         47m
kube-system   svc/calico-etcd   10.96.232.136   <none>        6666/TCP        47m
kube-system   svc/kube-dns      10.96.0.10      <none>        53/UDP,53/TCP   47m

NAMESPACE     NAME                              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deploy/calico-policy-controller   1         1         1            1           47m
kube-system   deploy/kube-dns                   1         1         1            1           47m

NAMESPACE     NAME                                     DESIRED   CURRENT   READY     AGE
kube-system   rs/calico-policy-controller-1777954159   1         1         1         47m
kube-system   rs/kube-dns-3913472980                   1         1         1         47m
[vagrant@master ~]$ kubectl -n kube-system describe po/calico-node-zq3zx
Name:		calico-node-zq3zx
Namespace:	kube-system
Node:		node1/192.168.10.101
Start Time:	Wed, 26 Apr 2017 19:37:35 +0000
Labels:		k8s-app=calico-node
		pod-template-generation=1
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"kube-system","name":"calico-node","uid":"844cd287-2ab7-11e7-b184-5254008815b6","ap...
		scheduler.alpha.kubernetes.io/critical-pod=
Status:		Running
IP:		192.168.10.101
Controllers:	DaemonSet/calico-node
Containers:
  calico-node:
    Container ID:	docker://ca00b0a73a073a2d2e39cb0cc315b8366eaa20e2e479002dd16134b0d1e94f0b
    Image:		quay.io/calico/node:v1.1.3
    Image ID:		docker-pullable://quay.io/calico/node@sha256:8e62eee18612a6ac7bcae90afaba0ed95265baba7bf3c0ab632b7b40ddfaf603
    Port:		
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Error
      Exit Code:	1
      Started:		Mon, 01 Jan 0001 00:00:00 +0000
      Finished:		Wed, 26 Apr 2017 20:21:09 +0000
    Ready:		False
    Restart Count:	12
    Requests:
      cpu:	250m
    Environment:
      ETCD_ENDPOINTS:				<set to the key 'etcd_endpoints' of config map 'calico-config'>	Optional: false
      CALICO_NETWORKING_BACKEND:		<set to the key 'calico_backend' of config map 'calico-config'>	Optional: false
      CALICO_DISABLE_FILE_LOGGING:		true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:	ACCEPT
      CALICO_IPV4POOL_CIDR:			192.168.0.0/16
      CALICO_IPV4POOL_IPIP:			always
      FELIX_IPV6SUPPORT:			false
      FELIX_LOGSEVERITYSCREEN:			info
      IP:					
    Mounts:
      /lib/modules from lib-modules (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-5wnmg (ro)
  install-cni:
    Container ID:	docker://442c3adfa908f76654bb54070ef5ff638e4b68e0331ea0555ae877ce583ce858
    Image:		quay.io/calico/cni:v1.7.0
    Image ID:		docker-pullable://quay.io/calico/cni@sha256:3612ffb0bff609d65311b45f4bae57fa80a05d25e1580ceb83ba4162e2ceef9f
    Port:		
    Command:
      /install-cni.sh
    State:		Running
      Started:		Wed, 26 Apr 2017 19:38:29 +0000
    Ready:		True
    Restart Count:	0
    Environment:
      ETCD_ENDPOINTS:		<set to the key 'etcd_endpoints' of config map 'calico-config'>		Optional: false
      CNI_NETWORK_CONFIG:	<set to the key 'cni_network_config' of config map 'calico-config'>	Optional: false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-5wnmg (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  lib-modules:
    Type:	HostPath (bare host directory volume)
    Path:	/lib/modules
  var-run-calico:
    Type:	HostPath (bare host directory volume)
    Path:	/var/run/calico
  cni-bin-dir:
    Type:	HostPath (bare host directory volume)
    Path:	/opt/cni/bin
  cni-net-dir:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/cni/net.d
  calico-cni-plugin-token-5wnmg:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	calico-cni-plugin-token-5wnmg
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	<none>
Tolerations:	CriticalAddonsOnly=:Exists
		node-role.kubernetes.io/master=:NoSchedule
		node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
		node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
  FirstSeen	LastSeen	Count	From		SubObjectPath			Type		Reason		Message
  ---------	--------	-----	----		-------------			--------	------		-------
  46m		46m		1	kubelet, node1	spec.containers{calico-node}	Normal		Pulling		pulling image "quay.io/calico/node:v1.1.3"
  45m		45m		1	kubelet, node1	spec.containers{calico-node}	Normal		Pulled		Successfully pulled image "quay.io/calico/node:v1.1.3"
  45m		45m		1	kubelet, node1	spec.containers{calico-node}	Normal		Created		Created container with id e035a82202b2c8490e879cb9647773158ff05def6c60b31a001e23e6d288a977
  45m		45m		1	kubelet, node1	spec.containers{calico-node}	Normal		Started		Started container with id e035a82202b2c8490e879cb9647773158ff05def6c60b31a001e23e6d288a977
  45m		45m		1	kubelet, node1	spec.containers{install-cni}	Normal		Pulling		pulling image "quay.io/calico/cni:v1.7.0"
  45m		45m		1	kubelet, node1	spec.containers{install-cni}	Normal		Pulled		Successfully pulled image "quay.io/calico/cni:v1.7.0"
  45m		45m		1	kubelet, node1	spec.containers{install-cni}	Normal		Created		Created container with id 442c3adfa908f76654bb54070ef5ff638e4b68e0331ea0555ae877ce583ce858
  45m		45m		1	kubelet, node1	spec.containers{install-cni}	Normal		Started		Started container with id 442c3adfa908f76654bb54070ef5ff638e4b68e0331ea0555ae877ce583ce858
  44m		44m		1	kubelet, node1	spec.containers{calico-node}	Normal		Created		Created container with id 163a9073070aa52ce7ee98c798ffe130a581e4fdbbc503540ed5d3b79651c549
  44m		44m		1	kubelet, node1	spec.containers{calico-node}	Normal		Started		Started container with id 163a9073070aa52ce7ee98c798ffe130a581e4fdbbc503540ed5d3b79651c549
  44m		44m		1	kubelet, node1					Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 10s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  44m	44m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id 07453d944dfb9a4ebae57c83158e4b51f8870bcab94b4f706239f6c0b93bb62d
  44m	44m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id 07453d944dfb9a4ebae57c83158e4b51f8870bcab94b4f706239f6c0b93bb62d
  43m	43m	2	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 20s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  43m	43m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id 00f363848c16ff66743d54b87948133a87a97bfd32fbde2338622904d0990601
  43m	43m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id 00f363848c16ff66743d54b87948133a87a97bfd32fbde2338622904d0990601
  42m	42m	3	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 40s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  41m	41m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id a5aad1f1a57a361fafcaa2ee6aba244bf19925f56c5b46771cfd45e5e7fd884e
  41m	41m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id a5aad1f1a57a361fafcaa2ee6aba244bf19925f56c5b46771cfd45e5e7fd884e
  41m	40m	6	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  40m	40m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id 520ee97fe986fd726a0347cab6de5b2a8fba91f73df2d601e8b7625531ed2117
  40m	40m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id 520ee97fe986fd726a0347cab6de5b2a8fba91f73df2d601e8b7625531ed2117
  39m	36m	12	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  36m	36m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id 90be4da6fd2e8c111c3e2a91256d60656db80316c1497c29c4155b8f009f241f
  36m	36m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id 90be4da6fd2e8c111c3e2a91256d60656db80316c1497c29c4155b8f009f241f
  31m	31m	1	kubelet, node1	spec.containers{calico-node}	Normal	Created		Created container with id bf0d93f45d5ffa2d2c42487851f80048757da5c767491f673bfecfa37fe76e48
  31m	31m	1	kubelet, node1	spec.containers{calico-node}	Normal	Started		Started container with id bf0d93f45d5ffa2d2c42487851f80048757da5c767491f673bfecfa37fe76e48
  44m	3m	12	kubelet, node1	spec.containers{calico-node}	Normal	Pulled		Container image "quay.io/calico/node:v1.1.3" already present on machine
  25m	3m	5	kubelet, node1	spec.containers{calico-node}	Normal	Started		(events with common reason combined)
  25m	3m	5	kubelet, node1	spec.containers{calico-node}	Normal	Created		(events with common reason combined)
  36m	15s	149	kubelet, node1					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-zq3zx_kube-system(c983e5d0-2ab7-11e7-b184-5254008815b6)"

  44m	15s	173	kubelet, node1	spec.containers{calico-node}	Warning	BackOff	Back-off restarting failed container

This looks like key information, but I'm not sure how to fix it:

[vagrant@master ~]$ kubectl -n kube-system logs calico-node-zq3zx calico-node
Skipping datastore connection test
time="2017-04-26T20:20:39Z" level=info msg="NODENAME environment not specified - check HOSTNAME" 
time="2017-04-26T20:20:39Z" level=info msg="Loading config from environment" 
ERROR: Unable to access datastore to query node configuration
Terminating
time="2017-04-26T20:21:09Z" level=info msg="Unhandled error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
" 
time="2017-04-26T20:21:09Z" level=info msg="Unable to query node configuration" Name=node1 error="client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
" 
Calico node failed to start

Any help would be greatly appreciated...

[0]- https://github.com/thiagodasilva/kubernetes-swift/tree/master/roles

@ReSearchITEng

This comment has been minimized.

Show comment
Hide comment
@ReSearchITEng

ReSearchITEng May 2, 2017

I could not identify what's wrong on your end.
I strongly suggest you try to create a separate installation using the playbooks here: https://github.com/ReSearchITEng/kubeadm-playbook and try to see what's the difference.
PS: my last tests are with 1.6.2 , both kube* and images and seems fine.

I could not identify what's wrong on your end.
I strongly suggest you try to create a separate installation using the playbooks here: https://github.com/ReSearchITEng/kubeadm-playbook and try to see what's the difference.
PS: my last tests are with 1.6.2 , both kube* and images and seems fine.

@davidopp

This comment has been minimized.

Show comment
Hide comment
@thiagodasilva

This comment has been minimized.

Show comment
Hide comment
@thiagodasilva

thiagodasilva May 22, 2017

@ReSearchITEng sorry I forgot to report back, but I eventually got it to work, my vagrant+ansible files are here: https://github.com/thiagodasilva/kubernetes-swift

@ReSearchITEng sorry I forgot to report back, but I eventually got it to work, my vagrant+ansible files are here: https://github.com/thiagodasilva/kubernetes-swift

@jamiehannaford jamiehannaford referenced this issue in kubernetes/kubernetes-anywhere May 23, 2017

Closed

Add flannel #390

mintzhao pushed a commit to mintzhao/kubernetes that referenced this issue Jun 1, 2017

Merge pull request #43837 from mikedanese/automated-cherry-pick-of-#4…
…3835-release-1.6

Automatic merge from submit-queue

Automated cherry pick of #43835 release 1.6

Automated cherry pick of #43835 release 1.6

fixes: kubernetes#43815
@frankruizhi

This comment has been minimized.

Show comment
Hide comment
@frankruizhi

frankruizhi Jun 6, 2017

I also got the same issue,but I just copy the cni config on the master node to the corresponding location of the worker node,then it became OK.

systemctl status kubelet.service -l

● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2017-06-06 10:42:00 CST; 18min ago
Docs: http://kubernetes.io/docs/
Main PID: 4414 (kubelet)
Memory: 43.0M
CGroup: /system.slice/kubelet.service
├─4414 /usr/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --pod-manifest-path=/etc/kubernetes/manifests --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=10.96.0.10 --cluster-domain=cluster.local --authorizatio-ca-file=/etc/kubernetes/pki/ca.crt --cgroup-driver=cgroupfs
└─4493 journalctl -k -f

Jun 06 10:59:46 contiv1.com kubelet[4414]: W0606 10:59:46.215827 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 10:59:46 contiv1.com kubelet[4414]: E0606 10:59:46.215972 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized
Jun 06 10:59:51 contiv1.com kubelet[4414]: W0606 10:59:51.216843 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 10:59:51 contiv1.com kubelet[4414]: E0606 10:59:51.216942 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized
Jun 06 10:59:56 contiv1.com kubelet[4414]: W0606 10:59:56.217923 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 10:59:56 contiv1.com kubelet[4414]: E0606 10:59:56.218113 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized
Jun 06 11:00:01 contiv1.com kubelet[4414]: W0606 11:00:01.219251 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 11:00:01 contiv1.com kubelet[4414]: E0606 11:00:01.219382 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized
Jun 06 11:00:06 contiv1.com kubelet[4414]: W0606 11:00:06.220396 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 11:00:06 contiv1.com kubelet[4414]: E0606 11:00:06.220575 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized

The status of all nodes:
[root@swarm net.d]# kubectl get node
NAME STATUS AGE VERSION
contiv1.com Ready 1h v1.6.4
contiv2.com Ready 1h v1.6.4
swarm.com Ready 1h v1.6.4

frankruizhi commented Jun 6, 2017

I also got the same issue,but I just copy the cni config on the master node to the corresponding location of the worker node,then it became OK.

systemctl status kubelet.service -l

● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2017-06-06 10:42:00 CST; 18min ago
Docs: http://kubernetes.io/docs/
Main PID: 4414 (kubelet)
Memory: 43.0M
CGroup: /system.slice/kubelet.service
├─4414 /usr/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --pod-manifest-path=/etc/kubernetes/manifests --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=10.96.0.10 --cluster-domain=cluster.local --authorizatio-ca-file=/etc/kubernetes/pki/ca.crt --cgroup-driver=cgroupfs
└─4493 journalctl -k -f

Jun 06 10:59:46 contiv1.com kubelet[4414]: W0606 10:59:46.215827 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 10:59:46 contiv1.com kubelet[4414]: E0606 10:59:46.215972 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized
Jun 06 10:59:51 contiv1.com kubelet[4414]: W0606 10:59:51.216843 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 10:59:51 contiv1.com kubelet[4414]: E0606 10:59:51.216942 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized
Jun 06 10:59:56 contiv1.com kubelet[4414]: W0606 10:59:56.217923 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 10:59:56 contiv1.com kubelet[4414]: E0606 10:59:56.218113 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized
Jun 06 11:00:01 contiv1.com kubelet[4414]: W0606 11:00:01.219251 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 11:00:01 contiv1.com kubelet[4414]: E0606 11:00:01.219382 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized
Jun 06 11:00:06 contiv1.com kubelet[4414]: W0606 11:00:06.220396 4414 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 06 11:00:06 contiv1.com kubelet[4414]: E0606 11:00:06.220575 4414 kubelet.go:2067] Container runtime network not ready: NetworkReady=false ready message:docker: network plugin is not ready: cni config uninitialized

The status of all nodes:
[root@swarm net.d]# kubectl get node
NAME STATUS AGE VERSION
contiv1.com Ready 1h v1.6.4
contiv2.com Ready 1h v1.6.4
swarm.com Ready 1h v1.6.4

@vaibhavjain882

This comment has been minimized.

Show comment
Hide comment
@vaibhavjain882

vaibhavjain882 Jun 8, 2017

Any resolution on this ? I was not able to do this even after trying all the mentioned resolutions.

Any resolution on this ? I was not able to do this even after trying all the mentioned resolutions.

@tirithen

This comment has been minimized.

Show comment
Hide comment
@tirithen

tirithen Jun 12, 2017

Being new to setting up Kubernetes I get superconfused. I tried following https://medium.com/@SystemMining/setup-kubenetes-cluster-on-ubuntu-16-04-with-kubeadm-336f4061d929 that use weave-kube for network but I'm also stuck with the same issue. Any way to solve this?
Ready False Mon, 12 Jun 2017 16:55:16 +0200 Mon, 12 Jun 2017 12:22:45 +0200 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Being new to setting up Kubernetes I get superconfused. I tried following https://medium.com/@SystemMining/setup-kubenetes-cluster-on-ubuntu-16-04-with-kubeadm-336f4061d929 that use weave-kube for network but I'm also stuck with the same issue. Any way to solve this?
Ready False Mon, 12 Jun 2017 16:55:16 +0200 Mon, 12 Jun 2017 12:22:45 +0200 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

@luxas luxas changed the title from kubeadm 1.6 is broken due to unconfigured CNI making kubelet NotReady to kubeadm 1.6.0 (only 1.6.0!!) is broken due to unconfigured CNI making kubelet NotReady Jun 12, 2017

@drajen

This comment has been minimized.

Show comment
Hide comment
@drajen

drajen Jun 13, 2017

Why is this still an issue? Ubuntu 16.04/CentOS 7.3 with latest updates using the official k8s repos with 1.6.4 and following the steps outlined here: https://kubernetes.io/docs/setup/independent/install-kubeadm/
Jun 13 09:57:21 tme-lnx1-centos kubelet: W0613 09:57:21.871413 10321 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 13 09:57:21 tme-lnx1-centos kubelet: E0613 09:57:21.871788 10321 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

drajen commented Jun 13, 2017

Why is this still an issue? Ubuntu 16.04/CentOS 7.3 with latest updates using the official k8s repos with 1.6.4 and following the steps outlined here: https://kubernetes.io/docs/setup/independent/install-kubeadm/
Jun 13 09:57:21 tme-lnx1-centos kubelet: W0613 09:57:21.871413 10321 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 13 09:57:21 tme-lnx1-centos kubelet: E0613 09:57:21.871788 10321 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Jun 13, 2017

Member

@drajen No, this affected only v1.6.0. It's expected that kubelet doesn't find a network since you haven't installed any. For example, just run

kubectl apply -f https://git.io/weave-kube-1.6

to install Weave Net and those problems will go away. You can choose to install Flannel, Calico, Canal or whatever CNI network if you'd like

Member

luxas commented Jun 13, 2017

@drajen No, this affected only v1.6.0. It's expected that kubelet doesn't find a network since you haven't installed any. For example, just run

kubectl apply -f https://git.io/weave-kube-1.6

to install Weave Net and those problems will go away. You can choose to install Flannel, Calico, Canal or whatever CNI network if you'd like

@drajen

This comment has been minimized.

Show comment
Hide comment
@drajen

drajen Jun 13, 2017

@luxas I keep seeing references to this but how am I suppose to apply something to a cluster that is not running? I have nothing to connect to.

drajen commented Jun 13, 2017

@luxas I keep seeing references to this but how am I suppose to apply something to a cluster that is not running? I have nothing to connect to.

@zatricky

This comment has been minimized.

Show comment
Hide comment
@zatricky

zatricky Jun 13, 2017

@drajen I think @luxas' point is that this is the wrong place to be asking about setup.
The various setup guides will get you past this point - the typical missing step in older guides, which luxas helpfully mentions, in that you need to apply a network configuration before everything will start working properly.

@drajen I think @luxas' point is that this is the wrong place to be asking about setup.
The various setup guides will get you past this point - the typical missing step in older guides, which luxas helpfully mentions, in that you need to apply a network configuration before everything will start working properly.

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Jun 13, 2017

Member

Yeah, it is might be non-obvious and we're sorry for that, but we can't have one single providers name there either.

Chatted with @drajen on Slack and the issue was cgroup related, the kubelet was unhealthy and wasn't able to create any Pods, hence the issue.

Member

luxas commented Jun 13, 2017

Yeah, it is might be non-obvious and we're sorry for that, but we can't have one single providers name there either.

Chatted with @drajen on Slack and the issue was cgroup related, the kubelet was unhealthy and wasn't able to create any Pods, hence the issue.

@drajen

This comment has been minimized.

Show comment
Hide comment
@drajen

drajen Jun 13, 2017

Thanks to @luxas for wrestling my particular problem to the ground: kubernetes/kubeadm#302

drajen commented Jun 13, 2017

Thanks to @luxas for wrestling my particular problem to the ground: kubernetes/kubeadm#302

@kris-nova

This comment has been minimized.

Show comment
Hide comment
@kris-nova

kris-nova Jul 3, 2017

Member

Still hitting this in arch on 1.7, is there a quick fix anywhere?


Edit:

kubectl apply -f https://git.io/weave-kube-1.6

did the trick, looks like we just needed CNI running

Member

kris-nova commented Jul 3, 2017

Still hitting this in arch on 1.7, is there a quick fix anywhere?


Edit:

kubectl apply -f https://git.io/weave-kube-1.6

did the trick, looks like we just needed CNI running

@ReSearchITEng

This comment has been minimized.

Show comment
Hide comment
@ReSearchITEng

ReSearchITEng Jul 3, 2017

At least for CentOS/RHEL, make sure you update /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and add the flag --cgroup-driver="systemd"

If you reinstall again on the same machine, this is a full proper reset:
https://github.com/ReSearchITEng/kubeadm-playbook/blob/master/reset.yml
(this is required especially if you use flanneld)
If you want to do all in one, you may want to use the entire project: https://github.com/ReSearchITEng/kubeadm-playbook/

At least for CentOS/RHEL, make sure you update /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and add the flag --cgroup-driver="systemd"

If you reinstall again on the same machine, this is a full proper reset:
https://github.com/ReSearchITEng/kubeadm-playbook/blob/master/reset.yml
(this is required especially if you use flanneld)
If you want to do all in one, you may want to use the entire project: https://github.com/ReSearchITEng/kubeadm-playbook/

@billmilligan

This comment has been minimized.

Show comment
Hide comment
@billmilligan

billmilligan Jul 24, 2017

I hit this issue, and absolutely nothing I read above worked. So I tried again with a much more controlled setup, switching from Ubuntu to the latest CoreOS, going to an earlier version of k8s to start with, and in general being very anal about every last thing installed into each VM. I am NOT using kubeadm, but instead a combination of vagrant and ansible.

(why? because I had no idea what was going on in kubeadm and figured this way at least I'd have control and be able to bypass any overzealous preflight checks, not to mention feeling like I had more automation control in general, and also not having to worry about the warning to do-not-apply-alpha-software-in-production.)

When I tried this setup with an older (1.4.3) edition of k8s, this approach was golden. I then tried upgrading to 1.71. Once again, I am STILL hitting this same issue in spite of using no kubeadm at all.

I have confirmed that I am running calico in each of my nodes, including the Master and the three potential Workers. ALL of my nodes are reporting as NotReady, so I'm not really sure how I could apply weave (or anything else) to get it running.

This whole thing just seems chicken/egg ... can't allocate a pod because networking is failing, but need networking running to create a network at /etc/cni/net.d in order to be able to allocate pods. And again, all this was working a few hours ago with k8s 1.4.3. I'm very frustrated!

I'd appreciate any insights anybody could give.


Footnotes:

On master: journalctl -r -u kubelet gives me

Jul 24 00:48:16 rogue-kube-master-01 kubelet-wrapper[7647]: E0724 00:48:16.592274 7647 kubelet.go:2136] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is no
Jul 24 00:48:16 rogue-kube-master-01 kubelet-wrapper[7647]: W0724 00:48:16.590588 7647 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d

docker ps | grep calico

(truncated for readability)
cde... quay.io/calico/leader-elector@sha256:... "/run.sh --election=c" 8 hours ago Up 8 hours
f72... calico/kube-policy-controller@sha256:... "/dist/controller" 8 hours ago Up 8 hours
c47... gcr.io/google_containers/pause-amd64:3.0 "/pause" 8 hours ago Up 8 hours

There is no /etc/cni/net.d

From my kubectl box:
kubectl get nodes
10.0.0.111 NotReady,SchedulingDisabled 8h v1.7.1+coreos.0
10.0.0.121 NotReady 8h v1.7.1+coreos.0
10.0.0.122 NotReady 8h v1.7.1+coreos.0
10.0.0.123 NotReady 8h v1.7.1+coreos.0


kubectl apply -f https://git.io/weave-kube-1.6

DID NOT fix anything and in fact only seems to create more problems.

bill@rogue:~/vagrant_rogue/rogue-cluster$ kubectl apply -f https://git.io/weave-kube-1.6
serviceaccount "weave-net" created
clusterrolebinding "weave-net" created
daemonset "weave-net" created
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-net" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["networkpolicies"], APIGroups:["extensions"], Verbs:["get"]} PolicyRule{Resources:["networkpolicies"], APIGroups:["extensions"], Verbs:["list"]} PolicyRule{Resources:["networkpolicies"], APIGroups:["extensions"], Verbs:["watch"]}] user=&{kube-admin [system:authenticated] map[]} ownerrules=[] ruleResolutionErrors=[]

bill@rogue:~/vagrant_rogue/rogue-cluster$ kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-10.0.0.111 1/1 Running 1 8h
kube-controller-manager-10.0.0.111 1/1 Running 1 8h
kube-dns-v20-fcl01 0/3 Pending 0 8h
kube-proxy-10.0.0.111 1/1 Running 1 8h
kube-proxy-10.0.0.121 1/1 Running 1 8h
kube-proxy-10.0.0.122 1/1 Running 1 8h
kube-proxy-10.0.0.123 1/1 Running 1 8h
kube-scheduler-10.0.0.111 1/1 Running 1 8h
kubernetes-dashboard-v1.4.1-29zzk 0/1 Pending 0 8h
weave-net-2lplj 1/2 CrashLoopBackOff 3 3m
weave-net-2nbgd 1/2 CrashLoopBackOff 3 3m
weave-net-fdr1v 2/2 Running 0 3m
weave-net-jzv50 1/2 CrashLoopBackOff 3 3m

Deeper investigation of the weave errors indicate that they either (a) cannot connect to the apiserver, or else (b) in the case of the one marked "Running" it's complaining that it cannot connect to itself.

billmilligan commented Jul 24, 2017

I hit this issue, and absolutely nothing I read above worked. So I tried again with a much more controlled setup, switching from Ubuntu to the latest CoreOS, going to an earlier version of k8s to start with, and in general being very anal about every last thing installed into each VM. I am NOT using kubeadm, but instead a combination of vagrant and ansible.

(why? because I had no idea what was going on in kubeadm and figured this way at least I'd have control and be able to bypass any overzealous preflight checks, not to mention feeling like I had more automation control in general, and also not having to worry about the warning to do-not-apply-alpha-software-in-production.)

When I tried this setup with an older (1.4.3) edition of k8s, this approach was golden. I then tried upgrading to 1.71. Once again, I am STILL hitting this same issue in spite of using no kubeadm at all.

I have confirmed that I am running calico in each of my nodes, including the Master and the three potential Workers. ALL of my nodes are reporting as NotReady, so I'm not really sure how I could apply weave (or anything else) to get it running.

This whole thing just seems chicken/egg ... can't allocate a pod because networking is failing, but need networking running to create a network at /etc/cni/net.d in order to be able to allocate pods. And again, all this was working a few hours ago with k8s 1.4.3. I'm very frustrated!

I'd appreciate any insights anybody could give.


Footnotes:

On master: journalctl -r -u kubelet gives me

Jul 24 00:48:16 rogue-kube-master-01 kubelet-wrapper[7647]: E0724 00:48:16.592274 7647 kubelet.go:2136] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is no
Jul 24 00:48:16 rogue-kube-master-01 kubelet-wrapper[7647]: W0724 00:48:16.590588 7647 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d

docker ps | grep calico

(truncated for readability)
cde... quay.io/calico/leader-elector@sha256:... "/run.sh --election=c" 8 hours ago Up 8 hours
f72... calico/kube-policy-controller@sha256:... "/dist/controller" 8 hours ago Up 8 hours
c47... gcr.io/google_containers/pause-amd64:3.0 "/pause" 8 hours ago Up 8 hours

There is no /etc/cni/net.d

From my kubectl box:
kubectl get nodes
10.0.0.111 NotReady,SchedulingDisabled 8h v1.7.1+coreos.0
10.0.0.121 NotReady 8h v1.7.1+coreos.0
10.0.0.122 NotReady 8h v1.7.1+coreos.0
10.0.0.123 NotReady 8h v1.7.1+coreos.0


kubectl apply -f https://git.io/weave-kube-1.6

DID NOT fix anything and in fact only seems to create more problems.

bill@rogue:~/vagrant_rogue/rogue-cluster$ kubectl apply -f https://git.io/weave-kube-1.6
serviceaccount "weave-net" created
clusterrolebinding "weave-net" created
daemonset "weave-net" created
Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-net" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["namespaces"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["networkpolicies"], APIGroups:["extensions"], Verbs:["get"]} PolicyRule{Resources:["networkpolicies"], APIGroups:["extensions"], Verbs:["list"]} PolicyRule{Resources:["networkpolicies"], APIGroups:["extensions"], Verbs:["watch"]}] user=&{kube-admin [system:authenticated] map[]} ownerrules=[] ruleResolutionErrors=[]

bill@rogue:~/vagrant_rogue/rogue-cluster$ kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-10.0.0.111 1/1 Running 1 8h
kube-controller-manager-10.0.0.111 1/1 Running 1 8h
kube-dns-v20-fcl01 0/3 Pending 0 8h
kube-proxy-10.0.0.111 1/1 Running 1 8h
kube-proxy-10.0.0.121 1/1 Running 1 8h
kube-proxy-10.0.0.122 1/1 Running 1 8h
kube-proxy-10.0.0.123 1/1 Running 1 8h
kube-scheduler-10.0.0.111 1/1 Running 1 8h
kubernetes-dashboard-v1.4.1-29zzk 0/1 Pending 0 8h
weave-net-2lplj 1/2 CrashLoopBackOff 3 3m
weave-net-2nbgd 1/2 CrashLoopBackOff 3 3m
weave-net-fdr1v 2/2 Running 0 3m
weave-net-jzv50 1/2 CrashLoopBackOff 3 3m

Deeper investigation of the weave errors indicate that they either (a) cannot connect to the apiserver, or else (b) in the case of the one marked "Running" it's complaining that it cannot connect to itself.

@Paxa

This comment has been minimized.

Show comment
Hide comment
@Paxa

Paxa Jul 24, 2017

@billmilligan Having similar issues, dns stop working few minutes after container started

Paxa commented Jul 24, 2017

@billmilligan Having similar issues, dns stop working few minutes after container started

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Jul 24, 2017

Member

@Paxa @billmilligan If you want to get help, don't comment on this issue. Instead, open new ones in the kubeadm repo with sufficient details requested.

Member

luxas commented Jul 24, 2017

@Paxa @billmilligan If you want to get help, don't comment on this issue. Instead, open new ones in the kubeadm repo with sufficient details requested.

@billmilligan

This comment has been minimized.

Show comment
Hide comment
@billmilligan

billmilligan Jul 24, 2017

@luxas Respectfully, I have to question whether this is a new issue. Since I am getting the exact same result setting up k8s without kubeadm as everyone else is getting with kubeadm, this seems to eliminate kubeadm as the source of the problem. Perhaps this issue ought to be renamed accordingly?

@luxas Respectfully, I have to question whether this is a new issue. Since I am getting the exact same result setting up k8s without kubeadm as everyone else is getting with kubeadm, this seems to eliminate kubeadm as the source of the problem. Perhaps this issue ought to be renamed accordingly?

@kfox1111

This comment has been minimized.

Show comment
Hide comment
@kfox1111

kfox1111 Jul 24, 2017

@billmilligan respectfully, since the issue is about kubeadm, and your able to reproduce it without kubeadm, then it is the wrong place to file it? I think this thread solved the kubeadm issue. This is a new issue. I think it will get more attention as a new issue. As people on this thread think its already solved and are ignoring it.

I use kubeadm and was effected by this issue, and it has been solved since 1.6.1. I've deployed losts of k8s's since, so I really do think you have a separate issue.

@billmilligan respectfully, since the issue is about kubeadm, and your able to reproduce it without kubeadm, then it is the wrong place to file it? I think this thread solved the kubeadm issue. This is a new issue. I think it will get more attention as a new issue. As people on this thread think its already solved and are ignoring it.

I use kubeadm and was effected by this issue, and it has been solved since 1.6.1. I've deployed losts of k8s's since, so I really do think you have a separate issue.

@billmilligan

This comment has been minimized.

Show comment
Hide comment
@billmilligan

billmilligan Jul 24, 2017

@kfox1111 Thanks for the feedback. I'll file a new issue, but the number of people who still seem to be facing it elsewhere in 1.7.x still makes me wonder.

@kfox1111 Thanks for the feedback. I'll file a new issue, but the number of people who still seem to be facing it elsewhere in 1.7.x still makes me wonder.

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Jul 24, 2017

Member

TL;DR;

The error message

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

is NOT necessarily bad.

That error message tells you that you have to plugin in a third party CNI spec implementation provider.

What is CNI and how does integrate with Kubernetes?

CNI stands for Container Network Interface and defines a specification that kubelet uses for creating a network for the cluster. See this page for more information how Kubernetes uses the CNI spec to create a network for the cluster.

Kubernetes doesn't care how the network is created as long as it satisfies the CNI spec.

kubelet is in charge of connecting new Pods to the network (can be an overlay network for instance).
kubelet reads a configuration directory (often /etc/cni/net.d) for CNI networks to use.
When a new Pod is created, the kubelet reads files in the configuration directory, exec's out to the CNI binary specified in the config file (the binary is often in /opt/cni/bin). The binary that will be executed belongs to and is installed by a third-party (like Weave, Flannel, Calico, etc.).

kubeadm is a generic tool to spin up Kubernetes clusters and does not know what networking solution you want and doesn't favor anyone specific. After kubeadm init is run, there is no such CNI binary nor configuration. This means the kubeadm init IS NOT ENOUGH to get a fully working cluster up and running.

This means, that after kubeadm init, the kubelet logs will say

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

this is very much expected. If this wasn't the case, we would have favored a specific network provider.

So how do I "fix" this error?
The next step in the kubeadm getting started guide is "Installing a Pod network".
This means, kubectl apply a manifest from your preferred CNI network provider.

The DaemonSet will copy out the CNI binaries needed to /opt/cni/bin and the needed configuration to /etc/cni/net.d/. Also it will run the actual daemon that sets up the network between the Nodes (by writing iptables rules for instance).

After the CNI provider is installed, the kubelet will notice that "oh I have some information how to set up the network", and will use the 3rd-party configuration and binaries.

And when the network is set up by the 3rd-party provider (by kubelet invoking it), the Node will mark itself Ready.

How is this issue related to kubeadm?

Late in the v1.6 cycle, a PR was merged that changed the way kubelet reported the Ready/NotReady status. In earlier releases, kubelet had always reported Ready status, regardless of whether the CNI network was set up or not. This was actually kind of wrong, and changed to respect the CNI network status. That is, NotReady when CNI was uninitialized and Ready when initialized.

kubeadm in v1.6.0 waited wrongly for the master node to be in the Ready state before proceeding with the rest of the kubeadm init tasks. When the kubelet behavior changed to report NotReady when CNI was uninitialized, kubeadm would wait forever for the Node to get Ready.

THAT WAIT MISCONCEPTION ON THE KUBEADM SIDE IS WHAT THIS ISSUE IS ABOUT

However, we quickly fixed the regression in v1.6.1 and released it some days after v1.6.0.

Please read the retrospective for more information about this, and why v1.6.0 could be released with this flaw.

So, what do you do if you think you see this issue in kubeadm v1.6.1+?

Well, I really think you don't. This issue is about when kubeadm init is deadlocking.
No users or maintainers have seen that in v1.6.1+.

What you WILL see though is

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

after every kubeadm init in all versions above v1.6, but that IS NOT BAD

Anyway, please open a new issue if you see something unexpected with kubeadm

Please do not comment more on this issue. Instead open a new one.

@billmilligan So you only have to kubectl apply a CNI provider's manifest to get your cluster up and running I think

I'm pretty much summarizing what has been said above, but hopefully in a more clear and detailed way.
If you have questions about how CNI work, please refer to the normal support channels like StackOverflow, an issue or Slack.

(Lastly, sorry for that much bold text, but I felt like it was needed to get people's attention.)

Member

luxas commented Jul 24, 2017

TL;DR;

The error message

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

is NOT necessarily bad.

That error message tells you that you have to plugin in a third party CNI spec implementation provider.

What is CNI and how does integrate with Kubernetes?

CNI stands for Container Network Interface and defines a specification that kubelet uses for creating a network for the cluster. See this page for more information how Kubernetes uses the CNI spec to create a network for the cluster.

Kubernetes doesn't care how the network is created as long as it satisfies the CNI spec.

kubelet is in charge of connecting new Pods to the network (can be an overlay network for instance).
kubelet reads a configuration directory (often /etc/cni/net.d) for CNI networks to use.
When a new Pod is created, the kubelet reads files in the configuration directory, exec's out to the CNI binary specified in the config file (the binary is often in /opt/cni/bin). The binary that will be executed belongs to and is installed by a third-party (like Weave, Flannel, Calico, etc.).

kubeadm is a generic tool to spin up Kubernetes clusters and does not know what networking solution you want and doesn't favor anyone specific. After kubeadm init is run, there is no such CNI binary nor configuration. This means the kubeadm init IS NOT ENOUGH to get a fully working cluster up and running.

This means, that after kubeadm init, the kubelet logs will say

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

this is very much expected. If this wasn't the case, we would have favored a specific network provider.

So how do I "fix" this error?
The next step in the kubeadm getting started guide is "Installing a Pod network".
This means, kubectl apply a manifest from your preferred CNI network provider.

The DaemonSet will copy out the CNI binaries needed to /opt/cni/bin and the needed configuration to /etc/cni/net.d/. Also it will run the actual daemon that sets up the network between the Nodes (by writing iptables rules for instance).

After the CNI provider is installed, the kubelet will notice that "oh I have some information how to set up the network", and will use the 3rd-party configuration and binaries.

And when the network is set up by the 3rd-party provider (by kubelet invoking it), the Node will mark itself Ready.

How is this issue related to kubeadm?

Late in the v1.6 cycle, a PR was merged that changed the way kubelet reported the Ready/NotReady status. In earlier releases, kubelet had always reported Ready status, regardless of whether the CNI network was set up or not. This was actually kind of wrong, and changed to respect the CNI network status. That is, NotReady when CNI was uninitialized and Ready when initialized.

kubeadm in v1.6.0 waited wrongly for the master node to be in the Ready state before proceeding with the rest of the kubeadm init tasks. When the kubelet behavior changed to report NotReady when CNI was uninitialized, kubeadm would wait forever for the Node to get Ready.

THAT WAIT MISCONCEPTION ON THE KUBEADM SIDE IS WHAT THIS ISSUE IS ABOUT

However, we quickly fixed the regression in v1.6.1 and released it some days after v1.6.0.

Please read the retrospective for more information about this, and why v1.6.0 could be released with this flaw.

So, what do you do if you think you see this issue in kubeadm v1.6.1+?

Well, I really think you don't. This issue is about when kubeadm init is deadlocking.
No users or maintainers have seen that in v1.6.1+.

What you WILL see though is

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

after every kubeadm init in all versions above v1.6, but that IS NOT BAD

Anyway, please open a new issue if you see something unexpected with kubeadm

Please do not comment more on this issue. Instead open a new one.

@billmilligan So you only have to kubectl apply a CNI provider's manifest to get your cluster up and running I think

I'm pretty much summarizing what has been said above, but hopefully in a more clear and detailed way.
If you have questions about how CNI work, please refer to the normal support channels like StackOverflow, an issue or Slack.

(Lastly, sorry for that much bold text, but I felt like it was needed to get people's attention.)

@kubernetes kubernetes locked and limited conversation to collaborators Jul 24, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.