New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm init stuck on "First node has registered, but is not ready yet" #212

Closed
jimmycuadra opened this Issue Mar 29, 2017 · 52 comments

Comments

Projects
None yet
@jimmycuadra
Copy link
Member

jimmycuadra commented Mar 29, 2017

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): kubeadm

Is this a BUG REPORT or FEATURE REQUEST? (choose one): bug report

Kubernetes version (use kubectl version): 1.6.0

Environment:

  • Cloud provider or hardware configuration: Raspberry Pi 3 Model B
  • OS (e.g. from /etc/os-release): Hypriot 1.4.0 (with Docker manually downgraded to 1.12.6, Hypriot 1.4.0 ships with Docker 17.03.0-ce)
  • Kernel (e.g. uname -a): 4.4.50-hypriotos-v7+
  • Install tools: kubeadm
  • Others:

What happened:

Following the kubeadm getting started guide exactly:

# kubeadm init --apiserver-cert-extra-sans redacted --pod-network-cidr 10.244.0.0/16
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.6.0
[init] Using Authorization mode: RBAC
[preflight] Running pre-flight checks
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [kube-01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local redacted] and IPs [10.96.0.1 10.0.1.101]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 206.956919 seconds
[apiclient] Waiting for at least one node to register and become ready
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet

That last message, "First node has registered, but is not ready yet" repeats infinitely, and kubeadm never finishes. I connected to the master server in another session to see if all the Docker containers were running as expected and they are:

$ docker ps
CONTAINER ID        IMAGE                                                                                                                          COMMAND                  CREATED             STATUS              PORTS               NAMES
54733aa1aae3        gcr.io/google_containers/kube-controller-manager-arm@sha256:22f30303212b276b6868b89c8e92c5fb2cb93641e59c312b254c6cb0fa111b2a   "kube-controller-mana"   10 minutes ago      Up 10 minutes                           k8s_kube-controller-manager_kube-controller-manager-kube-01_kube-system_d44abf63e3ab24853ab86643e0b96d81_0
55b6bf2cc09e        gcr.io/google_containers/etcd-arm@sha256:0ce1dcd85968a3242995dfc168abba2c3bc03d0e3955f52a0b1e79f90039dcf2                      "etcd --listen-client"   11 minutes ago      Up 11 minutes                           k8s_etcd_etcd-kube-01_kube-system_90ab26991bf9ad676a430c7592d08bee_0
bd0dc34d5e77        gcr.io/google_containers/kube-apiserver-arm@sha256:c54b8c609a6633b5397173c763aba0656c6cb2601926cce5a5b4870d58ba67bd            "kube-apiserver --ins"   12 minutes ago      Up 12 minutes                           k8s_kube-apiserver_kube-apiserver-kube-01_kube-system_4d99c225ec157dc715c26b59313aeac8_1
1c4c7b69a3eb        gcr.io/google_containers/kube-scheduler-arm@sha256:827449ef1f3d8c0a54d842af9d6528217ccd2d36cc2b49815d746d41c7302050            "kube-scheduler --kub"   13 minutes ago      Up 13 minutes                           k8s_kube-scheduler_kube-scheduler-kube-01_kube-system_3ef1979df7569495bb727d12ac1a7a6f_0
4fd0635f9439        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_kube-controller-manager-kube-01_kube-system_d44abf63e3ab24853ab86643e0b96d81_0
cfb4a758ad96        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_etcd-kube-01_kube-system_90ab26991bf9ad676a430c7592d08bee_0
a631d8b6c11c        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_kube-scheduler-kube-01_kube-system_3ef1979df7569495bb727d12ac1a7a6f_0
309b62fff122        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_kube-apiserver-kube-01_kube-system_4d99c225ec157dc715c26b59313aeac8_0

I copied the admin kubeconfig to my local machine and used kubectl (1.6.0) to see what was going on with the node kubeadm was claiming was registered:

$ kubectl describe node kube-01
Name:			kube-01
Role:
Labels:			beta.kubernetes.io/arch=arm
			beta.kubernetes.io/os=linux
			kubernetes.io/hostname=kube-01
Annotations:		node.alpha.kubernetes.io/ttl=0
			volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Tue, 28 Mar 2017 22:06:40 -0700
Phase:
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  OutOfDisk 		False 	Tue, 28 Mar 2017 22:17:24 -0700 	Tue, 28 Mar 2017 22:06:40 -0700 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  MemoryPressure 	False 	Tue, 28 Mar 2017 22:17:24 -0700 	Tue, 28 Mar 2017 22:06:40 -0700 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  DiskPressure 		False 	Tue, 28 Mar 2017 22:17:24 -0700 	Tue, 28 Mar 2017 22:06:40 -0700 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  Ready 		False 	Tue, 28 Mar 2017 22:17:24 -0700 	Tue, 28 Mar 2017 22:06:40 -0700 	KubeletNotReady 		runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:		10.0.1.101,10.0.1.101,kube-01
Capacity:
 cpu:		4
 memory:	882632Ki
 pods:		110
Allocatable:
 cpu:		4
 memory:	780232Ki
 pods:		110
System Info:
 Machine ID:			9989a26f06984d6dbadc01770f018e3b
 System UUID:			9989a26f06984d6dbadc01770f018e3b
 Boot ID:			7a77e2e8-dd62-4989-b9e7-0fb52747162a
 Kernel Version:		4.4.50-hypriotos-v7+
 OS Image:			Raspbian GNU/Linux 8 (jessie)
 Operating System:		linux
 Architecture:			arm
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.6.0
 Kube-Proxy Version:		v1.6.0
PodCIDR:			10.244.0.0/24
ExternalID:			kube-01
Non-terminated Pods:		(4 in total)
  Namespace			Name						CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----						------------	----------	---------------	-------------
  kube-system			etcd-kube-01				0 (0%)		0 (0%)		0 (0%)		0 (0%)
  kube-system			kube-apiserver-kube-01			250m (6%)	0 (0%)		0 (0%)		0 (0%)
  kube-system			kube-controller-manager-kube-01		200m (5%)	0 (0%)		0 (0%)		0 (0%)
  kube-system			kube-scheduler-kube-01			100m (2%)	0 (0%)		0 (0%)		0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  550m (13%)	0 (0%)		0 (0%)		0 (0%)
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  14m		14m		1	kubelet, kube-01			Normal		Starting		Starting kubelet.
  14m		10m		55	kubelet, kube-01			Normal		NodeHasSufficientDisk	Node kube-01 status is now: NodeHasSufficientDisk
  14m		10m		55	kubelet, kube-01			Normal		NodeHasSufficientMemory	Node kube-01 status is now: NodeHasSufficientMemory
  14m		10m		55	kubelet, kube-01			Normal		NodeHasNoDiskPressure	Node kube-01 status is now: NodeHasNoDiskPressure

This uncovered the reason the kubelet was not ready:

"runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config"

In my experiments with kubeadm 1.5, CNI was not needed to bring up the master node, so this is surprising. Even the getting started guide suggests that kubeadm init should finish successfully before you move on to deploying a CNI plugin.

Anyway, I deployed flannel using kubectl from my local machine:

$ kubectl apply -f kube-flannel.yml

Where the contents of the file was:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      serviceAccountName: flannel
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.7.0-amd64
        command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      - name: install-cni
        image: quay.io/coreos/flannel:v0.7.0-amd64
        command: [ "/bin/sh", "-c", "set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done" ]
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

But it never scheduled:

$ kubectl describe ds kube-flannel-ds -n kube-system
Name:		kube-flannel-ds
Selector:	app=flannel,tier=node
Node-Selector:	beta.kubernetes.io/arch=amd64
Labels:		app=flannel
		tier=node
Annotations:	kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"extensions/v1beta1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"app":"flannel","tier":"node"},"name":"kube-flannel-ds","n...
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:		app=flannel
			tier=node
  Service Account:	flannel
  Containers:
   kube-flannel:
    Image:	quay.io/coreos/flannel:v0.7.0-amd64
    Port:
    Command:
      /opt/bin/flanneld
      --ip-masq
      --kube-subnet-mgr
    Environment:
      POD_NAME:		 (v1:metadata.name)
      POD_NAMESPACE:	 (v1:metadata.namespace)
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run from run (rw)
   install-cni:
    Image:	quay.io/coreos/flannel:v0.7.0-amd64
    Port:
    Command:
      /bin/sh
      -c
      set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done
    Environment:	<none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
  Volumes:
   run:
    Type:	HostPath (bare host directory volume)
    Path:	/run
   cni:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/cni/net.d
   flannel-cfg:
    Type:	ConfigMap (a volume populated by a ConfigMap)
    Name:	kube-flannel-cfg
    Optional:	false
Events:		<none>

I tried to join one of the other servers anyway, just to see what would happen. I used kubeadm token create to manually create a token that I could use from another machine. On the other machine:

kubeadm join --token $TOKEN 10.0.1.101:6443
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "10.0.1.101:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.1.101:6443"
[discovery] Failed to request cluster info, will try again: [User "system:anonymous" cannot get configmaps in the namespace "kube-public". (get configmaps cluster-info)]
[discovery] Failed to request cluster info, will try again: [User "system:anonymous" cannot get configmaps in the namespace "kube-public". (get configmaps cluster-info)]
[discovery] Failed to request cluster info, will try again: [User "system:anonymous" cannot get configmaps in the namespace "kube-public". (get configmaps cluster-info)]

And the final message repeated forever.

What you expected to happen:

kubeadm init should complete and produce a bootstrap token.

@racingmars

This comment has been minimized.

Copy link

racingmars commented Mar 29, 2017

Exact same thing happening to me on Ubuntu 16.04.02, both GCE and local VMWare installations, Docker version 1.12.6, kernel 4.8.0-44-generic 47~16.04.1-Ubuntu SMP.

The kubelet log shows a warning about missing /etc/cni/net.d before the error that we see in jimmycuadra's report:

Mar 29 04:43:25 instance-1 kubelet[6800]: W0329 04:43:25.763117    6800 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 29 04:43:25 instance-1 kubelet[6800]: E0329 04:43:25.763515    6800 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
@csarora

This comment has been minimized.

Copy link

csarora commented Mar 29, 2017

Same issue on Ubuntu AWS VM. Docker 1.12.5

root@ip-10-43-0-20:~# kubeadm version
kubeadm version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:24:30Z", GoVersion:"go1.7.5"

root@ip-10-43-0-20:~# uname -a
Linux ip-10-43-0-20 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@ip-10-43-0-20:~# kubeadm init --config cfg.yaml
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.6.0
[init] Using Authorization mode: RBAC
[init] WARNING: For cloudprovider integrations to work --cloud-provider must be set for all kubelets in the cluster.
(/etc/systemd/system/kubelet.service.d/10-kubeadm.conf should be edited for this purpose)
[preflight] Running pre-flight checks
[preflight] Starting the kubelet service
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [ip-10-43-0-20 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.43.0.20]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 16.531681 seconds
[apiclient] Waiting for at least one node to register and become ready
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet

@omazilov

This comment has been minimized.

Copy link

omazilov commented Mar 29, 2017

++ the same issue (Ubuntu 16.04.1)

@antoinefinkelstein

This comment has been minimized.

Copy link

antoinefinkelstein commented Mar 29, 2017

Same thing here on Ubuntu 16.04

@rmohr

This comment has been minimized.

Copy link

rmohr commented Mar 29, 2017

On CentOS 7, I downgraded the kubelet to 1.5.4. That solved it for me. It seems like the ready check works different in the 1.6.0 kubelet.

@vascofg

This comment has been minimized.

Copy link

vascofg commented Mar 29, 2017

Same issue on CentOS 7 on bare metal x64 machine, since upgrading to k8s 1.6.0

@lowstz

This comment has been minimized.

Copy link

lowstz commented Mar 29, 2017

Same issue on Ubuntu 16.04

@ctrlaltdel

This comment has been minimized.

Copy link

ctrlaltdel commented Mar 29, 2017

Same issue on Ubuntu 16.04, manually downgrading the kubelet package solved the issue.

# apt install kubelet=1.5.6-00

mcapuccini pushed a commit to kubenow/KubeNow that referenced this issue Mar 29, 2017

@Scukerman

This comment has been minimized.

Copy link

Scukerman commented Mar 29, 2017

@ctrlaltdel it didn't work for me.

@jbeda

This comment has been minimized.

Copy link

jbeda commented Mar 29, 2017

I suspect this is a Kubelet issue. It shouldn't mark node as not ready when CNI is unconfigured. Only pods that require CNI should be marked as not ready.

@kristiandrucker

This comment has been minimized.

Copy link

kristiandrucker commented Mar 29, 2017

@jbeda Do you know when will this issue be resolved?

@jbeda

This comment has been minimized.

Copy link

jbeda commented Mar 29, 2017

@kristiandrucker -- no -- still figuring out what is going on. Need to root cause it first.

@kristiandrucker

This comment has been minimized.

Copy link

kristiandrucker commented Mar 29, 2017

@jbeda Ok, but after the issue will be resolved, then what? Rebuild kubelet from source?

@jbeda

This comment has been minimized.

Copy link

jbeda commented Mar 29, 2017

@kristiandrucker This'll have to go out in a point release of k8s if it is a kubelet issue.

I suspect that kubernetes/kubernetes#43474 is the root cause. Going to file a bug and follow up with the network people.

@dcbw You around?

@dcbw

This comment has been minimized.

Copy link
Member

dcbw commented Mar 29, 2017

Looks like the issue is that a DaemonSet is not scheduled to nodes that have the NetworkReady:false condition, because the checks for scheduling pods are not fine-grained enough. We need to fix that; a pod that is hostNetwork:true should be scheduled on a node that is NetworkReady:false, but a hostNetwork:false pod should not.

As a workaround, does adding the scheduler.alpha.kubernetes.io/critical-pod annotation on your DaemonSet make things work again?

@kargakis

This comment has been minimized.

Copy link
Member

kargakis commented Mar 29, 2017

@janetkuo @lukaszo can you triage the DS behavior?

@dewet22

This comment has been minimized.

Copy link

dewet22 commented Mar 29, 2017

There is also an ongoing discussion in #sig-network on slack, btw.

@prapdm

This comment has been minimized.

Copy link

prapdm commented Mar 29, 2017

Same issue CentOS 7 x64

@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Mar 29, 2017

@prapdm this appears to undefended of what distro you are running.

@prapdm

This comment has been minimized.

Copy link

prapdm commented Mar 29, 2017

CentOS Linux release 7.3.1611 (Core)

@lukaszo

This comment has been minimized.

Copy link
Member

lukaszo commented Mar 29, 2017

I've tried it on one node with Ubuntu 16.04. It hangs with the "not ready yet" message. I also manually created flannel DaemonSet but in my case it scheduled one pod without any problem. The daemon pod itself went in to the CrashLoopBackOff with error: E0329 22:57:03.065651 1 main.go:127] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-z3xgn': the server does not allow access to the requested resource (get pods kube-flannel-ds-z3xgn)

I will try on Centos also but I don't think that DaemonSet is to blame here, kubeadm hangs here.

@mikedanese

This comment has been minimized.

Copy link
Member

mikedanese commented Mar 29, 2017

that is an rbac permission error.

@MaximF

This comment has been minimized.

Copy link

MaximF commented Apr 4, 2017

I think because a root issue is solved and related ticket is closed, we should close this one as well :)

@m4r10k

This comment has been minimized.

Copy link

m4r10k commented Apr 4, 2017

Just for information: It is working for me now with the updated packages under Ubuntu 16.04.

@jimmycuadra

This comment has been minimized.

Copy link
Member Author

jimmycuadra commented Apr 4, 2017

1.6.1 works for me! Thanks to everyone that helped get this fix out!

@jimmycuadra jimmycuadra closed this Apr 4, 2017

@eastcirclek

This comment has been minimized.

Copy link

eastcirclek commented Apr 4, 2017

I successfully setup my Kubernetes cluster on centos-release-7-3.1611.el7.centos.x86_64 by taking the following steps (I assume Docker is already installed):

  1. (from /etc/yum.repo.d/kubernetes.repo) baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64-unstable
    => To use the unstable repository for the latest Kubernetes 1.6.1
  2. yum install -y kubelet kubeadm kubectl kubernetes-cni
  3. (/etc/systemd/system/kubelet.service.d/10-kubeadm.conf) add "--cgroup-driver=systemd" at the end of the last line.
    => This is because Docker uses systemd for cgroup-driver while kubelet uses cgroupfs for cgroup-driver.
  4. systemctl enable kubelet && systemctl start kubelet
  5. kubeadm init --pod-network-cidr 10.244.0.0/16
    => If you used to add --api-advertise-addresses, you need to use --apiserver-advertise-address instead.
  6. cp /etc/kubernetes/admin.conf $HOME/
    sudo chown $(id -u):$(id -g) $HOME/admin.conf
    export KUBECONFIG=$HOME/admin.conf
    => Without this step, you might get an error with kubectl get
    => I didn't do it with 1.5.2
  7. kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
    => 1.6.0 introduces a role-based access control so you should add a ClusterRole and a ClusterRoleBinding before creating a Flannel daemonset
  8. kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
    => Create a Flannel daemonset
  9. (on every slave node) kubeadm join --token (your token) (ip):(port)
    => as shown in the result of kubeadm init

All the above steps are a result of combining suggestions from various issues around Kubernetes-1.6.0, especially kubeadm.

Hope this will save your time.

@xilu0

This comment has been minimized.

Copy link

xilu0 commented Apr 4, 2017

@eastcirclek @Sliim You are great

rmohr added a commit to rmohr/kubevirt that referenced this issue Apr 4, 2017

@jralmaraz

This comment has been minimized.

Copy link

jralmaraz commented Apr 4, 2017

@eastcirclek this were the exact steps that I have just executed by querying several forums too. A timezone difference, maybe? Thanks everyone, this topic was really helpful.

@overip

This comment has been minimized.

Copy link

overip commented Apr 5, 2017

I have Ubuntu 16.04 server on AWS and followed the steps

  1. edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and remove $KUBELET_NETWORK_ARGS
  2. kubeadm reset to clean up previous attempt to start it
  3. kubeadm init --token= --apiserver-advertise-address=

which apparently worked correctly, but then when I try to install Calico as network plugin, I get the following error
The connection to the server localhost:8080 was refused - did you specify the right host or port?

Is the k8s team working on a patch?

Thanks

@jimmycuadra

This comment has been minimized.

Copy link
Member Author

jimmycuadra commented Apr 5, 2017

@overip I don't think any patch is required for that... You just need to specify the right kubeconfig file when using kubectl. kubeadm should have written it to /etc/kubernetes/admin.conf.

@overip

This comment has been minimized.

Copy link

overip commented Apr 5, 2017

@jimmycuadra could you please explain the steps to do that?

@jimmycuadra

This comment has been minimized.

Copy link
Member Author

jimmycuadra commented Apr 5, 2017

@overip The output of kubeadm init have the instructions:

To start using your cluster, you need to run (as a regular user):

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

Personally, I prefer to copy the file to $HOME/.kube/config, which is where kubectl will look for it by default. Then you don't need to set the KUBECONFIG environment variable.

If you are planning to use kubectl from your local machine, you can use scp (or even just copy paste the contents) to write it to ~/.kube/config on your own computer.

Search for "admin.conf" in this GitHub issue for more details. It's been mentioned a few times.

admiyo added a commit to admiyo/kubevirt that referenced this issue Apr 7, 2017

@ReSearchITEng

This comment has been minimized.

Copy link

ReSearchITEng commented Apr 13, 2017

@eastcirclek - followed the steps, but for some reason the nodes are not able to install flannel properly.
(Note: on master everything is smooth.)

Apr 13 22:31:11 node2 kubelet[22893]: I0413 22:31:11.666206   22893 kuberuntime_manager.go:458] Container {Name:install-cni Image:quay.io/coreos/flannel:v0.7.0-amd64 Command:[/bin/sh -c set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:cni ReadOnly:false MountPath:/etc/cni/net.d SubPath:} {Name:flannel-cfg ReadOnly:false MountPath:/etc/kube-flannel/ SubPath:} {Name:flannel-token-g65nf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr 13 22:31:11 node2 kubelet[22893]: I0413 22:31:11.666280   22893 kuberuntime_manager.go:742] checking backoff for container "install-cni" in pod "kube-flannel-ds-3smf7_kube-system(2e6ad0f9-207f-11e7-8f34-0050569120ff)"
Apr 13 22:31:12 node2 kubelet[22893]: I0413 22:31:12.846325   22893 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/configmap/2e6ad0f9-207f-11e7-8f34-0050569120ff-flannel-cfg" (spec.Name: "flannel-cfg") pod "2e6ad0f9-207f-11e7-8f34-0050569120ff" (UID: "2e6ad0f9-207f-11e7-8f34-0050569120ff").
Apr 13 22:31:12 node2 kubelet[22893]: I0413 22:31:12.846373   22893 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/2e6ad0f9-207f-11e7-8f34-0050569120ff-flannel-token-g65nf" (spec.Name: "flannel-token-g65nf") pod "2e6ad0f9-207f-11e7-8f34-0050569120ff" (UID: "2e6ad0f9-207f-11e7-8f34-0050569120ff").
@luckyfengyong

This comment has been minimized.

Copy link

luckyfengyong commented Apr 14, 2017

Just share my workaround method. Firstly $KUBELET_NETWORK_ARGS is required, otherwise CNI is not enabled/configured. Removing and then restoring $KUBELET_NETWORK_ARGS seems too complicated.
When kubeadm init shows "[apiclient] First node has registered, but is not ready yet", the k8s cluster actually is ready to serve request. At that time, user could simply move to step 3/4 of https://kubernetes.io/docs/getting-started-guides/kubeadm/ as follows.

To start using your cluster, you need to run (as a regular user):

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:  http://kubernetes.io/docs/admin/addons/

When a user installs the podnetwork, please make sure the serviceaccount of podnetwork policy is granted enough permission. Taking flannel as an example. I just bind cluster-admin role to service account of flannel as follows. It may not be ideal, and you could define a specific role for flannel serviceacount. BTW, when a user deploy other addon service like dashboard, it also requires to grant enough permission to the related serviceaccount.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: flannel:daemonset
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: flannel
  namespace:  kube-system

After the podnetwork server is ready, kubeadm init will shows the node is ready, and the the user could continue with the instrution.

@kargakis

This comment has been minimized.

Copy link
Member

kargakis commented Apr 14, 2017

Taking flannel as an example. I just bind cluster-admin role to service account of flannel as follows. It may not be ideal, and you could define a specific role for flannel serviceacount.

There is https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml already

@ReSearchITEng

This comment has been minimized.

Copy link

ReSearchITEng commented Apr 14, 2017

Thanks all for help.
Finally fully working k8s 1.6.1 with flannel. Everything is now in ansible playbooks.
Tested on Centos/RHEL. Preparations started for Debian based also (e.g. Ubuntu), but there might needs some refining.

https://github.com/ReSearchITEng/kubeadm-playbook/blob/master/README.md

PS: work based on sjenning/kubeadm-playbook - Many thanks @sjenning

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Jun 6, 2017

Kubernetes Submit Queue
Merge pull request #44125 from amacneil/kubeadm-instructions
Automatic merge from submit-queue

kubeadm: improve quickstart instructions

**What this PR does / why we need it**:

Improves instructional output following setup of a kubernetes master with kubeadm.

This helps prevent unnecessary support overhead such as: kubernetes/kubeadm#212 (comment)

**Example current output**:

```
To start using your cluster, you need to run (as a regular user):

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf
```

**Example new output**:

```
To start using your cluster, you need to run (as a regular user):

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
```

**Release note**:

```release-note
NONE
```
@joaquin386

This comment has been minimized.

Copy link

joaquin386 commented May 10, 2018

Geeting this for joining into a cluster:
[discovery] Created cluster-info discovery client, requesting info from "https://10.100.2.158:6443"
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get configmaps in the namespace "kube-public"]
[discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get configmaps in the namespace "kube-public"]

I started the node as SelfHosting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment