New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm init error marking master: timed out waiting for the condition #1092

Closed
heng-Yuan opened this Issue Sep 4, 2018 · 16 comments

Comments

Projects
None yet
7 participants
@heng-Yuan
Copy link

heng-Yuan commented Sep 4, 2018

Versions

kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

  • OS (e.g. from /etc/os-release):
    CentOS 7.1

  • Kernel (e.g. uname -a):
    Linux master1 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

  • Docker
    Docker version 17.03.1-ce, build c6d412e

What happened?

When i used kubeadm init to creating a single cluster, it ended up with an error as follow.

[root@master1 kubeadm]# kubeadm init --apiserver-advertise-address=172.16.6.64 --kubernetes-version=v1.11.1 --pod-network-cidr=192.168.0.0/16
[init] using Kubernetes version: v1.11.1
[preflight] running pre-flight checks
I0904 14:29:33.474299   28529 kernel_validator.go:81] Validating kernel version
I0904 14:29:33.474529   28529 kernel_validator.go:96] Validating kernel config
[preflight/images] Pulling images required for setting up a Kubernetes cluster
[preflight/images] This might take a minute or two, depending on the speed of your internet connection
[preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[preflight] Activating the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.16.6.64]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [master1 localhost] and IPs [127.0.0.1 ::1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [master1 localhost] and IPs [172.16.6.64 127.0.0.1 ::1]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" 
[init] this might take a minute or longer if the control plane images have to be pulled
[apiclient] All control plane components are healthy after 23.503472 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.11" in namespace kube-system with the configuration for the kubelets in the cluster
[markmaster] Marking the node master1 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node master1 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
error marking master: timed out waiting for the condition

But, all docker containers were work fine.

[root@master1 kubeadm]# docker ps 
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
53886ee1db02        272b3a60cd68           "kube-scheduler --..."   5 minutes ago       Up 5 minutes                            k8s_kube-scheduler_kube-schedu
05f9e74cb1ae        b8df3b177be2           "etcd --advertise-..."   5 minutes ago       Up 5 minutes                            k8s_etcd_etcd-master1_kube-sys
ac00773b050d        52096ee87d0e           "kube-controller-m..."   5 minutes ago       Up 5 minutes                            k8s_kube-controller-manager_ku
ebeae2ea255b        816332bd9d11           "kube-apiserver --..."   5 minutes ago       Up 5 minutes                            k8s_kube-apiserver_kube-apiser
74a0d0b1346e        k8s.gcr.io/pause:3.1   "/pause"                 5 minutes ago       Up 5 minutes                            k8s_POD_etcd-master1_kube-syst
b693b16e39cc        k8s.gcr.io/pause:3.1   "/pause"                 5 minutes ago       Up 5 minutes                            k8s_POD_kube-scheduler-master1
0ce92c0afa62        k8s.gcr.io/pause:3.1   "/pause"                 5 minutes ago       Up 5 minutes                            k8s_POD_kube-controller-manage
c43f05f27c01        k8s.gcr.io/pause:3.1   "/pause"                 5 minutes ago       Up 5 minutes                            k8s_POD_kube-apiserver-master

And, it's so weird that when i added --dry-run option it work out.

[root@master1 kubeadm]# kubeadm init --apiserver-advertise-address 172.16.6.64 --pod-network-cidr=192.168.0.0/16 --node-name=master1 --dry-run --kubernetes-version=v1.11.1
[init] using Kubernetes version: v1.11.1
[preflight] running pre-flight checks
I0904 16:07:56.101221   23703 kernel_validator.go:81] Validating kernel version
I0904 16:07:56.101565   23703 kernel_validator.go:96] Validating kernel config
[preflight/images] Would pull the required images (like 'kubeadm config images pull')
[kubelet] Writing kubelet environment file with flags to file "/tmp/kubeadm-init-dryrun016982898/kubeadm-flags.env"
[kubelet] Writing kubelet configuration to file "/tmp/kubeadm-init-dryrun016982898/config.yaml"
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.16.6.64]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [master1 localhost] and IPs [127.0.0.1 ::1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [master1 localhost] and IPs [172.16.6.64 127.0.0.1 ::1]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] valid certificates and keys now exist in "/tmp/kubeadm-init-dryrun016982898"
[kubeconfig] Wrote KubeConfig file to disk: "/tmp/kubeadm-init-dryrun016982898/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/tmp/kubeadm-init-dryrun016982898/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/tmp/kubeadm-init-dryrun016982898/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/tmp/kubeadm-init-dryrun016982898/scheduler.conf"
[controlplane] wrote Static Pod manifest for component kube-apiserver to "/tmp/kubeadm-init-dryrun016982898/kube-apiserver.yaml"
[controlplane] wrote Static Pod manifest for component kube-controller-manager to "/tmp/kubeadm-init-dryrun016982898/kube-controller-manager.yaml"
[controlplane] wrote Static Pod manifest for component kube-scheduler to "/tmp/kubeadm-init-dryrun016982898/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/tmp/kubeadm-init-dryrun016982898/etcd.yaml"
[dryrun] wrote certificates, kubeconfig files and control plane manifests to the "/tmp/kubeadm-init-dryrun016982898" directory
[dryrun] the certificates or kubeconfig files would not be printed due to their sensitive nature
[dryrun] please examine the "/tmp/kubeadm-init-dryrun016982898" directory for details about what would be written
[dryrun] Would write file "/etc/kubernetes/manifests/kube-apiserver.yaml" with content:
...

[markmaster] Marking the node master1 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node master1 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "master1"
[dryrun] Would perform action PATCH on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "master1"
[dryrun] Attached patch:
	{"metadata":{"labels":{"node-role.kubernetes.io/master":""}},"spec":{"taints":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"}]}}
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master1" as an annotation
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "master1"
[dryrun] Would perform action PATCH on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "master1"
[dryrun] Attached patch:
	{"metadata":{"annotations":{"kubeadm.alpha.kubernetes.io/cri-socket":"/var/run/dockershim.sock"}}}
[bootstraptoken] using token: 3gvy0t.amka3xc9u1oljlla
[dryrun] Would perform action GET on resource "secrets" in API group "core/v1"
[dryrun] Resource name: "bootstrap-token-3gvy0t"
[dryrun] Would perform action CREATE on resource "secrets" in API group "core/v1"
[dryrun] Attached object:
	apiVersion: v1
	data:
	  auth-extra-groups: c3lzdGVtOmJvb3RzdHJhcHBlcnM6a3ViZWFkbTpkZWZhdWx0LW5vZGUtdG9rZW4=
	  description: VGhlIGRlZmF1bHQgYm9vdHN0cmFwIHRva2VuIGdlbmVyYXRlZCBieSAna3ViZWFkbSBpbml0Jy4=
	  expiration: MjAxOC0wOS0wNVQxNjowODowNSswODowMA==
	  token-id: M2d2eTB0
	  token-secret: YW1rYTN4Yzl1MW9samxsYQ==
	  usage-bootstrap-authentication: dHJ1ZQ==
	  usage-bootstrap-signing: dHJ1ZQ==
	kind: Secret
	metadata:
	  creationTimestamp: null
	  name: bootstrap-token-3gvy0t
	  namespace: kube-system
	type: bootstrap.kubernetes.io/token
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[dryrun] Would perform action CREATE on resource "clusterrolebindings" in API group "rbac.authorization.k8s.io/v1"
[dryrun] Attached object:
	apiVersion: rbac.authorization.k8s.io/v1
	kind: ClusterRoleBinding
	metadata:
	  creationTimestamp: null
	  name: kubeadm:kubelet-bootstrap
	roleRef:
	  apiGroup: rbac.authorization.k8s.io
	  kind: ClusterRole
	  name: system:node-bootstrapper
	subjects:
	- kind: Group
	  name: system:bootstrappers:kubeadm:default-node-token
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[dryrun] Would perform action CREATE on resource "clusterrolebindings" in API group "rbac.authorization.k8s.io/v1"
[dryrun] Attached object:
	apiVersion: rbac.authorization.k8s.io/v1
	kind: ClusterRoleBinding
	metadata:
	  creationTimestamp: null
	  name: kubeadm:node-autoapprove-bootstrap
	roleRef:
	  apiGroup: rbac.authorization.k8s.io
	  kind: ClusterRole
	  name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
	subjects:
	- kind: Group
	  name: system:bootstrappers:kubeadm:default-node-token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[dryrun] Would perform action CREATE on resource "clusterrolebindings" in API group "rbac.authorization.k8s.io/v1"
[dryrun] Attached object:
	apiVersion: rbac.authorization.k8s.io/v1
	kind: ClusterRoleBinding
	metadata:
	  creationTimestamp: null
	  name: kubeadm:node-autoapprove-certificate-rotation
	roleRef:
	  apiGroup: rbac.authorization.k8s.io
	  kind: ClusterRole
	  name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
	subjects:
	- kind: Group
	  name: system:nodes
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
...

[dryrun] Attached object:
	apiVersion: rbac.authorization.k8s.io/v1
	kind: ClusterRoleBinding
	metadata:
	  creationTimestamp: null
	  name: kubeadm:node-proxier
	roleRef:
	  apiGroup: rbac.authorization.k8s.io
	  kind: ClusterRole
	  name: system:node-proxier
	subjects:
	- kind: ServiceAccount
	  name: kube-proxy
	  namespace: kube-system
[addons] Applied essential addon: kube-proxy
[dryrun] finished dry-running successfully. Above are the resources that would be created

What you expected to happen?

How could i solve this problem, and create a single master cluster with kubeadm.

@rosti

This comment has been minimized.

Copy link
Member

rosti commented Sep 4, 2018

Hi @heng-Yuan and thanks for filing this issue!

Can you check the state and logs of kubelet and the API server container (of course you can filter out any information you deem sensitive):

systemctl status kubelet
journalctl -xeu kubelet
docker logs ebeae2ea255b

Note that ebeae2ea255b is your API server container ID.

@heng-Yuan

This comment has been minimized.

Copy link
Author

heng-Yuan commented Sep 5, 2018

@rosti Thanks for your reply sincerely. I have reseted the master using kubeadm reset, and reinitialized it. Nevertheless it still had above errors . And I had checked the state and logs as you methioned.

[root@master1 kubeadm]# systemctl status kubelet.service -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf, 20-etcd-service-manager.conf
   Active: active (running) since Tue 2018-09-04 16:47:14 CST; 3min 3s ago
     Docs: http://kubernetes.io/docs/
 Main PID: 32505 (kubelet)
   Memory: 43.0M
   CGroup: /system.slice/kubelet.service
           └─32505 /usr/bin/kubelet --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true

Sep 04 16:49:26 master1 kubelet[32505]: I0904 16:49:26.817311   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:49:36 master1 kubelet[32505]: I0904 16:49:36.850186   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:49:37 master1 kubelet[32505]: I0904 16:49:37.053103   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:49:46 master1 kubelet[32505]: I0904 16:49:46.880508   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:49:56 master1 kubelet[32505]: I0904 16:49:56.910928   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:06 master1 kubelet[32505]: I0904 16:50:06.941318   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:09 master1 kubelet[32505]: I0904 16:50:09.053222   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:09 master1 kubelet[32505]: I0904 16:50:09.053483   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:10 master1 kubelet[32505]: I0904 16:50:10.053315   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Sep 04 16:50:16 master1 kubelet[32505]: I0904 16:50:16.979911   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach

docker logs API server output lots of TLS handshake errorlike this:

[root@master1 kubeadm]# docker logs 23bb9ca0598b
I0905 01:06:56.928270       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37562: read tcp 172.16.6.64:6443->172.16.6.65:37562: read: connection reset by peer
I0905 01:07:01.930357       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37565: read tcp 172.16.6.64:6443->172.16.6.65:37565: read: connection reset by peer
I0905 01:07:06.931092       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37568: read tcp 172.16.6.64:6443->172.16.6.65:37568: read: connection reset by peer
I0905 01:07:11.932974       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37571: read tcp 172.16.6.64:6443->172.16.6.65:37571: read: connection reset by peer

172.16.6.64 is where I execute the command of kubeadm init server, and 172.16.6.65 is another server which i intend to use it as node.

And from my docker log , I also got some error logs as following:

[root@master1 ~]# journalctl -u docker.service -f
-- Logs begin at Mon 2018-09-03 04:20:53 CST. --
Sep 04 16:47:14 master1 dockerd[28979]: time="2018-09-04T16:47:14.583834446+08:00" level=error msg="Handler for GET /v1.27/containers/k8s.gcr.io/pause:3.1/json returned error: No such container: k8s.gcr.io/pause:3.1"
Sep 04 16:47:14 master1 dockerd[28979]: time="2018-09-04T16:47:14.617667342+08:00" level=error msg="Handler for GET /v1.27/containers/k8s.gcr.io/etcd-amd64:3.2.18/json returned error: No such container: k8s.gcr.io/etcd-amd64:3.2.18"
Sep 04 16:47:14 master1 dockerd[28979]: time="2018-09-04T16:47:14.652892678+08:00" level=error msg="Handler for GET /v1.27/containers/k8s.gcr.io/coredns:1.1.3/json returned error: No such container: k8s.gcr.io/coredns:1.1.3"
Sep 04 16:47:16 master1 dockerd[28979]: time="2018-09-04T16:47:16.198158654+08:00" level=error msg="Handler for GET /containers/13430f7e8177925ec6f51b5881f9e27cae98868256c83653be03e8dc6467bf18/json returned error: No such container: 13430f7e8177925ec6f51b5881f9e27cae98868256c83653be03e8dc6467bf18"
Sep 04 16:47:24 master1 dockerd[28979]: time="2018-09-04T16:47:24.729158296+08:00" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container 23bb9ca0598bd9183e0a289cfd128367f261f2673e93b675a530e9a66ff4bc37"
Sep 04 16:47:24 master1 dockerd[28979]: time="2018-09-04T16:47:24.953014165+08:00" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container 30ba9554ad122c5317e54ba11d6e4b44ca50fe5bb497716a93f7a2fb85b9c808"
Sep 04 16:47:25 master1 dockerd[28979]: time="2018-09-04T16:47:25.754823422+08:00" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container d27dde76000f2261051ff29277658c09471810cae72802087554bef12b54bc1e"
Sep 04 16:47:26 master1 dockerd[28979]: time="2018-09-04T16:47:26.215033314+08:00" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container 45c934c67f9c555c8e3dbe3311f01766485d9a09859d96798e744a5830545b23"
Sep 04 16:55:12 master1 dockerd[28979]: time="2018-09-04T16:55:12.034057676+08:00" level=error msg="Error setting up exec command in container http:: No such container: http:"
Sep 04 16:55:12 master1 dockerd[28979]: time="2018-09-04T16:55:12.034173274+08:00" level=error msg="Handler for POST /v1.27/containers/http:/exec returned error: No such container: http:"

However, all of those image are here locally.

[root@master1 kubeadm]# docker image ls
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-proxy-amd64                v1.11.1             d5c25579d0ff        7 weeks ago         97.8 MB
k8s.gcr.io/kube-apiserver-amd64            v1.11.1             816332bd9d11        7 weeks ago         187 MB
k8s.gcr.io/kube-controller-manager-amd64   v1.11.1             52096ee87d0e        7 weeks ago         155 MB
k8s.gcr.io/kube-scheduler-amd64            v1.11.1             272b3a60cd68        7 weeks ago         56.8 MB
k8s.gcr.io/coredns                         1.1.3               b3b94275d97c        3 months ago        45.6 MB
k8s.gcr.io/etcd-amd64                      3.2.18              b8df3b177be2        4 months ago        219 MB
k8s.gcr.io/pause                           3.1                 da86e6ba6ca1        8 months ago        742 kB
@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Sep 5, 2018

best to also include:
journalctl -xeu kubelet

@heng-Yuan

This comment has been minimized.

Copy link
Author

heng-Yuan commented Sep 5, 2018

@neolit123 Thanks . As shown above of systemctl status kubelet.service -l , journalctl -xeu kubelet output Setting node annotation to enable volume controller attach/detach all the time .

[root@master1 kubeadm]# journalctl -xeu kubelet
Sep 05 10:25:30 master1 kubelet[32505]: I0905 10:25:30.053407   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:25:39 master1 kubelet[32505]: I0905 10:25:39.537799   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:25:49 master1 kubelet[32505]: I0905 10:25:49.575277   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:25:58 master1 kubelet[32505]: I0905 10:25:58.057656   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:25:59 master1 kubelet[32505]: I0905 10:25:59.613457   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:09 master1 kubelet[32505]: I0905 10:26:09.641498   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:19 master1 kubelet[32505]: I0905 10:26:19.674790   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:29 master1 kubelet[32505]: I0905 10:26:29.716115   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:39 master1 kubelet[32505]: I0905 10:26:39.749659   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:43 master1 kubelet[32505]: I0905 10:26:43.053217   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
Sep 05 10:26:44 master1 kubelet[32505]: I0905 10:26:44.053360   32505 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/
@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Sep 5, 2018

are you sure you can advertise the API server on 172.16.6.64?
please check the connectivity to the address.

I0905 01:06:56.928270       1 logs.go:49] http: TLS handshake error from 172.16.6.65:37562: read tcp 172.16.6.64:6443->172.16.6.65:37562: read: connection reset by peer

these seem to be logs from another time.

@heng-Yuan

This comment has been minimized.

Copy link
Author

heng-Yuan commented Sep 5, 2018

@neolit123 Yes, I can connect to this address from another node(172.16.6.71).

[root@node1 ~]#  ping -c 2 172.16.6.64
PING 172.16.6.64 (172.16.6.64) 56(84) bytes of data.
64 bytes from 172.16.6.64: icmp_seq=1 ttl=64 time=0.473 ms
64 bytes from 172.16.6.64: icmp_seq=2 ttl=64 time=0.346 ms

--- 172.16.6.64 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.346/0.409/0.473/0.066 ms

And this TLS handshake error output all the time after apiserver container created.

Furthermore, I can use another server (172.16.6.65) to initialize master with the same config .

[root@master kubernetes]# kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=172.16.6.65 --kubernetes-version=v1.11.1 --node-name=master.XXX.com 
[init] using Kubernetes version: v1.11.1
[preflight] running pre-flight checks
I0905 10:40:52.988876   16823 kernel_validator.go:81] Validating kernel version
I0905 10:40:52.989249   16823 kernel_validator.go:96] Validating kernel config

...
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!
...
You can now join any number of machines by running the following on each node
as root:

  kubeadm join 172.16.6.65:6443 --token okt9xh.s12faifwcsXXXXXX --discovery-token-ca-cert-hash sha256:86b132ac50ffc055dacca29f86077d5fc09c5b6eb26f51696740a5d309b08351

Therefore, I think there is something I have overlooked.

@heng-Yuan

This comment has been minimized.

Copy link
Author

heng-Yuan commented Sep 5, 2018

@neolit123 Hi neolit, I use this server(172.16.6.64) as a worker node, to join into the master , it also output an error timed out waiting for the condition .

[root@node2 ~]# kubeadm join 172.16.6.65:6443 --token okt9xh.s12faifwcsqa1ly3 --discovery-token-ca-cert-hash sha256:86b132ac50ffc055dacca29f86077d5fc09c5b6eb26f51696740a5d309b08351
[preflight] running pre-flight checks
	[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

I0905 11:48:51.819871   32477 kernel_validator.go:81] Validating kernel version
I0905 11:48:51.820073   32477 kernel_validator.go:96] Validating kernel config
[discovery] Trying to connect to API Server "172.16.6.65:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.16.6.65:6443"
[discovery] Requesting info from "https://172.16.6.65:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "172.16.6.65:6443"
[discovery] Successfully established connection with API Server "172.16.6.65:6443"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'
timed out waiting for the condition

And ,the kubelet logs showing:

[root@node2 ~]# journalctl -xeu kubelet
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057056   32577 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057114   32577 status_manager.go:148] Kubernetes client is nil, not starting status manager.
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057155   32577 kubelet.go:1758] Starting kubelet main sync loop.
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057214   32577 kubelet.go:1775] skipping pod synchronization - [container runtime is down PLEG is not he
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057256   32577 volume_manager.go:247] Starting Kubelet Volume Manager
Sep 05 11:48:53 node2 kubelet[32577]: E0905 11:48:53.057444   32577 kubelet.go:1261] Image garbage collection failed once. Stats initialization may not have 
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057534   32577 server.go:302] Adding debug handlers to kubelet server.
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.057814   32577 desired_state_of_world_populator.go:130] Desired state populator starts to run
Sep 05 11:48:53 node2 kubelet[32577]: E0905 11:48:53.095989   32577 factory.go:340] devicemapper filesystem stats will not be reported: RHEL/Centos 7.x kerne
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.158177   32577 kubelet.go:1775] skipping pod synchronization - [container runtime is down]
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.294942   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.298652   32577 cpu_manager.go:155] [cpumanager] starting with none policy
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.298700   32577 cpu_manager.go:156] [cpumanager] reconciling every 10s
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.298732   32577 policy_none.go:42] [cpumanager] none policy: Start
Sep 05 11:48:53 node2 kubelet[32577]: Starting Device Plugin manager
Sep 05 11:48:53 node2 kubelet[32577]: W0905 11:48:53.299950   32577 manager.go:496] Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoi
Sep 05 11:48:53 node2 kubelet[32577]: I0905 11:48:53.300381   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:03 node2 kubelet[32577]: I0905 11:49:03.324526   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:13 node2 kubelet[32577]: I0905 11:49:13.014114   32577 reconciler.go:154] Reconciler: start to sync state
Sep 05 11:49:13 node2 kubelet[32577]: I0905 11:49:13.354442   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:23 node2 kubelet[32577]: I0905 11:49:23.385356   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:33 node2 kubelet[32577]: I0905 11:49:33.415169   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:43 node2 kubelet[32577]: I0905 11:49:43.445193   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:49:53 node2 kubelet[32577]: I0905 11:49:53.478646   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:03 node2 kubelet[32577]: I0905 11:50:03.517594   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:13 node2 kubelet[32577]: I0905 11:50:13.542841   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:23 node2 kubelet[32577]: I0905 11:50:23.578388   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:33 node2 kubelet[32577]: I0905 11:50:33.605220   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:43 node2 kubelet[32577]: I0905 11:50:43.629862   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:50:53 node2 kubelet[32577]: I0905 11:50:53.668571   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de
Sep 05 11:51:03 node2 kubelet[32577]: I0905 11:51:03.701641   32577 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/de

@rosti

This comment has been minimized.

Copy link
Member

rosti commented Sep 5, 2018

Hi @heng-Yuan , I think, that the TLS handshake error is caused by the use of FQDN for the --node-name parameter to kubeadm init. Can you reset your cluster and try to specify a simple host name to that?

You can also try to supply the host name via --apiserver-cert-extra-sans if you want to try to keep the FQDN for --node-name.

@heng-Yuan

This comment has been minimized.

Copy link
Author

heng-Yuan commented Sep 5, 2018

@rosti , I have noticed this issue , and have used simple host name to init the master, but it's also had this problem .

kubernetes/kubernetes#64312

@rosti

This comment has been minimized.

Copy link
Member

rosti commented Sep 5, 2018

@heng-Yuan can you verify that you have forwarding enabled?

cat /proc/sys/net/ipv4/ip_forward
@timothysc

This comment has been minimized.

Copy link
Member

timothysc commented Sep 5, 2018

@heng-Yuan - I'd make certain SELinux is disabled FWIW.

@heng-Yuan

This comment has been minimized.

Copy link
Author

heng-Yuan commented Sep 6, 2018

@rosti Yes, I have checked it ,and forwarding has already enabled .

[root@master1 ~]# cat /proc/sys/net/ipv4/ip_forward
1

@timothysc Also, SElinux was disabled .

[root@master1 ~]# getenforce
Disabled
@zt706

This comment has been minimized.

Copy link

zt706 commented Sep 12, 2018

$ kubeadm reset
$ ifconfig cni0 down && ip link delete cni0
$ ifconfig flannel.1 down && ip link delete flannel.1
$ rm -rf /var/lib/cni/

good luck!

@timothysc

This comment has been minimized.

Copy link
Member

timothysc commented Oct 26, 2018

We are adding a separate timeout to the config in 1.13.
Closing this issue.

@zaijianwutian

This comment has been minimized.

Copy link

zaijianwutian commented Jan 7, 2019

@heng-Yuan Hi, did you fix the issue?

@d10raghu

This comment has been minimized.

Copy link

d10raghu commented Jan 9, 2019

@heng-Yuan Hi, did you fix the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment