kubeadm init --kubernetes-version=v1.8.0 fail with connection refuse for Get http://localhost:10255/healthz #55281

chenqy930 · 2017-11-08T01:00:04Z

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:
bug
/kind bug
/kind feature

What happened:
kubeadm init for initial cluster in version 1.8 fail with error:

[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.

and the output of "journalctl -f -u kubelet.service" was:

-- Logs begin at 二 2017-11-07 15:17:36 CST. --
11月 07 16:38:34 master systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
11月 07 16:38:34 master systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
11月 07 16:38:34 master systemd[1]: Started kubelet: The Kubernetes Node Agent.
11月 07 16:38:34 master kubelet[119508]: I1107 16:38:34.348135 119508 feature_gate.go:156] feature gates: map[]
11月 07 16:38:34 master kubelet[119508]: I1107 16:38:34.348240 119508 controller.go:114] kubelet config controller: starting controller
11月 07 16:38:34 master kubelet[119508]: I1107 16:38:34.348252 119508 controller.go:118] kubelet config controller: validating combination of defaults and flags
11月 07 16:38:34 master kubelet[119508]: error: unable to load client CA file /etc/kubernetes/pki/ca.pem: open /etc/kubernetes/pki/ca.pem: no such file or directory
11月 07 16:38:34 master systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
11月 07 16:38:34 master systemd[1]: kubelet.service: Unit entered failed state.
11月 07 16:38:34 master systemd[1]: kubelet.service: Failed with result 'exit-code'.
11月 07 16:38:44 master systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
11月 07 16:38:44 master systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
11月 07 16:38:44 master systemd[1]: Started kubelet: The Kubernetes Node Agent.

What you expected to happen:
kubeadm init success

How to reproduce it (as minimally and precisely as possible):
Lab Description:
VMName: Roles: IP Address:
master Master 192.168.41.136
node-1 Worker 192.168.41.133
node-2 Worker 192.168.41.134
node-3 Worker 192.168.41.135
Setup docker/kubectl/kubeadm/kubelet with version 1.8.0 for all node
On "kubernetes-ms" start command:

Anything else we need to know?:
cannot reach gcr.io due to network limitation.

Environment:

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:48:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
16.04.3 LTS (Xenial Xerus)
Kernel (e.g. uname -a):
Linux master 4.10.0-28-generic readability: package comment #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

k8s-github-robot · 2017-11-08T01:00:09Z

@chenqy930
There are no sig labels on this issue. Please add a sig label by:

mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR
specifying the label manually: /sig <label>
e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. See the group list and label list.
The <group-suffix> in the method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals

dixudx · 2017-11-08T09:20:14Z

/area kubeadm

error: unable to load client CA file /etc/kubernetes/pki/ca.pem: open /etc/kubernetes/pki/ca.pem: no such file or directory

@chenqy930 Please place right pem files for kubelet.

chenqy930 · 2017-11-08T09:32:46Z

I installed kubeadm/kubectl/kubelet/kubernetes -cni and loaded images(kube-apiserver-amd64 etc.) into docker, then kubeadm init, then I got the aforementioned result.

Is there any steps I missed? And how can I fix this?

dixudx · 2017-11-08T09:38:07Z

@chenqy930 Any logs on kubeadm init ?

chenqy930 · 2017-11-08T09:40:17Z

That‘s it:
root@master:/home/k8s-1# kubeadm init
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.2
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.10.0-ce. Max validated version: 17.03
[preflight] Starting the kubelet service
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.41.136]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by that:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- There is no internet connection; so the kubelet can't pull the following control plane images:
- gcr.io/google_containers/kube-apiserver-amd64:v1.8.2
- gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2
- gcr.io/google_containers/kube-scheduler-amd64:v1.8.2

You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster

dixudx · 2017-11-08T09:45:20Z

It seems like the kubelet isn't running or healthy.
There is no internet connection; so the kubelet can't pull the following control plane images:

gcr.io/google_containers/kube-apiserver-amd64:v1.8.2

gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2

gcr.io/google_containers/kube-scheduler-amd64:v1.8.2

@chenqy930 The images defined in the manifest cannot be pulled.

Please find out a way to download those needed images.

chenqy930 · 2017-11-08T09:48:12Z

I check the loaded images as follows, It seems they are already included.
root@master:/home/k8s-1/kubernetes/cluster# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/google_containers/pause-amd64 3.0 bc3585b6f659 About an hour ago 754kB
quay.io/coreos/flannel v0.9.0-amd64 6d62367f030b About an hour ago 52.2MB
gcr.io/google_containers/kubernetes-dashboard-amd64 v1.7.1 15c99f22cca8 About an hour ago 129MB
gcr.io/google_containers/kubernetes-dashboard-init-amd64 v1.0.1 81854976441a About an hour ago 255MB
gcr.io/google_containers/kube-proxy-amd64 v1.8.2 11d54b084210 About an hour ago 95MB
gcr.io/google_containers/kube-scheduler-amd64 v1.8.2 477906b10219 About an hour ago 55.1MB
gcr.io/google_containers/kube-controller-manager-amd64 v1.8.2 66cb019a2efb About an hour ago 129MB
gcr.io/google_containers/kube-apiserver-amd64 v1.8.2 79ecb7eeca03 About an hour ago 194MB
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64 1.14.5 fa404860cd46 About an hour ago 41.7MB
gcr.io/google_containers/k8s-dns-kube-dns-amd64 1.14.5 eada1b0bb596 About an hour ago 49.6MB
gcr.io/google_containers/k8s-dns-sidecar-amd64 1.14.5 046e2bcba500 About an hour ago 42.1MB
gcr.io/google_containers/heapster-influxdb-amd64 v1.3.3 7e8a3d9ffdf3 About an hour ago 12.8MB
gcr.io/google_containers/heapster-grafana-amd64 v4.4.3 bc6b2ce5b113 About an hour ago 155MB
gcr.io/google_containers/etcd-amd64 3.0.17 5ece253bdc32 About an hour ago 169MB
gcr.io/google_containers/heapster-amd64 v1.4.0 b753840aa6ce About an hour ago 73.4MB
hello-world latest 725dcfab7d63 4 days ago 1.84kB

chenqy930 · 2017-11-08T14:33:58Z

I found someone has the same problem here：
https://groups.google.com/forum/#!topic/kubernetes-dev/_NheOuYtsKQ

His used a lower version of docker and then the problem was solved.
But that didn't work for me.
I install docker-engine_1.11.2, and the problem is still the same.

Help.. T^T

Lion-Wei · 2017-11-09T03:58:57Z

Is there any proxy in your environment? If so, you may want to add no_proxy="127.0.0.1,localhost to your environment and try again.

chenqy930 · 2017-11-09T07:32:21Z

I don't have any proxy.
I was trying to init kubeadm with no Internet connection.

chenqy930 · 2017-11-09T09:31:20Z

It is due to the docker version.
I try the same steps in a cleaner environment which has not installed a v17.10.xx-ce before, and the problem was solve!

dixudx · 2017-11-12T10:11:00Z

/close

Since this is not a kubernetes issue.

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 8, 2017

k8s-ci-robot added the area/kubeadm label Nov 8, 2017

k8s-ci-robot assigned dixudx Nov 12, 2017

k8s-ci-robot closed this as completed Nov 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm init --kubernetes-version=v1.8.0 fail with connection refuse for Get http://localhost:10255/healthz #55281

kubeadm init --kubernetes-version=v1.8.0 fail with connection refuse for Get http://localhost:10255/healthz #55281

chenqy930 commented Nov 8, 2017

k8s-github-robot commented Nov 8, 2017

dixudx commented Nov 8, 2017

chenqy930 commented Nov 8, 2017

dixudx commented Nov 8, 2017

chenqy930 commented Nov 8, 2017

dixudx commented Nov 8, 2017

chenqy930 commented Nov 8, 2017

chenqy930 commented Nov 8, 2017

Lion-Wei commented Nov 9, 2017

chenqy930 commented Nov 9, 2017

chenqy930 commented Nov 9, 2017

dixudx commented Nov 12, 2017

kubeadm init --kubernetes-version=v1.8.0 fail with connection refuse for Get http://localhost:10255/healthz #55281

kubeadm init --kubernetes-version=v1.8.0 fail with connection refuse for Get http://localhost:10255/healthz #55281

Comments

chenqy930 commented Nov 8, 2017

k8s-github-robot commented Nov 8, 2017

dixudx commented Nov 8, 2017

chenqy930 commented Nov 8, 2017

dixudx commented Nov 8, 2017

chenqy930 commented Nov 8, 2017

dixudx commented Nov 8, 2017

chenqy930 commented Nov 8, 2017

chenqy930 commented Nov 8, 2017

Lion-Wei commented Nov 9, 2017

chenqy930 commented Nov 9, 2017

chenqy930 commented Nov 9, 2017

dixudx commented Nov 12, 2017