New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Etcd doesn't start on master after reboot on AWS host #417
Comments
The problem is SELinux. As per https://kubernetes.io/docs/setup/independent/install-kubeadm/:
After you restart the box, it sets SELinux back to 'Enforcing'. (you can check if that's the case by running You should be able to turn off SELinux permanently on the box by setting |
Great @tomciaaa, it seems like the issue got solved |
This is a very helpful discussion. I'd suggest to add these steps to the documentation: https://kubernetes.io/docs/setup/independent/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl |
My original issue turned to be exactly the same problem as people are describing above: Click to expandI'm experiencing exactly same problems on Centos7 .
I have installed:
Everything looks perfect and I have set up
So far so good but there is a problem. When I am trying to access
but when I ssh to actual node
I can easily access
and I will get response I am expecting
HINT
Centos7 tail /var/log/messages I have spent like 4 days on it, as you can in my post - like 4 days ago - I was at least start up cluster, suddenly - It is not possible anymore |
BUG REPORT
Versions
kubeadm version: &version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:30:51Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server 10.0.19.21:6443 was refused - did you specify the right host or port?
PRETTY_NAME="Red Hat Enterprise Linux Server 7.2 (Maipo)"
uname -a
):Linux ip-10-0-19-21.eu-west-1.compute.internal 3.10.0-327.el7.x86_64 kubeadm join on slave node fails preflight checks #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
What happened?
We've installed a Kubernetes cluster both on one or several Red Hat machines hosted on AWS by following the guide : https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ . We've used flannel for the pod network. Note that we did not make the cluster aware of the AWS environment, i.e. we did not follow the instructions for this doc: https://docs.google.com/document/d/17d4qinC_HnIwrK0GHnRlD1FKkTNdN__VO4TH9-EzbIY/edit to enable the AWS Cloud Provider, as we wanted to simulate an installation on bare-metal.
What you expected to happen?
We would have expected that stopping the AWS master machine and starting it again (both in the setup with only 1 host or with 3 hosts) would start the Kubernetes master services successfully. Instead, we noticed that the etcd pod on the master did not start and its log contained the following error:
2017-09-01 11:03:50.267821 I | etcdmain: etcd Version: 3.0.17
2017-09-01 11:03:50.268234 I | etcdmain: Git SHA: cc198e2
2017-09-01 11:03:50.268239 I | etcdmain: Go Version: go1.6.4
2017-09-01 11:03:50.268246 I | etcdmain: Go OS/Arch: linux/amd64
2017-09-01 11:03:50.268250 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-09-01 11:03:50.268327 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-09-01 11:03:50.269031 I | etcdmain: listening for peers on http://localhost:2380
2017-09-01 11:03:50.269075 I | etcdmain: listening for client requests on 127.0.0.1:2379
2017-09-01 11:03:50.269214 I | etcdmain: stopping listening for client requests on 127.0.0.1:2379
2017-09-01 11:03:50.269232 I | etcdmain: stopping listening for peers on http://localhost:2380
2017-09-01 11:03:50.269239 C | etcdmain: cannot access data directory: open /var/lib/etcd/.touch: permission denied
Moreover, the kubectl version command failed with the following error:
version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server 10.0.19.21:6443 was refused - did you specify the right host or port?
The kubelet command fails with the following error:
I0901 13:10:01.759421 4969 feature_gate.go:144] feature gates: map[]
W0901 13:10:01.759797 4969 server.go:496] No API client: no api servers specified
error: failed to run Kubelet: error reading /var/run/kubernetes/kubelet.key, certificate and key must be supplied as a pair
The kube-api-server pod also fails to start because etcd is not up.
How to reproduce it (as minimally and precisely as possible)?
Install the kubernetes cluster on one or three machines as described above.
Stop and then start the AWS master machine.
SSH to the AWS master machine and:
sudo docker ps -a | grep "etcd"
sudo docker logs <etcd_id>
Anything else we need to know?
$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Vi 2017-09-01 11:05:28 CEST; 2h 6min ago
Docs: http://kubernetes.io/docs/
Main PID: 11149 (kubelet)
CGroup: /system.slice/kubelet.service
├─11149 /usr/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --network-plugin=cni --cni-...
└─11213 journalctl -k -f
sep 01 13:12:19 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: E0901 13:12:19.909863 11149 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:408: Failed to list *v1.Node: Get https...
sep 01 13:12:20 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: I0901 13:12:20.200246 11149 kubelet_node_status.go:247] Setting node annotation to enable volume controller attach/detach
sep 01 13:12:20 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: I0901 13:12:20.286776 11149 kubelet_node_status.go:82] Attempting to register node ip-10-0-19-21.eu-west-1.compute.internal
sep 01 13:12:20 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: E0901 13:12:20.287104 11149 kubelet_node_status.go:106] Unable to register node "ip-10-0-19-21.eu-west-1.compute.in...ion refused
sep 01 13:12:20 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: E0901 13:12:20.600222 11149 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:400: Failed to list *v1.Serv...ion refused
sep 01 13:12:20 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: E0901 13:12:20.601103 11149 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Ge...
sep 01 13:12:20 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: E0901 13:12:20.910412 11149 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:408: Failed to list *v1.Node: Get https...
sep 01 13:12:21 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: E0901 13:12:21.600833 11149 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:400: Failed to list *v1.Serv...ion refused
sep 01 13:12:21 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: E0901 13:12:21.601776 11149 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Ge...
sep 01 13:12:21 ip-10-0-19-21.eu-west-1.compute.internal kubelet[11149]: E0901 13:12:21.910936 11149 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:408: Failed to list *v1.Node: Get https...
Hint: Some lines were ellipsized, use -l to show in full.
$ sudo ls -la /var/lib/etcd
total 4
drwxr-xr-x. 3 root root 19 sep 1 10:52 .
drwxr-xr-x. 34 root root 4096 sep 1 12:22 ..
drwx------. 4 root root 27 sep 1 10:52 member
The docker daemon runs with root privileges.
The text was updated successfully, but these errors were encountered: