Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeadm init stuck on [init] This might take a minute or longer if the control plane images have to be pulled. #61277

Closed
ChristianCandelaria opened this issue Mar 16, 2018 · 71 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@ChristianCandelaria
Copy link

ChristianCandelaria commented Mar 16, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

/sig bug

What happened:
Kubeadm init hangs at:

[init] Using Kubernetes version: v1.9.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.56.60]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
...

What you expected to happen:
Cluster initialized. --> Kubernetes master inilized

How to reproduce it (as minimally and precisely as possible):
Install docker
Install kubeadm, kubelet, kubectl
use kubeadm init command

Anything else we need to know?:
Since yesterday when initializing a cluster it automatically used Kubernetes version: v1.9.4.
I tried forcing kubeadm to use --kubernetes-version=v1.9.3 but I still have the same issue.
Last week it was fine when I reset my Kubernetes cluster and reinitialize it again.
I found the issue yesterday when I wanted to reset my cluster again and reinitialize it and it got stuck.

I tried yum update, to update all my software, but still the same issue
I used Kubernetes v1.9.3 and after updating today... I'm using Kubernetes v1.9.4

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4", GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean", BuildDate:"2018-03-12T16:29:47Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

  • OS (e.g. from /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 16, 2018
@depauna
Copy link

depauna commented Mar 16, 2018

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 16, 2018
@tpepper
Copy link
Member

tpepper commented Mar 16, 2018

/sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 16, 2018
@depauna
Copy link

depauna commented Mar 21, 2018

1.9.5 has the same issue
possible to investigate the issue?
what are steps we have to take in the future when an update comes out?

@Miyurz
Copy link

Miyurz commented Mar 31, 2018

Same for v1.10.0 and v1.11.0 too.

# ./kubeadm init --kubernetes-version 1.10.0 
[init] Using Kubernetes version: v1.10.0
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
	[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.03.0-ce. Max validated version: 17.03
	[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Starting the kubelet service
[certificates] Using the existing ca certificate and key.
[certificates] Using the existing apiserver certificate and key.
[certificates] Using the existing apiserver-kubelet-client certificate and key.
[certificates] Using the existing etcd/ca certificate and key.
[certificates] Using the existing etcd/server certificate and key.
[certificates] Using the existing etcd/peer certificate and key.
[certificates] Using the existing etcd/healthcheck-client certificate and key.
[certificates] Using the existing apiserver-etcd-client certificate and key.
[certificates] Using the existing sa key.
[certificates] Using the existing front-proxy-ca certificate and key.
[certificates] Using the existing front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Using existing up-to-date KubeConfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing up-to-date KubeConfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing up-to-date KubeConfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing up-to-date KubeConfig file: "/etc/kubernetes/scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.

@Miyurz
Copy link

Miyurz commented Mar 31, 2018

on a side note, does anyone know how we can make kubeadm logs more verbose to understand what/why its actually hosed ? many thanks

@princerachit
Copy link

I followed advices from this link
First make sure you have switched off swap with sudo swapoff -a
Then add the following line if it does not exist to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false"
Restart the kubelet service and docker service with systemctl restart docker && systemctl restart kubelet.service

Now run kubeadm init

@ChristianCandelaria
Copy link
Author

ChristianCandelaria commented Apr 2, 2018

This issue often comes back on each new release of Kubernetes version.
I still don't know what causes this problem.
But since I opened this ticket the kubeadm init gets stuck on each version release
and after 3 or 4 days kubeadm init works again like an angel fixes the problem.

@nikhilno1
Copy link

I am new to kubernetes and getting the same error on CentOS 7 (v1.10.0)

Apr 5 09:50:27 es-nikhil kubelet: I0405 09:50:27.518727 2793 kubelet_node_status.go:82] Attempting to register node es-nikhil
Apr 5 09:50:27 es-nikhil kubelet: E0405 09:50:27.519031 2793 kubelet_node_status.go:106] Unable to register node "es-nikhil" with API server: Post https://10.193.104.43:6443/api/v1/nodes: dial tcp 10.193.104.43:6443: getsockopt: connection refused

Tried everything that many posts have suggested but still not able to get it to work.
There is nobody listening on port 6443. Complete log attached.
Any help appreciated. Thanks.
kube.log

@toolboc
Copy link

toolboc commented Apr 6, 2018

I am getting a similar result to @nikhilno1 when attempting to initialize a master node on a raspberry pi. Any assistance is much appreciated.

kubeadm version: &version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/arm"

journalctl -xeu kubectl:

Apr 06 19:45:26 k8s-master kubelet[4153]: E0406 19:45:26.314242    4153 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://192.168.1.184:6443/api/v1/services?limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 06 19:45:26 k8s-master kubelet[4153]: E0406 19:45:26.316249    4153 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.1.184:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-master&limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 06 19:45:26 k8s-master kubelet[4153]: W0406 19:45:26.322755    4153 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 06 19:45:26 k8s-master kubelet[4153]: E0406 19:45:26.343319    4153 kubelet.go:2125] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 06 19:45:26 k8s-master kubelet[4153]: E0406 19:45:26.597418    4153 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://192.168.1.184:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-master&limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 06 19:45:31 k8s-master kubelet[4153]: W0406 19:45:31.352689    4153 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 06 19:45:31 k8s-master kubelet[4153]: E0406 19:45:31.353707    4153 kubelet.go:2125] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 06 19:45:32 k8s-master kubelet[4153]: I0406 19:45:32.378350    4153 kubelet_node_status.go:271] Setting node annotation to enable volume controller attach/detach
Apr 06 19:45:33 k8s-master kubelet[4153]: E0406 19:45:33.954306    4153 kubelet_node_status.go:106] Unable to register node "k8s-master" with API server: Post https://192.168.1.184:6443/api/v1/nodes: net/http: TLS handshake timeout
Apr 06 19:45:35 k8s-master kubelet[4153]: E0406 19:45:35.554362    4153 eviction_manager.go:246] eviction manager: failed to get get summary stats: failed to get node info: node "k8s-master" not found
Apr 06 19:45:36 k8s-master kubelet[4153]: W0406 19:45:36.359610    4153 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 06 19:45:36 k8s-master kubelet[4153]: E0406 19:45:36.360561    4153 kubelet.go:2125] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 06 19:45:37 k8s-master kubelet[4153]: E0406 19:45:37.322600    4153 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://192.168.1.184:6443/api/v1/services?limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 06 19:45:37 k8s-master kubelet[4153]: E0406 19:45:37.325736    4153 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.1.184:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-master&limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 06 19:45:37 k8s-master kubelet[4153]: I0406 19:45:37.379039    4153 kubelet_node_status.go:271] Setting node annotation to enable volume controller attach/detach
Apr 06 19:45:38 k8s-master kubelet[4153]: E0406 19:45:38.633877    4153 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://192.168.1.184:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-master&limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 06 19:45:40 k8s-master kubelet[4153]: E0406 19:45:40.762656    4153 event.go:209] Unable to write event: 'Post https://192.168.1.184:6443/api/v1/namespaces/default/events: net/http: TLS handshake timeout' (may retry after sleeping)
Apr 06 19:45:40 k8s-master kubelet[4153]: E0406 19:45:40.762836    4153 event.go:144] Unable to write event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"k8s-master.1522f003ac10794d", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"k8s-master", UID:"k8s-master", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientDisk", Message:"Node k8s-master status is now: NodeHasSufficientDisk", Source:v1.EventSource{Component:"kubelet", Host:"k8s-master"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbeaa1110a1cb654d, ext:1600396290, loc:(*time.Location)(0x4547c98)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbeaa1110a1cb654d, ext:1600396290, loc:(*time.Location)(0x4547c98)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}' (retry limit exceeded!)
Apr 06 19:45:40 k8s-master kubelet[4153]: I0406 19:45:40.955172    4153 kubelet_node_status.go:271] Setting node annotation to enable volume controller attach/detach
Apr 06 19:45:40 k8s-master kubelet[4153]: I0406 19:45:40.971118    4153 kubelet_node_status.go:82] Attempting to register node k8s-master
Apr 06 19:45:41 k8s-master kubelet[4153]: W0406 19:45:41.365472    4153 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 06 19:45:41 k8s-master kubelet[4153]: E0406 19:45:41.366689    4153 kubelet.go:2125] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 06 19:45:42 k8s-master kubelet[4153]: I0406 19:45:42.462219    4153 kubelet_node_status.go:271] Setting node annotation to enable volume controller attach/detach
Apr 06 19:45:45 k8s-master kubelet[4153]: E0406 19:45:45.559326    4153 eviction_manager.go:246] eviction manager: failed to get get summary stats: failed to get node info: node "k8s-master" not found
Apr 06 19:45:46 k8s-master kubelet[4153]: W0406 19:45:46.372431    4153 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 06 19:45:46 k8s-master kubelet[4153]: E0406 19:45:46.376717    4153 kubelet.go:2125] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 06 19:45:49 k8s-master kubelet[4153]: E0406 19:45:49.353873    4153 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.1.184:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-master&limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 06 19:45:50 k8s-master kubelet[4153]: E0406 19:45:50.713264    4153 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://192.168.1.184:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-master&limit=500&resourceVersion=0: net/http: TLS handshake timeout
Apr 06 19:45:51 k8s-master kubelet[4153]: W0406 19:45:51.383303    4153 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 06 19:45:51 k8s-master kubelet[4153]: E0406 19:45:51.392989    4153 kubelet.go:2125] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 06 19:45:51 k8s-master kubelet[4153]: E0406 19:45:51.833722    4153 event.go:209] Unable to write event: 'Post https://192.168.1.184:6443/api/v1/namespaces/default/events: net/http: TLS handshake timeout' (may retry after sleeping)
Apr 06 19:45:51 k8s-master kubelet[4153]: E0406 19:45:51.993251    4153 kubelet_node_status.go:106] Unable to register node "k8s-master" with API server: Post https://192.168.1.184:6443/api/v1/nodes: net/http: TLS handshake timeout
Apr 06 19:45:55 k8s-master kubelet[4153]: E0406 19:45:55.562988    4153 eviction_manager.go:246] eviction manager: failed to get get summary stats: failed to get node info: node "k8s-master" not found
Apr 06 19:45:56 k8s-master kubelet[4153]: E0406 19:45:56.211931    4153 certificate_manager.go:299] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Post https://192.168.1.184:6443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: net/http: TLS handshake timeout```

@geoffgarside
Copy link

I'm getting the same. When checking docker ps I'm only seeing the apiserver being up for about a minute and then getting restarted. The logs from the apiserver look like this

Flag --admission-control has been deprecated, Use --enable-admission-plugins or --disable-admission-plugins instead. Will be removed in a future version.
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I0408 12:14:46.511153       1 server.go:135] Version: v1.10.0
I0408 12:14:46.511886       1 server.go:679] external host was not specified, using 192.168.1.200
I0408 12:15:39.952054       1 plugins.go:149] Loaded 9 admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota.
I0408 12:15:39.990920       1 master.go:228] Using reconciler: master-count
W0408 12:15:42.294537       1 genericapiserver.go:342] Skipping API batch/v2alpha1 because it has no resources.
W0408 12:15:42.399900       1 genericapiserver.go:342] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
W0408 12:15:42.413353       1 genericapiserver.go:342] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
W0408 12:15:42.522536       1 genericapiserver.go:342] Skipping API admissionregistration.k8s.io/v1alpha1 because it has no resources.
[restful] 2018/04/08 12:15:43 log.go:33: [restful/swagger] listing is available at https://192.168.1.200:6443/swaggerapi
[restful] 2018/04/08 12:15:43 log.go:33: [restful/swagger] https://192.168.1.200:6443/swaggerui/ is mapped to folder /swagger-ui/
[restful] 2018/04/08 12:16:02 log.go:33: [restful/swagger] listing is available at https://192.168.1.200:6443/swaggerapi
[restful] 2018/04/08 12:16:02 log.go:33: [restful/swagger] https://192.168.1.200:6443/swaggerui/ is mapped to folder /swagger-ui/
I0408 12:16:46.589461       1 serve.go:96] Serving securely on [::]:6443
I0408 12:16:46.590531       1 crd_finalizer.go:242] Starting CRDFinalizer
I0408 12:16:46.590739       1 crd_finalizer.go:246] Shutting down CRDFinalizer
I0408 12:16:46.610096       1 controller.go:84] Starting OpenAPI AggregationController
I0408 12:16:46.610214       1 controller.go:90] Shutting down OpenAPI AggregationController
I0408 12:16:46.610434       1 crdregistration_controller.go:110] Starting crd-autoregister controller
I0408 12:16:46.620929       1 controller_utils.go:1019] Waiting for caches to sync for crd-autoregister controller
E0408 12:16:46.621160       1 controller_utils.go:1022] Unable to sync caches for crd-autoregister controller
I0408 12:16:46.621246       1 crdregistration_controller.go:115] Shutting down crd-autoregister controller
I0408 12:16:46.642881       1 customresource_discovery_controller.go:174] Starting DiscoveryController
E0408 12:16:46.652046       1 customresource_discovery_controller.go:177] timed out waiting for caches to sync
I0408 12:16:46.652148       1 customresource_discovery_controller.go:178] Shutting down DiscoveryController
I0408 12:16:46.652288       1 naming_controller.go:276] Starting NamingConditionController
I0408 12:16:46.652380       1 naming_controller.go:280] Shutting down NamingConditionController
I0408 12:16:46.651594       1 apiservice_controller.go:90] Starting APIServiceRegistrationController
I0408 12:16:46.654368       1 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
E0408 12:16:46.654485       1 cache.go:35] Unable to sync caches for APIServiceRegistrationController controller
I0408 12:16:46.654559       1 apiservice_controller.go:94] Shutting down APIServiceRegistrationController
I0408 12:16:46.665820       1 available_controller.go:262] Starting AvailableConditionController
I0408 12:16:46.665943       1 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
E0408 12:16:46.666085       1 cache.go:35] Unable to sync caches for AvailableConditionController controller
I0408 12:16:46.666172       1 available_controller.go:266] Shutting down AvailableConditionController
I0408 12:16:46.802095       1 serve.go:136] Stopped listening on [::]:6443
E0408 12:16:48.482319       1 storage_rbac.go:157] unable to initialize clusterroles: Get https://127.0.0.1:6443/apis/rbac.authorization.k8s.io/v1/clusterroles: dial tcp 127.0.0.1:6443: getsockopt: connection refused
E0408 12:16:48.916921       1 client_ca_hook.go:78] Post https://127.0.0.1:6443/api/v1/namespaces: dial tcp 127.0.0.1:6443: getsockopt: connection refused

this is while waiting for kubeadm init to finish, it eventually stalls and exits. I've tried applying a pod network manifest while init is waiting, that seems to be a suggestion in other issues, but that fails as the api server isn't available. Either the address:port isn't being listened on, or the connection times out due to TLS handshake timeout.

The apiserver container is exiting with a code 137.

@rhuss
Copy link

rhuss commented Apr 9, 2018

I experience the same issue with kubeadm 1.10.0 on raspi3 (hypriot os 1.8.0) but also see the kube-apiserver & etcd process running at full blast CPU:

 PID  USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
24110 root      20   0  804136  26984  10284 R 217.9  2.8   1:01.00 etcd
24019 root      20   0  931116 168132  42464 S 170.1 17.7   2:07.47 kube-apiserver
....

Load of the pi even after 20 minutes running is load average: 6.31, 5.16, 4.66

This even continues after kubeadm init bailed out. There's nothing suspicious in those containers log though.

@Namonic
Copy link

Namonic commented Apr 10, 2018

I am experiencing the same issue, please add me to this. I am getting pretty much identical logs as geoffgarside noted above. (I am on a Pi 3 B)

  • Swap is off
  • There is no gateway/proxy between me and the internet
  • networking is rock solid (we always say that don't we?) I have verified I am resolving properly both out and back with DNS resolution and can ping resources in and out of my local network.

Included in my issue is what rhuss is seeing as well, a huge spike in load as indicated by "top" as well as my etcd and apiserver restarting every few minutes.

@ChristianCandelaria
Copy link
Author

ChristianCandelaria commented Apr 11, 2018

It seems that the etcd causing the problem, the etcd node doesn't start properly because of certs issue. The kube-apiserver pod crashes because it can't locate an etcd pod.

You can get around it by configuring an external etcd on your master node.
This are the steps on Centos7.

Step 1: install etcd on your master node
yum install etcd

Step 2: configure /etc/etcd/etcd.conf
ETCD_LISTEN_PEER_URLS="http://MASTERIP:2380"
ETCD_LISTEN_CLIENT_URLS="http://MASTERIP:2379,http://localhost:2379"

Step 3: restart or start etcd
systemctl start etcd; systemctl enable etcd

Step 4: generate token
kubeadm token generate
output> 0a1b4e.8dd0c0e0ff3c3f7a

Step 5: create config.yaml file
[source for full settings:]https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/
Podsubnet 172.168.0.0/16 (using calico)
MasterIP: 192.168.56.60
Token: 0a1b4e.8dd0c0e0ff3c3f7a

File: config.yaml

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
  advertiseAddress: 192.168.56.60
  bindPort: 6443
etcd:
  endpoints:
  - "http://192.168.56.60:2379"
apiServerCertSANs:
- "192.168.56.60"
- "127.0.0.1"
networking:
  podSubnet: "172.168.0.0/16"
token: "0a1b4e.8dd0c0e0ff3c3f7a"
tokenTTL: "0"

Step 6: start kubeadm init
kubeadm init --config /path/to/config.yaml

@geoffgarside
Copy link

I'm not quite sure what is killing etcd though, I've got the exited etcd container log below, just need to work out the source of its "exit (0)" status

2018-04-11 22:22:59.911070 W | etcdmain: running etcd on unsupported architecture "arm64" since ETCD_UNSUPPORTED_ARCH is set
2018-04-11 22:23:00.195791 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=arm64
2018-04-11 22:23:00.196870 I | etcdmain: etcd Version: 3.1.12
2018-04-11 22:23:00.197647 I | etcdmain: Git SHA: 918698a
2018-04-11 22:23:00.198133 I | etcdmain: Go Version: go1.8.5
2018-04-11 22:23:00.199211 I | etcdmain: Go OS/Arch: linux/arm64
2018-04-11 22:23:00.200036 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2018-04-11 22:23:00.201240 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true
2018-04-11 22:23:00.206591 W | embed: The scheme of peer url http://localhost:2380 is HTTP while peer key/cert files are presented. Ignored peer key/cert files.
2018-04-11 22:23:00.207050 W | embed: The scheme of peer url http://localhost:2380 is HTTP while client cert auth (--peer-client-cert-auth) is enabled. Ignored client cert auth for this url.
2018-04-11 22:23:00.243879 I | embed: listening for peers on http://localhost:2380
2018-04-11 22:23:00.245035 I | embed: listening for client requests on 127.0.0.1:2379
2018-04-11 22:23:01.248754 W | etcdserver: another etcd process is running with the same data dir and holding the file lock.
2018-04-11 22:23:01.248945 W | etcdserver: waiting for it to exit before starting...
2018-04-11 22:23:11.452299 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
2018-04-11 22:23:11.497000 I | etcdserver: name = default
2018-04-11 22:23:11.497173 I | etcdserver: data dir = /var/lib/etcd
2018-04-11 22:23:11.497234 I | etcdserver: member dir = /var/lib/etcd/member
2018-04-11 22:23:11.497273 I | etcdserver: heartbeat = 100ms
2018-04-11 22:23:11.497310 I | etcdserver: election = 1000ms
2018-04-11 22:23:11.497352 I | etcdserver: snapshot count = 10000
2018-04-11 22:23:11.497571 I | etcdserver: advertise client URLs = https://127.0.0.1:2379
2018-04-11 22:23:11.497823 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2018-04-11 22:23:11.498199 I | etcdserver: initial cluster = default=http://localhost:2380
2018-04-11 22:23:20.995857 W | wal: sync duration of 1.320885532s, expected less than 1s
2018-04-11 22:23:21.217165 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
2018-04-11 22:23:21.217423 I | raft: 8e9e05c52164694d became follower at term 0
2018-04-11 22:23:21.217575 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2018-04-11 22:23:21.217657 I | raft: 8e9e05c52164694d became follower at term 1
2018-04-11 22:23:21.873572 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
2018-04-11 22:23:21.880938 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
2018-04-11 22:23:21.881174 I | etcdserver: starting server... [version: 3.1.12, cluster version: to_be_decided]
2018-04-11 22:23:21.881395 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd/server.crt, key = /etc/kubernetes/pki/etcd/server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true
2018-04-11 22:23:21.890640 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-04-11 22:23:21.919740 I | raft: 8e9e05c52164694d is starting a new election at term 1
2018-04-11 22:23:21.920156 I | raft: 8e9e05c52164694d became candidate at term 2
2018-04-11 22:23:21.920293 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
2018-04-11 22:23:21.920430 I | raft: 8e9e05c52164694d became leader at term 2
2018-04-11 22:23:21.920576 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
2018-04-11 22:23:21.922068 I | etcdserver: setting up the initial cluster version to 3.1
2018-04-11 22:23:22.367497 N | etcdserver/membership: set the initial cluster version to 3.1
2018-04-11 22:23:22.367751 I | embed: ready to serve client requests
2018-04-11 22:23:22.368025 I | etcdserver: published {Name:default ClientURLs:[https://127.0.0.1:2379]} to cluster cdf818194e3a8c32
2018-04-11 22:23:22.368094 I | etcdserver/api: enabled capabilities for version 3.1
2018-04-11 22:23:22.368196 W | etcdserver: apply entries took too long [283.993281ms for 2 entries]
2018-04-11 22:23:22.368237 W | etcdserver: avoid queries with large range/delete range!
2018-04-11 22:23:22.391108 I | embed: serving client requests on 127.0.0.1:2379
2018-04-11 22:25:31.629821 N | pkg/osutil: received terminated signal, shutting down...
2018-04-11 22:25:31.630957 I | etcdserver: skipped leadership transfer for single member cluster

@toolboc
Copy link

toolboc commented Apr 14, 2018

@geoffgarside ,

The issue may lie in the fact that the etcd version is not supported by the installed version of kubeadm.

Using kubeadm v1.10.1 with etcd v2.3.7 (latest available on Raspbian Stretch) and following the suggestion by @ChristianCandelaria to run an external etc instance, I get the following:

[init] Using Kubernetes version: v1.10.1
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
        [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.04.0-ce. Max validated version: 17.03
        [WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Some fatal errors occurred:
        [ERROR ExternalEtcdVersion]: this version of kubeadm only supports external etcd version >= 3.1.12. Current version: 2.3.

@geoffgarside
Copy link

I'm running the etcd-arm64 3.1.12 container, so its the accepted version of etcd. The issue seems to be the livenessProbe on the etcd pod, though I'm not sure how. The kubelet logs show it thinking the container is dead and so restarting it. I've been trying the following

docker exec $(docker ps --format "{{.ID}}" --filter="label=io.kubernetes.container.name=etcd") \
  /bin/sh -ec "ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key get foo; echo \$?"

and get the following output

2018-04-14 20:41:22.863186 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
0

so I'm still not sure why its thinking the container is dead. If I run the etcd service manually with docker run and copy the contents from the /etc/kubernetes/manifests/etcd.yaml it seems to run, I can docker exec into the same container and use the above style command to get, put and del values. I'm half wondering if the INFO warning is causing the problems with the livenessProbe.

@geoffgarside
Copy link

I've turned up the logging verbosity on kubelet, seeing these probe messages now

Apr 16 23:03:40 rpi3-01 kubelet[7270]: I0416 23:03:40.663747    7270 prober.go:111] Liveness probe for "etcd-rpi3-01_kube-system(6e46efd4679b5b164d314731383c326d):etcd" failed (failure): 2018-04-16 23:03:38.529485 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
Apr 16 23:03:40 rpi3-01 kubelet[7270]: Error:  grpc: timed out when dialing

It looks like this might be related to the healthcheck-client.crt not having a SAN field for IP 127.0.0.1, but I'm not sure. Need to create a similar certificate to the existing one and try that, will check that in the morning.

@novakg
Copy link

novakg commented Apr 18, 2018

I've experienced the same, but only after rebooting the box. It was all ok when I've created the CentOS instance in OpenStack and installed k8s on it. But when I tried to install k8s after rebooting the instance, or I rebooted after k8s install, k8s did not work anymore/ install hung as described above.
Apiserver was trying to come up, and then timed out an stopped.
It turned out that the problem was SElinux as described here: kubernetes/kubeadm#417
It all works fine after I've set SELINUX=permissive in /etc/selinux/config

@joesan
Copy link

joesan commented Apr 21, 2018

I'm getting the same issue, running on a Raspberry Pi 3 Model B+

Here are the logs:

pi@master01:~/k8s $ sudo kubeadm init --config kubeadm_conf.yaml
[init] Using Kubernetes version: v1.10.1
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
	[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.04.0-ce. Max validated version: 17.03
	[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.0.227]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [localhost] and IPs [127.0.0.1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [master01] and IPs [192.168.0.227]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.

It hangs here and it is frustrating!

@12wrigja
Copy link

I was having similar issues in a cluster I brought up from scratch (loosely based on kubernetes-the-hard-way and kubeadm), with the biggest difference being that I was running etcd separate from the kubelet as it's own non-k8s managed container, and all communication was secured using TLS. Upgrading the master node from v1.9.2 to v.1.10.0 caused the machine (a raspberry pi 3b) to crash (with kernel:[164210.000398] Internal error: Oops: 80000007 [#1] SMP ARM
) every time I attempt to add a new node to the cluster (including the master node itself).

@mkumatag
Copy link
Member

If you are running on non-x86 platform and if k8s is using k8s.gcr.io/pause:3.1 then you are hitting with the issue mentioned in #63057

@ferrarimarco
Copy link

ferrarimarco commented Apr 26, 2018

Same here :(

kubeadm 1.10.1 running on CentOS 7 on a x86 platform (Virtualbox VM managed with Vagrant).

Components cannot connect to the API server (connetion refused and TLS handshake timeouts), plus CPU and disk usage at 100%

UPDATE: in my case this was due to low RAM (just 512 MB).

@geoffgarside
Copy link

The k8s.gcr.io/pause:3.1 issue doesn't seem to be related. It looks like kubeadm init or the kubelet pulls down the arch specific image, in this case k8s.gcr.io/pause-arm64:3.1 anyway.

@12wrigja
Copy link

I agree with Geoff here - all my other pods come up for the controller node, so I'd be surprised if there was something specifically wrong with the pause image.

I'm wondering if it's a memory issue as Ferrari is alluding to - It's been hit or miss for me in a couple of my reproduction attempts as to whether this works, and low RAM could certainly be part of the problem on the Pi.

@vjroxvijay1
Copy link

I am facing the same issue in RHEL7. Any solution will be appreciated. Tried removing the swap space and still facing the same issue.

@snowake4me
Copy link

I'm having the same problems as articulated above - where the kubeadm init hangs. I'm working on a raspberry pi 3B - and have tried every variety of mitigation identified above that some claim to have resolved their problem. I get the same pre-flight warnings others have detailed:

[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.05.0-ce. Max validated version: 17.03
	[WARNING FileExisting-crictl]: crictl not found in system path

Being a relative n00b to all this, I'm at a bit of a loss - but looking at /var/log/daemon.log, I see it spewing errors like the following:

kmaster kubelet[1554]: E0519 21:42:56.942912    1554 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.10.10.201:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkmaster.gotwake.com&limit=500&resourceVersion=0: dial tcp 10.10.10.201:6443: getsockopt: connection refused

While I'm a k8s n00b - I'm pretty savvy with networking, daemons / listeners, certificates/pki, etc - and this, at least, is an error I understand.

I'm assuming the API server is what's supposed to be listening on tcp 6443 - but this does not appear to be the case on my pi. No wonder all the connections are timing out. I'm barely following the discussions about regarding "spinning up a stand-alone etcd container' - but, assuming it is intended to be instantiated through this process - where, exactly, would I look to confirm that it's at least being attempted - and ideally, seeing an error there which prevents it?

I got all excited by rongou's statement about a change to /etc/resolv.conf fixing his problem - alas, there's no search directive contained in my file.

The connection error above seems so straight-forward - but I'm in over my head on how to proceed with the troubleshooting.

@rongou
Copy link

rongou commented May 19, 2018

@snowake4me is 10.10.10.201 the correct IP address for your host?

@jmreicha
Copy link

Seeing similar behavior (on a Pine64, not an RPi). The etcd container seems to be crash looping, as well as the API server. Here's the last few lines of the etcd logs before it crashes.

2018-05-20 03:13:34.792877 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-05-20 03:13:34.793338 N | etcdserver/membership: set the initial cluster version to 3.1
2018-05-20 03:13:34.793505 I | etcdserver/api: enabled capabilities for version 3.1
2018-05-20 03:13:34.856940 I | raft: 8e9e05c52164694d is starting a new election at term 8
2018-05-20 03:13:34.857159 I | raft: 8e9e05c52164694d became candidate at term 9
2018-05-20 03:13:34.857236 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 9
2018-05-20 03:13:34.857321 I | raft: 8e9e05c52164694d became leader at term 9
2018-05-20 03:13:34.857360 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 9
2018-05-20 03:13:34.858413 I | etcdserver: published {Name:default ClientURLs:[https://127.0.0.1:2379]} to cluster cdf818194e3a8c32
2018-05-20 03:13:34.858742 I | embed: ready to serve client requests
2018-05-20 03:13:34.860289 I | embed: serving client requests on 127.0.0.1:2379
2018-05-20 03:17:49.792779 N | pkg/osutil: received terminated signal, shutting down...
2018-05-20 03:17:49.792864 I | etcdserver: skipped leadership transfer for single member cluster

This is on Docker v17.12.1.

@toolboc
Copy link

toolboc commented May 20, 2018

Hangs @ [init] This might take a minute or longer if the control plane images have to be pulled. on Raspberry Pi 3 B with Docker 18.05 and kubeadm=1.10.2-00 kubectl=1.10.2-00 kubelet=1.10.2-00

Fixed by downgrading kubeadm, kubectl, and kubelet to 1.9.6:
sudo apt-get install -qy kubeadm=1.9.6-00 kubectl=1.9.6-00 kubelet=1.9.6-00

AND

Downgrading to Docker 18.04:
sudo aptitude install -qy docker-ce=18.04.0~ce~3-0~raspbian

@jmreicha
Copy link

@toolboc Just confirmed that downgrading to 1.9.6 works. I was able to downgrade only kubeadm with sudo apt-get install -qy kubeadm=1.9.6-00.

I was then able to bring up a 1.10.2 cluster with sudo kubeadm init --token-ttl=0 --pod-network-cidr=10.244.0.0/16 --kubernetes-version=1.10.2.

I still need to do more testing but it seems to be working.

@jmreicha
Copy link

It turns out that I am still having issues with 1.10.2 clusters. I left the --kubernetes-version=1.10.2 flag off and brought up a 1.9.7 cluster and it is working.

@clabu
Copy link

clabu commented May 20, 2018

In my case I didn't have any state I needed to keep so I reset my cluster and then downgraded from 1.10.2 to 1.9.7: sudo apt-get install kubeadm=1.9.7-00 kubectl=1.9.7-00 kubelet=1.9.7-00

After that I could init the cluster successfully. I'm on a Raspberry Pi 3 with Hypriotos: Linux pi-1 4.4.50-hypriotos-v7+

@woile
Copy link

woile commented May 21, 2018

I'm also having the same issue on raspbian stretch in a RB+

[init] This might take a minute or longer if the control plane images have to be pulled.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
	- Either there is no internet connection, or imagePullPolicy is set to "Never",
	  so the kubelet cannot pull or find the following control plane images:
		- k8s.gcr.io/kube-apiserver-arm:v1.10.2
		- k8s.gcr.io/kube-controller-manager-arm:v1.10.2
		- k8s.gcr.io/kube-scheduler-arm:v1.10.2
		- k8s.gcr.io/etcd-arm:3.1.12 (only if no external etcd endpoints are configured)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster

Edit: Fixed updating docker version sudo aptitude install docker-ce=17.12.1~ce-0~raspbian

@mcglinn
Copy link

mcglinn commented May 24, 2018

Thanks all for your help!

I finally got passed kubeadm init.

For any of you who are still having trouble all i did was downgrade both kubernetes and docker.

If you follow the guide linked from @FFFEGO above the only differences are:

  1. I downgraded immediately docker after installation
  2. I forced 10.9.6-00 for kubeadm kubectl and kubelet

Here is a quick list for now (needs to be read with guide [https://gist.github.com/aaronkjones/d996f1a441bc80875fd4929866ca65ad])

1  sudo vi /boot/cmdline.txt # Add three extra terms as per link
2  sudo vi /etc/dhcpcd.conf # Change to static IP - compared to guide just use same commands under interface eth0
3  sudo raspi-config # Change name and reboot
4  curl -sSL get.docker.com | sh && sudo usermod pi -aG docker
5  sudo aptitude install docker-ce=17.12.1~ce-0~raspbian
6  sudo dphys-swapfile swapoff &&   sudo dphys-swapfile uninstall &&   sudo update-rc.d dphys-swapfile remove
7  curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - &&   echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list &&   sudo apt-get update -q &&   sudo apt-get install -qy kubeadm=1.9.6-00 kubectl=1.9.6-00 kubelet=1.9.6-00
8  sudo sed -i '/KUBELET_NETWORK_ARGS=/d' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
9  sudo kubeadm init --token-ttl=0 --pod-network-cidr=10.244.0.0/16

I will share a better write up later if requested

@javafoot
Copy link

I encountered the same issue when installed k8s1.10.3. I downloaded images of k8s, but failed. The fatal reason, I used outdated pause image, gcr.io/google_containers/pause-amd64:3.0.
A set of images for k8s1.10.3:

16:56:59@cd-arch-2104|~
root>docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy-amd64 v1.10.3 4261d315109d 3 days ago 97.1 MB
k8s.gcr.io/kube-apiserver-amd64 v1.10.3 e03746fe22c3 3 days ago 225 MB
k8s.gcr.io/kube-scheduler-amd64 v1.10.3 353b8f1d102e 3 days ago 50.4 MB
k8s.gcr.io/kube-controller-manager-amd64 v1.10.3 40c8d10b2d11 3 days ago 148 MB
k8s.gcr.io/etcd-amd64 3.1.12 52920ad46f5b 2 months ago 193 MB
k8s.gcr.io/pause-amd64 3.1 da86e6ba6ca1 5 months ago 742 kB

I finally installed "kubeadm init" successfully with these 6 images. I failed for the master can't download pause-amd64.

Hope this can help you. Thank you very much, Lord Jesus!

@jmreicha
Copy link

jmreicha commented May 25, 2018

I dug a little bit deeper into this but am still stuck. It looks like the manifests/etcd.yaml and manifests/kube-apiserver.yaml configs were changed between 1.9 and 1.10.

I ran a --dry-run using both the 1.9 and 1.10 versions of kubeadm and it seems like the etcd health check was changed as well as turning on certificate auth. At this point I'm thinking that this change is what is causing the issue. For example,

1.9 etcd.yaml

spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=http://127.0.0.1:2379
    - --data-dir=/var/lib/etcd
    - --listen-client-urls=http://127.0.0.1:2379
    image: gcr.io/google_containers/etcd-arm64:3.1.11
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2379
        scheme: HTTP
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: etcd

1.10 etcd.yaml

spec:
  containers:
  - command:
    - etcd
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --advertise-client-urls=https://127.0.0.1:2379
    - --client-cert-auth=true
    - --peer-client-cert-auth=true
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --listen-client-urls=https://127.0.0.1:2379
    - --data-dir=/var/lib/etcd
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    image: k8s.gcr.io/etcd-arm64:3.1.12
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -ec
        - ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
          --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
          get foo
      failureThreshold: 8
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: etcd

The kube-apiserver has also been updated to use this https etcd, instead of the http version that is used in 1.9.

I can get the kubeadm init to finish bootstrapping by creating a config file and overriding all of the etcd urls with http endpoints but the etcd and apiserver containers still crashloop.

Unfortunately I'm not sure how to fix this, but would love to get it figured out.

@geoffgarside
Copy link

@jmreicha yeah thats what I found with 1.10.2 as well. The switch of etcd from HTTP to HTTPS seems to be the main source of this issue. I tried starting a etcd instance using docker and the same options as the manifest and it all ran fine. I was also able to docker exec into the container and run the health check command without issue. Unfortunately doing the same docker exec into the kubelet managed container is more hit and miss, sometimes it would work, usually after it had just started and sometimes it would error out with a grpc timeout. Usually when the grpc timeouts where happening lsof would show a large number of connections between etcd and apiserver though the logs wouldn't suggest they were actually talking to each other. After a short period of time I think the etcd kubelet health check fails would cause the kubelet to shutdown the etcd instance, the logs on the etcd appear to suggest the instance is being instructed to shutdown rather than say crashing. I've never been able to make enough sense out of the apiserver logs to work out whats actually going on with it.

I know the crypto on ARM64 is a bit slow (lacking ASM implementations), I believe Golang is working on that at the moment, but that probably won't land until 1.11 at least and it looks like etcd is still on Go 1.8.5. I've been wondering if the reduced speed of the crypto is therefore exceeding some hardwired TLS timeouts in etcd and k8s.

@jmreicha
Copy link

Out of curiosity, I just hooked up a pi 3B+ I forgot I had and tried installing the master on it. Interestingly, the master came up using k8s version 1.10.2 and Docker 18.04 but k8s 1.10.3 seems to still be broken using the RPi.

Then I was able to join the remaining Pine64s to the cluster as workers. This isn't an ideal setup but at least gets me a 1.10 cluster for now. Still don't know what's different between Pine64 and RPi packages/hardware and why it decided to work on RPi but thought it might be helpful for others.

@jiashenC
Copy link

jiashenC commented Jun 5, 2018

I have the same issue on AWS server, and it turns out my EC2 instance blocks the incoming traffic. I resolved the issue by opening more accesses for Inbound rules.

@watsonl
Copy link

watsonl commented Jun 24, 2018

os ubuntu 16.04, 18.04, kubeadm verson 1.10.5
add no_proxy at /etc/environment solved my problem!
no_proxy="localhost,127.0.0.1,........."

@pytimer
Copy link
Contributor

pytimer commented Jul 30, 2018

I have the same issue on CentOS 7.4, i use http_proxy before run kubeadm init <args>, this issue must happen.

I unset http_proxy like @watsonl said, this issue solved. Anyone can tell me why? Thanks.

  • kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

@loburm
Copy link
Contributor

loburm commented Aug 18, 2018

Today I was trying to setup kubernetes cluster on Raspberry PI and encountered the same issue. I think the problem is that on apiserver fails all the time to finish it's own configuration during two minutes, and as the result kubelet killing it all the time. And after 4-5 minutes kubeadm also timeouts.

To fix this issue I have used next strategy. Once kubeadm enters init stage( "[init] This might take a minute or longer if the control plane images have to be pulled" is printed) I immediately update kubeapiserver manifest file by running next command:

sed -i 's/failureThreshold: 8/failureThreshold: 20/g' /etc/kubernetes/manifests/kube-apiserver.yaml

Then I have just killed current kube-apiserver container (docker kill).

After that it took near 3 minutes for apiserver to actually startup and kubeadm managed to continue its work.

@neolit123
Copy link
Member

some notes:

  • 1.9.x is now outside of the supported versions and we don't have the bandwidth to help users that have issues with it
  • make sure that you have a dual core CPU + 2GB of ram for the control plane node (master)

if you find a bug in kubeadm please report at the kubernetest/kubeadm repo or please use our support channels for help and questions:
https://github.com/kubernetes/community/blob/master/contributors/guide/issue-triage.md#user-support-response-example

thanks
/close

@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

some notes:

  • 1.9.x is now outside of the supported versions and we don't have the bandwidth to help users that have issues with it
  • make sure that you have a dual core CPU + 2GB of ram for the control plane node (master)

if you find a bug in kubeadm please report at the kubernetest/kubeadm repo or please use our support channels for help and questions:
https://github.com/kubernetes/community/blob/master/contributors/guide/issue-triage.md#user-support-response-example

thanks
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@loburm
Copy link
Contributor

loburm commented Oct 29, 2018

@neolit123 Why we can't just increase default failureThreshold from 8 to some more reasonable value? There is a lot of articles online how to install kubernetes cluster on Raspberry PI and with current configuration non of them actually work, due to limited amount of resources.

@neolit123
Copy link
Member

@loburm
we are adding a user controllable timeout for the API server in 1.13.

@neolit123
Copy link
Member

@rakesh1533
Copy link

kubeadm init --apiserver-advertise-address 192.168.33.11 --pod-network-cidr=192.168.0.0/16
[init] Using Kubernetes version: v1.13.1
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR NumCPU]: the number of available CPUs 1 is less than the required 2

I am getting this error when i am running kubeadm init on master or any other machine. Previously it was worked well.

@neolit123
Copy link
Member

@rakesh1533
we've added a check for CPU cores on control-plane nodes.
control plane nodes need 2 cores.

if you have less than 2 you are going to see this error.
to ignore it you can use --ignore-preflight-errors, but you might experience weird behavior depending on your setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests