-
Notifications
You must be signed in to change notification settings - Fork 39.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubeadm init stuck on [init] This might take a minute or longer if the control plane images have to be pulled. #61277
Comments
/kind bug |
/sig cluster-lifecycle |
1.9.5 has the same issue |
Same for v1.10.0 and v1.11.0 too.
|
on a side note, does anyone know how we can make kubeadm logs more verbose to understand what/why its actually hosed ? many thanks |
I followed advices from this link
Now run |
This issue often comes back on each new release of Kubernetes version. |
I am new to kubernetes and getting the same error on CentOS 7 (v1.10.0)
Tried everything that many posts have suggested but still not able to get it to work. |
I am getting a similar result to @nikhilno1 when attempting to initialize a master node on a raspberry pi. Any assistance is much appreciated.
|
I'm getting the same. When checking
this is while waiting for The apiserver container is exiting with a code 137. |
I experience the same issue with kubeadm 1.10.0 on raspi3 (hypriot os 1.8.0) but also see the kube-apiserver & etcd process running at full blast CPU:
Load of the pi even after 20 minutes running is This even continues after |
I am experiencing the same issue, please add me to this. I am getting pretty much identical logs as geoffgarside noted above. (I am on a Pi 3 B)
Included in my issue is what rhuss is seeing as well, a huge spike in load as indicated by "top" as well as my etcd and apiserver restarting every few minutes. |
It seems that the etcd causing the problem, the etcd node doesn't start properly because of certs issue. The kube-apiserver pod crashes because it can't locate an etcd pod. You can get around it by configuring an external etcd on your master node. Step 1: install etcd on your master node Step 2: configure /etc/etcd/etcd.conf Step 3: restart or start etcd Step 4: generate token Step 5: create config.yaml file File: config.yaml
Step 6: start kubeadm init |
I'm not quite sure what is killing etcd though, I've got the exited etcd container log below, just need to work out the source of its "exit (0)" status
|
The issue may lie in the fact that the etcd version is not supported by the installed version of kubeadm. Using kubeadm v1.10.1 with etcd v2.3.7 (latest available on Raspbian Stretch) and following the suggestion by @ChristianCandelaria to run an external etc instance, I get the following:
|
I'm running the etcd-arm64 3.1.12 container, so its the accepted version of etcd. The issue seems to be the livenessProbe on the etcd pod, though I'm not sure how. The kubelet logs show it thinking the container is dead and so restarting it. I've been trying the following docker exec $(docker ps --format "{{.ID}}" --filter="label=io.kubernetes.container.name=etcd") \
/bin/sh -ec "ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key get foo; echo \$?" and get the following output 2018-04-14 20:41:22.863186 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
0 so I'm still not sure why its thinking the container is dead. If I run the etcd service manually with |
I've turned up the logging verbosity on kubelet, seeing these probe messages now
It looks like this might be related to the |
I've experienced the same, but only after rebooting the box. It was all ok when I've created the CentOS instance in OpenStack and installed k8s on it. But when I tried to install k8s after rebooting the instance, or I rebooted after k8s install, k8s did not work anymore/ install hung as described above. |
I'm getting the same issue, running on a Raspberry Pi 3 Model B+ Here are the logs:
It hangs here and it is frustrating! |
I was having similar issues in a cluster I brought up from scratch (loosely based on kubernetes-the-hard-way and kubeadm), with the biggest difference being that I was running etcd separate from the kubelet as it's own non-k8s managed container, and all communication was secured using TLS. Upgrading the master node from v1.9.2 to v.1.10.0 caused the machine (a raspberry pi 3b) to crash (with kernel:[164210.000398] Internal error: Oops: 80000007 [#1] SMP ARM |
If you are running on non-x86 platform and if k8s is using |
Same here :( kubeadm 1.10.1 running on CentOS 7 on a x86 platform (Virtualbox VM managed with Vagrant). Components cannot connect to the API server (connetion refused and TLS handshake timeouts), plus CPU and disk usage at 100% UPDATE: in my case this was due to low RAM (just 512 MB). |
The |
I agree with Geoff here - all my other pods come up for the controller node, so I'd be surprised if there was something specifically wrong with the pause image. I'm wondering if it's a memory issue as Ferrari is alluding to - It's been hit or miss for me in a couple of my reproduction attempts as to whether this works, and low RAM could certainly be part of the problem on the Pi. |
I am facing the same issue in RHEL7. Any solution will be appreciated. Tried removing the swap space and still facing the same issue. |
I'm having the same problems as articulated above - where the kubeadm init hangs. I'm working on a raspberry pi 3B - and have tried every variety of mitigation identified above that some claim to have resolved their problem. I get the same pre-flight warnings others have detailed:
Being a relative n00b to all this, I'm at a bit of a loss - but looking at /var/log/daemon.log, I see it spewing errors like the following:
While I'm a k8s n00b - I'm pretty savvy with networking, daemons / listeners, certificates/pki, etc - and this, at least, is an error I understand. I'm assuming the API server is what's supposed to be listening on tcp 6443 - but this does not appear to be the case on my pi. No wonder all the connections are timing out. I'm barely following the discussions about regarding "spinning up a stand-alone etcd container' - but, assuming it is intended to be instantiated through this process - where, exactly, would I look to confirm that it's at least being attempted - and ideally, seeing an error there which prevents it? I got all excited by rongou's statement about a change to /etc/resolv.conf fixing his problem - alas, there's no search directive contained in my file. The connection error above seems so straight-forward - but I'm in over my head on how to proceed with the troubleshooting. |
@snowake4me is 10.10.10.201 the correct IP address for your host? |
Seeing similar behavior (on a Pine64, not an RPi). The etcd container seems to be crash looping, as well as the API server. Here's the last few lines of the etcd logs before it crashes.
This is on Docker |
Hangs @ Fixed by downgrading kubeadm, kubectl, and kubelet to 1.9.6: AND Downgrading to Docker 18.04: |
@toolboc Just confirmed that downgrading to I was then able to bring up a 1.10.2 cluster with I still need to do more testing but it seems to be working. |
It turns out that I am still having issues with 1.10.2 clusters. I left the |
In my case I didn't have any state I needed to keep so I reset my cluster and then downgraded from 1.10.2 to 1.9.7: After that I could init the cluster successfully. I'm on a Raspberry Pi 3 with Hypriotos: |
I'm also having the same issue on raspbian stretch in a RB+
Edit: Fixed updating docker version |
Thanks all for your help! I finally got passed kubeadm init. For any of you who are still having trouble all i did was downgrade both kubernetes and docker. If you follow the guide linked from @FFFEGO above the only differences are:
Here is a quick list for now (needs to be read with guide [https://gist.github.com/aaronkjones/d996f1a441bc80875fd4929866ca65ad])
I will share a better write up later if requested |
I encountered the same issue when installed k8s1.10.3. I downloaded images of k8s, but failed. The fatal reason, I used outdated pause image, gcr.io/google_containers/pause-amd64:3.0. 16:56:59@cd-arch-2104|~ I finally installed "kubeadm init" successfully with these 6 images. I failed for the master can't download pause-amd64. Hope this can help you. Thank you very much, Lord Jesus! |
I dug a little bit deeper into this but am still stuck. It looks like the I ran a 1.9 etcd.yaml
1.10 etcd.yaml
The kube-apiserver has also been updated to use this https etcd, instead of the http version that is used in 1.9. I can get the Unfortunately I'm not sure how to fix this, but would love to get it figured out. |
@jmreicha yeah thats what I found with 1.10.2 as well. The switch of etcd from HTTP to HTTPS seems to be the main source of this issue. I tried starting a etcd instance using docker and the same options as the manifest and it all ran fine. I was also able to I know the crypto on ARM64 is a bit slow (lacking ASM implementations), I believe Golang is working on that at the moment, but that probably won't land until 1.11 at least and it looks like etcd is still on Go 1.8.5. I've been wondering if the reduced speed of the crypto is therefore exceeding some hardwired TLS timeouts in etcd and k8s. |
Out of curiosity, I just hooked up a pi 3B+ I forgot I had and tried installing the master on it. Interestingly, the master came up using k8s version 1.10.2 and Docker 18.04 but k8s 1.10.3 seems to still be broken using the RPi. Then I was able to join the remaining Pine64s to the cluster as workers. This isn't an ideal setup but at least gets me a 1.10 cluster for now. Still don't know what's different between Pine64 and RPi packages/hardware and why it decided to work on RPi but thought it might be helpful for others. |
I have the same issue on AWS server, and it turns out my EC2 instance blocks the incoming traffic. I resolved the issue by opening more accesses for Inbound rules. |
os ubuntu 16.04, 18.04, kubeadm verson 1.10.5 |
I have the same issue on CentOS 7.4, i use http_proxy before run I unset http_proxy like @watsonl said, this issue solved. Anyone can tell me why? Thanks.
|
Today I was trying to setup kubernetes cluster on Raspberry PI and encountered the same issue. I think the problem is that on apiserver fails all the time to finish it's own configuration during two minutes, and as the result kubelet killing it all the time. And after 4-5 minutes kubeadm also timeouts. To fix this issue I have used next strategy. Once kubeadm enters init stage( "[init] This might take a minute or longer if the control plane images have to be pulled" is printed) I immediately update kubeapiserver manifest file by running next command: sed -i 's/failureThreshold: 8/failureThreshold: 20/g' /etc/kubernetes/manifests/kube-apiserver.yaml Then I have just killed current kube-apiserver container (docker kill). After that it took near 3 minutes for apiserver to actually startup and kubeadm managed to continue its work. |
some notes:
if you find a bug in kubeadm please report at the kubernetest/kubeadm repo or please use our support channels for help and questions: thanks |
@neolit123: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@neolit123 Why we can't just increase default failureThreshold from 8 to some more reasonable value? There is a lot of articles online how to install kubernetes cluster on Raspberry PI and with current configuration non of them actually work, due to limited amount of resources. |
@loburm |
kubeadm init --apiserver-advertise-address 192.168.33.11 --pod-network-cidr=192.168.0.0/16 I am getting this error when i am running kubeadm init on master or any other machine. Previously it was worked well. |
@rakesh1533 if you have less than 2 you are going to see this error. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
/sig bug
What happened:
Kubeadm init hangs at:
[init] Using Kubernetes version: v1.9.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.56.60]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
...
What you expected to happen:
Cluster initialized. --> Kubernetes master inilized
How to reproduce it (as minimally and precisely as possible):
Install docker
Install kubeadm, kubelet, kubectl
use kubeadm init command
Anything else we need to know?:
Since yesterday when initializing a cluster it automatically used Kubernetes version: v1.9.4.
I tried forcing kubeadm to use --kubernetes-version=v1.9.3 but I still have the same issue.
Last week it was fine when I reset my Kubernetes cluster and reinitialize it again.
I found the issue yesterday when I wanted to reset my cluster again and reinitialize it and it got stuck.
I tried yum update, to update all my software, but still the same issue
I used Kubernetes v1.9.3 and after updating today... I'm using Kubernetes v1.9.4
Environment:
Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4", GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean", BuildDate:"2018-03-12T16:29:47Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Kernel (e.g.
uname -a
):Linux master 3.10.0-514.el7.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Install tools: Kubeadm
The text was updated successfully, but these errors were encountered: