Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running pre-flight checks hang #1477

Closed
afagund opened this issue Apr 1, 2019 · 27 comments
Closed

Running pre-flight checks hang #1477

afagund opened this issue Apr 1, 2019 · 27 comments
Labels
kind/support Categorizes issue or PR as a support question. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/network Categorizes an issue or PR as relevant to SIG Network.
Milestone

Comments

@afagund
Copy link

afagund commented Apr 1, 2019

What keywords did you search in kubeadm issues before filing this one?

preflight
hang
kubeadm join

BUG REPORT

Versions

kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:51:21Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:

  • OS (e.g. from /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

What happened?

Problem when joining a control-plane. The process hang with message Running pre-flight checks. See below:

[root@vm02 ~]# kubeadm join vm10.andrefagundes.org:6443 --token 07nh7g.v8p5fcs61fn3o2h4 --discovery-token-ca-cert-hash sha256:039a5f9229dafe39d4a51af6899c20adff1de5dda23f780ac9b896e95f95623a --experimental-control-plane --certificate-key 8afd066a7b8baa2abf86ba1b2d5e7f29625875d8f78a3e136f7fd35605b4775
[preflight] Running pre-flight checks

What you expected to happen?

I was expecting the node to be joined or a message indicating an error.

How to reproduce it (as minimally and precisely as possible)?

I am following the official documentation below.

https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd-nodes

Anything else we need to know?

No.

@afagund
Copy link
Author

afagund commented Apr 1, 2019

With v10 parameter.

[root@vm03 etcd]# kubeadm join vm10.andrefagundes.org:6443 --token 07nh7g.v8p5fcs61fn3o2h4 --discovery-token-ca-cert-hash sha256:039a5f9229dafe39d4a51af6899c20adff1de5dda23f780ac9b896e95f95623a --experimental-control-plane --certificate-key cf3c8ca4f74751bfe7fc9d3e00e03a37619d36a6d6fb79fb5ba3645d74dd7bf4 -v10
I0401 00:34:08.531961 16893 join.go:367] [preflight] found NodeName empty; using OS hostname as NodeName
I0401 00:34:08.532014 16893 join.go:371] [preflight] found advertiseAddress empty; using default interface's IP address as advertiseAddress
I0401 00:34:08.532048 16893 initconfiguration.go:105] detected and using CRI socket: /var/run/dockershim.sock
I0401 00:34:08.532179 16893 interface.go:384] Looking for default routes with IPv4 addresses
I0401 00:34:08.532187 16893 interface.go:389] Default route transits interface "eth0"
I0401 00:34:08.532324 16893 interface.go:196] Interface eth0 is up
I0401 00:34:08.532380 16893 interface.go:244] Interface "eth0" has 4 addresses :[192.168.122.103/24 fe80::a3c0:2a34:91f2:e0eb/64 fe80::8439:c3eb:5848:c1f2/64 fe80::4381:b4a5:5836:a0e1/64].
I0401 00:34:08.532399 16893 interface.go:211] Checking addr 192.168.122.103/24.
I0401 00:34:08.532407 16893 interface.go:218] IP found 192.168.122.103
I0401 00:34:08.532415 16893 interface.go:250] Found valid IPv4 address 192.168.122.103 for interface "eth0".
I0401 00:34:08.532421 16893 interface.go:395] Found active IP 192.168.122.103
[preflight] Running pre-flight checks
I0401 00:34:08.532495 16893 preflight.go:90] [preflight] Running general checks
I0401 00:34:08.532539 16893 checks.go:254] validating the existence and emptiness of directory /etc/kubernetes/manifests
I0401 00:34:08.532570 16893 checks.go:292] validating the existence of file /etc/kubernetes/kubelet.conf
I0401 00:34:08.532579 16893 checks.go:292] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I0401 00:34:08.532586 16893 checks.go:105] validating the container runtime
I0401 00:34:08.580885 16893 checks.go:131] validating if the service is enabled and active
I0401 00:34:08.638659 16893 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0401 00:34:08.638724 16893 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0401 00:34:08.638755 16893 checks.go:653] validating whether swap is enabled or not
I0401 00:34:08.638788 16893 checks.go:382] validating the presence of executable ip
I0401 00:34:08.638809 16893 checks.go:382] validating the presence of executable iptables
I0401 00:34:08.638824 16893 checks.go:382] validating the presence of executable mount
I0401 00:34:08.638837 16893 checks.go:382] validating the presence of executable nsenter
I0401 00:34:08.638849 16893 checks.go:382] validating the presence of executable ebtables
I0401 00:34:08.638860 16893 checks.go:382] validating the presence of executable ethtool
I0401 00:34:08.638871 16893 checks.go:382] validating the presence of executable socat
I0401 00:34:08.638883 16893 checks.go:382] validating the presence of executable tc
I0401 00:34:08.638894 16893 checks.go:382] validating the presence of executable touch
I0401 00:34:08.638914 16893 checks.go:524] running all checks
I0401 00:34:08.664826 16893 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I0401 00:34:08.665583 16893 checks.go:622] validating kubelet version
I0401 00:34:08.709573 16893 checks.go:131] validating if the service is enabled and active
I0401 00:34:08.716270 16893 checks.go:209] validating availability of port 10250
I0401 00:34:08.716418 16893 checks.go:439] validating if the connectivity type is via proxy or direct
I0401 00:34:08.716444 16893 join.go:427] [preflight] Discovering cluster-info
I0401 00:34:08.716498 16893 token.go:200] [discovery] Trying to connect to API Server "vm10.andrefagundes.org:6443"
I0401 00:34:08.716961 16893 token.go:75] [discovery] Created cluster-info discovery client, requesting info from "https://vm10.andrefagundes.org:6443"
I0401 00:34:08.717031 16893 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubeadm/v1.14.0 (linux/amd64) kubernetes/641856d" 'https://vm10.andrefagundes.org:6443/api/v1/namespaces/kube-public/configmaps/cluster-info'
I0401 00:34:08.722405 16893 round_trippers.go:438] GET https://vm10.andrefagundes.org:6443/api/v1/namespaces/kube-public/configmaps/cluster-info 403 Forbidden in 5 milliseconds
I0401 00:34:08.722423 16893 round_trippers.go:444] Response Headers:
I0401 00:34:08.722432 16893 round_trippers.go:447] Content-Type: application/json
I0401 00:34:08.722441 16893 round_trippers.go:447] X-Content-Type-Options: nosniff
I0401 00:34:08.722450 16893 round_trippers.go:447] Content-Length: 321
I0401 00:34:08.722458 16893 round_trippers.go:447] Date: Mon, 01 Apr 2019 03:34:08 GMT
I0401 00:34:08.722497 16893 request.go:942] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"","reason":"Forbidden","details":{"name":"cluster-info","kind":"configmaps"},"code":403}
I0401 00:34:08.722937 16893 token.go:83] [discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"]

@afagund
Copy link
Author

afagund commented Apr 1, 2019

Another info ... vm10.andrefagundes.org is a Haproxy in front of my control plane.

@neolit123
Copy link
Member

seems like a networking issue to me.
are you sure this joining node has connectivity to port 6443 on the LB and can resolve vm10.andrefagundes.org?

@neolit123 neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Apr 1, 2019
@afagund
Copy link
Author

afagund commented Apr 1, 2019

Yes, I also changed vm10 to point to control plane. I saw traffic on control plane coming in monitoring with TCDUMP.

@neolit123
Copy link
Member

are you seeing any outstanding errors in the kubelet logs?

@afagund
Copy link
Author

afagund commented Apr 1, 2019

There are several errors in the logs. I also tried to reinstall the cluster few times and each time I get different errors. I am giving up. We can close the case. Thanks!!

@neolit123
Copy link
Member

does creating a single control plane node + some worker nodes work for you or does the problem only happen when joining additional control plane nodes?

@fabriziopandini
Copy link
Member

User "system:anonymous" cannot get resource "configmaps" in API group "" in the namespace "kube-public"","reason":"Forbidden","details":{"name":"cluster-info","kind":"configmaps"},"code":403

Seems like kubeadm init doesn't created/configured cluster-info properly
Could you share the kubeadm init logs?

@timothysc timothysc added this to the Next milestone Apr 17, 2019
@meowlomoDevelopment
Copy link

I have the same error after i executed the command 'kubeadm join ...' : Running pre-flight checks stuck. I have no idea to handle it.

@AnneMarijke
Copy link

I had the same issue. I needed to reboot the master and after that executing the 'kubeadm join ...' command again on the nodes worked for me.

@wotmshuaisi
Copy link

i had same issues with kubeadm v1.15, reboot master doesn't works for me

@wotmshuaisi
Copy link

i had same issues with kubeadm v1.15, reboot master doesn't works for me

fall back to kubelet & kubeadm v1.13.1 fixed this issues

@neolit123
Copy link
Member

make sure you call kubeadm init/join with e.g. --v=2 to have more details on what's going on.

@hernando-garcia
Copy link

hernando-garcia commented Jul 27, 2019

Bumped into the same issue but the problem was traced down to network connectivity my side with my keepalived and haproxy daemons that were configured wrongly preventing the hang master node to join the cluster via the API service VIP

Worth pointing out that running the kubeadm init/join with --v=2 was how I got to resolve it

@cclient
Copy link

cclient commented Aug 2, 2019

make sure you call kubeadm init/join with e.g. --v=2 to have more details on what's going on.

kubeadm v1.15

kubeadm join .. --v=2

I0802 11:47:31.027812 359 token.go:202] [discovery] Failed to connect to API Server "": token id "r5uyqk" is invalid for this cluster or it has expired. Use "kubeadm token create" on the control-plane node to create a new valid token

kubeadm init phase upload-certs --upload-certs
kubeadm token create

then kubeadm join sucess

@karamsahu
Copy link

In my case, I was able to successfully join the node by stopping the firewall on the Master node.

systemctl stop firewall

@sambit15k
Copy link

In my case, I was able to successfully join the node by stopping the firewall on the Master node.

systemctl stop firewall

This one worked like charm .
[root@localhost ~]# kubeadm join 192.168.8.128:6443 --token 38lhr8.kxi5uy8aoy71dj17 --discovery-token-ca-cert-hash sha256:a12c805b8d98f42a256486d27e87463e22aaba190ab8f5bdce89bbb843fca983
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.1. Latest validated version: 18.09
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:

  • Certificate signing request was sent to apiserver and a response was received.
  • The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

@neolit123
Copy link
Member

looking at the log in the OP again, this is not a "hang" in the preflight, but rather the cluster-info config map cannot be accessed, the only way this could happen if the "boostrap-token" phase of "init" is skipped.

looking at later reports, i see networking and expired token problems which fall under "support" items and not bugs.

/triage support
for questions, try stackoverflow, reddit or #kubeadm on k8s slack.

if you find a real bug please, open a new issue.

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Aug 22, 2019
@haanyip
Copy link

haanyip commented Nov 22, 2019

In my case, I was able to successfully join the node by stopping the firewall on the Master node.

systemctl stop firewall

systemctl stop firewalld

@harish-singh01
Copy link

I find traffic was not allowed to connect master node.

adding rules in sg solved my problem

@mohak-aws
Copy link

I have the same error after i executed the command 'kubeadm join ...' : Running pre-flight checks stuck. I have no idea to handle it.

Did you find any solution?

@mohak-aws
Copy link

I find traffic was not allowed to connect master node.

adding rules in sg solved my problem

what inbound port you allowed?

@copycharming
Copy link

I was unable to join my master node with kubeadm join command.

Here is what i get when i run with --v=5

sudo kubeadm join 172.31.41.122:6443 --token tinrm3.qo0tg18ml9lk3xuk --discovery-token-ca-cert-hash sha256:0961d71f484fe3ba64a2a170e86ba80846d61f7e4f2d0d006c301bf17b5eaaf2 --v=5 I0922 05:24:02.872327 3872 join.go:405] [preflight] found NodeName empty; using OS hostname as NodeName I0922 05:24:02.872399 3872 initconfiguration.go:116] detected and using CRI socket: /var/run/dockershim.sock [preflight] Running pre-flight checks I0922 05:24:02.872461 3872 preflight.go:92] [preflight] Running general checks I0922 05:24:02.872503 3872 checks.go:245] validating the existence and emptiness of directory /etc/kubernetes/manifests I0922 05:24:02.872548 3872 checks.go:282] validating the existence of file /etc/kubernetes/kubelet.conf I0922 05:24:02.872564 3872 checks.go:282] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf I0922 05:24:02.872573 3872 checks.go:106] validating the container runtime I0922 05:24:02.967028 3872 checks.go:132] validating if the "docker" service is enabled and active I0922 05:24:02.983081 3872 checks.go:331] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables I0922 05:24:02.983171 3872 checks.go:331] validating the contents of file /proc/sys/net/ipv4/ip_forward I0922 05:24:02.983205 3872 checks.go:649] validating whether swap is enabled or not I0922 05:24:02.983245 3872 checks.go:372] validating the presence of executable conntrack I0922 05:24:02.983270 3872 checks.go:372] validating the presence of executable ip I0922 05:24:02.983295 3872 checks.go:372] validating the presence of executable iptables I0922 05:24:02.983323 3872 checks.go:372] validating the presence of executable mount I0922 05:24:02.983356 3872 checks.go:372] validating the presence of executable nsenter I0922 05:24:02.983375 3872 checks.go:372] validating the presence of executable ebtables I0922 05:24:02.983395 3872 checks.go:372] validating the presence of executable ethtool I0922 05:24:02.983416 3872 checks.go:372] validating the presence of executable socat I0922 05:24:02.983435 3872 checks.go:372] validating the presence of executable tc I0922 05:24:02.983454 3872 checks.go:372] validating the presence of executable touch I0922 05:24:02.983473 3872 checks.go:520] running all checks I0922 05:24:03.104309 3872 checks.go:403] checking whether the given node name is valid and reachable using net.LookupHost I0922 05:24:03.104989 3872 checks.go:618] validating kubelet version I0922 05:24:03.181742 3872 checks.go:132] validating if the "kubelet" service is enabled and active I0922 05:24:03.191633 3872 checks.go:205] validating availability of port 10250 I0922 05:24:03.191786 3872 checks.go:282] validating the existence of file /etc/kubernetes/pki/ca.crt I0922 05:24:03.191806 3872 checks.go:432] validating if the connectivity type is via proxy or direct I0922 05:24:03.191860 3872 join.go:475] [preflight] Discovering cluster-info I0922 05:24:03.191890 3872 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "172.31.41.122:6443" I0922 05:24:13.193677 3872 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://172.31.41.122:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) I0922 05:24:29.103703 3872 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://172.31.41.122:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) I0922 05:24:45.518929 3872 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://172.31.41.122:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) I0922 05:25:01.522705 3872 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://172.31.41.122:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) I0922 05:25:17.183833 3872 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://172.31.41.122:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) I0922 05:25:32.825987 3872 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://172.31.41.122:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) I0922 05:25:48.857077 3872 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://172.31.41.122:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Any help would be appreciated

@neolit123
Copy link
Member

try asking for more help in the support channels:
https://github.com/kubernetes/kubeadm#support

the error above indicates that the worker node cannot reach https://172.31.41.122:6443
you have to debug this connectivity problem.

@hadi-mansoori
Copy link

Read the following document and open the ports on your server. Problems will be solved.
https://kubernetes.io/docs/reference/ports-and-protocols/

@ashisharyaa
Copy link

ashisharyaa commented May 11, 2022

In my case, I was able to successfully join the node by giving all traffic access in Security group on the Master node.

Note: Allowing All traffic is only for self practice do not do as Organization level.

@santygcp
Copy link

43

you need to create the firewall opening port

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests