Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve kubeadm join preflight #1128

Closed
fabriziopandini opened this issue Sep 18, 2018 · 12 comments · Fixed by kubernetes/kubernetes#69662
Closed

Improve kubeadm join preflight #1128

fabriziopandini opened this issue Sep 18, 2018 · 12 comments · Fixed by kubernetes/kubernetes#69662
Labels
area/UX help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@fabriziopandini
Copy link
Member

Currently kubeadm join preflight checks are executed without considering the ClusterConfiguration, and the IPVS check is always executed, no matter if the cluster uses IPVS or not.

Kubeadm join preflight checks should be improved by considering the ClusterConfiguration in place.

@fabriziopandini fabriziopandini added kind/bug Categorizes issue or PR as related to a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/UX labels Sep 18, 2018
@fabriziopandini fabriziopandini added this to the v1.13 milestone Sep 18, 2018
@ereslibre
Copy link
Contributor

I can have a look at it @fabriziopandini. I'm not familiar with IPVS though.

@fabriziopandini
Copy link
Member Author

/lifecycle active
Thanks @ereslibre. Please note that the problem here is not the IPVS check, but the fact that the check is always executed. To better understand this you can check the differences between master preflight check, that runs IPVS check only if IPVS is used, and node preflight check

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Oct 10, 2018
@ereslibre
Copy link
Contributor

@fabriziopandini Here's an approach: kubernetes/kubernetes#69662; in particular I don't know if preflight/checks.go is the right place for InitConfigurationFromJoinConfiguration and FetchInitConfiguration. The whole idea is to discover the InitConfiguration from the cluster based on the JoinConfiguration when we perform the preflight checks, so it's possible to mold the checks done by the preflight on the join process depending on this cluster's InitConfiguration.

@ereslibre
Copy link
Contributor

ereslibre commented Oct 11, 2018

With this approach I'm hitting: Unable to fetch the kubeadm-config ConfigMap: configmaps "kubeadm-config" is forbidden: User "system:bootstrap:oe5abl" cannot get configmaps in the namespace "kube-system" when joining the node.

@ereslibre
Copy link
Contributor

I think I have to change the approach, it's not possible to assume that we can download the kubeadm-config configmap before the join. Even changing that to run the pre-flight checks after the join would relay on node-autoapprove to be enabled; so my question is, what's the expected approach here? Because as far as I can see there's no ComponentConfigs in JoinConfiguration, and we cannot download the kubeadm-config before joining, so how could do this in the client-side of the joining machine?

@ereslibre
Copy link
Contributor

ereslibre commented Oct 11, 2018

I created the first node with an older version of kubeadm (missed that), that was my fault. Now the join output looks like:

kubic-node-0:~ # ./kubeadm join 192.168.122.61:6443 --token mga0vo.zewbd3gyfzj7p6c3 --discovery-token-ca-cert-hash sha256:673f7fb12a588a75cedd6443d4ede01eb7b7590c2e653d9e4ea6f211a27e7811 --cri-socket=/var/run/crio/crio.sock --ignore-preflight-errors=all                                                                                                                          
I1011 14:09:12.268451    2345 join.go:270] [join] found NodeName empty; using OS hostname as NodeName
I1011 14:09:12.268698    2345 join.go:274] [join] found advertiseAddress empty; using default interface's IP address as advertiseAddress
[preflight] running pre-flight checks
I1011 14:09:12.269416    2345 join.go:288] [preflight] running various checks on all nodes
I1011 14:09:12.269666    2345 checks.go:920] [join] discovering cluster-info
[discovery] Trying to connect to API Server "192.168.122.61:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.122.61:6443"
[discovery] Requesting info from "https://192.168.122.61:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.122.61:6443"
[discovery] Successfully established connection with API Server "192.168.122.61:6443"
I1011 14:09:12.299473    2345 checks.go:927] [join] retrieving KubeConfig objects
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I1011 14:09:12.314690    2345 checks.go:250] validating the existence and emptiness of directory /etc/kubernetes/manifests
I1011 14:09:12.315077    2345 checks.go:288] validating the existence of file /etc/kubernetes/kubelet.conf
I1011 14:09:12.315218    2345 checks.go:288] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
        [WARNING FileAvailable--etc-kubernetes-bootstrap-kubelet.conf]: /etc/kubernetes/bootstrap-kubelet.conf already exists
I1011 14:09:12.315679    2345 checks.go:109] validating the container runtime
I1011 14:09:12.333335    2345 checks.go:378] validating the presence of executable crictl
I1011 14:09:12.333559    2345 checks.go:337] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I1011 14:09:12.333824    2345 checks.go:337] validating the contents of file /proc/sys/net/ipv4/ip_forward
I1011 14:09:12.334095    2345 checks.go:649] validating whether swap is enabled or not
I1011 14:09:12.334296    2345 checks.go:378] validating the presence of executable ip
I1011 14:09:12.334492    2345 checks.go:378] validating the presence of executable iptables
I1011 14:09:12.334667    2345 checks.go:378] validating the presence of executable mount
I1011 14:09:12.334809    2345 checks.go:378] validating the presence of executable nsenter
I1011 14:09:12.334964    2345 checks.go:378] validating the presence of executable ebtables
I1011 14:09:12.335098    2345 checks.go:378] validating the presence of executable ethtool
I1011 14:09:12.335141    2345 checks.go:378] validating the presence of executable socat
I1011 14:09:12.335384    2345 checks.go:378] validating the presence of executable tc
I1011 14:09:12.335513    2345 checks.go:378] validating the presence of executable touch
I1011 14:09:12.335683    2345 checks.go:520] running all checks
I1011 14:09:12.357711    2345 checks.go:408] checking whether the given node name is reachable using net.LookupHost
        [WARNING Hostname]: hostname "kubic-node-0" could not be reached
        [WARNING Hostname]: hostname "kubic-node-0" lookup kubic-node-0 on 192.168.122.1:53: no such host
I1011 14:09:12.358415    2345 checks.go:618] validating kubelet version
I1011 14:09:12.611845    2345 checks.go:135] validating if the service is enabled and active
I1011 14:09:12.628866    2345 checks.go:213] validating availability of port 10250
I1011 14:09:12.629838    2345 checks.go:288] validating the existence of file /etc/kubernetes/pki/ca.crt
        [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
I1011 14:09:12.630507    2345 checks.go:435] validating if the connectivity type is via proxy or direct
I1011 14:09:12.630597    2345 join.go:300] [join] discovering cluster-info
[discovery] Trying to connect to API Server "192.168.122.61:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.122.61:6443"
[discovery] Requesting info from "https://192.168.122.61:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.122.61:6443"
[discovery] Successfully established connection with API Server "192.168.122.61:6443"
I1011 14:09:12.652950    2345 join.go:436] [join] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I1011 14:09:12.884523    2345 join.go:461] Stopping the kubelet
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace
configmaps "kubelet-config-1.11" is forbidden: User "system:bootstrap:mga0vo" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

My patch is on top of 8012b9583e5301c2f66db4118b38433d035b87c4, I wonder if I'm doing something wrong during the join.

@bart0sh
Copy link

bart0sh commented Oct 11, 2018

[WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists

This looks suspicious. Do you want to use pre-created certificates or those are just leftovers from your previous installation?

./kubeadm join 192.168.122.61:6443 --token mga0vo.zewbd3gyfzj7p6c3 --discovery-token-ca-cert-hash sha256:673f7fb12a588a75cedd6443d4ede01eb7b7590c2e653d9e4ea6f211a27e7811 --cri-socket=/var/run/crio/crio.sock --ignore-preflight-errors=all

It looks like --ignore-preflight-errors=all becomes common way to run kubeadm, which is quite dangerous sign from my point of view. May I ask what was the reason of using this option?

@ereslibre
Copy link
Contributor

@bart0sh I ran this on a VM based cluster just to test my PR. It was a re-run of the join, since certificates were there already I added ignore-preflight-errors=all because I just wanted to get the part that I was interested in (that the kubeadm-config configmap could be downloaded). It's not something we are doing in "real life" environments at all. This was just in the context of trying this specific PR and after join had to be called again on that node.

@bart0sh
Copy link

bart0sh commented Oct 11, 2018

@ereslibre do you have client certificates in /etc/kubernetes/pki? If you do, can you look which CN is in those certificates? I suspect that system:bootstrap:mga0vo comes from there. If using pre-created certs is not important for you you can remove them. This could help to join the cluster.

@ereslibre
Copy link
Contributor

@bart0sh Thanks for your help. I already destroyed the cluster and I won't be able to retry for some hours. I'll get back to this later today.

@Ramane19
Copy link

[ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists

[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...

How can I fix this issue?

@KptnKMan
Copy link

KptnKMan commented Jun 25, 2019

I'm also trying to understand the same issue.
I checked the documentation and made this string that works for me:

--ignore-preflight-errors='DirAvailable--etc-kubernetes-manifests,FileAvailable--etc-kubernetes-kubelet.conf,Port-10250,FileAvailable--etc-kubernetes-pki-ca.crt'

You can find a working example here, that works for me on v1.14.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/UX help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants