New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm join failing preflight checks #347

Closed
cgilmour opened this Issue Jul 17, 2017 · 21 comments

Comments

Projects
None yet
@cgilmour

cgilmour commented Jul 17, 2017

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version: &version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"2017-07-14T01:48:01Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"2017-07-14T02:00:46Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.3.1611 (Core)

What happened?

[root@ip-192-168-99-11 ~]# kubeadm join --token random.16charactertoken 192.168.99.10:6443
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] WARNING: hostname "" could not be reached
[preflight] WARNING: hostname "" lookup : no such host
[preflight] Some fatal errors occurred:
	hostname "" a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`

What you expected to happen?

Normal completion of kubeadm join

How to reproduce it (as minimally and precisely as possible)?

Try to perform kubeadm join with v1.7..1

Anything else we need to know?

Skipping preflight checks allows the join to run normally.

I suspect the problem is this line:
https://github.com/kubernetes/kubernetes/blob/v1.7.1/cmd/kubeadm/app/preflight/checks.go#L544

It was not updated to match the changes to HostnameCheck in kubernetes/kubernetes@e20f5a3
So it always passes in an empty string and always fails the updated HostnameCheck.

@lwr20

This comment has been minimized.

Show comment
Hide comment
@lwr20

lwr20 Jul 17, 2017

I'm seeing this too.

lwr20 commented Jul 17, 2017

I'm seeing this too.

@laumair

This comment has been minimized.

Show comment
Hide comment
@laumair

laumair Jul 17, 2017

Yeah as a workaround you can skip the preflight checks but the problem persists in 1.7.1

laumair commented Jul 17, 2017

Yeah as a workaround you can skip the preflight checks but the problem persists in 1.7.1

@luxas luxas self-assigned this Jul 17, 2017

@erhudy

This comment has been minimized.

Show comment
Hide comment
@erhudy

erhudy Jul 17, 2017

Same is happening for me.

erhudy commented Jul 17, 2017

Same is happening for me.

@luxas luxas added this to the v1.7 milestone Jul 17, 2017

@rushabh268

This comment has been minimized.

Show comment
Hide comment
@rushabh268

rushabh268 Jul 17, 2017

Seeing this on 1.7.1. Even after providing node-name, it still complains with the same error. Workaround is to skip pre-flight checks. Looks like, the error is happening in this validation:

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L110-L126

rushabh268 commented Jul 17, 2017

Seeing this on 1.7.1. Even after providing node-name, it still complains with the same error. Workaround is to skip pre-flight checks. Looks like, the error is happening in this validation:

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L110-L126

@stevenbower

This comment has been minimized.

Show comment
Hide comment
@stevenbower

stevenbower commented Jul 18, 2017

Actually looks to be here:
https://github.com/kubernetes/kubernetes/blob/v1.7.1/cmd/kubeadm/app/preflight/checks.go#L544

Looks like the HostNameCheck for node join missed an update in kubernetes/kubernetes@e20f5a3

@luxas

This comment has been minimized.

Show comment
Hide comment
@luxas

luxas Jul 18, 2017

Member

Thanks @stevenbower for the PR!

Member

luxas commented Jul 18, 2017

Thanks @stevenbower for the PR!

@Caldas

This comment has been minimized.

Show comment
Hide comment
@Caldas

Caldas Jul 28, 2017

Problem persists even after v1.7.2

Caldas commented Jul 28, 2017

Problem persists even after v1.7.2

@erhudy

This comment has been minimized.

Show comment
Hide comment
@erhudy

erhudy Jul 28, 2017

Confirmed, problem is still present.

erhudy commented Jul 28, 2017

Confirmed, problem is still present.

@erhudy

This comment has been minimized.

Show comment
Hide comment
@erhudy

erhudy Jul 28, 2017

Looks like the PR to fix it hasn't been merged to master yet: kubernetes/kubernetes#49073

After it's merged, it will also need to make it onto a backport branch for a 1.7.x point release.

erhudy commented Jul 28, 2017

Looks like the PR to fix it hasn't been merged to master yet: kubernetes/kubernetes#49073

After it's merged, it will also need to make it onto a backport branch for a 1.7.x point release.

@mattmoyer

This comment has been minimized.

Show comment
Hide comment
@mattmoyer

mattmoyer Jul 28, 2017

Member

@erhudy: It's a little confusing, but that change was already cherry picked onto the 1.7 branch (in kubernetes/kubernetes#49094). The fix is released in 1.7.2.

It's still broken in master at the moment.

Edit: nope, I'm wrong.

Edit Edit: I wasn't completely wrong, the purported fix was backported but the issue is still in 1.7.2.

Member

mattmoyer commented Jul 28, 2017

@erhudy: It's a little confusing, but that change was already cherry picked onto the 1.7 branch (in kubernetes/kubernetes#49094). The fix is released in 1.7.2.

It's still broken in master at the moment.

Edit: nope, I'm wrong.

Edit Edit: I wasn't completely wrong, the purported fix was backported but the issue is still in 1.7.2.

@erhudy

This comment has been minimized.

Show comment
Hide comment
@erhudy

erhudy Jul 28, 2017

Oh, interesting. kubeadm 1.7.2 is still showing the behavior, so I'm going to take another look at it.

erhudy commented Jul 28, 2017

Oh, interesting. kubeadm 1.7.2 is still showing the behavior, so I'm going to take another look at it.

@mattmoyer

This comment has been minimized.

Show comment
Hide comment
@mattmoyer

mattmoyer Jul 28, 2017

Member

@erhudy ah, you're right. I crossed the streams.

Member

mattmoyer commented Jul 28, 2017

@erhudy ah, you're right. I crossed the streams.

@mattmoyer

This comment has been minimized.

Show comment
Hide comment
@mattmoyer

mattmoyer Jul 28, 2017

Member

Apologies for all the back and forth and confusion here. I did more digging and I do see the purported fix backported to v1.7.2: https://github.com/kubernetes/kubernetes/blob/v1.7.2/cmd/kubeadm/app/preflight/checks.go#L544

I also confirmed that 1.7.2 still has the problem. I think that means kubernetes/kubernetes#49073 is not a complete fix for the problem.

cc @stevenbower @luxas

Member

mattmoyer commented Jul 28, 2017

Apologies for all the back and forth and confusion here. I did more digging and I do see the purported fix backported to v1.7.2: https://github.com/kubernetes/kubernetes/blob/v1.7.2/cmd/kubeadm/app/preflight/checks.go#L544

I also confirmed that 1.7.2 still has the problem. I think that means kubernetes/kubernetes#49073 is not a complete fix for the problem.

cc @stevenbower @luxas

@erhudy

This comment has been minimized.

Show comment
Hide comment
@erhudy

erhudy Jul 28, 2017

As far as I can tell, the problem is that in v1.7.0, the hostname check had the following structure:

	hostname := node.GetHostname("")
	for _, msg := range validation.ValidateNodeName(hostname, false) {
		errors = append(errors, fmt.Errorf("hostname \"%s\" %s", hostname, msg))
	}
	addr, err := net.LookupHost(hostname)

The node.GetHostname("") call gets passed to a method which then returns the value of os.Hostname(). This comes back, gets looked up via net.LookupHost(hostname), and passes.

In v1.7.2, the structure of this check changed. The hostnameCheck struct now has a nodeName field that is populated with the value of cfg.NodeName, cfg being an instance of kubeadmapi.NodeConfiguration. cfg is initialized to an unconfigured instance of the struct in NewCmdJoin, from which we (presumably) see the node name being reported as "", which then fails the preflight check.

Based on this, I think the correct fix is probably to populate cfg.NodeName with the value of node.GetHostname("") prior to running the preflight checks, as v1.7.0 did.

Does anyone see a problem with this analysis? This is easy enough to fix, and also to add an additional test around to verify that we're not passing blank strings as the node name.

erhudy commented Jul 28, 2017

As far as I can tell, the problem is that in v1.7.0, the hostname check had the following structure:

	hostname := node.GetHostname("")
	for _, msg := range validation.ValidateNodeName(hostname, false) {
		errors = append(errors, fmt.Errorf("hostname \"%s\" %s", hostname, msg))
	}
	addr, err := net.LookupHost(hostname)

The node.GetHostname("") call gets passed to a method which then returns the value of os.Hostname(). This comes back, gets looked up via net.LookupHost(hostname), and passes.

In v1.7.2, the structure of this check changed. The hostnameCheck struct now has a nodeName field that is populated with the value of cfg.NodeName, cfg being an instance of kubeadmapi.NodeConfiguration. cfg is initialized to an unconfigured instance of the struct in NewCmdJoin, from which we (presumably) see the node name being reported as "", which then fails the preflight check.

Based on this, I think the correct fix is probably to populate cfg.NodeName with the value of node.GetHostname("") prior to running the preflight checks, as v1.7.0 did.

Does anyone see a problem with this analysis? This is easy enough to fix, and also to add an additional test around to verify that we're not passing blank strings as the node name.

@pipejakob

This comment has been minimized.

Show comment
Hide comment
@pipejakob

pipejakob Jul 28, 2017

Member

@erhudy If you're able to confirm that it actually fixes the problem you're seeing, I don't think you'll get any pushback for such a patch. :-)

Member

pipejakob commented Jul 28, 2017

@erhudy If you're able to confirm that it actually fixes the problem you're seeing, I don't think you'll get any pushback for such a patch. :-)

@erhudy

This comment has been minimized.

Show comment
Hide comment
@erhudy

erhudy Jul 28, 2017

Right, just writing it up here in the event that there was some grander reason for restructuring it that I don't know about. I will work up a PR.

erhudy commented Jul 28, 2017

Right, just writing it up here in the event that there was some grander reason for restructuring it that I don't know about. I will work up a PR.

@rmb938

This comment has been minimized.

Show comment
Hide comment
@rmb938

rmb938 Jul 29, 2017

Just to add an additional issue when using the --config flag and setting nodeName in the config (to try and get around the error) the following error is given:

[root@node1 ~]# kubeadm join --config=/etc/kubernetes/kubeadm.yaml
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
can not mix '--config' with other arguments

kubeadm.yaml

kind: NodeConfiguration
apiVersion: kubeadm.k8s.io/v1alpha1
nodeName: node1.k8s1.vmw.rmb938.me
token: nope
discoveryTokenAPIServers:
  - master.rmb938.me:6443

rmb938 commented Jul 29, 2017

Just to add an additional issue when using the --config flag and setting nodeName in the config (to try and get around the error) the following error is given:

[root@node1 ~]# kubeadm join --config=/etc/kubernetes/kubeadm.yaml
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
can not mix '--config' with other arguments

kubeadm.yaml

kind: NodeConfiguration
apiVersion: kubeadm.k8s.io/v1alpha1
nodeName: node1.k8s1.vmw.rmb938.me
token: nope
discoveryTokenAPIServers:
  - master.rmb938.me:6443
@erhudy

This comment has been minimized.

Show comment
Hide comment
@erhudy

erhudy Jul 29, 2017

Pushed a PR against master that fixes this issue, both the blank default hostname and not registering the presence of --node-name. The PR can be trivially backported to 1.7.x (I tested against both master and 1.7.2).

erhudy commented Jul 29, 2017

Pushed a PR against master that fixes this issue, both the blank default hostname and not registering the presence of --node-name. The PR can be trivially backported to 1.7.x (I tested against both master and 1.7.2).

@ejilay

This comment has been minimized.

Show comment
Hide comment
@ejilay

ejilay Jul 30, 2017

Resulting this issue kube nodes cannot join master node cause of they are has the equal domain name, eg "" and "". Possibly I should open new issue.

ejilay commented Jul 30, 2017

Resulting this issue kube nodes cannot join master node cause of they are has the equal domain name, eg "" and "". Possibly I should open new issue.

@koalalorenzo

This comment has been minimized.

Show comment
Hide comment
@koalalorenzo

koalalorenzo Jul 31, 2017

Do we have an ETA for this bug to be fixed/merged/approved? 👍

koalalorenzo commented Jul 31, 2017

Do we have an ETA for this bug to be fixed/merged/approved? 👍

luxas added a commit to luxas/kubernetes that referenced this issue Jul 31, 2017

Fixes kubernetes/kubeadm#347
Node name discovery failed on `kubeadm join`. If a node name
is not explicitly provided, it will be looked up.
@omkensey

This comment has been minimized.

Show comment
Hide comment
@omkensey

omkensey Jul 31, 2017

For the moment I downgraded my workers to kubeadm-1.7.0 and they joined a master initialized with 1.7.2 just fine, and even reported node version 1.7.2. (CentOS 7 using the Kubernetes yum repo)

omkensey commented Jul 31, 2017

For the moment I downgraded my workers to kubeadm-1.7.0 and they joined a master initialized with 1.7.2 just fine, and even reported node version 1.7.2. (CentOS 7 using the Kubernetes yum repo)

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Aug 1, 2017

Merge pull request #49825 from erhudy/bugfix/kubeadm-337-empty-node-name
Automatic merge from submit-queue

Fixes kubernetes/kubeadm#347: empty node name when joining nodes with kubeadm

Node name discovery failed on `kubeadm join`. If a node name
is not explicitly provided, it will be looked up.

**What this PR does / why we need it**: `kubeadm join` fails because the preflight checks always see an empty node name. This corrects that issue.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes kubernetes/kubeadm#347

**Special notes for your reviewer**:

**Release note**:

```release-note
kubeadm: Fix join preflight check false negative
```

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Aug 1, 2017

Merge pull request #49893 from luxas/automated-cherry-pick-of-#49825-…
…upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #49825

Cherry pick of #49825 on release-1.7.

#49825: Fixes kubernetes/kubeadm#347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment