Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indefinite wait on host with uppercase hostname #160

Closed
aaliddell opened this Issue Mar 5, 2019 · 8 comments

Comments

5 participants
@aaliddell
Copy link
Contributor

aaliddell commented Mar 5, 2019

Describe the bug
On kubernetes, nodes with upper case hostnames are not permitted: kubernetes/kubernetes#71140
When running with k3s, the hostname appears to get converted to lower case inconsistently before the agent is started, in order to comply with the above constraint and the node successfully registers. However, k3s then waits indefinitely for the upper case version of the name here:

logrus.Infof("waiting for node %s: %v", nodeName, err)

This may be associated with the issues people are seeing regarding #91

To Reproduce
Use a system with an uppercase hostname. e.g. in this example hostname = VM

The node succesfully starts with lowercase name:

kubectl get nodes
NAME   STATUS   ROLES    AGE   VERSION
vm     Ready    <none>   16m   v1.13.3-k3s.6

But all pods are stuck on ContainerCreating, as k3s is waiting for the node to come up:

kubectl get --all-namespaces pod
NAMESPACE     NAME                         READY   STATUS              RESTARTS   AGE
kube-system   coredns-7748f7f6df-nj7lh     0/1     ContainerCreating   0          18m
kube-system   helm-install-traefik-b4ztr   0/1     ContainerCreating   0          18m

Looking at the output from journalctl -u k3s, we can see that it is waiting for the uppercase node name that will never appear:

Mar 05 17:27:57 VM k3s[806]: time="2019-03-05T17:27:57.301529728Z" level=info msg="waiting for node VM: nodes \"VM\" not found"
... repeated indefinitely

Expected behavior
Use consistent naming to ensure k3s is waiting for the correct name to start.
Perhaps convert to lower case just after hostname is retrieved:

nodeName, nodeIP, err := getHostnameAndIP(*envInfo)

If desired I can add a PR to add this, but it's effectively a one liner:

# Use lower case hostname to comply with kubernetes constraint:
# https://github.com/kubernetes/kubernetes/issues/71140 
nodeName = strings.ToLower(nodeName)

Although care needs to be taken to also ensure that if an uppercase NODE_NAME env is passed, this is also fixed or raises an error:

os.Setenv("NODE_NAME", nodeConfig.AgentConfig.NodeName)

@ibuildthecloud

This comment has been minimized.

Copy link
Member

ibuildthecloud commented Mar 5, 2019

@aaliddell PR is welcomed. I can then put it into a rc build and then you can see if it fully works end to end.

@aaliddell

This comment has been minimized.

Copy link
Contributor Author

aaliddell commented Mar 5, 2019

Ok it'll be up shortly. However, I'm not managing to get k3s to build locally with go presently, so the PR will be somewhat blind as to whether build succeeds...

@aaliddell

This comment has been minimized.

Copy link
Contributor Author

aaliddell commented Mar 5, 2019

Seemingly resolved in 0.2.0rc5

Output is:

INFO[2019-03-05T19:59:21.236816926Z] waiting for node vm: nodes "vm" not found
INFO[2019-03-05T19:59:23.241217702Z] waiting for node vm CIDR not assigned yet

After these lines, there is no more output, the node is available and the dns pod starts.

However, the helm pod does not start, with a DNS resolution failure in its logs, which I'm not sure is related?

@ibuildthecloud

This comment has been minimized.

Copy link
Member

ibuildthecloud commented Mar 5, 2019

That does seem concerning. I'd probably say this isn't fully resolved quite yet. Might be another place the hostname matters.

@aaliddell

This comment has been minimized.

Copy link
Contributor Author

aaliddell commented Mar 5, 2019

After a reboot, all pods come up successfully. I shall test on a totally clean machine to see if it was just something lingering from a previous failure.

@aaliddell

This comment has been minimized.

Copy link
Contributor Author

aaliddell commented Mar 5, 2019

On a totally fresh install on a machine with an uppercase hostname, everything comes up sucessfully.

@ibuildthecloud

This comment has been minimized.

Copy link
Member

ibuildthecloud commented Mar 5, 2019

Awesome thanks! I'll mark this resolved.

@cjellick cjellick added this to the v0.2.0 milestone Mar 8, 2019

@cjellick cjellick added this to Backlog in K3S Development via automation Mar 8, 2019

@daxmc99

This comment has been minimized.

Copy link

daxmc99 commented Mar 8, 2019

Steps taken to test:

  1. Set hostname to UPPERCASE
ubuntu@UPPERCASE:~$ hostname
UPPERCASE
  1. Start k3s server sudo k3s server &
  2. Run kubectl commands
ubuntu@UPPERCASE:~$ kubectl get nodes
NAME        STATUS   ROLES    AGE   VERSION
uppercase   Ready    <none>   79s   v1.13.4-k3s.1
buntu@UPPERCASE:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE
kube-system   coredns-7748f7f6df-4dnzt         1/1     Running   0          104s
kube-system   helm-install-traefik-ch8vk       1/1     Running   0          53s
kube-system   svclb-traefik-6f5db68b9c-bhjwn   2/2     Running   0          94s
kube-system   traefik-6876857645-kchzv         0/1     Running   1          94s

Looks like traefik is failing for reasons besides hostname

  1. Run redis example
ubuntu@UPPERCASE:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                             READY   STATUS             RESTARTS   AGE
default       redis-master                     2/2     Running            0          27s
default       redis-sentinel-mlppn             1/1     Running            0          11s
kube-system   coredns-7748f7f6df-4dnzt         1/1     Running            2          4m54s
kube-system   helm-install-traefik-ch8vk       1/1     Running            3          4m3s
kube-system   svclb-traefik-6f5db68b9c-bhjwn   2/2     Running            0          4m44s

@erikwilson erikwilson closed this Mar 8, 2019

K3S Development automation moved this from Backlog to Done Mar 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.