Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rancher Worker nodes can not connect to rancher server #14426

Closed
palbiez opened this issue Jul 6, 2018 · 1 comment
Closed

Rancher Worker nodes can not connect to rancher server #14426

palbiez opened this issue Jul 6, 2018 · 1 comment

Comments

@palbiez
Copy link

palbiez commented Jul 6, 2018

Rancher versions:
rancher/server or rancher/rancher: v2.0.3
rancher/agent or rancher/rancher-agent: v2.03

Infrastructure Stack versions:
healthcheck:
ipsec:
network-services:
scheduler:
kubernetes (if applicable):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Rancher Server
Docker version: (docker version,docker info preferred)
Containers: 23
Running: 11
Paused: 0
Stopped: 12
Images: 34
Server Version: 1.13.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 255
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: N/A (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-130-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 23.55 GiB
Name: rancher
ID: RL5U:Y5KA:UGP3:QWKX:O4NW:YTAD:AAQM:WYTV:QEUB:JYX3:UEQC:FSBD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

ifconfig
docker0 Link encap:Ethernet HWaddr 02:42:df:fa:5f:37
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:dfff:fefa:5f37/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:721327 errors:0 dropped:0 overruns:0 frame:0
TX packets:1090783 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:374541385 (374.5 MB) TX bytes:351801021 (351.8 MB)

ens19 Link encap:Ethernet HWaddr 96:61:8c:89:67:8a
inet addr:10.0.2.19 Bcast:10.0.2.255 Mask:255.255.255.0
inet6 addr: fe80::9461:8cff:fe89:678a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1089724 errors:0 dropped:0 overruns:0 frame:0
TX packets:812560 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:232671596 (232.6 MB) TX bytes:470907793 (470.9 MB)

flannel.1 Link encap:Ethernet HWaddr 9a:b6:94:2d:7f:f3
inet addr:10.42.0.0 Bcast:0.0.0.0 Mask:255.255.255.255
inet6 addr: fe80::98b6:94ff:fe2d:7ff3/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:6779 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:7642150 errors:0 dropped:0 overruns:0 frame:0
TX packets:7642150 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:2308046052 (2.3 GB) TX bytes:2308046052 (2.3 GB)

veth72ffd04 Link encap:Ethernet HWaddr 3e:b0:d0:f0:93:73
inet6 addr: fe80::3cb0:d0ff:fef0:9373/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:721327 errors:0 dropped:0 overruns:0 frame:0
TX packets:1097630 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:384639963 (384.6 MB) TX bytes:354236353 (354.2 MB)

Rancher Workernodes
Docker version: (docker version,docker info preferred)
Containers: 23
Running: 4
Paused: 0
Stopped: 19
Images: 14
Server Version: 17.06.2-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.75-rancher
Operating System: RancherOS v1.1.3
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 31.42GiB
Name: rancherworker1
ID: EOMO:FDJN:LI6K:MHSU:D2RC:5C3N:KFTB:6CQO:GWAD:D5PW:DU2P:VJO2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)
NAME="RancherOS"
VERSION=v1.1.3
ID=rancheros
ID_LIKE=
VERSION_ID=v1.1.3
PRETTY_NAME="RancherOS v1.1.3"
HOME_URL="http://rancher.com/rancher-os/"
SUPPORT_URL="https://forums.rancher.com/c/rancher-os"
BUG_REPORT_URL="https://github.com/rancher/os/issues"
BUILD_ID=

ifconfig
docker-sys Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:172.18.42.2 Bcast:172.18.255.255 Mask:255.255.0.0
inet6 addr: fe80::c4b0:73ff:fedc:89b1/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:1 errors:0 dropped:0 overruns:0 frame:0
TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:76 (76.0 B) TX bytes:258 (258.0 B)

docker0 Link encap:Ethernet HWaddr 02:42:36:41:7D:4A
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

eth0 Link encap:Ethernet HWaddr 36:4D:5C:AD:2D:D9
inet addr:10.0.1.20 Bcast:10.0.1.255 Mask:255.255.255.0
inet6 addr: fe80::344d:5cff:fead:2dd9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:114628 errors:0 dropped:0 overruns:0 frame:0
TX packets:131580 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:40313455 (38.4 MiB) TX bytes:67792662 (64.6 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:99281 errors:0 dropped:0 overruns:0 frame:0
TX packets:99281 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:30721406 (29.2 MiB) TX bytes:30721406 (29.2 MiB)

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
VMs
Setup details: (single node rancher vs. HA rancher, internal DB vs. external DB)
multi node rancher 1 Server 3 Workernodes
Environment Template: (Cattle/Kubernetes/Swarm/Mesos)
Kubernetes
firewall rules
Firewall is open rancher all ports to rancherworkers and rancherworker to rancher all ports

Steps to Reproduce:
After reboot all rancherworkers can't connect to rancher server
rancher show the message Kubelet stopped posting node status.
cacerts is set to NULL
https://rancher.acando.tech/v3/settings/cacerts

{

    "actions": { },
    "baseType": "setting",
    "created": "2018-06-25T09:26:51Z",
    "createdTS": 1529918811000,
    "creatorId": null,
    "customized": true,
    "default": "",
    "id": "cacerts",
    "links": {
        "remove": "…/v3/settings/cacerts",
        "self": "…/v3/settings/cacerts",
        "update": "…/v3/settings/cacerts"
    },
    "name": "cacerts",
    "type": "setting",
    "uuid": "e4596afa-7859-11e8-a141-0242ac110002",
    "value": null

docker logs for rancher/rancher have no ERRORS regarding the worker nodes.
docker logs for rancher/rancher-agent:v2.0.3 NAME priceless_minsky have no ERRORS regarding the worker nodes.
docker logs for kubulet on Rancher Server have no ERRORS regarding the worker node errors
docker logs for rancher/rancheragent:v2.0.3 NAME unruffled_banach on rancherworker1 have the following errors
time="2018-07-06T11:51:57Z" level=info msg="Connecting to proxy" url="wss://rancher.acando.tech/v3/connect" time="2018-07-06T11:54:34Z" level=info msg="Starting plan monitor" time="2018-07-06T11:54:34Z" level=error msg="Failed to connect to proxy" error="websocket: close 1006 unexpected EOF" time="2018-07-06T11:54:44Z" level=info msg="Connecting to wss://rancher.acando.tech/v3/connect with token c6ftcfvqfdxqhrbfwdsf5vhtgp44jhslnlz6xlq7m6646fcmdmtvtk"
rancher/rancher-agent:v2.0.3 NAME share-mnt have the following errors
Error: failed to start containers: kubelet Error response from daemon: {"message":"oci runtime error: container_linux.go:262: starting container process caused \"process_linux.go:339: container init caused \\\"rootfs_linux.go:57: mounting \\\\\\\"/opt/rke/var/lib/kubelet\\\\\\\" to rootfs \\\\\\\"/var/lib/docker/overlay/3639606f6a985b0192874bf2f22cde0528fdcd1a88e444b8f2aea259de0a8dcd/merged\\\\\\\" at \\\\\\\"/var/lib/docker/overlay/3639606f6a985b0192874bf2f22cde0528fdcd1a88e444b8f2aea259de0a8dcd/merged/opt/rke/var/lib/kubelet\\\\\\\" caused \\\\\\\"no space left on device\\\\\\\"\\\"\"\n"} Error: failed to start containers: kubelet

NFS-Mountpoints are available at /persistentstore
[rancher@rancherworker1 persistentstore]$ ls -lt total 24 drwxr-xr-x 2 100 101 4096 Jun 29 08:07 themecustom drw-rw-rw- 8 root root 4096 Jun 29 08:05 xibo drwx------ 2 root root 16384 Jun 25 11:43 lost+found
[rancher@rancherworker1 persistentstore]$ less /etc/mtab | grep 3639606f6a985b0192874bf2f22cde0528fdcd1a88e444b8f2aea259de0a8dcd overlay /var/lib/docker/overlay/3639606f6a985b0192874bf2f22cde0528fdcd1a88e444b8f2aea259de0a8dcd/merged overlay rw,relatime,lowerdir=/var/lib/docker/overlay/9e6554d89047066533109868b12c9d7736b19cfb22c67206ec9cd99b2deeea98/root,upperdir=/var/lib/docker/overlay/3639606f6a985b0192874bf2f22cde0528fdcd1a88e444b8f2aea259de0a8dcd/upper,workdir=/var/lib/docker/overlay/3639606f6a985b0192874bf2f22cde0528fdcd1a88e444b8f2aea259de0a8dcd/work 0 0
-bash: cd: /var/lib/docker/overlay/3639606f6a985b0192874bf2f22cde0528fdcd1a88e444b8f2aea259de0a8dcd/upper: Permission denied
[root@rancherworker1 3639606f6a985b0192874bf2f22cde0528fdcd1a88e444b8f2aea259de0a8dcd]# ls -l total 16 -rw-r--r-- 1 root root 64 Jun 25 10:04 lower-id drwxr-xr-x 1 root root 4096 Jun 25 10:04 merged drwxr-xr-x 10 root root 4096 Jun 25 10:04 upper drwx------ 3 root root 4096 Jul 6 12:05 work
Results:
We have no ideas why the kubulet container isn't startable and why the owner of this dirs are root and nor rancher.
Is it possible, that this issue causes onto the wrong permissions?
What can we do here? Because our cloud-config.yml don't contain anything about docker containers.

Thanks in advance

@superseb
Copy link
Contributor

superseb commented Jul 6, 2018

Pretty sure this is a duplicate of #14376

@superseb superseb closed this as completed Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants