-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes not hosting API VIP IP fail dial tcp 192.168.123.5:6443: connect: invalid argument
#344
Comments
Thanks for the report, can you provide the keepalived logs from all masters please? |
@hardys logs from keepalieved containers |
Worth mentioning that both masters and workers nodes end up in NotReady status: oc get nodes
NAME STATUS ROLES AGE VERSION
master-0 NotReady master 3d2h v1.13.4+1ad602308
master-1 NotReady master 3d2h v1.13.4+1ad602308
master-2 NotReady master 3d2h v1.13.4+1ad602308
worker-0 NotReady worker 2d22h v1.13.4+1ad602308 |
Hit this issue in my deployment, only master-2 went to NotReady. Exact same issue described in this issue. |
@yprokule , seems like there's L2 connectivity issue for 192.168.123.5 (as ping doesn't work) ,
|
master-0 [root@master-0 ~]# ping -c1 192.168.123.6
PING 192.168.123.6 (192.168.123.6) 56(84) bytes of data.
64 bytes from 192.168.123.6: icmp_seq=1 ttl=64 time=0.029 ms
--- 192.168.123.6 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms
[root@master-0 ~]# ping -c1 192.168.123.5
connect: Invalid argument master-1 [root@master-1 ~]# ping -c1 192.168.123.5
PING 192.168.123.5 (192.168.123.5) 56(84) bytes of data.
64 bytes from 192.168.123.5: icmp_seq=1 ttl=64 time=0.174 ms
--- 192.168.123.5 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.174/0.174/0.174/0.000 ms
[root@master-1 ~]# ping -c1 192.168.123.6
PING 192.168.123.6 (192.168.123.6) 56(84) bytes of data.
64 bytes from 192.168.123.6: icmp_seq=1 ttl=64 time=0.213 ms
--- 192.168.123.6 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.213/0.213/0.213/0.000 ms master-2 root@master-2 ~]# ping -c1 192.168.123.6
PING 192.168.123.6 (192.168.123.6) 56(84) bytes of data.
64 bytes from 192.168.123.6: icmp_seq=1 ttl=64 time=0.163 ms
--- 192.168.123.6 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.163/0.163/0.163/0.000 ms
[root@master-2 ~]# ping -c1 192.168.123.5
PING 192.168.123.5 (192.168.123.5) 56(84) bytes of data.
64 bytes from 192.168.123.5: icmp_seq=1 ttl=64 time=0.030 ms
--- 192.168.123.5 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.030/0.030/0.030/0.000 ms |
on other nodes Apr 15 16:15:41 master-2 hyperkube[121307]: E0415 16:15:41.413449 121307 kubelet.go:2273] node "master-2" not found
Apr 15 16:15:41 master-2 hyperkube[121307]: E0415 16:15:41.497345 121307 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Apr 15 16:15:41 master-2 hyperkube[121307]: E0415 16:15:41.498449 121307 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "system:anonymous" cannot list resource "pods" in API group "" at the cluster scope |
@yprokule , I think that I found something. Master-0 doesn't hold the API VIP (192.168.123.5) but I can still see the following HOST entry in the routing table: So, when Master-0 try to send any packet to 192.168.123.5, network stack fail with 'connect: Invalid argument' . I deleted the 192.168.123.5 route from Master-0 , and now I'm able to ping 192.168.123.5. [core@master-0 ~]$ sudo ip route del 192.168.123.5/32 |
yep deleting the incorrect route fixed the notready state of the node, which was simply not able to reach the api to report status |
seems like a RHCOS/RHEL bug, I filed bz for that https://bugzilla.redhat.com/show_bug.cgi?id=1700415 |
a different workaround here: #377 |
We've got an open bug tracking the kernel issue. In the meantime, we've updated our config such that the undeleted route won't cause a problem anymore. See #377 |
After some time of cluster being up it starts to fail with errors like:
Attempt to ping this ip from any other master nodes(except the one that hosts the ip) fails:
and from master-0 that handles the API VIP IP
The text was updated successfully, but these errors were encountered: