Pod cannot ping each other in multi-host scenario - failed to add vxlanRoute (XXX -> X.Y.0.0): invalid argument #844

senwangrockets · 2017-10-18T18:39:50Z

Pod from different host cannot ping each others.
Flannel logs as below:

I1018 17:58:53.498781       1 main.go:470] Determining IP address of default interface
I1018 17:58:53.499196       1 main.go:483] Using interface with name eth0 and address 172.28.249.156
I1018 17:58:53.499243       1 main.go:500] Defaulting external address to interface address (172.28.249.156)
I1018 17:58:53.517275       1 kube.go:130] Waiting 10m0s for node controller to sync
I1018 17:58:53.517332       1 kube.go:283] Starting kube subnet manager
I1018 17:58:54.517591       1 kube.go:137] Node controller sync successful
I1018 17:58:54.517652       1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - scarif-admin-2
I1018 17:58:54.517661       1 main.go:238] Installing signal handlers
I1018 17:58:54.517821       1 main.go:348] Found network config - Backend type: vxlan
I1018 17:58:54.517912       1 vxlan.go:119] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
I1018 17:58:54.573370       1 main.go:295] Wrote subnet file to /run/flannel/subnet.env
I1018 17:58:54.573408       1 main.go:299] Running backend.
I1018 17:58:54.573427       1 main.go:317] Waiting for all goroutines to exit
I1018 17:58:54.573496       1 vxlan_network.go:56] watching for new subnet leases
**E1018 17:58:54.573780       1 vxlan_network.go:158] failed to add vxlanRoute (172.16.0.0/24 -> 172.16.0.0): invalid argument**
I1018 17:58:54.577620       1 ipmasq.go:75] Some iptables rules are missing; deleting and recreating rules
I1018 17:58:54.577673       1 ipmasq.go:97] Deleting iptables rule: -s 172.16.0.0/16 -d 172.16.0.0/16 -j RETURN
I1018 17:58:54.579324       1 ipmasq.go:97] Deleting iptables rule: -s 172.16.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1018 17:58:54.580870       1 ipmasq.go:97] Deleting iptables rule: ! -s 172.16.0.0/16 -d 172.16.1.0/24 -j RETURN
I1018 17:58:54.582349       1 ipmasq.go:97] Deleting iptables rule: ! -s 172.16.0.0/16 -d 172.16.0.0/16 -j MASQUERADE
I1018 17:58:54.583900       1 ipmasq.go:85] Adding iptables rule: -s 172.16.0.0/16 -d 172.16.0.0/16 -j RETURN
I1018 17:58:54.587553       1 ipmasq.go:85] Adding iptables rule: -s 172.16.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1018 17:58:54.591290       1 ipmasq.go:85] Adding iptables rule: ! -s 172.16.0.0/16 -d 172.16.1.0/24 -j RETURN
I1018 17:58:54.595032       1 ipmasq.go:85] Adding iptables rule: ! -s 172.16.0.0/16 -d 172.16.0.0/16 -j MASQUERADE

Your Environment

Flannel version: 0.9
Backend used (e.g. vxlan or udp): vxlan
Etcd version:
Kubernetes version (if used): 1.8
Operating System and version: Centos 7.3 Docker 17.06

The text was updated successfully, but these errors were encountered:

senwangrockets · 2017-10-19T13:49:58Z

What I think is interesting is "
E1018 17:58:54.573780 1 vxlan_network.go:158] failed to add vxlanRoute (172.16.0.0/24 -> 172.16.0.0): invalid argument
"

tomdee · 2017-10-20T19:22:42Z

Yes, that line is the smoking gun. What other nodes do you have? Can you output the flannel annotation you have on your nodes (something like kubectl get nodes -o yaml |grep flannel.alpha).

Somehow, I think one of your nodes has a PublicIP of 172.16.0.0 which it shouldn't do. The 172.16/16 range should be reserved for the vxlan network.

camflan · 2017-10-24T16:01:53Z

I have a similar issue, same versions of flannel, k8s. Using vxlan, flannel is up and running, no errors in the logs (not even the error above).

kubeadm 1.8.1
k8s 1.8.0
flannel 0.9
ubuntu 16.04
docker 17.03ce

I've tried combinations of k8s as far back as 1.6 and flannel as far back as 0.8, all with the same results.

I'm able to connect pod <-> pod and host <-> pod as long as the pods are on that host. All hosts can communicate with each other without issues. I've spent almost a month fiddling with iptables, routes, etc and cannot figure this out. I'm seeing traffic via tcpdump on the cni0 bridge, but my pods aren't getting it. IIRC, last night I was using iptstate and was seeing udp traffic on the bridge when I expected tcp. Maybe this is the issue? It's also possible I was seeing something else...

Should I open another ticket, or piggy back on this one?

jhorwit2 · 2017-10-27T03:26:23Z

I'm running into the same issue it seems.

I1026 22:38:06.797811     208 vxlan_network.go:56] watching for new subnet leases
I1026 22:38:06.800429     208 ipmasq.go:75] Some iptables rules are missing; deleting and recreating rules
I1026 22:38:06.800450     208 ipmasq.go:97] Deleting iptables rule: -s 172.17.0.0/16 -d 172.17.0.0/16 -j RETURN
I1026 22:38:06.801507     208 ipmasq.go:97] Deleting iptables rule: -s 172.17.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1026 22:38:06.802527     208 ipmasq.go:97] Deleting iptables rule: ! -s 172.17.0.0/16 -d 172.17.9.0/24 -j RETURN
I1026 22:38:06.803535     208 ipmasq.go:97] Deleting iptables rule: ! -s 172.17.0.0/16 -d 172.17.0.0/16 -j MASQUERADE
I1026 22:38:06.804543     208 ipmasq.go:85] Adding iptables rule: -s 172.17.0.0/16 -d 172.17.0.0/16 -j RETURN
I1026 22:38:06.806706     208 ipmasq.go:85] Adding iptables rule: -s 172.17.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1026 22:38:06.808932     208 ipmasq.go:85] Adding iptables rule: ! -s 172.17.0.0/16 -d 172.17.9.0/24 -j RETURN
I1026 22:38:06.811148     208 ipmasq.go:85] Adding iptables rule: ! -s 172.17.0.0/16 -d 172.17.0.0/16 -j MASQUERADE
E1026 22:38:11.064786     208 vxlan_network.go:158] failed to add vxlanRoute (172.17.0.0/24 -> 172.17.0.0): invalid argument
E1027 02:51:24.265565     208 vxlan_network.go:158] failed to add vxlanRoute (172.17.0.0/24 -> 172.17.0.0): invalid argument

@tomdee none of my nodes have that as the public ip annotation (they're all correct).

jhorwit2 · 2017-10-27T03:31:34Z

I don't see a route for 172.17.0.0/24 on any of my hosts.

$ ip route
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.17.1.0/24 via 172.17.1.0 dev flannel.1 onlink
172.17.2.0/24 via 172.17.2.0 dev flannel.1 onlink
172.17.3.0/24 via 172.17.3.0 dev flannel.1 onlink
172.17.4.0/24 via 172.17.4.0 dev flannel.1 onlink
172.17.5.0/24 via 172.17.5.0 dev flannel.1 onlink
172.17.6.0/24 via 172.17.6.0 dev flannel.1 onlink
172.17.7.0/24 via 172.17.7.0 dev flannel.1 onlink
172.17.8.0/24 via 172.17.8.0 dev flannel.1 onlink
172.17.9.2 dev cali299270d87b6 scope link
172.17.9.3 dev calib63aee49779 scope link
172.17.9.4 dev cali12d4a061371 scope link

$ arp -a
...
? (172.17.0.0) at <incomplete> on flannel.1
...

Flannel logs

I1027 12:53:29.439503     166 vxlan_network.go:138] adding subnet: 172.17.0.0/24 PublicIP: 10.65.27.18 VtepMAC: 46:ee:d0:82:55:a4
I1027 12:53:29.439524     166 device.go:179] calling AddARP: 172.17.0.0, 46:ee:d0:82:55:a4
I1027 12:53:29.439591     166 device.go:156] calling AddFDB: <hostip>, 46:ee:d0:82:55:a4
E1027 12:53:29.439668     166 vxlan_network.go:158] failed to add vxlanRoute (172.17.0.0/24 -> 172.17.0.0): invalid argument
I1027 12:53:29.439706     166 device.go:190] calling DelARP: 172.17.0.0, 46:ee:d0:82:55:a4
I1027 12:53:29.439751     166 device.go:168] calling DelFDB: <hostip>, 46:ee:d0:82:55:a4

DominicDV · 2017-10-27T14:44:55Z

I had this error too when transitioning from 1.7.5 to 1.8.2.
A reboot solved this error for me.
(for completenes: prior to this I deleted the fstab swap entry because kubelet requires that the system doesnt swap. Not sure If this is related)

tomdee · 2017-11-04T00:04:04Z

@camflan please open a different issue. I suspect you just need "iptables -P FORWARD ACCEPT"

tomdee · 2017-11-04T00:13:02Z

@jhorwit2 @senwangrockets I think the problem could be that you have the same IP range configured for your Docker bridge as you do for flannel. If you're using kubeadm, did you specify --pod-network-cidr 10.244.0.0/16

jhorwit2 · 2017-11-04T01:45:06Z

@tomdee that was my issue. Sorry I forgot to post after I realized that.

kumarganesh2814 · 2017-12-10T09:40:14Z

@tomdee
Hi Tom,

I initialised my cluster with same kubeadm command
kubeadm init --pod-network-cidr 10.244.0.0/16
But Still in Flannel pods I see errors

E1210 07:10:45.198903 1 vxlan_network.go:158] failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument

I have 4 host cluster 2 of them works fine but other 2 fails to schedule container

Always in state of "ContainerCreating"

Errors which I see is

Dec 10 01:39:14 kongapi-poc-db1 kubelet: E1210 01:39:14.554032   58034 cni.go:250] Error while adding to cni network: "cni0" already has an IP address different from 10.244.3.1/24
Dec 10 01:39:14 kongapi-poc-db1 kernel: cni0: port 1(veth7b12c96f) entered disabled state
Dec 10 01:39:14 kongapi-poc-db1 kernel: device veth7b12c96f left promiscuous mode
Dec 10 01:39:14 kongapi-poc-db1 kernel: cni0: port 1(veth7b12c96f) entered disabled state
Dec 10 01:39:14 kongapi-poc-db1 NetworkManager[702]: <info>  [1512898754.6477] device (veth7b12c96f): released from master device cni0
Dec 10 01:39:14 kongapi-poc-db1 kubelet: E1210 01:39:14.655974   58034 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "tomcat-d6b5b9647-prq9w_tomcat" network: "cni0" already has an IP address different from 10.244.3.1/24

eroji · 2018-02-09T01:45:50Z

Having the same problem. 4 nodes, 2 masters and 2 workers. the .167 and .168 are the workers and .167 is the one that's having issues adding the route.

Output of: kubectl get nodes -o yaml |grep flannel.alpha

      flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"d2:28:18:cd:1d:82"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 10.1.130.165
      flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"b6:67:12:1c:d9:c4"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 10.1.130.166
      flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"aa:e0:31:6e:d1:ef"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 10.1.130.167
      flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"16:13:d5:7c:c5:e2"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 10.1.130.168

BSWANG · 2018-03-01T12:18:45Z

Are the invalid gateway addresses treated as multicast address by linux?
The subnet allocation in flannel will skip the multicast addresses https://github.com/coreos/flannel/blob/master/subnet/config.go#L86-L93. But using the podCidr allocated by "controller manager" not skip the first subnet.

@tomdee

nabheet · 2018-12-03T18:22:30Z

@tomdee
Hi Tom,

I initialised my cluster with same kubeadm command
kubeadm init --pod-network-cidr 10.244.0.0/16
But Still in Flannel pods I see errors

E1210 07:10:45.198903 1 vxlan_network.go:158] failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument

I have 4 host cluster 2 of them works fine but other 2 fails to schedule container

Always in state of "ContainerCreating"

Errors which I see is
Dec 10 01:39:14 kongapi-poc-db1 kubelet: E1210 01:39:14.554032   58034 cni.go:250] Error while adding to cni network: "cni0" already has an IP address different from 10.244.3.1/24
Dec 10 01:39:14 kongapi-poc-db1 kernel: cni0: port 1(veth7b12c96f) entered disabled state
Dec 10 01:39:14 kongapi-poc-db1 kernel: device veth7b12c96f left promiscuous mode
Dec 10 01:39:14 kongapi-poc-db1 kernel: cni0: port 1(veth7b12c96f) entered disabled state
Dec 10 01:39:14 kongapi-poc-db1 NetworkManager[702]: <info>  [1512898754.6477] device (veth7b12c96f): released from master device cni0
Dec 10 01:39:14 kongapi-poc-db1 kubelet: E1210 01:39:14.655974   58034 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "tomcat-d6b5b9647-prq9w_tomcat" network: "cni0" already has an IP address different from 10.244.3.1/24

I am not sure if this will help, but you might want to delete all the network/bridge devices before initializing k8s again. I had similar issues but I destroyed and created new VMs which resolved my similar issue. However, the issues might not be the same.

After reading flannel documentation, it was not obvious to me that flannel works one cidr only. But after the change things are much better, although with other issues.

leogoing · 2019-07-22T09:12:15Z

@senwangrockets @kumarganesh2814 ,I have the same problem. Have you solved it ?

Voxis · 2020-10-31T01:59:42Z

I got the same problem here is how I resolved. I have a 1 master 2 worker nodes setup, all of them are VMs. they have fixed ip and hostnames in my local are network. master and 1 worker node is ok. 1 worker node has this problem.

when I see something like this: vxlan_network.go:158] failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument, I would log onto that machine and check the ip address of cni0, it could be a different address. you could delete the interface and let the cluster re-generate. but my side of the problem is that I realized the flannel.1 interface was not created.

so I delete the node, manually delete the associated pods from master, and did kubectl reset on the problematic worker node. and rejoined. but the flannel.1 never appear. In the end, I deleted the node from master and and did a reset. Restart the vm, and join master just like normal, flannel.1 appeared. And I did a deployment on master. On the worker node, cni0 and veth appeared.

TLDR: not sure whether it would work but: delete worker node from master, worker node kubectl reset, clean up , Restart vm, join master node as normal.

Queetinliu · 2021-08-03T04:49:57Z

I also faced this problem,this is because the network interface which flanneld use can't access each other,i use another network interface then sovled

rthamrin · 2022-01-27T08:07:36Z

mine so weird on this flannel.alpha.coreos.com/public-ip: 10.0.3.15. this is my master, now my master cannot ping others flannel. what is actually happened here and how to edit the flannel.alpha on my master?

kubectl get nodes -o yaml |grep flannel.alpha

      flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"16:cb:5c:78:57:cb"}'

      flannel.alpha.coreos.com/backend-type: vxlan

      flannel.alpha.coreos.com/kube-subnet-manager: "true"

      flannel.alpha.coreos.com/public-ip: 192.168.14.3

      flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"7e:1e:e8:f6:8f:77"}'

      flannel.alpha.coreos.com/backend-type: vxlan

      flannel.alpha.coreos.com/kube-subnet-manager: "true"

      flannel.alpha.coreos.com/public-ip: 192.168.14.4

      flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"06:cd:6a:ba:6b:54"}'

      flannel.alpha.coreos.com/backend-type: vxlan

      flannel.alpha.coreos.com/kube-subnet-manager: "true"

      flannel.alpha.coreos.com/public-ip: 10.0.3.15

      flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"96:71:0e:48:52:4d"}'

      flannel.alpha.coreos.com/backend-type: vxlan

      flannel.alpha.coreos.com/kube-subnet-manager: "true"

      flannel.alpha.coreos.com/public-ip: 192.168.14.2

dale1202 · 2022-03-20T11:26:02Z

check the flannel.1 is conflicted with the docker0's ip, if conflicted, change the subnet's ip range

rthamrin · 2022-03-21T00:03:53Z

check the flannel.1 is conflicted with the docker0's ip, if conflicted, change the subnet's ip range

sorry, to whom your answer go with?

dale1202 · 2022-03-21T01:49:06Z

@rthamrin i followed this question: "failed to add vxlanRoute (10.244.2.0/24 -> 10.244.2.0): invalid argument"

stale · 2023-01-25T20:23:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tomdee added component/kubernetes components/backend/vxlan labels Oct 20, 2017

tomdee added the reviewed/needs more information label Oct 20, 2017

tomdee changed the title ~~Pod cannot ping each other in multi-host scenario~~ Pod cannot ping each other in multi-host scenario - failed to add vxlanRoute (XXX -> X.Y.0.0): invalid argument Nov 4, 2017

tomdee closed this as completed Nov 8, 2017

tomdee reopened this Dec 12, 2017

yasker mentioned this issue Jan 3, 2018

Network communicate issue sometime happens due to failed to add vxlanRoute in flannel rancher/rke#194

Closed

stale bot added the wontfix label Jan 25, 2023

stale bot closed this as completed Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod cannot ping each other in multi-host scenario - failed to add vxlanRoute (XXX -> X.Y.0.0): invalid argument #844

Pod cannot ping each other in multi-host scenario - failed to add vxlanRoute (XXX -> X.Y.0.0): invalid argument #844

senwangrockets commented Oct 18, 2017 •

edited by tomdee

senwangrockets commented Oct 19, 2017

tomdee commented Oct 20, 2017

camflan commented Oct 24, 2017 •

edited

jhorwit2 commented Oct 27, 2017

jhorwit2 commented Oct 27, 2017 •

edited

DominicDV commented Oct 27, 2017 •

edited

tomdee commented Nov 4, 2017

tomdee commented Nov 4, 2017

jhorwit2 commented Nov 4, 2017

kumarganesh2814 commented Dec 10, 2017

eroji commented Feb 9, 2018

BSWANG commented Mar 1, 2018 •

edited

nabheet commented Dec 3, 2018

leogoing commented Jul 22, 2019

Voxis commented Oct 31, 2020 •

edited

Queetinliu commented Aug 3, 2021

rthamrin commented Jan 27, 2022

dale1202 commented Mar 20, 2022

rthamrin commented Mar 21, 2022

dale1202 commented Mar 21, 2022

stale bot commented Jan 25, 2023

Pod cannot ping each other in multi-host scenario - failed to add vxlanRoute (XXX -> X.Y.0.0): invalid argument #844

Pod cannot ping each other in multi-host scenario - failed to add vxlanRoute (XXX -> X.Y.0.0): invalid argument #844

Comments

senwangrockets commented Oct 18, 2017 • edited by tomdee

Your Environment

senwangrockets commented Oct 19, 2017

tomdee commented Oct 20, 2017

camflan commented Oct 24, 2017 • edited

jhorwit2 commented Oct 27, 2017

jhorwit2 commented Oct 27, 2017 • edited

DominicDV commented Oct 27, 2017 • edited

tomdee commented Nov 4, 2017

tomdee commented Nov 4, 2017

jhorwit2 commented Nov 4, 2017

kumarganesh2814 commented Dec 10, 2017

eroji commented Feb 9, 2018

BSWANG commented Mar 1, 2018 • edited

nabheet commented Dec 3, 2018

leogoing commented Jul 22, 2019

Voxis commented Oct 31, 2020 • edited

Queetinliu commented Aug 3, 2021

rthamrin commented Jan 27, 2022

dale1202 commented Mar 20, 2022

rthamrin commented Mar 21, 2022

dale1202 commented Mar 21, 2022

stale bot commented Jan 25, 2023

senwangrockets commented Oct 18, 2017 •

edited by tomdee

camflan commented Oct 24, 2017 •

edited

jhorwit2 commented Oct 27, 2017 •

edited

DominicDV commented Oct 27, 2017 •

edited

BSWANG commented Mar 1, 2018 •

edited

Voxis commented Oct 31, 2020 •

edited