Skip to content
This repository has been archived by the owner on Jul 25, 2019. It is now read-only.

Timeout on first connection attempt in multi-tenant mode after some time of inactivity (origin 1.1.0.1) #231

Closed
barkbay opened this issue Dec 20, 2015 · 11 comments
Assignees

Comments

@barkbay
Copy link

barkbay commented Dec 20, 2015

Hi all,

I'm stuck for several days on an issue where after quite a long time of inactivity I get a connection timeout for the first connection between 2 pods (subsequent connections are OK) :

lenaic-node-compute-1ba08.example.com origin-node[2055]: I1220 15:08:33.591561    2055 proxysocket.go:96] Mapped service "postgresql2/postgresql:postgresql" to endpoint 10.1.1.2:5432
lenaic-node-compute-1ba08.example.com origin-node[2055]: E1220 15:08:34.591741    2055 proxysocket.go:104] Dial failed: dial tcp 10.1.1.2:5432: i/o timeout
lenaic-node-compute-1ba08.example.com origin-node[2055]: I1220 15:08:34.591788    2055 roundrobin.go:130] NextEndpoint for service "postgresql2/postgresql:postgresql", srcAddr=10.1.2.2:37328: endpoints: [10.1.1.2:5432]
lenaic-node-compute-1ba08.example.com origin-node[2055]: I1220 15:08:34.591806    2055 proxysocket.go:96] Mapped service "postgresql2/postgresql:postgresql" to endpoint 10.1.1.2:5432
lenaic-node-compute-1ba08.example.com origin-node[2055]: E1220 15:08:36.592010    2055 proxysocket.go:104] Dial failed: dial tcp 10.1.1.2:5432: i/o timeout
lenaic-node-compute-1ba08.example.com origin-node[2055]: E1220 15:08:36.592043    2055 proxysocket.go:138] Failed to connect to balancer: failed to connect to an endpoint.

It seems to happen if no network frame was transmitted during a period equals to the value "hard_timeout" in the Openvswitch learn table (8).
I have decreased this value from 900 seconds to 60 seconds and I can consistently reproduce the issue if I wait 60 seconds between each connection attempt.

Here is a small diagram of my setup : https://www.dropbox.com/s/esu2gtiz8y6js1x/use_case.png?dl=1
Here is the result of the debug.sh script : https://www.dropbox.com/s/k8x4i2308wpkqlu/openshift-sdn-debug-2015-12-20.tgz?dl=1

I would say that it is related to an ARP issue, here what I see if I do a tcpdump on the VXLAN connection (client side):

[root@lenaic-node-compute-1ba08 openshift]# tcpdump -i eth0 -nn -v udp port 4789 
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 
14:11:22.222814 IP (tos 0x0, ttl 64, id 1049, offset 0, flags [DF], proto UDP (17), length 110) 
   192.168.55.233.36826 > 192.168.55.175.4789: VXLAN, flags [I] (0x08), vni 0 
IP (tos 0x0, ttl 64, id 3905, offset 0, flags [DF], proto TCP (6), length 60) 
   10.1.2.1.60708 > 10.1.1.2.5432: Flags [S], cksum 0x7bfd (correct), seq 2334882139, win 14100, options [mss 1410,sackOK,TS val 18773530 ecr 0,nop,wscale 7], length 0 
                                    [... cut a lot of SYN packets ...]
14:11:24.976028 IP (tos 0x0, ttl 64, id 1053, offset 0, flags [DF], proto UDP (17), length 110) 
   192.168.55.233.35083 > 192.168.55.175.4789: VXLAN, flags [I] (0x08), vni 0 
IP (tos 0x0, ttl 64, id 3443, offset 0, flags [DF], proto TCP (6), length 60) 
   10.1.2.1.60715 > 10.1.1.2.5432: Flags [S], cksum 0xb13d (correct), seq 1392144771, win 14100, options [mss 1410,sackOK,TS val 18776284 ecr 0,nop,wscale 7], length 0 
14:11:27.236170 IP (tos 0x0, ttl 64, id 1054, offset 0, flags [DF], proto UDP (17), length 78) 
   192.168.55.233.39892 > 192.168.55.175.4789: VXLAN, flags [I] (0x08), vni 0 
ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.1.2 tell 10.1.2.1, length 28 
14:11:27.236741 IP (tos 0x0, ttl 64, id 17103, offset 0, flags [DF], proto UDP (17), length 78) 
   192.168.55.175.37297 > 192.168.55.233.4789: VXLAN, flags [I] (0x08), vni 0 
ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.1.1.2 is-at 02:42:0a:01:01:02, length 28

(By the way you can see that the vni is 0, not sure if it is expected in a multitenant environment)
The interesting point here is that once the ARP request is done then subsequent connections are successful !

In the mean time these flows are created in the ovs flow table on the client :

cookie=0x0, duration=14.426s, table=8, n_packets=16, n_bytes=1390, hard_timeout=60, priority=200,dl_dst=02:42:0a:01:02:02 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:4 
cookie=0x0, duration=14.426s, table=8, n_packets=3, n_bytes=126, hard_timeout=60, priority=200,dl_dst=02:9e:4d:7f:00:a3 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:2 
cookie=0x0, duration=9.419s, table=8, n_packets=0, n_bytes=0, hard_timeout=60, priority=200,dl_dst=02:42:0a:01:01:02 actions=load:0xc0a837af->NXM_NX_TUN_IPV4_DST[],output:1

and these flows are created on the server :

cookie=0x0, duration=10.301s, table=8, n_packets=1, n_bytes=42, hard_timeout=60, priority=200,dl_dst=02:9e:4d:7f:00:a3 actions=load:0xc0a837e9->NXM_NX_TUN_IPV4_DST[],output:1 
cookie=0x0, duration=5.291s, table=8, n_packets=0, n_bytes=0, hard_timeout=60, priority=200,dl_dst=02:42:0a:01:01:02 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:4

Once they have expired the problem appears again.

As a side note there is no issue if I use the pod IP (10.1.1.2) instead of the service IP (172.30.184.188)

Let me know if you need any other information, this problem prevents us from activating the multitenant mode.

Thank you !

@barkbay
Copy link
Author

barkbay commented Dec 21, 2015

On the VXLAN interface the VNID is lost when the service IP (172.30.184.188) is used :

[root@lenaic-node-compute-1ba08 openshift]# tcpdump -vvv -nn -i eth0 udp|grep vni 192.168.55.233.59854 > 192.168.55.175.4789: [no cksum] VXLAN, flags [I] (0x08), vni 0 192.168.55.175.48585 > 192.168.55.233.4789: [no cksum] VXLAN, flags [I] (0x08), vni 13 192.168.55.233.59854 > 192.168.55.175.4789: [no cksum] VXLAN, flags [I] (0x08), vni 0 192.168.55.233.59854 > 192.168.55.175.4789: [no cksum] VXLAN, flags [I] (0x08), vni 0 [....]

If I use the pod IP (10.1.1.2), everything is OK :
[root@lenaic-node-compute-1ba08 openshift]# tcpdump -vvv -nn -i eth0 udp|grep vni 192.168.55.233.43037 > 192.168.55.175.4789: [no cksum] VXLAN, flags [I] (0x08), vni 13 192.168.55.175.44208 > 192.168.55.233.4789: [no cksum] VXLAN, flags [I] (0x08), vni 13 192.168.55.233.43037 > 192.168.55.175.4789: [no cksum] VXLAN, flags [I] (0x08), vni 13 [...]

I think this is the root of the problem.
Any idea ?

@rajatchopra
Copy link
Contributor

'vnid 0' is expected because the service proxy has global access.
Not sure why the ARP learning takes so long that a connection time out has to happen.
/cc @danwinship

@barkbay
Copy link
Author

barkbay commented Dec 21, 2015

Why do we have to wait for an ARP request ?
It's not the purpose of the table 6 to send the data to the right port ?

cookie=0x0, duration=3756.657s, table=6, n_packets=82, n_bytes=6511, priority=200,ip,reg0=0 actions=goto_table:8
cookie=0x0, duration=3751.984s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd,nw_dst=10.1.1.2 actions=output:4

Port 4 is the pod I want to connect to

@danwinship
Copy link
Contributor

Why do we have to wait for an ARP request ?
It's not the purpose of the table 6 to send the data to the right port ?

Yes, but the pods don't know about that; they think they are connected to an ethernet network, so the source pod can't send any data to the destination until it knows its MAC address (since it has to write that address into the packet).

@danwinship
Copy link
Contributor

@barkbay Try editing /usr/bin/openshift-sdn-ovs-setup.sh on the node with the service; search for the rule that starts with "table=3" and ends with "goto_table:7", and change that to "goto_table:6" (and restart the node). Fixed?

@barkbay
Copy link
Author

barkbay commented Jan 6, 2016

@danwinship Thanks for the tip
fyi here is the fix I'm currently testing on my cluster : https://gist.github.com/barkbay/cf57b2b4022bbc2d02ba
So far it seems to also solve the problem.

I'll try your suggestion and update the issue.

@barkbay
Copy link
Author

barkbay commented Jan 13, 2016

/usr/bin/openshift-sdn-ovs-setup.sh does not exist in Openshift v1.1.0.1

@danwinship
Copy link
Contributor

/usr/bin/openshift-sdn-ovs-setup.sh does not exist in Openshift v1.1.0.1

Hm... the bug fixed in #236 was introduced after openshift-sdn-kube-subnet-setup.sh and openshift-sdn-multitenant-setup.sh were merged.

Looking at your patch, it is basically bypassing the ARP cache table. So if that fixes things, that suggests that the problem is that the sender still has the MAC address cached, so doesn't need to send an ARP, but OVS has dropped it from its cache. And looking at the rules, that definitely seems like it could happen...

@barkbay
Copy link
Author

barkbay commented Jan 20, 2016

Looking at your patch, it is basically bypassing the ARP cache table. So if that fixes things, that suggests that the problem is that the sender still has the MAC address cached, so doesn't need to send an ARP, but OVS has dropped it from its cache. And looking at the rules, that definitely seems like it could happen...

Congratulation 👍
Indeed if I delete (regularly) the ARP entries on the hosts the problem no longer appears.
I need a battle plan. Do you think my patch is an acceptable solution?
Let me know how can I help you

@danwinship
Copy link
Contributor

Your patch leaves redundant rules, so it's not quite right. Between this and #239, it looks like we'll need some heavy rewriting of the OVS rules. I'm working on it.

@mmaier2
Copy link

mmaier2 commented Jan 22, 2016

As it seems, we're facing the same problem on an openshift enterprise 3.0.2. Between an openshift php pod with wordpress and a mysql DB pod. So at times - not really reproduceable - the wordpress complaines that it could not connect to the mysql (Service via Cluster IP).
And indeed any connection (curl, telnet etc.) to the cluster IP + port of the mysql service leads to a timeout. But after a connection to the IP of the mysql pod itself happens, the service cluster IP again works, so the mysql server responds on the cluster ip.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants