Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP mode does not support ECMP routing #4462

Open
JimmyMa opened this issue Mar 11, 2021 · 7 comments
Open

IPIP mode does not support ECMP routing #4462

JimmyMa opened this issue Mar 11, 2021 · 7 comments

Comments

@JimmyMa
Copy link

JimmyMa commented Mar 11, 2021

In my cluster, the ipipMode is Always for IP Pool 198.19.0.0/16, as below:

apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: controlplane-services-cidr
spec:
  blockSize: 26
  cidr: 198.19.0.0/16
  disabled: true
  ipipMode: Always
  nodeSelector: all()
  vxlanMode: Never

When there are multiple next hops, below routes are generated, and they are not using tunl0

198.19.0.0/16 proto bird
	nexthop via 10.240.1.57 dev ens3 weight 1
	nexthop via 10.240.1.58 dev ens3 weight 1

When there is only one next hop, below route is generated, and it's using tunl0:

198.19.0.0/16 via 10.240.3.49 dev tunl0 proto bird onlink

Expected Behavior

I hope it generates the routes with tunl0 as below when there are multiple next hops

198.19.0.0/16 proto bird
	nexthop via 10.240.1.57 dev tunl0 weight 1
	nexthop via 10.240.1.58 dev tunl0 weight 1

Context

I have two k8s clusters, and each cluster has a node as route reflector, and the two route reflectors are peered. Each cluster broadcasts its service cidr to other cluster, and I need all traffic are IPIP.

Your Environment

  • Calico version: 3.17
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes
  • Operating System and version: 4.15.0-109-generic Add note about gateway router to FAQ #110-Ubuntu SMP Tue Jun 23 02:39:32 UTC 2020 x86_64 Linux
@caseydavenport
Copy link
Member

@JimmyMa interesting. I think this scenario is a bit outside the set of use-cases that Calico typically handles, but it might be workable.

If I understand correctly, you have two clusters with the same Service CIDR, and you want to advertise ECMP routes for that Service CIDR so that traffic to a service IP is split equally between the two clusters?

If I had to guess, I would say we probably haven't implemented IPIP route programming for ECMP routes in our BIRD code, since I don't think we expected ECMP routing to ever occur for IPIP mode.

CC @neiljerram

@nelljerram
Copy link
Member

@caseydavenport We have internal tracking for this at https://tigera.atlassian.net/browse/CNX-10379. I'm afraid @JimmyMa won't be able to see that directly, but the summary is exactly as you say: we made some Calico-specific patches to BIRD to handle IP-IP routes, and unfortunately those patches don't work in the ECMP case.

@JimmyMa
Copy link
Author

JimmyMa commented Mar 25, 2021

@caseydavenport @neiljerram thank you for the comments. I think this change projectcalico/confd#379 enabled the ECMP in bird, but the generated routes in kernel are incorrect for ipipMode.

@nelljerram
Copy link
Member

@JimmyMa Yes, that confd change enables BIRD to program ECMP routes into the kernel. But we are still missing some support in the BIRD code.

@nelljerram
Copy link
Member

@JimmyMa I've created projectcalico/bird#90 (an issue in our BIRD fork repo) to publish everything that we know about why the ECMP + IP-IP combination does not work. Please take a look and feel free to comment or to contribute towards possible solutions.

@caseydavenport caseydavenport changed the title The generated route is not using tunl0 if there are multiple next hops when ipipMode is Always IPIP mode does not support ECMP routing Sep 21, 2021
@nelljerram
Copy link
Member

@JimmyMa I have been thinking about this problem again, and am wondering if the apparent problem is in fact solved by the routing for 10.240.3.49. In other words, if there is a single path route for the pod block

198.19.0.0/16 via 10.240.3.49 dev tunl0 proto bird onlink

but there is an ECMP route to get to 10.240.3.49, such as

10.240.3.49 proto bird
	nexthop via 10.240.1.57 dev ens3 weight 1
	nexthop via 10.240.1.58 dev ens3 weight 1

then perhaps we would see repeated connections to a pod in 198.19.0.0/16 (with different source ports, and assuming fib_multipath_hash_policy=1) using both underlying ECMP paths, as a result of the 2-step routing resolution.

If so, doesn't that mean that the single path IPIP route here is actually fine?

@nelljerram
Copy link
Member

I tested this last week - although with a VXLAN overlay instead of IPIP - and it broadly appears that it does work as suggested in my previous comment.

I created a test server pod running

nc -l -k 10.244.195.197 8888

and a test client pod running

for sp in 30001 30002 30003 30004 30005 30006 30007 30008 30009; do echo hello$sp | nc -N -p $sp 10.244.195.197 8888; done

and used tcpdump to observe traffic through the two NICs of the client pod's node

tcpdump -i eth0 -n -v  udp port 4789 and dst 172.31.20.3
tcpdump -i eth1 -n -v  udp port 4789 and dst 172.31.20.3

10.244.195.197 is the IP of the server pod and 172.31.20.3 is the stable IP of the server pod's node.

Good observations:

  1. tcpdumps showed that both NICs were being used.
  2. When I disabled eth0 on the source node and repeated the test, all the connections succeeded using eth1.

However, I expected that the connection for a given source port would reliably use the same NIC for all of its outbound packets, and that was not the case. Instead - for example - I saw the outbound SYN go through eth1, but then the outbound SYN ACK would go through eth0, and then the data packet for that connection would also go through eth0.

More research is needed to understand why that happens, instead of seeing a reliable association from 4-tuple to NIC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants