Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submariner doesn't work with Calico CNI #407

Closed
roytman opened this issue Feb 29, 2020 · 18 comments
Closed

Submariner doesn't work with Calico CNI #407

roytman opened this issue Feb 29, 2020 · 18 comments
Assignees
Milestone

Comments

@roytman
Copy link
Contributor

roytman commented Feb 29, 2020

Calico CNI supports several networking modes.

  • First, I checked the IP in IP or VXLAN encapsulation.

After successful submariner installation (disable-nat), I was able to ping from a pod which is colocated with the submariner gateway to pods on another cluster, but was not able to ping services.
Pings from a pod which is not colocated with the GW, achieved neither pods nor services on another cluster.

  • After that I checked the BGP peering mode.
    In this case I was able to ping only from a pod colocated with a GW to pods colocated with another GW on the second cluster.

Looks like Calico puts its rules before submariner's in iptables.

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
cali-POSTROUTING  all  --  anywhere             anywhere             /* cali:O3lYWMrLQYEMJtB5 */
SUBMARINER-POSTROUTING  all  --  anywhere             anywhere            
KUBE-POSTROUTING  all  --  anywhere             anywhere             /* kubernetes postrouting rules */
MASQUERADE  all  --  172.17.0.0/16        anywhere  
@roytman roytman changed the title Calico Submariner doesn't work with Calico CNI Feb 29, 2020
@roytman
Copy link
Contributor Author

roytman commented Feb 29, 2020

Possible related to #272

@paurosello
Copy link

Did you get around this? I am facing the same problem, connecting to services work fine even if the pods are on non-gateway nodes but not pod-pod communication if the destination pod is not in the gateway node.

@roytman
Copy link
Contributor Author

roytman commented Mar 4, 2020

not yet, meantime we have moved to another CNI, probably will return to Calico later.

@caseydavenport
Copy link

Hm, yeah Calico is pretty aggressive in inserting its rules at the top, in order to prevent other applications from bypassing network policy enforcement, which would be bad.

I'm not too familiar with submariner - what do the submariner rules do / look like? We might be able to make changes to make these two play nicer together.

@dfarrell07
Copy link
Member

Welcome @caseydavenport! /cc @sridhargaddam @mangelajo

@deanlorenz
Copy link
Contributor

I had problems with disabling calico NAT only for cross cluster communication.

  1. Be able to tell Calico that there is a direct peer on the other cluster, so although traffic is external (and normally NATed) it should bypass some of the Calico rules:

    • Do not SNAT/DNAT at all -- keep pod (private) IP as source
    • Allow traffic from other cluster (via tunnel) to be forwarded to internal pods
  2. Route cross-cluster communication through a specific GW node. Currently submariner attempts to intercept this traffic and tunnel to GW, but Calico prevents that

  3. Do custom NAT for cross-cluster

    • SNAT/DNAT to global IP space if cluster CIDRs overlap
    • use internal cluster IPs as source IP for pods on host network (so other cluster will see it as cross cluster communication rather than external

@sridhargaddam
Copy link
Member

Basically, inorder to support Cross-Cluster communication and to preserve the source-ip of the PODs across the clusters, Submariner programs certain iptable rules (PFA the list of rules in the attached file). Submariner requires these rules to be applied prior to the rules programmed by the respective CNI. Also, JFYI the rules programmed by Submariner only affects the inter-cluster traffic and will not affect the local cluster traffic.

So, initially when Submariner is deployed, it installs certain rules ahead of existing rules programmed by CNI.

Sample example:
Chain POSTROUTING (policy ACCEPT 2 packets, 120 bytes)
num pkts bytes target prot opt in out source destination
1 8537 576K SUBMARINER-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0
2 16254 1039K cali-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0
3 1685K 110M KUBE-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0

However, after a brief amount of time, the SUBMARINER-POSTROUTING rule in POSTROUTING chain ceases to become the first rule in the chain and this affects the Submariner use-cases.

Chain POSTROUTING (policy ACCEPT 2 packets, 120 bytes)
num pkts bytes target prot opt in out source destination
1 16254 1039K cali-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0
2 8537 576K SUBMARINER-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0
3 1685K 110M KUBE-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0

iptables-save.txt

@sridhargaddam
Copy link
Member

Hm, yeah Calico is pretty aggressive in inserting its rules at the top, in order to prevent other applications from bypassing network policy enforcement, which would be bad.

I'm not too familiar with submariner - what do the submariner rules do / look like? We might be able to make changes to make these two play nicer together.

Thank you @caseydavenport for looking into it. Please see my comment above. Also, if you are interested in taking a look at the source-code in Submariner, here is the link -

func (r *Controller) programIptableRulesForInterClusterTraffic(remoteCidrBlock string) error {

@caseydavenport
Copy link

-A SUBMARINER-POSTROUTING -s 240.0.0.0/8 -o vx-submariner -j SNAT --to-source 10.242.0.1
-A SUBMARINER-POSTROUTING -s 10.242.0.0/16 -d 100.93.0.0/16 -j ACCEPT
-A SUBMARINER-POSTROUTING -s 100.93.0.0/16 -d 10.242.0.0/16 -j ACCEPT
-A SUBMARINER-POSTROUTING -s 10.242.0.0/16 -d 10.243.0.0/16 -j ACCEPT
-A SUBMARINER-POSTROUTING -s 10.243.0.0/16 -d 10.242.0.0/16 -j ACCEPT

Right, so in the example file you gave above it looks like these are the rules that submariner programs (more or less, probably varies a bit by cluster).

The first seems to SNAT the cross cluster traffic, like you're describing. This is probably OK - it's only performed on the encapsulated packet, leaving the original pod source IP intact, IIUC?

The other rules afterwards appear to be allowing traffic from other clusters that are destined to local cluster CIDRs - is that right? Those ones are more worrying to me, because if they are put in front of Calico's rules they will accept all pod-to-pod traffic, bypassing Calico's policy enforcement. What are those rules for?

@sridhargaddam
Copy link
Member

Sorry, @caseydavenport I missed your comment. PSB

When you deploy a KIND based cluster in Submariner, it deploys three clusters one as an independent broker (i.e., cluster1) and the other two as data clusters (cluster2 and cluster3)

Cluster2 network details:
  Network plugin:  weave-net
  Service CIDRs: [100.92.0.0/16]
  POD/Cluster CIDRs: [10.242.0.0/16]

Cluster3 network details:
   Network plugin:  weave-net
   Service CIDRs: [100.93.0.0/16]
   POD/Cluster CIDRs: [10.243.0.0/16]

Submariner programs the following rules in the SUBMARINER-POSTROUTING chain of NAT table on Cluster2 nodes.

Chain SUBMARINER-POSTROUTING (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 SNAT       all  --  *      vx-submariner  240.0.0.0/8   0.0.0.0/0     to:10.242.224.0
2        0     0 ACCEPT     all  --  *      *       10.242.0.0/16        100.93.0.0/16       
3        0     0 ACCEPT     all  --  *      *       10.242.0.0/16        10.243.0.0/16
4        0     0 ACCEPT     all  --  *      *       100.93.0.0/16        10.242.0.0/16       
<SNIP> 

In the above SUBMARINER-POSTROUTING chain, rule number 1 is to support connectivity from Host itself (or a POD that uses HostNetworking=true) to a remoteService in another submariner connected cluster.
For more details, please see this issue - #298

Rule number 2/3/4 are mainly to allow inter-cluster traffic (i.e., POD to POD or POD to remoteService) between the submariner connected clusters.
Submariner preserves the source-ip of the traffic for inter-cluster traffic.
As you can see, the source/destination IPs correspond to POD/Service CIDRs of the connected clusters.

Regarding Network Policies, as you know, not all CNIs support NetworkPolicies. We verified Submariner+K8sNetworkPolicies with weavenet as well as openshift-sdn and it works as expected.
The iptable rules programmed by submariner does not bypass the networkpolicy rules (atleast with weavenet/openshift-sdn).

Let me take an example of Weavenet CNI. In Weave, when we program NetworkPolicies, they are translated to iptable rules in Filter table, forwarding chain (i.e., WEAVE-NPC-INGRESS).
Since the FORWARD chain is hit prior to SUBMARINER-POSTROUTING, NetworkPolicies work as expected.
Does Calico program policy enforcement rules similar to weave or is it totally different?

Iptable rules with weavenet:

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1       36 12631 ACCEPT     all  --  *      vx-submariner  0.0.0.0/0            0.0.0.0/0           
2      486 87079 WEAVE-NPC-EGRESS  all  --  weave  *       0.0.0.0/0            0.0.0.0/0            /* NOTE: this must go before '-j KUBE-FORWARD' */
3      408  164K WEAVE-NPC  all  --  *      weave   0.0.0.0/0            0.0.0.0/0            /* NOTE: this must go before '-j KUBE-FORWARD' */
4       10   600 NFLOG      all  --  *      weave   0.0.0.0/0            0.0.0.0/0            state NEW nflog-group 86
<SNIP>

Chain WEAVE-NPC (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1      374  162K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
2        0     0 ACCEPT     all  --  *      *       0.0.0.0/0            224.0.0.0/4         
3       13   802 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            PHYSDEV match --physdev-out vethwe-bridge --physdev-is-bridged
4       21  1260 WEAVE-NPC-DEFAULT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW
5       17  1020 WEAVE-NPC-INGRESS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW

Chain WEAVE-NPC-INGRESS (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        3   180 ACCEPT     tcp  --  *      *       10.243.224.1         0.0.0.0/0            match-set weave-KN[_+Gl.dlb1q$;v4h!E_Sg)( dst tcp dpt:80 /* cidr: 10.243.224.1/32 -> pods: namespace: default, selector: run=nginx (ingress) */
2        2   120 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            match-set weave-jn=@U_tr~?=n2^eQx*wmb{|o2 src match-set weave-KN[_+Gl.dlb1q$;v4h!E_Sg)( dst tcp dpt:80 /* pods: namespace: default, selector: run=netshoot-2-1 -> pods: namespace: default, selector: run=nginx (ingress) */

@nyechiel
Copy link
Member

@sridhargaddam @caseydavenport can you share what it the latest status of this Issue? We just had one more user asking about Calico support on Slack. Thanks!

@sridhargaddam
Copy link
Member

@caseydavenport, Let me know if you need any additional info regarding Submariner. I'm (along with the Submariner team) are available on #submariner slack channel on Kubernetes. If required, we can also have a Bluejeans session. Thanks.

@caseydavenport
Copy link

Really sorry for the long delay on this one guys... too much going on for as long as I can remember 😅

Does Calico program policy enforcement rules similar to weave or is it totally different?

No, you're correct here in that Calico does most of its policy enforcement in the FILTER table when running with the iptables dataplane driver. So at that level is fairly similar to how weavenet is programming its policy.

There are three classes of traffic that Calico handles in the cali-POSTROUTING chain.

  1. cali-fip-snat - Calico's "floating IPs" feature - typically will do nothing unless you've explicitly enabled this via an annotation, so not what we're hitting here. Additionally, this is only needed for pods that end up sending traffic to themselves via such a floating IP (described more here).

  2. cali-nat-outgoing - rules which match traffic sourced from pods within the cluster to destinations outside of the cluster and performs NAT. Rules look something like this:

    [1:60] -A cali-nat-outgoing -m comment --comment "cali:flqWnvo8yq4ULQLa" -m set --match-set cali40masq-ipam-pools src -m set ! --match-set cali40all-ipam-pools dst -j MASQUERADE
    

    Whether or not this is occurs is configured on Calico's IP pool API via the NAT outgoing configuration option on particular CIDRs. Calico won't perform NAT on traffic destined to CIDRs that it is told about via this IP pool API.

  3. Tunnel specific NAT - Calico will SNAT locally originated packets that are being sent down Calico's own VXLAN / IPIP / Wireguard tunnels to ensure the correct source address is used. More on that in the code comment here.

Given that:

  • the cali-fip-snat rules (1 above) are not relevant here I think, and probably don't even exist on the cluster.
  • the cali-nat-outgoing rules (2 above) should only apply to packets from within a Calico IP pool destined to something outside of a Calico IP pool and will masquerade the traffic. This could occur for packets from pods in the local cluster to pods/services in another cluster connected via Submariner, and is probably not desired. You could try creating disabled Calico IP pools which include the CIDRs of the other clusters to see if that mitigates this rule, as a quick fix. Disabling them prevents them from being used for local pods.
  • the tunnel specific NAT (3 above), which I think is also not relevant here because it should only match traffic from LOCAL addresses that are destined out the Calico tunnel interface(s) - in other words, to a Kubernetes pod.

Does that make sense? Am I missing something? I joined the #submariner channel in Slack so we can continue the conversation there if you'd like!

@sridhargaddam
Copy link
Member

Really sorry for the long delay on this one guys... too much going on for as long as I can remember

No problem @caseydavenport and thank you for the detailed info.

Does that make sense? Am I missing something? I joined the #submariner channel in Slack so we can continue the conversation there if you'd like!

Yes, it makes sense and we can give it a try to see if it works.
Let's take a simple example of two clusters with the following configuration.
Screenshot from 2020-08-14 23-28-37

So, if I understood your suggestion properly, you are suggesting that we can create the following IPPool on WestCluster for allowing connectivity to Services in EastCluster (similarly on the East cluster). Is this correct?
In case something is missing/wrong, please let us know. Thanks.

cat > remoteClusterSubnets.yaml <<EOF
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: remotesubnets
spec:
  cidr: 100.93.0.0/16
  natOutgoing: false
  disabled: true
EOF

calicoctl create -f remoteClusterSubnets.yaml

@caseydavenport
Copy link

@sridhargaddam yes, that IP pool looks right to me! Assuming my understanding of the issue is correct, that should configure Calico to stop performing NAT on traffic destined to pods in the other cluster.

@sridhargaddam sridhargaddam self-assigned this Aug 17, 2020
@mangelajo mangelajo added this to the 0.6.0 milestone Aug 21, 2020
@sridhargaddam
Copy link
Member

@caseydavenport, as discussed, I configured necessary IPPools with remoteCIDRs in two KIND Calico Clusters joined with Submariner and can see that a Pod in West Cluster is able to talk to Pod/Service in East Cluster :-)
Also, as expected, the IPaddress of the sourcePod is preserved when the traffic reaches the destination Pod.
Thanks a lot, @caseydavenport for all the inputs.

However, when I tried to deploy Submariner Globalnet (a feature that allows joining clusters with Overlapping CIDRs), with default globalnet-cidr* which is 169.254.0.0/16, I'm not able to create the IPPool with 169.254.0.0/16 CIDR. I'm getting the following error.

"Failed to create 'IPPool' resource: [error with field IPPool.Spec.CIDR = '169.254.0.0/19' (IPPool CIDR overlaps with IPv4 Link Local range 169.254.0.0/16)]"

This is not a major issue since we can override the --globalnet-cidr (to non-IPv4 LLA) when installing Submariner. But I'm interested to know if there is a way to allow this in Calico. Thanks once again.

@caseydavenport
Copy link

Hm, interesting. Calico uses some link-local addresses itself so it prevents them from being allocated as pod IPs.

I'll do some more reading on the Globalnet functionality, but for now selecting a different CIDR seems like a reasonable workaround.

@nyechiel
Copy link
Member

nyechiel commented Oct 7, 2020

This was tested with v0.6, see https://submariner.io/deployment/calico/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants