Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Calico CNI for ambient #40973

Closed
irisdingbj opened this issue Sep 14, 2022 · 16 comments
Closed

Support Calico CNI for ambient #40973

irisdingbj opened this issue Sep 14, 2022 · 16 comments
Labels
Ambient Beta Must have for Beta of Ambient Mesh area/ambient Issues related to ambient mesh

Comments

@irisdingbj
Copy link
Member

irisdingbj commented Sep 14, 2022

Bug Description

  1. Have ambient profile installed on a single node k8s cluster as below:
NAME                                    READY   STATUS    RESTARTS   AGE
istio-cni-node-7h47k                    1/1     Running   0          5d18h
istio-ingressgateway-77ff84cdf6-2mdbp   1/1     Running   0          5d18h
istiod-576df488d5-j857h                 1/1     Running   0          5d18h
ztunnel-hd4wj                           1/1     Running   0          5d18h
  1. Have deployed applications following the blog post in default ns
$ kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
$ kubectl apply -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/sleep.yaml
$ kubectl apply -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/notsleep.yaml
  1. sleep pod can communicate with product page without issue before joined ambient mesh:
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
<!DOCTYPE html>
  1. Join default ns into ambient mesh:
kubectl label namespace default istio.io/dataplane-mode=ambient
namespace/default labeled
  1. sleep pod CAN NOT communicate with product page :
kubectl exec deploy/sleep -- curl -s -vvv http://productpage:9080/ | head -n1
* Could not resolve host: productpage
* Closing connection 0
command terminated with exit code 6
  1. sleep pod CAN NOT communicate with product page via pod ip:
kubectl exec deploy/sleep -- curl -s -vvv http://172.16.138.12:9080/
*   Trying 172.16.138.12:9080...
* connect to 172.16.138.12 port 9080 failed: Operation timed out
* Failed to connect to 172.16.138.12 port 9080: Operation timed out
* Closing connection 0
command terminated with exit code 28

Version

istioctl version
client version: 0.0.0-ambient.191fe680b52c1754ee72a06b3e0d3f9d116f2e82
control plane version: 0.0.0
data plane version: 0.0.0-ambient.191fe680b52c1754ee72a06b3e0d3f9d116f2e82 (2 proxies)

Additional Information

No response

@istio-policy-bot istio-policy-bot added the area/ambient Issues related to ambient mesh label Sep 14, 2022
@irisdingbj
Copy link
Member Author

Thanks @howardjohn who pointed to me this is due to Calico CNI will revert IPtable rules.

@howardjohn howardjohn changed the title Pods on the same node CAN NOT communicate when joined ambient mesh Support Calico CNI for ambient Sep 15, 2022
@PlatformLC
Copy link
Contributor

@dhawton, is there any detail or status update for supporting either calico or other bridging CNI plugin rather than kind CNI? Any materials or information could be shared will be appreciated.

@dhawton
Copy link
Member

dhawton commented Oct 17, 2022

@dhawton, is there any detail or status update for supporting either calico or other bridging CNI plugin rather than kind CNI? Any materials or information could be shared will be appreciated.

Not at this point. We do support the CNIs in GKE (non-Calico and non-DPv2) and EKS... but I do not have anything more I can share at this moment.

@dcw329
Copy link

dcw329 commented Nov 25, 2022

I saw note that GKE with Calico does not work. I believe the default is deployment is with calico. Is there an alternative that should be used in Dev to test ambient with?

@howardjohn
Copy link
Member

howardjohn commented Nov 25, 2022 via email

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Feb 24, 2023
@dhawton dhawton removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Feb 24, 2023
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Feb 24, 2023
@dhawton dhawton added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Feb 24, 2023
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Feb 24, 2023
@howardjohn howardjohn removed the P1 label Jul 18, 2023
@hzxuzhonghu hzxuzhonghu added Ambient Beta Must have for Beta of Ambient Mesh and removed lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed labels Sep 12, 2023
@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Sep 12, 2023
@dhawton dhawton removed the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Sep 12, 2023
@dhawton dhawton reopened this Sep 12, 2023
@DANic-git
Copy link

Hello,

Is there any way to use ambient with calico?

@chenwng
Copy link

chenwng commented Oct 17, 2023

Hello,

Is there any way to use ambient with calico?

The following may help.

  • Upgrade the calico version to the latest.
  • Modify chainInsertMode to Append and workloadSourceSpoofing to Any in Calico FelixConfiguration.
  • Add the annotation cni.projectcalico.org/allowedSourcePrefixes: '["0.0.0.0/0"]' to your workload.

@WoodyWoodsta
Copy link

I'm attempting to lay istio over calico in my clusters. Calico unfortunately only supports 1.15 - and it appears as though Ambient would solve a lot of the knitting that has to be done to integrate the two technologies.

Would be great if ambient supported the current calico project!

@KfreeZ
Copy link
Contributor

KfreeZ commented Oct 24, 2023

Hello,
Is there any way to use ambient with calico?

The following may help.

  • Upgrade the calico version to the latest.
  • Modify chainInsertMode to Append and workloadSourceSpoofing to Any in Calico FelixConfiguration.
  • Add the annotation cni.projectcalico.org/allowedSourcePrefixes: '["0.0.0.0/0"]' to your workload.

It works for me,
I see the annotation is included in https://github.com/istio/istio/blob/master/manifests/charts/ztunnel/templates/daemonset.yaml, so this step is not mandatory for the latest release.

@chenwng
Copy link

chenwng commented Oct 24, 2023

Hello,
Is there any way to use ambient with calico?

The following may help.

  • Upgrade the calico version to the latest.
  • Modify chainInsertMode to Append and workloadSourceSpoofing to Any in Calico FelixConfiguration.
  • Add the annotation cni.projectcalico.org/allowedSourcePrefixes: '["0.0.0.0/0"]' to your workload.

It works for me, I see the annotation is included in https://github.com/istio/istio/blob/master/manifests/charts/ztunnel/templates/daemonset.yaml, so this step is not mandatory for the latest release.

Do you mean the 3rd step is not necessary?

The issue I encountered was not with ztunnel pods but with my workloads. Without the annotation added to my workloads, egress traffic of my workload pods were dropped due to RPF. I believe 103 ip rule and routes in table 100 were causing this RPF failure. So I had to add the annotation to my workloads to bypass the RPF check by calico.

ip rule 
0:      from all lookup local
100:    from all fwmark 0x200/0x200 goto 32766
101:    from all fwmark 0x100/0x100 lookup 101
102:    from all fwmark 0x40/0x40 lookup 102
103:    from all lookup 100
...
ip route sh table 100 
192.168.1.69 dev cali30566341ea4 scope link 
192.168.1.76 via 192.168.126.2 dev istioin src 172.18.0.5

I suppose this 103 ip rule should be modified as it makes all egress traffic of pods served in ambient mode fail the RPF check.

@KfreeZ
Copy link
Contributor

KfreeZ commented Oct 24, 2023

Hello,
Is there any way to use ambient with calico?

The following may help.

  • Upgrade the calico version to the latest.
  • Modify chainInsertMode to Append and workloadSourceSpoofing to Any in Calico FelixConfiguration.
  • Add the annotation cni.projectcalico.org/allowedSourcePrefixes: '["0.0.0.0/0"]' to your workload.

It works for me, I see the annotation is included in https://github.com/istio/istio/blob/master/manifests/charts/ztunnel/templates/daemonset.yaml, so this step is not mandatory for the latest release.

Do you mean the 3rd step is not necessary?

The issue I encountered was not with ztunnel pods but with my workloads. Without the annotation added to my workloads, egress traffic of my workload pods were dropped due to RPF. I believe 103 ip rule and routes in table 100 were causing this RPF failure. So I had to add the annotation to my workloads to bypass the RPF check by calico.

ip rule 
0:      from all lookup local
100:    from all fwmark 0x200/0x200 goto 32766
101:    from all fwmark 0x100/0x100 lookup 101
102:    from all fwmark 0x40/0x40 lookup 102
103:    from all lookup 100
...
ip route sh table 100 
192.168.1.69 dev cali30566341ea4 scope link 
192.168.1.76 via 192.168.126.2 dev istioin src 172.18.0.5

I suppose this 103 ip rule should be modified as it makes all egress traffic of pods served in ambient mode fail the RPF check.

Yes, I did not apply step 3 because I am using the ebpf redirection mode, so I think the ip rules is not a problem for me, maybe you could give it a try :)

@chenwng
Copy link

chenwng commented Oct 24, 2023

Hello,
Is there any way to use ambient with calico?

The following may help.

  • Upgrade the calico version to the latest.
  • Modify chainInsertMode to Append and workloadSourceSpoofing to Any in Calico FelixConfiguration.
  • Add the annotation cni.projectcalico.org/allowedSourcePrefixes: '["0.0.0.0/0"]' to your workload.

It works for me, I see the annotation is included in https://github.com/istio/istio/blob/master/manifests/charts/ztunnel/templates/daemonset.yaml, so this step is not mandatory for the latest release.

Do you mean the 3rd step is not necessary?
The issue I encountered was not with ztunnel pods but with my workloads. Without the annotation added to my workloads, egress traffic of my workload pods were dropped due to RPF. I believe 103 ip rule and routes in table 100 were causing this RPF failure. So I had to add the annotation to my workloads to bypass the RPF check by calico.

ip rule 
0:      from all lookup local
100:    from all fwmark 0x200/0x200 goto 32766
101:    from all fwmark 0x100/0x100 lookup 101
102:    from all fwmark 0x40/0x40 lookup 102
103:    from all lookup 100
...
ip route sh table 100 
192.168.1.69 dev cali30566341ea4 scope link 
192.168.1.76 via 192.168.126.2 dev istioin src 172.18.0.5

I suppose this 103 ip rule should be modified as it makes all egress traffic of pods served in ambient mode fail the RPF check.

Yes, I did not apply step 3 because I am using the ebpf redirection mode, so I think the ip rules is not a problem for me, maybe you could give it a try :)

Thanks. I was using the default iptables mode and haven't tried ebpf mode.

@craigbox
Copy link
Contributor

Will #48212 address this?

@dhawton
Copy link
Member

dhawton commented Dec 21, 2023

Will #48212 address this?

Yes

@linsun
Copy link
Member

linsun commented Jan 29, 2024

This is resolved, please try latest master or release 1.21 build, or wait for the official 1.21 release. see #48212

@linsun linsun closed this as completed Jan 29, 2024
@linsun
Copy link
Member

linsun commented Jan 29, 2024

Sorry, just got confirmation from Eric that the change is not in beta 0 yet, but should be in -beta.1 since it was merged after beta.0 was released. I expect that the new build will be available in a day or 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ambient Beta Must have for Beta of Ambient Mesh area/ambient Issues related to ambient mesh
Projects
Status: Done
Development

No branches or pull requests