Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Cilium CNI for ambient #44198

Closed
bleggett opened this issue Mar 31, 2023 · 12 comments
Closed

Support Cilium CNI for ambient #44198

bleggett opened this issue Mar 31, 2023 · 12 comments
Assignees
Labels
area/ambient Issues related to ambient mesh area/environments area/networking lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically.

Comments

@bleggett
Copy link
Contributor

bleggett commented Mar 31, 2023

Bug Description

Currently Ambient (whether with default iptables or experimental eBPF redir) doesn't work with Cilium as the CNI layer.

We should support Cilium with at least one form of CNI - it's supported and used by both GKE and AKS, at least.

Slack context: https://istio.slack.com/archives/C049TCZMPCP/p1679416770902909

@bleggett
Copy link
Contributor Author

I believe @costinm is looking at this, so will tag him for now.

@costinm
Copy link
Contributor

costinm commented Apr 1, 2023

I am looking more on how to support ambient on unsupported CNI providers ( no admin permissions, interception problems like Cilium, etc), by continuing to inject ztunnel as a L4 proxy and using the old iptables modes.

As mentioned in slack, one option we may explore for such CNI providers is to modify the default route on the pod so egress goes to ztunnel pod, and in ztunnel pod use the 'old style' interception so all traffic is captured. It may work for
all CNIs since it's not doing anything on the host level - unless they mess with the routing.

@yuval-k
Copy link
Contributor

yuval-k commented Apr 25, 2023

disadvantage of changing the default route of the pod as that you'll need to update it in all pods if ztunnel were to restart

@costinm
Copy link
Contributor

costinm commented Apr 25, 2023 via email

@bleggett
Copy link
Contributor Author

bleggett commented Apr 25, 2023

As mentioned in slack, one option we may explore for such CNI providers is to modify the default route on the pod so egress goes to ztunnel pod, and in ztunnel pod use the 'old style' interception so all traffic is captured. It may work for
all CNIs since it's not doing anything on the host level - unless they mess with the routing.

Yeah - aside from the update problem @yuval-k mentioned, I also don't like this because any pod with sufficient local perms can self-bypass redirection.

That's the price paid for no host rules, and that feels like a pretty steep price - I think redirection has to be "forcible" in the sense that it can be guaranteed to happen for everything leaving the pod netns regardless of intended dest/what the pod thinks is happening, or has local netns/cgroup-level permissions to alter.

AFAICT redirection rules (iptables or ebpf or whatever) must live outside the security context of the pod - or at least the ability to block/drop traffic that isn't tagged for redirection must live outside the security context of the pod.

@bleggett
Copy link
Contributor Author

bleggett commented Apr 25, 2023

Fundamentally, we cannot have two things fighting over host node redirection rules (eBPF or iptables, it doesn't really matter - it's the same problem in either case and CNI plugin chaining doesn't help with this problem).

Really the only options I see are:

  1. Have Istio CNI skip setting up certain host-side redirection things if Cilium (or other self-redirecting CNI) is in use, and document how the self-redirecting CNI can be used to set up/enforce the redirection Istio needs, or create a separate Cilium-only Istio CNI plugin (kludgy)
  2. Tell people using Istio and Cilium to avoid the use of some Cilium APIs (e.g. LocalRedirectPolicy) (also kludgy, slightly gross)
  3. Convince GAMMA + Istio + Cilium to take this up as a shared API (ideal, but that could take years as has been discussed)
  4. Come up with a DIY standard (basically (3) but without GAMMA) for making sure overlapping node-level redirection rules don't create problems and getting Istio and Cilium to both support it (eBPF chaining versus CNI chaining - technically possible, but there's no standard for how to do it - or some sort of generic API).

Just creating a completely standalone/dedicated Cilium Istio CNI flavor that perhaps interfaces with Cilium APIs and asks Cilium to set up the node-local redirection we want might be the best/simplest/most maintainable option (for both OSS and vendors that want the two to play nice) - nothing in Istio really cares who or what sets up pod->ztunnel redirection, we just need it to be in place.

@danehans
Copy link
Contributor

danehans commented Apr 25, 2023

disadvantage of changing the default route of the pod as that you'll need to update it in all pods if ztunnel were to restart

xref #43642 (comment) for using VRRP to provide ztunnel HA. Since the VRRP VIP can be used in the rules, no updates are required when switching from the active to the backup ztunnel.

I personally like 'simplest' option, and cordon and node upgrade seems to
be not only simplest but also safest,
and will be consistent with future 'ztunnel built into native CNI' or even
'ztunnel is the CNI' cases we discussed in
the past. Not sure if any vendor is doing live-upgades of the CNI.

+1 until the use case arises for a more elaborate solution.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Aug 4, 2023
@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Aug 9, 2023
@bleggett bleggett reopened this Aug 9, 2023
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Aug 9, 2023
@Joffref
Copy link

Joffref commented Aug 10, 2023

Hello everyone, I'm currently aiming to implement this feature inside Ambient. I've proposed a draft pull request focused on the implementation of what we can call an Ambient Plugin, this is an interface wrapping the call made by Ambient to the underlying CNI.

Also, I'm willing to implement Cilium support too along this interface as a PoC. I'm open to help on this one and if you have any recommendations feel free to comment under the PR

@therealmitchconnors
Copy link
Contributor

Reference to #46470

@illrill
Copy link

illrill commented Dec 8, 2023

Gloo Mesh by Solo.io did it. Perhaps we can expect a backport soon?

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Dec 11, 2023
@bleggett
Copy link
Contributor Author

Backport PRs tracked here: #48212

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Dec 11, 2023
@linsun
Copy link
Member

linsun commented Jan 29, 2024

Can i close this now? should be fixed now due to #48212

@linsun linsun closed this as completed Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh area/environments area/networking lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically.
Projects
Status: Done
Development

No branches or pull requests

9 participants