Networking breaks when using cilium with strict kube-proxy replacement #27619

StefanCenusa · 2020-09-29T12:05:26Z

Bug description
This issue is related to a thread I've started on istio's slack channel.

It affects inter-service telemetry, but it might impact other features as well, because traffic is not treated as http where it should be.

My setup is a kubernetes 1.18 cluster with cilium 1.8 configured with kubeProxyReplacement=strict. This means that the kube-proxy component from kubernetes is replaced and cilium handles its duties.
I'm not an expert in how cilium works, but this mode should improve service to service communication (and more networking) leveraging eBPF functionality.

I have noticed (using tcpdump) that when this mode is enabled, if from one pod I make requests to another service (eg. curl http://servicename.namespace), the connections are "magically" made directly to destination pod (pod-ip:target-port), rather then going through the ClusterIP of the destination service. I don't know the internals of how istio-proxy is configured or how metadata filter works, but this behaviour seem to make istio-proxy into thinking that requests go directly to the pod-ip:container-port, thus no route from istio config is matched, going through some default tcp path.

[ ] Docs
[ ] Installation
[X] Networking
[ ] Performance and Scalability
[X] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Expected behavior
Istio metadata headers to be added to requests even when using cilium's kube-proxy solution

Steps to reproduce the bug
To test this bug, I've installed kube-proxy and changed cilium's config to kubeProxyReplacement=partial and inter-service telemetry started to work, no changes to istio setup at all. Moreover, when inspecting traffic between pods using tcpdump, I was able to see some packages being sent to the ClusterIP of the destination service (this didn't happen previously).

Version (include the output of istioctl version --remote and kubectl version --short and helm version if you used Helm)

client version: 1.7.2
control plane version: 1.7.2
data plane version: 1.7.2 (1225 proxies)

How was Istio installed?
istioctl and istio-operator

Environment where bug was observed (cloud vendor, OS, etc)
aws, kops cluster

I am not able to share a config dump of my proxies, but this issue can be reproduced on any test cluster with cilium configured as described.

The text was updated successfully, but these errors were encountered:

howardjohn · 2020-09-29T14:01:57Z

This essentially bypasses Istio load balancing. We do not support routing "direct to pod ip" traffic like they are doing

StefanCenusa · 2020-09-29T15:35:39Z

@howardjohn Thanks for the quick reply! It makes sense to be this way. Unfortunately I've lost a lot of time debugging this issue 😔 Maybe this info will be useful to other istio users that face similar issues or have similar setups.
I would suggest adding this information somewhere in istio's "getting started" or "setup" docs 😄

howardjohn · 2020-09-29T15:48:28Z

@StefanCenusa I think that makes sense. We should add the "direct to pod ip" to https://istio.io/latest/docs/ops/deployment/requirements/ and the cilium specific part (not sure where). I wasn't aware that is what the cilium no kube-proxy mode did, so good to know

StefanCenusa · 2020-09-29T15:55:26Z

Regarding Cilium, I wasn't aware either. But now I found this in their docs:

Host-reachable services act transparent to Cilium’s lower layer datapath in that upon connect system call (TCP, connected UDP) or sendmsg as well as recvmsg (UDP) the destination IP is checked for an existing service IP and one of the service backends is selected as a target, meaning, while the application is assuming its connection to the service address, the corresponding kernel’s socket is actually connected to the backend address and therefore no additional lower layer NAT is required.

Thanks again! Should I close this issue or leave it open for the docs improvement part?

howardjohn · 2020-09-29T15:56:34Z

Sorry added the wrong label - lets keep this open.

tgraf · 2020-09-29T16:43:45Z

We have just been made aware of this. We can definitely add to our docs to disable host-services to skip the load-balancing decision. Before doing, that I would like to understand a bit better why translating ClusterIP to pod IP prior to Istio seeing it is causing a problem.

Context: I'm a Cilium developer.

howardjohn · 2020-09-29T16:47:19Z

@tgraf Istio routing is based on:

HTTP traffic - we look only at host header, by binding to 0.0.0.0:<PORT> and doing routing in Envoy Routes.
Non HTTP traffic - we create listeners for <SERVICE_VIP>:<PORT>.
Auto detected traffic (no protocol specified) is also using <SERVICE_VIP>:<PORT> listeners.

So for HTTP (explicitly declared only) will work (both theoretically and I confirmed this) as the routing doesn't take the IP into account, but everything else is not going to hit the appropriate configuration.

mandarjog · 2020-09-29T17:34:39Z

@tgraf does disabling host services translation significantly reduce functionality provided by cilium?

tgraf · 2020-09-29T18:04:36Z

@tgraf does disabling host services translation significantly reduce functionality provided by cilium?

No, it does not. The functionality when implemented on the network level is the same, just with a much higher overhead as each network packet has to be altered.

We will point it out in the Cilium documentation that host services are currently not compatible with Istio due to the way Istio is doing redirect.

On that note, what if Cilium did the redirect of all traffic? Similar to host services can currently bypass Istio, it can also be used in favor of Istio to redirect traffic to the sidecar.

Non HTTP traffic - we create listeners for <SERVICE_VIP>:.

This then means that headless services are not compatible either if the DNS directly returns pod IPs?

howardjohn · 2020-09-29T18:07:29Z

This then means that headless services are not compatible either if the DNS directly returns pod IPs?

Good point - for headless services specifically we create listeners for each pod IP. The assumption we are depending on here, and why we do this only for headless services, is that users will not have massive headless services as this will lead to scalability bottlenecks.

costinm · 2020-09-29T21:51:50Z

How would Cilium redirect all traffic to Istio ? We do need some metadata - like original DST:port - that we get using a system call. We have discussed different ways to get this metadata - like a 'proxy protocol' prefix, or tunneled in H2 CONNECT - but it'll take some time.

tgraf · 2020-10-09T07:53:40Z

How would Cilium redirect all traffic to Istio ? We do need some metadata - like original DST:port - that we get using a system call. We have discussed different ways to get this metadata - like a 'proxy protocol' prefix, or tunneled in H2 CONNECT - but it'll take some time.

We have already built this using TPROXY so you will see the original network headers. This is also how Cilium redirects packets to Envoy when Istio is not in play. In the past, we have also shared metadata with Envoy and have built an eBPF metadata reader in Envoy to share arbitrary data between the Cilium datapath and Envoy.

mandarjog · 2020-10-09T14:18:02Z

@tgraf is this available in OSS and does it require CAP_NET_ADMIN to read eBPF metadata?

tgraf · 2020-10-09T19:02:19Z

@tgraf is this available in OSS

Yes, all of this is available as of Cilium today.

and does it require CAP_NET_ADMIN to read eBPF metadata?

Yes, when using TPROXY with IP_TRANSPARENT it will require CAP_NET_ADMIN. We also used to have code to redirect to a known listener port on Envoy which does not require CAP_NET_ADMIN.

howardjohn · 2021-01-08T05:13:32Z

Not stale

…

On Thu, Jan 7, 2021, 9:05 PM Istio Policy Bot ***@***.***> wrote: 🧭 This issue or pull request has been automatically marked as stale because it has not had activity from an Istio team member since 2020-10-09. It will be closed on 2021-01-22 unless an Istio team member takes action. Please see this wiki page <https://github.com/istio/istio/wiki/Issue-and-Pull-Request-Lifecycle-Manager> for more information. Thank you for your contributions. *Created by the issue and PR lifecycle manager*. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27619 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXL5DLU7NO5QW2QJEULSY2HAZANCNFSM4R5WUUDA> .

brb · 2021-10-18T07:46:52Z

Just a heads-up - we released the fix (cilium/cilium#17154) in Cilium v1.10.5 which allows Cilium's KPR to cooperate with Istio's dataplane. The cilium-agent option is named --bpf-lb-sock-hostns-only, while in Helm - hostServices.hostNamespaceOnly. You need to set it to true in order to prevent the cilium's socket LB from translating service connections. Otherwise, Istio sidecar proxy won't be able to determine the original destination.

dntosas · 2021-10-19T09:27:32Z

Just a heads-up - we released the fix (cilium/cilium#17154) in Cilium v1.10.5 which allows Cilium's KPR to cooperate with Istio's dataplane. The cilium-agent option is named --bpf-lb-sock-hostns-only, while in Helm - hostServices.hostNamespaceOnly. You need to set it to true in order to prevent the cilium's socket LB from translating service connections. Otherwise, Istio sidecar proxy won't be able to determine the original destination.

Applied above configuration on our clusters, all stuff work like a charm 🚀

howardjohn · 2022-08-24T18:16:02Z

Maybe we can add a check to istioctl precheck to ensure compatible cilium settings are used (bpf-lb-sock-hostns-only)

brb · 2022-08-25T08:04:31Z

Maybe we can add a check to istioctl precheck to ensure compatible cilium settings are used (bpf-lb-sock-hostns-only)

👍 We could make the setting accessible through Cilium's Config API.

alfsch · 2022-12-20T07:54:06Z

Some news on this?

howardjohn · 2022-12-20T16:09:08Z

@alfsch the only thing Istio can ever do about this is warn that its enabled. The option fundamentally breaks Istio. No news on a warning being added

brb · 2023-02-02T09:30:54Z

@howardjohn Is there an easy way to detect when Cilium runs with Istio?

howardjohn · 2023-02-02T15:45:54Z

Statically we can try to read Ciliums config and parse the configmap for this value, but I don't know how reliable that is (or stable). I do wonder if we could add a runtime check. We have a validation container that tests our iptables actually work though, i wonder if we could extend this... something like send a request to kubernetes.default and assert we get the ClusterIP instead of some other IP when we capture. WDYT?

One issue with that is we want to remove the validation container, but I think there will always be a place we can run it, it just may move

brb · 2023-02-03T09:57:45Z

I'm more interested in detecting a presence of Istio from Cilium PoV. I'd like to add a warning if the KPR / Socket LB are enabled when running with Istio.

howardjohn · 2023-02-03T16:06:56Z

Ah got it. At install time you probably have Cilium before Istio, so I would imagine you want it at runtime.

Some possible detection mechanisms:

istio-system namespace exists
Pod has istio-proxy container
pod has a container running as UID 1337 with name "envoy"?
Pod has 2 containers and some specific iptables rules
Pod has 2 containers and listens on some specific ports

I am not sure what type of data you have where you would run this, but I'd imagine we could find an approach that works

One note, this would break all sidecar service meshes, so you might want a more generic approach

alfsch · 2023-02-15T14:18:41Z

For all who have troubles with cilium in strict mode without kube-proxy.

From my point of view, the root cause for the issues was our cilium version < 12.x and linux kernel < 5.7 due to the usage of ubuntu 20.04 as kubernetes node image.

There were the same issues as described in http(s) communication across namespaces using services with port mappings to different destination ports. Using the same ports mitigated the problem for some pods, but not for all.
e.g. Java based service worked well, but python services with async lib had troubles.....

After updating to cilium 1.12.6 we still faced the issues, but after looking into the logs of the cilium pods, I found in the startup section warnings about possible problems with istio using a kernel lower than 5.7.

We updated the cluster nodes to a current ubuntu 22.04 with kernel 5.15.x and voilà, all our issues were gone. We even don't need the port mapping change to use the same destination ports.

The for us working command line to render the cilium chart is:

   helm template cilium cilium/cilium --version 1.12.6 --namespace` kube-system --set cni.exclusive=false --set operator.replicas=1 \
     --set hubble.tls.auto.method=cronJob --set hubble.relay.enabled=true --set hubble.ui.enabled=true \
     --set socketLB.hostNamespaceOnly=true --set nodePort.enabled=true --set hostPort.enabled=true \
     --set externalIPs.enabled=true --set bpf.masquerade=false \
     --set kubeProxyReplacement=strict --set k8sServiceHost=CHANGEME --set k8sServicePort=CHANGEME

kubernetes node:

System Info:
  Kernel Version:             5.15.0-60-generic
  OS Image:                   Ubuntu 22.04.1 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.16
  Kubelet Version:            v1.22.16

I found this, because I saw a warning during startup. @brb Is the feature you want already implemented?

boriscosic · 2023-04-28T05:49:43Z

Resolved: need kernel 5.7 at least
k logs -f cilium-vd6mb | grep cookie (arn:aws:eks:us-west-2:171997294028:cluster/ek8s-test/kube-system)
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
level=warning msg="Without network namespace cookie lookup functionality, BPF datapath cannot distinguish root and non-root namespace, skipping socket-level loadbalancing will not work. Istio routing chains will be missed. Needs kernel version >= 5.7" subsys=daemon

Still seeing this issue on AWS EKS 1.23:

Linux 5.4.238-148.347.amzn2.x86_64 #1 SMP Thu Apr 6 19:42:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Minimal install:

helm install cilium cilium/cilium \
    --version 1.13.2 \
    --namespace kube-system \
    --set kubeProxyReplacement=strict \
    --set socketLB.hostNamespaceOnly=true

On left hand side is the application pod and right hand is the istio-proxy sidecar of said application pod. tcpdump on istio-proxy shows requests going to endpoints of corends, intead of kube-dns.

Prior to replacing kube-proxy we were seeing correct calls:

20:20:22.043611 IP test-cf8977485-gfh4n.33835 > kube-dns.kube-system.svc.cluster.local.53: 10927+ PTR? 0.100.51.198.in-addr.arpa. (43)
20:20:22.046156 IP kube-dns.kube-system.svc.cluster.local.53 > test-cf8977485-gfh4n.33835: 10927 NXDomain 0/1/0 (125)

After replacing kube-proxy:

05:35:36.927550 IP test-cf8977485-xx76b.46530 > 10-0-0-15.kube-dns.kube-system.svc.cluster.local.domain: 24413+ A? address.internal.default.svc.cluster.local. (60)
05:35:36.927600 IP test-cf8977485-xx76b.46530 > 10-0-0-15.kube-dns.kube-system.svc.cluster.local.domain: 46683+ AAAA? address.internal.default.svc.cluster.local. (60)
05:35:36.928428 IP test-cf8977485-xx76b.48588 > ip-10-0-0-202.us-west-2.compute.internal.domain: 9899+ A? address.internal.svc.cluster.local. (52)

Any thoughts on what else to try? After installing cilium, nodes were terminated to ensure all configurations applied correctly.

gyutaeb · 2023-06-12T06:51:25Z

Kernel v5.7 is patched from torvalds/linux@f318903
Cilium need this patch when --set socketLB.hostNamespaceOnly=true.

christian-posta · 2023-06-15T19:25:01Z

Seems like we can close this?

boriscosic · 2023-06-18T12:40:51Z

Thanks @christian-posta @gyutaeb confirmed with EKS 1.24 (Kernel v5.10) this issue is resolved.

keithmattix · 2023-08-01T19:40:05Z

Closing

istio-policy-bot added area/extensions and telemetry area/networking labels Sep 29, 2020

howardjohn added kind/need more info Need more info or followup from the issue reporter and removed area/extensions and telemetry labels Sep 29, 2020

howardjohn added kind/docs and removed kind/need more info Need more info or followup from the issue reporter labels Sep 29, 2020

howardjohn added this to P1 in Prioritization Oct 23, 2020

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jan 8, 2021

istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jan 8, 2021

howardjohn added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Feb 10, 2021

bianpengyuan mentioned this issue Apr 16, 2021

Service Mesh can't through the dedicated outbound when service port not equal targetPort #32231

Closed

bianpengyuan mentioned this issue Sep 22, 2021

k8s cilium cni: outbound request connection's ip is pod ip, not svc ip. #35307

Closed

dntosas mentioned this issue Oct 10, 2021

mTLS is not working without kube-proxy? #35531

Closed

howardjohn changed the title ~~Telemetry v2 breaks when using cilium with strict kube-proxy replacement~~ Networking breaks when using cilium with strict kube-proxy replacement Aug 24, 2022

howardjohn added the community/help wanted label Aug 24, 2022

howardjohn mentioned this issue Sep 7, 2022

Having filter_chain_not_found when service port is different than container port #40833

Closed

howardjohn mentioned this issue Nov 17, 2022

mTLS not working with Sidecar defined #41978

Closed

User7845 mentioned this issue Jul 10, 2023

Azure CNI powered by Cilium Azure/AKS#3318

Closed

keithmattix closed this as completed Aug 1, 2023

Prioritization automation moved this from P1 to Done Aug 1, 2023

afscrome mentioned this issue Aug 11, 2023

Azure CNI & Istio MicrosoftDocs/azure-docs#113457

Closed

yylt mentioned this issue Sep 27, 2023

when kubeproxyreplace enable, cilium work with istio (sidecar mode) well cilium/cilium#28304

Closed

2 tasks

howardjohn mentioned this issue Feb 7, 2024

HTTP routing is not based on Host in Ambient #49149

Closed

2 tasks

zoftdev mentioned this issue Feb 9, 2024

To work with Istio, need option to disable ebpf in pod network namespace projectcalico/calico#8495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Networking breaks when using cilium with strict kube-proxy replacement #27619

Networking breaks when using cilium with strict kube-proxy replacement #27619

StefanCenusa commented Sep 29, 2020 •

edited by istio-policy-bot

howardjohn commented Sep 29, 2020

StefanCenusa commented Sep 29, 2020

howardjohn commented Sep 29, 2020

StefanCenusa commented Sep 29, 2020

howardjohn commented Sep 29, 2020

tgraf commented Sep 29, 2020 •

edited

howardjohn commented Sep 29, 2020

mandarjog commented Sep 29, 2020 •

edited

tgraf commented Sep 29, 2020

howardjohn commented Sep 29, 2020

costinm commented Sep 29, 2020

tgraf commented Oct 9, 2020

mandarjog commented Oct 9, 2020

tgraf commented Oct 9, 2020

howardjohn commented Jan 8, 2021 via email

brb commented Oct 18, 2021

dntosas commented Oct 19, 2021

howardjohn commented Aug 24, 2022

brb commented Aug 25, 2022

alfsch commented Dec 20, 2022

howardjohn commented Dec 20, 2022

brb commented Feb 2, 2023

howardjohn commented Feb 2, 2023

brb commented Feb 3, 2023

howardjohn commented Feb 3, 2023

alfsch commented Feb 15, 2023

boriscosic commented Apr 28, 2023 •

edited

gyutaeb commented Jun 12, 2023

christian-posta commented Jun 15, 2023

boriscosic commented Jun 18, 2023

keithmattix commented Aug 1, 2023

Networking breaks when using cilium with strict kube-proxy replacement #27619

Networking breaks when using cilium with strict kube-proxy replacement #27619

Comments

StefanCenusa commented Sep 29, 2020 • edited by istio-policy-bot

howardjohn commented Sep 29, 2020

StefanCenusa commented Sep 29, 2020

howardjohn commented Sep 29, 2020

StefanCenusa commented Sep 29, 2020

howardjohn commented Sep 29, 2020

tgraf commented Sep 29, 2020 • edited

howardjohn commented Sep 29, 2020

mandarjog commented Sep 29, 2020 • edited

tgraf commented Sep 29, 2020

howardjohn commented Sep 29, 2020

costinm commented Sep 29, 2020

tgraf commented Oct 9, 2020

mandarjog commented Oct 9, 2020

tgraf commented Oct 9, 2020

howardjohn commented Jan 8, 2021 via email

brb commented Oct 18, 2021

dntosas commented Oct 19, 2021

howardjohn commented Aug 24, 2022

brb commented Aug 25, 2022

alfsch commented Dec 20, 2022

howardjohn commented Dec 20, 2022

brb commented Feb 2, 2023

howardjohn commented Feb 2, 2023

brb commented Feb 3, 2023

howardjohn commented Feb 3, 2023

alfsch commented Feb 15, 2023

boriscosic commented Apr 28, 2023 • edited

gyutaeb commented Jun 12, 2023

christian-posta commented Jun 15, 2023

boriscosic commented Jun 18, 2023

keithmattix commented Aug 1, 2023

StefanCenusa commented Sep 29, 2020 •

edited by istio-policy-bot

tgraf commented Sep 29, 2020 •

edited

mandarjog commented Sep 29, 2020 •

edited

boriscosic commented Apr 28, 2023 •

edited