Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use Linkerd-CNI #7945

Open
BobyMCbobs opened this issue Feb 23, 2022 · 24 comments
Open

Unable to use Linkerd-CNI #7945

BobyMCbobs opened this issue Feb 23, 2022 · 24 comments

Comments

@BobyMCbobs
Copy link

What is the issue?

When Linkerd is installed with CNI enabled, Pod sandboxes fail to create.

How can it be reproduced?

linkerd install-cni | kubectl apply -f -
linkerd install --linkerd-cni-enabled | kubectl apply -f -

Logs, error output, etc

  Normal   Scheduled               37s   default-scheduler  Successfully assigned linkerd/linkerd-destination-54c8fb86c8-gwz6k to talos-192-168-122-140
  Warning  FailedCreatePodSandBox  36s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c0b4a8286046ccbfd565b4d74731bd12b43b5a6b5ad43558f5d3f30d198ad517": plugin type="linkerd-cni" name="linkerd-cni" failed (add): exec: "nsenter": executable file not found in $PATH
  Warning  FailedCreatePodSandBox  25s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bb75adddf6aa1a08bf372c422257b5fcf70c5aa4d510a78f82c5c17f361b3c55": plugin type="linkerd-cni" name="linkerd-cni" failed (add): exec: "nsenter": executable file not found in $PATH
  Warning  FailedCreatePodSandBox  9s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5f159b05d91979644c44d84aa8f384295721e42e8802439649f6a9cbaeef7c2f": plugin type="linkerd-cni" name="linkerd-cni" failed (add): exec: "nsenter": executable file not found in $PATH

output of linkerd check -o short

Linkerd core checks
===================

linkerd-existence
-----------------
× control plane pods are ready
    No running pods for "linkerd-destination"
    see https://linkerd.io/2/checks/#l5d-api-control-ready for hints

Status check results are ×

Environment

  • Kubernetes Version: v1.23.3
  • Cluster Environment: Bare metal
  • Host OS: Talos v0.15.0-alpha.2
  • Linkerd version: edge-22.2.2

Possible solution

No response

Additional context

Using Cilium as the CNI. Using Flannel makes no difference.

This happens both on amd64 in a VM and arm64 on Raspberry Pis.

My goal is to improve app start time by using the CNI plugin instead of the init containers.

If I run

linkerd upgrade --linkerd-cni-enabled=false | kubectl apply -f -

the CNI isn't used, and Linkerd Pods return back healthly.

Would you like to work on fixing this bug?

No response

@BobyMCbobs BobyMCbobs added the bug label Feb 23, 2022
@mateiidavid
Copy link
Member

mateiidavid commented Feb 23, 2022

Hey @BobyMCbobs, thanks for raising this. Is util-linux package present on your hosts? Linker'ds CNI plugin will set-up iptable rules in a pod's network namespace. afaict, the network namespace will be passed in as an argument by the CNI runtime and it's how we know where to run the firewall configuration.

If you find it odd for Talos not to have nsenter, you can ssh on the host and do echo $PATH; which nsenter, perhaps the path has been set wrong? I unfortunately can't help out with a repro since we do not have access to any Talos hosts.

For reference, I found this and associated issues: siderolabs/talos#4194, might be worth having a look through them?

@BobyMCbobs
Copy link
Author

Hey @mateiidavid,

Thank you for your reply!
Talos doesn't include nsenter on the host.

May I please have a link to how the CNI plugin uses nsenter?

@olix0r
Copy link
Member

olix0r commented Feb 23, 2022

@BobyMCbobs
Copy link
Author

@mateiidavid mateiidavid added area/cni and removed bug labels Feb 24, 2022
@BobyMCbobs
Copy link
Author

@olix0r, why is nsenter needed to call iptables?

I'm taking a look that the implementation, to expand on what you said: Is it correct that the CNI plugin uses nsenter on the host to exec into the network namespace of the Pod and set iptables rules in it?

https://github.com/linkerd/linkerd2/blob/main/cni-plugin/main.go#L241 -> https://github.com/linkerd/linkerd2-proxy-init/blob/a556ca400132106db279ce8c3a79003a766bf707/iptables/iptables.go#L212-L228

@mateiidavid
Copy link
Member

mateiidavid commented Feb 28, 2022

Hey @BobyMCbobs, this is how I understand things. When a pod is scheduled on a node, the container runtime (CRI) is responsible for creating, starting and stopping the pod. After a pod is first created (i.e the CRI creates its sandbox -- in other words linux namespace -- and network namespace), its networking stack has to be created. For a pod to accept and send traffic without NAT, it needs to communicate with the host through a veth interface and get an IP address assigned to it.

The CNI does all of this, and more. A networking stack (or simply put network ns in our case) is first created as a blank canvas, there are no routes, no rules, no devices, they're all added in by the different plugins. CNIs simly configure everything.

In our case, we need to set up iptables and to do it, we need to enter the network namespace of the pod that's just been created. If we simply execute iptables commands without entering the namespace, they'll be applied to the host. So, to kind of directly answer the question, without using nsenter, there isn't really a way to guarantee we set the rules for the pod, since the agent/daemonset runs on the host. Does this sort of make sense? Feel free to correct any of my points if I'm wrong.

Now, on to the solution: I think we'd be open to bundling util-linux with our CNI plugin. I suspect it'd be pretty easy, just add util-linux in the Dockerfile.

RUN apt-get update && apt-get install -y --no-install-recommends \
iptables \
jq && \
rm -rf /var/lib/apt/lists/*

There's no easy for us to test this solution with Talos so we'd need some additional help here, which is why we'd appreciate it a lot if you could contribute :D

To test, we could do the following:

  1. Fork, make changes to the image
  2. bin/docker-build to build the images
  3. Push to a registry, or if you can, push to your cluster's image registry
  4. Install linkerd-cni & linkerd, verify all works well (might have to override the registry here).

Wdyt?

@BobyMCbobs
Copy link
Author

Hey @mateiidavid,
Thank you for your reply.

Now, on to the solution: I think we'd be open to bundling util-linux with our CNI plugin. I suspect it'd be pretty easy, just add util-linux in the Dockerfile.

RUN apt-get update && apt-get install -y --no-install-recommends \
iptables \
jq && \
rm -rf /var/lib/apt/lists/*

There's no easy for us to test this solution with Talos so we'd need some additional help here, which is why we'd appreciate it a lot if you could contribute :D

To test, we could do the following:

  1. Fork, make changes to the image
  2. bin/docker-build to build the images
  3. Push to a registry, or if you can, push to your cluster's image registry
  4. Install linkerd-cni & linkerd, verify all works well (might have to override the registry here).

Wdyt?

I gave this a go, and there doesn't appear to be any difference.
Since the CNI runs on the host through the kubelet, it will depend on the host binaries.
Took a look at how Cilium uses network namespaces and iptables:

I know for sure that if it were possible to have just the binary of the cni contain everything it needs, it would for sure work what ever the environment. I'll keep look around for what's possible.

Keen to have this work! I'm more than happy to contribute what I can!

@mateiidavid
Copy link
Member

No probs :) to be clear, the way I understand CNIs: the plugin is a binary on the host that gets called by the kubelet, in that sense, our plugin will also call the iptables binary on the host, it doesn't do it in a container. We need to run it in the pod's network namespace though, which is different. I guess that's why the initial solution didn't really work. You're right that packaging it with the container won't work (unless we copy the binary on the host).

Cilium might have a different use case for iptables and firewall configuration, and perhaps that's why it is run in the host's namespace. For example, looking at the docs it seems to be used for kube-proxy interop (kube-proxy in most cases is just a big collection of iptables rules itself). It would make sense that in these scenarios you can run it in the host ns.

For us though, the usecase for iptables is different. We want to make sure that we set up routing rules for each pod's network ns in such a way that allows the proxy to take over packets -- we do not want, however, the host to have the same config -- running in the same network ns as the pods is a bit of a necessity afaik (and as far as I can tell).

We can programatically enter the namespace, as opposed to using the nsenter wrapper. I'm a bit apprehensive to go down this route though, I think we started using the wrapper for a good reason, the folks from weave published an article about Go not working extremely well with network namespaces here. Idk, maybe we can think of something here but I'd avoid it if we can.

Hm, with all this being said, not sure what we can do as a solution here. Our container that runs as an agent on the host is basically a bash script that copies over the plugin binary in the right location (and creates a network config file). Wonder if there's anything we can do in the install script 🤔

@frezbo
Copy link

frezbo commented May 26, 2022

assuming that the linkerd pod runs with CAP_NET_ADMIN it could directly do a nsenter from the pod itself to other other pods network ns, removing the need for nsenter to be present on the host. Is this a limitation due to how CNI binaries pass information through stdin/stdout. Trying to understand the the need for nsenter when it could be done from the pod itself

@mateiidavid
Copy link
Member

mateiidavid commented May 26, 2022

@frezbo that's true, the CNI binary itself could enter the namespace programatically, however, there are two points to consider here:

  • First, we'd ideally avoid giving the CNI DaemonSet any special permissions, such as CAP_NET_ADMIN. A big part of why people opt to go for CNI plugins (as opposed to init containers) is that they can avoid granting it any special permissions. As far as I understand the landscape though, this wouldn't be a big issue. The plugin itself will run on the host (the CNI DaemonSet/Linkerd pod that we run will not actually be responsible for setting up any iptables specific rules, it will just copy over the plugin binary and relevant config on the host). I'm not sure what permissions CNI plugin binaries have, but I'd say the situation is very permissive since they have to set up network interfaces? If you have any material I can read here re: permissions, it'd be super appreciated. If we could enter the network namespace from the pod itself, the solution would be much easier: we could just bundle up nsenter in the container ourselves.

  • Second, switching Linux namespaces in Go seems to be an unsafe operation. I think this is the main reason for calling nsenter as opposed to doing it ourselves. I'm not sure if the problem still persists -- it has been a while since the article was written -- but given that the issue surfaced from how Go's goroutine scheduling works, I'd say it's still relevant?

Does this make sense and line up with what you know about the space? We'd still be very happy to fix this.

@smira
Copy link

smira commented May 26, 2022

On Go and Linux namespaces: this should not be a problem anymore with Go, e.g. it's possible to switch to some network namespace and perform actions:

@stale
Copy link

stale bot commented Aug 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Dec 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 2, 2022
@Cowboy-coder
Copy link

We also ran into this issue (with talos). Any chance that changing to use go for switching namespace #7945 (comment) would solve this issue?

@jeremychase jeremychase added this to the stable-2.13.0 milestone Feb 23, 2023
@jeremychase jeremychase added the priority/P2 Nice-to-have for Release label Feb 23, 2023
@stale
Copy link

stale bot commented Jun 30, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jun 30, 2023
@blabu23
Copy link

blabu23 commented Jul 10, 2023

seems to be the same issue as my problem described here: #10413

@kflynn
Copy link
Member

kflynn commented Oct 26, 2023

For the record, yup, we are looking into this...

@NigelVanHattum
Copy link

Any updates on this topic?

@the-wondersmith
Copy link
Contributor

@BobyMCbobs @NigelVanHattum @Cowboy-coder Talos supports system extensions and in fact maintains an official extension for util-linux (the package that normally supplies the nsenter binary). The official extension does not currently include nsenter when it builds, but I've just opened a PR that should fix that.

Once the PR is merged, simply using the extension should resolve this issue.

@the-wondersmith
Copy link
Contributor

the-wondersmith commented May 23, 2024

@BobyMCbobs @NigelVanHattum @Cowboy-coder

Update: the PR for including nsenter in the util-linux extension has been merged 😁.

@djryanj
Copy link

djryanj commented Jul 3, 2024

Hey all, just wondering if any headway has been made here.

I'm running Talos 1.7.4 with the util-linux extension and trying to install linkerd-cni so I can avoid needing to mark all namespaces where linkerd needs to run as privileged, but the linkerd-network-validator container in linkerd fails to come up properly with logs like

2024-07-02T20:03:06.514474Z  INFO linkerd_network_validator: Listening for connections on 0.0.0.0:4140
2024-07-02T20:03:06.514493Z DEBUG linkerd_network_validator: token="<redacted>\n"
2024-07-02T20:03:06.514500Z  INFO linkerd_network_validator: Connecting to 1.1.1.1:20001
2024-07-02T20:03:06.514929Z DEBUG connect: linkerd_network_validator: Connected client.addr=10.244.1.51:34290
2024-07-02T20:03:16.515844Z ERROR linkerd_network_validator: Failed to validate networking configuration. Please ensure iptables rules are rewriting traffic as expected. timeout=10s

With the last line seemingly the biggest hint. I'm at a loss as to how to proceed here, and I haven't found a single thing anywhere explaining how I can get linkerd running on Talos.

@djryanj
Copy link

djryanj commented Jul 4, 2024

So I discovered that my problem was actually that I was using cilium and had set "cni.exclusive=false" in the helm chart install for that. This caused any attempted use of linkerd-cni to fail. As soon as I set that flag to true, linkerd-cni works in Talos as expected.

@the-wondersmith
Copy link
Contributor

So I discovered that my problem was actually that I was using cilium and had set "cni.exclusive=false" in the helm chart install for that. This caused any attempted use of linkerd-cni to fail. As soon as I set that flag to true, linkerd-cni works in Talos as expected.

@BobyMCbobs with this verification, would you mind terribly marking this issue as resolved?

@wmorgan
Copy link
Member

wmorgan commented Jul 8, 2024

@djryanj Sorry you ran into that. We are learning about that (bizarre) flag for Cilium ourselves. We JUST merged a docs PR that mentions this so hopefully future Cilium + Linkerd + CNI users will be able to avoid the issue. linkerd/website#1794

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests