Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to provide customise intra-host communication #7754

Closed
Tracked by #7561
jleeh opened this issue Sep 14, 2023 · 15 comments · Fixed by #8012
Closed
Tracked by #7561

Ability to provide customise intra-host communication #7754

jleeh opened this issue Sep 14, 2023 · 15 comments · Fixed by #8012
Assignees

Comments

@jleeh
Copy link

jleeh commented Sep 14, 2023

Feature Request

Add the ability within Talos configuration to provide interfaces that are used as default for intra-host communication.

Description

By default, Flannel will select the interface that has the default route for the interface for inter-host communication as per their docs:
https://github.com/flannel-io/flannel/blob/master/Documentation/configuration.md#key-command-line-options

Within the bare-metal Talos cluster, I have two interfaces to each host: public and private. Since the public interface has the default route, then flannel uses this interface for all intra-host communication.

I would like the ability to specify in Talos configuration which interfaces is used by default within Flannel to be used for intra-communication so I don't mix public & private traffic within the network.

Eg:

network:
    # The CNI used.
    cni:
        name: flannel # Name of CNI to use.
        flannelConfig: # Only used if CNI name is flannel
          interface: eth0
          interfaceRegex:
          interfaceCanReach:
          publicInterface: eth1

The result of the following config would pass in the following CLI arguments into Flannel:

--iface=eth0 --public-ip="1.1.1.1"

Since flannel requires an IP not an interface for the public-ip argument, Talos would have to grab the assigned IP of the interface and pass into flannel.

I've also seen this issue describing a similar request, but this is different due to their NAT with Wireguard:
#6134

There should be the ability to customise Flannel without needed to replace the CNI.

@smira
Copy link
Member

smira commented Sep 14, 2023

we don't plan to implement a way to customize Flannel so far, but you can disable Flannel in Talos and bring your own manifest with the desired configuration

@jleeh
Copy link
Author

jleeh commented Sep 15, 2023

Would you be open to this change if I worked on it @smira?

@smira
Copy link
Member

smira commented Sep 15, 2023

It's more of a question how far we want to go. There are many ways to customize Flannel, I almost wonder if we could be more flexible by allowing to update some of the bootstrap manifests (override them). I don't think that templating Flannel manifest even more would help to solve all cases.

Do you have a diff of standard Talos manifest vs. what it should be to support the custom interface option?

@jleeh
Copy link
Author

jleeh commented Sep 15, 2023

Overriding the bootstrap manifests would be nice, and more configurable for other cases. I'd like to avoid using our own manifest as then there's potential for drift on Talos upgrades that we don't track, especially when it's for something as simple as extra args. It seems a bit counter intuitive to have so much customisability of the networking in general but then force users to replace the whole CNI manifest if they want to change anything for their own setup.

In terms of diff, I feel covering most cases would be allowing extra arguments here.

In our case, it would just be adding --iface="eth0" as our default route is on eth1 which is the public interface without NAT.

@smira
Copy link
Member

smira commented Sep 15, 2023

Let me think about it to understand what can be done here.

@smira
Copy link
Member

smira commented Sep 26, 2023

I haven't forgotten about the issue, but I still don't have any good ideas

@tuhtah
Copy link

tuhtah commented Oct 10, 2023

Thank you both for discussing this and @smira for considering it!

@jleeh Would the following suggestion work for your use case as well?

Adding --iface-can-reach=<private subnet IP> to spec.containers.args in the kube-flannel DaemonSet worked as a selector for my hybrid VM/bare-metal setup (with different NIC numbers and names) to get flannel to latch on the private interface.

With Talos v1.5 introducing predictable Network Interface Names (incl. Mac addressed based NIC names), the --iface-can-reach argument based on the routing table is likely the most straightforward option to configure this.

--iface-can-reach="":
           detect interface to use (IP or name) for inter-host communication
           based on which will be used for provided IP. This is exactly the interface to
           use of command "ip route get <ip-address>" (example: --iface-can-reach=192.168.1.1
           results the interface can be reached to 192.168.1.1 will be selected)

I was also looking into a way to bind the apid port to solely the private network interface on the public facing worker nodes (related to #1836).

Hmm, could it be an idea to introduce a single bindIfaceCanReach Talos config option in the cluster.network section to handle both use cases, i.e. apid/trustd and flannel socket binding? 🤔

Could there be a reason why one would want to configure the overlay network and apid/trustd daemons to use different inter-host communication paths?

The potential flannel DaemonSet template change:

diff --git a/pkg/flannel/template.go b/pkg/flannel/template.go
index f768939..42adc0a 100644
--- a/pkg/flannel/template.go
+++ b/pkg/flannel/template.go
@@ -147,6 +147,9 @@ spec:
       - args:
         - --ip-masq
         - --kube-subnet-mgr
+        {{- if .bindIfaceCanReach }}
+        - --iface-can-reach={{ .bindIfaceCanReach }}
+        {{- end }}
         command:
         - /opt/bin/flanneld
         env:

@smira
Copy link
Member

smira commented Oct 10, 2023

@tuhtah thanks for the suggestions, the apid/trustd and other services protection on public networks should come with #4421

as for the flannel, I wonder if we just need to provide a single flag/config value for extraArgs which would solve some of this issues

@tuhtah
Copy link

tuhtah commented Oct 10, 2023

@smira Ah, thank you for pointing out: #4421. Didn't see that one, that's even better! 🎉

Right, passing it through extraArgs on cluster.network.cni makes it generic. This way we now have access to all available flannel flags.

I like your approach. 👍 Feels like the cleanest and most correct way to go about it.

@jleeh
Copy link
Author

jleeh commented Oct 10, 2023

Thanks @tuhtah, having extraArgs would solve the problem.

After this change, we would no longer have any label/spec of the public IP assigned to the node. Currently Flannel adds the IP it communicates over as a label and that's how we fetch the public IP of nodes, needed for some of the apps we run.

Flannel has the public IP flag, but that accepts only an IP vs interface in the case of NAT.

It's outside the scope of this ticket, but it'd be good for Talos to have support to add external IPs to nodes like found with public subnets in cloud k8s.

@smira
Copy link
Member

smira commented Oct 10, 2023

It's outside the scope of this ticket, but it'd be good for Talos to have support to add external IPs to nodes like found with public subnets in cloud k8s.

That is Kubernetes cloud controller manager job, not something Talos should do.

@jleeh
Copy link
Author

jleeh commented Oct 10, 2023

That is Kubernetes cloud controller manager job, not something Talos should do.

Sure. Will revert to static IP with manual labels, seems overkill to develop a custom CCM just for that. Open to it though if there's any potential other usecase.

@tuhtah
Copy link

tuhtah commented Oct 10, 2023

@jleeh On my setup, I found the public IP address recorded in the node resource as an "InternalIP":

$ kubectl get node mynode -o yaml
[...]
status:
  addresses:
  - address: XXX.XXX.XXX.XXX
    type: InternalIP
  - address: mynode
    type: Hostname
[...]

To get at it one could use:

$ kubectl get node zinc -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}'

Now we need to spawn a one time job for each node that invokes that kubectl above for each node to add the desired node label. This we could achieve by specifying the yaml in the Talos worker node config under cluster.extraManifests.

A bit around the block, but maybe this solves your usecase?

@tuhtah
Copy link

tuhtah commented Oct 11, 2023

Please disregard my last message.

The kubelet needed to be configured (machine.kubelet.nodeIP.validSubnets) for the InternalIP to actually show my internal IP (and not the public one). 🤦

@jleeh Maybe scripting something around curl -4s ifconfig.me could get you there? Not sure about your setup and what your actual use case for it is? (Came across this: kubernetes/kubernetes/issues/42125, KEP-3705)

Are we all happy with the cluster.network.cni.extraArgs approach? If yes, @smira would you accept a PR for it? I might find some time for it this weekend.

@smira
Copy link
Member

smira commented Oct 11, 2023

Are we all happy with the cluster.network.cni.extraArgs approach? If yes, @smira would you accept a PR for it? I might find some time for it this weekend.

I don't have any exact idea, but this should probably be cluster.network.flannel.extraArgs

@smira smira self-assigned this Dec 1, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants