-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Bug Report
Running a dual stack cluster with Talos is still problematic and documentation is still short.
Description
The last bug report I created turned to a nice opportunity for many to contribute about how to achieve dual stacking with Talos. Thanks to every one who contributed to that one.
Still, there are more limitations and I thought about creating this new issue, hoping it will also help improving documentation and push development towards a better solution for dual stacking.
A-Talos's VIP is single stack
In machineconfig, you can create a VIP that will be shared by control planes to avoid the need of an external load balancer. Unfortunately, that option is single stack. If you put an IPv4 and an IPv6 address, the config fails parsing.
machine:
network:
interfaces:
- interface: eth0
vip:
ip: 1.2.3.4,2001:0DB8:1234:56::30
Fails because the value does not parse to a valid IP address.
machine:
network:
interfaces:
- interface: eth0
vip:
ip: 1.2.3.4
ip: 2001:0DB8:1234:56::30
Fails because there are two IP lines.
machine:
network:
interfaces:
- interface: eth0
vip:
ip: 2001:0DB8:1234:56::30
or
machine:
network:
interfaces:
- interface: eth0
vip:
ip: 1.2.3.4
Will work but you will be single stack.
Another point here is that Talos docs recommends not to use that VIP as the endpoint for talosctl. Here, I created a DNS name that resolves to each of control planes' IPs. Unfortunately, should the first one fail, the command will exit and will not try the others. Because that mechanism will fail to pass even from one IPv4 to another, of course it will also fail to pass from an IPv4 to a v6. So here again, Talos is single stack only.
B-Dual stack IPv6 / IPv4 is not the same as dual stack IPv4 / IPv6
In this post, @bernardgut explained that the ordering of the IP ranges in the config was important. If the order does not match from one setting to the other, the config will fail. It is also important to understand, explain and document that If IPv4 is listed first, the dual stack cluster will be IPv4 / IPv6. If IPv6 is listed first, the cluster will turn to IPv6 / IPv4. Here, I am deploying Longhorn as a CSI. That one does not support IPv6 and its frontend service must be IPv4. Still, its helm chart does not specify it and the service ends up single stack using the first IP family used by the cluster. For that reason, you will not reach Longhorn's UI if you are single stack v6 or dual stack v6 / v4. Again, nowhere I saw any documentation about this kind of impact.
C-And some other minor points...
As for the service subnet, I am the one who suspected that /112 was the maximum but indeed, /108 is fine and I'm using that myself. Thanks to @nazarewk for that one and forget about my /112.
Because I ended up forced to run IPv4 / IPv6 to accommodate Longhorn, I chose to re-enable kubeprism. The cluster being IPv4 first, I did not detect any problem with it. Still, I am unable to prove that it fully works as expected or if it just ends up surviving like Longhorn's frontend, saved by the fact that the cluster is IPv4 first hand.
So indeed, there are ways to run a dual stack cluster with Talos but there are still many things that are obscure, uncertain or even confirmed as non-functional.
Overall, I am using these files :
machine:
install:
extraKernelArgs:
- net.ifnames=0
sysctls:
vm.nr_hugepages: "2048"
time:
disabled: false # Indicates if the time service is disabled for the machine.
servers:
- time.localdomain
bootTimeout: 2m0s # Specifies the timeout when the node time is considered to be in sync unlocking the boot sequence.
kubelet:
nodeIP:
validSubnets:
- 172.24.136.128/26
- 2001:0DB8:1234:c0::/64
cluster:
apiServer:
extraArgs:
bind-address: "::"
controllerManager:
extraArgs:
bind-address: "::1"
node-cidr-mask-size-ipv6: "80"
scheduler:
extraArgs:
bind-address: "::1"
network:
podSubnets:
- 10.244.0.0/16
- 2001:0DB8:1234:c1::/64
serviceSubnets:
- 10.96.0.0/12
- 2001:0DB8:1234:c3::10:0/108
etcd:
advertisedSubnets:
- 172.24.136.128/26
- 2001:0DB8:1234:c0::/64
proxy:
extraArgs:
ipvs-strict-arp: true
For control planes, adding this:
machine:
network:
interfaces:
- interface: eth0
vip:
ip: 2001:0DB8:1234:c0::30
certSANs:
- k64ctl.localdomain
- kube64-ctl.localdomain
- kube64-c1.localdomain
- kube64-c2.localdomain
- kube64-c3.localdomain
- 172.24.136.161
- 172.24.136.162
- 172.24.136.163
- 2001:0DB8:1234:c0::30
- 2001:0DB8:1234:c0::31
- 2001:0DB8:1234:c0::32
- 2001:0DB8:1234:c0::33
- ::1
each control plan has its unique file :
machine:
network:
hostname: kube64-c1.localdomain
interfaces:
- interface: eth0
addresses:
- 172.24.136.161/26
- 2001:0DB8:1234:c0::31/64
routes:
- network: 0.0.0.0/0
gateway: 172.24.136.129
nameservers:
- 172.24.136.132
- 172.24.136.135
just like each worker :
machine:
disks:
- device: /dev/sdb
partitions:
- mountpoint: /var/mnt/sdb
kubelet:
extraMounts:
- destination: /var/mnt/sdb
type: bind
source: /var/mnt/sdb
options:
- bind
- rshared
- rw
network:
hostname: kube64-w1.localdomain
interfaces:
- interface: eth0
addresses:
- 172.24.136.164/26
- 2001:0DB8:1234:c0::41/64
routes:
- network: 0.0.0.0/0
gateway: 172.24.136.129
nameservers:
- 172.24.136.132
- 172.24.136.135
Logs
Environment
- Talos version: 1.9.4
- Kubernetes version: 1.32.2
- Platform: VMs built from ISO running in Proxmox 8.3