Inconsistent behavior between public-ipv6 annotations and public-ipv6 cli option #1813

dvgt · 2023-10-11T14:54:08Z

Expected Behavior

The node annotation flannel.alpha.coreos.com/public-ipv6 or flannel.alpha.coreos.com/public-ipv6-overwrite (if set), should have the same impact as setting the --public-ipv6 option of the flanneld binary.

Current Behavior

When setting one or both annotations to a specific address and not specifying the --public-ipv6 cli option, the external address used by flannel seems to always be the first address of the interface which also has the address from the annotations.
When using the --public-ipv6 cli option, the external address used by flannel is always the address given by the cli option.

Possible Solution

Steps to Reproduce (for bugs)

Setup a dual-stack cluster with at least two nodes (we use rancher RKE2) using canal as network plugin.

Example config for a master node (/etc/rancher/rke2/config.yaml):

server: https://10.50.147.218:9345
container-runtime-endpoint: /run/containerd/containerd.sock
write-kubeconfig: /etc/rancher/rke2/rke2.yaml
write-kubeconfig-mode: 0644
debug: False
tls-san:
  - 10.50.147.218
  - 2001:x:x:x:x:x:x:218
data-dir: /var/lib/rancher/rke2
cluster-cidr: 10.44.0.0/16,2001:x:x:y::/108
service-cidr: 10.43.0.0/16,2001:x:x:z::/112
service-node-port-range: 30000-32767
cluster-dns: 10.43.0.10
cluster-domain: cluster.local
node-name: master-node
node-external-ip: 10.50.147.218,2001:x:x:x:x:x:x:218
node-ip: 10.50.147.218,2001:x:x:x:x:x:x:218
node-taint:
  - node-role.kubernetes.io/etcd=true:NoExecute
selinux: False
disable:
  - rke2-ingress-nginx
disable-cloud-controller: False
etcd-expose-metrics: True
etcd-disable-snapshots: False
etcd-snapshot-name: etcd-snapshot
etcd-snapshot-schedule-cron: 0 */1 * * *
etcd-snapshot-retention: 12
etcd-snapshot-dir: /var/lib/rancher/rke2/server/db/snapshots
kube-apiserver-arg:
  - kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
  - disable-admission-plugins=AlwaysPullImages
  - authorization-mode=Node,RBAC enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction,Priority,TaintNodesByCondition,PersistentVolumeClaimResize
kube-controller-manager-arg:
  - node-cidr-mask-size-ipv4=24
  - node-cidr-mask-size-ipv6=116
disable-scheduler: False
kubelet-arg:
  - fail-swap-on=true
  - max-pods=110
disable-kube-proxy: False
cni: canal

Example config for a worker node (/etc/rancher/rke2/config.yaml)

server: https://10.50.147.221:9345
token: <MASTER_TOKEN>
container-runtime-endpoint: /run/containerd/containerd.sock
debug: False
data-dir: /var/lib/rancher/rke2
node-name: worker-node
node-external-ip: 10.50.147.105,2001:x:x:x:x:x:x:105
node-ip: 10.50.147.105,2001:x:x:x:x:x:x:105
selinux: False
kubelet-arg:
  - fail-swap-on=true
  - max-pods=110
disable-kube-proxy: False

kubectl get cm -n kube-system rke2-canal-config -o yaml
apiVersion: v1
data:
  canal_iface: ""
  cni_network_config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "calico",
          "log_level": "info",
          "datastore_type": "kubernetes",
          "nodename": "__KUBERNETES_NODE_NAME__",
          "mtu": __CNI_MTU__,
          "ipam": {
              "type": "host-local",
              "ranges": [
                  [
                      {
                          "subnet": "usePodCidr"
                      }
                  ],
                  [
                      {
                          "subnet": "usePodCidrIPv6"
                      }
                  ]
              ]
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        },
        {
          "type": "bandwidth",
          "capabilities": {"bandwidth": true}
        }
      ]
    }
  masquerade: "true"
  net-conf.json: |
    {
      "Network": "10.44.0.0/16",
      "IPv6Network": "2001:x:x:y::/108",
      "EnableIPv6": true,
      "Backend": {
        "Type": "vxlan"
      }
    }
  typha_service_name: none
  veth_mtu: "1450"
[...]

On the worker node, add an extra IPv6 address to the interface that is used for inter-pod traffic
ip a add 2001:x:x:x:x:x:x:229/64 dev ens192
Delete the canal pod that's running on the worker node
kubectl delete pod rke2-canal-... -n kube-system
Observe that the flannel logs of the worker node mention the wrong external address (2001:x:x:x:x:x:x:229 iso 2001:x:x:x:x:x:x:105)

kubectl logs rke2-canal-..

I1004 16:30:13.143547       1 main.go:543] Found network config - Backend type: vxlan
I1004 16:30:13.143620       1 match.go:206] Determining IP address of default interface
I1004 16:30:13.146384       1 match.go:259] Using interface with name ens192 and address 10.50.147.105
I1004 16:30:13.146440       1 match.go:262] Using interface with name ens192 and v6 address 2001:x:x:x:x:x:x:229
I1004 16:30:13.146463       1 match.go:281] Defaulting external address to interface address (10.50.147.105)
I1004 16:30:13.146486       1 match.go:294] Defaulting external v6 address to interface address (2001:x:x:x:x:x:x:229)

Deploy a pod on each node using the overlay network and start a ping between them

NODE1="master-node"
NODE2="worker-node"
for NODE in ${NODE1} ${NODE2}; do \
kubectl run --restart=Never  --overrides="{ \"spec\": { \"nodeSelector\": { \"kubernetes.io/hostname\": \"${NODE}\" } } }" --image=docker.io/library/busybox:1.28 busybox-${NODE} -- sh -c 'sleep 36000'; \
done
sleep 10
IPV6POD2=$(kubectl get pods busybox-${NODE2} -o custom-columns=IP:".status.podIPs[1].ip" --no-headers); echo ${IPV6POD2}
kubectl exec -it busybox-${NODE1} -- ping -6 ${IPV6POD2}

Observe the inter-node traffic on a node

tcpdump -i ens192 -pnnev ip6
16:37:52.833187 aa:bb:cc:dd:ee:01 > aa:bb:cc:dd:ee:02, ethertype IPv6 (0x86dd), length 188: (hlim 64, next-header UDP (17) payload length: 134) 2001:x:x:x:x:x:x:218.59599 > 2001:x:x:x:x:x:x:105.8472: [bad udp cksum 0x187f -> 0xcd86!] OTV, flags [I] (0x08), overlay 0, instance 1
86:8a:9a:80:a1:c5 > f6:18:3e:eb:4d:18, ethertype IPv6 (0x86dd), length 118: (flowlabel 0xda0ae, hlim 63, next-header ICMPv6 (58) payload length: 64) 2001:x:x:y::301f > 2001:x:x:y::4023: [icmp6 sum ok] ICMP6, echo request, seq 7
16:37:52.833575 aa:bb:cc:dd:ee:02 > aa:bb:cc:dd:ee:01, ethertype IPv6 (0x86dd), length 188: (hlim 64, next-header UDP (17) payload length: 134) 2001:x:x:x:x:x:x:229.54910 > 2001:x:x:x:x:x:x:218.8472: [udp sum ok] OTV, flags [I] (0x08), overlay 0, instance 1
f6:18:3e:eb:4d:18 > 86:8a:9a:80:a1:c5, ethertype IPv6 (0x86dd), length 118: (flowlabel 0xd15b5, hlim 63, next-header ICMPv6 (58) payload length: 64) 2001:x:x:y::4023 > 2001:x:x:y::301f: [icmp6 sum ok] ICMP6, echo reply, seq 7

The echo reply is sent with source address 2001:x:x:x:x:x:x:229 iso 2001:x:x:x:x:x:x:105 on the worker node.

Context

We have a dual-stack cluster setup with rancher RKE2. ON the interface that is used for inter-node kubernetes traffic there are multiple IPv6 addresses. We want to specifically use one of those addresses, not necessarily the first one, so that inter-node kubernetes packets have that source address. In rancher RKE2, flannel (as part of canal) is deployed via a rancher provided helm chart, which we don't want to manually modify. The only option to force the use of a specific public IPv6 address is to set the public-ipv6 annotations, which don't seem to have the expected behavior.

Your Environment

Flannel version: rancher/hardened-flannel:v0.22.0-build20230609
Backend used (e.g. vxlan or udp): vxlan
Etcd version: rancher/hardened-etcd:v3.5.7-k3s1-build20230406
Kubernetes version (if used): v1.25.11+rke2r1
Operating System and version: Oracle Linux Server 8.8

Edit: Added rke2-canal-config config map data.

The text was updated successfully, but these errors were encountered:

rbrtbnfgl · 2023-10-16T13:51:58Z

Hi thanks for reporting this. Checking from the code the public-ip is used only to select the interface and what you are saying is right. We can rework the code to force to use the same defined IP in case of multiple IPs on that interface.
Edit: I think I misunderstood what you wrote. The annotation public-ip is not used by flannel to select the IP but it's configured by the CNI itself. The flannel configuration is specified only by the cli options on the current implementation and not from the annotations.
Edit 2: you can configure the helm chart values with RKE2 without editing the chart itself.

dvgt · 2023-10-17T14:11:30Z

Thanks for your response. Let me clarify a bit more.

The public-ipv6 annotation seems to be always set correctly. While this is inconsistent with the real IP used by flannel at the moment, I don't think that plays a role for us. What would be more helpful is that setting the public-ipv6-overwrite annotation would force flannel to use the IP from the annotation iso the first IP of the interface. In other words, we expected that the public-ipv6-overwrite annotation would do the same as the --public-ipv6 cli option, which would then make the value of the public-ipv6 annotation also match.

I'm guessing this is the behavior that you mention which could updated in the code, right? That would be very helpful, because this is a setting that we can't control with helm chart values in RKE2. Flannel is deployed as container that calls the /opt/bin/flanneld directly, so there is no potential to customize that per node.

kubectl describe ds -n kube-system rke2-canal
[...]
   kube-flannel:
    Image:      rancher/hardened-flannel:v0.22.0-build20230609-custom1
    Port:       <none>
    Host Port:  <none>
    Command:
      /opt/bin/flanneld
    Args:
      --ip-masq
      --kube-subnet-mgr
    Environment:
      POD_NAME:           (v1:metadata.name)
      POD_NAMESPACE:      (v1:metadata.namespace)
      FLANNELD_IFACE:    <set to the key 'canal_iface' of config map 'rke2-canal-config'>  Optional: false
      FLANNELD_IP_MASQ:  <set to the key 'masquerade' of config map 'rke2-canal-config'>   Optional: false
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run/xtables.lock from xtables-lock (rw)
  Volumes:
   flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rke2-canal-config
    Optional:  false

rbrtbnfgl · 2023-10-17T14:17:18Z

Yes you are right I'm inspecting the code to check if there are a possible solution for your issue.

rbrtbnfgl · 2023-10-18T15:07:36Z

Ok you are right. How you wrote on the issue there is a public-ip-overwrite but no public-ipv6-overwrite we didn't add it when the ipv6 support was introduced.
Edit: Reading from the docs the purpose of the overwrite is for the destination for the VXLAN tunnel not the source. You can force the public-ip using the env variable FLANNELD_PUBLIC_IPV6 I can try to check if it's feasible to use different environment settings for canal in rke2.

dvgt · 2023-10-19T16:52:51Z

Initially, I found a reference to public-ipv6-overwrite in the below code reference, which is why we started using it. This is an easy way to modify the behavior per node.

flannel/pkg/subnet/kube/annotations.go

Line 68 in 44f5584

BackendPublicIPv6Overwrite: prefix + "public-ipv6-overwrite",

Regarding the FLANNELD_PUBLIC_IPV6 environment variable:

The only way I think this environment variable can be set from within the deamonset is to expose pod information via a fieldRef (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#envvarsource-v1-core), but in a dual-stack cluster, there is no field that just contains the IPv6 address: status.podIP contains the IPv4 address and status.podIPs contains a list of IPs. So I don't see a possibility there to make that work.
We would rather not change anything to the RKE2 deployed components (helm charts or components post-deploy) except for helm chart values. So if you would know or be able implement a method that accomplishes this, that would be great!

Regarding the meaning of the overwrite annotation:

What I understand from your explanation is that the annotation on nodeX is used as destination address by nodeY for inter-node packets send by nodeY to nodeX, right?
I tested this for IPv4, but I don't really observe this behavior:

Set public-ip-overwrite on k8s-6-5 (10.50.147.106) to 10.50.147.229. This IP is also present on the same interface as 10.50.147.106

# kubectl annotate node k8s-6-5 --overwrite "flannel.alpha.coreos.com/public-ip-overwrite=10.50.147.229"

[k8s-6-5 ~]# ip a show ens192
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether aa:bb:cc:dd:ee:01 brd ff:ff:ff:ff:ff:ff
    altname enp11s0
    inet 10.50.147.106/24 brd 10.50.147.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
    inet 10.50.147.229/32 scope global ens192
       valid_lft forever preferred_lft forever
    inet6 2001:x:x:x:x:x:x:229/128 scope global nodad deprecated 
       valid_lft forever preferred_lft 0sec
    inet6 2001:x:x:x:x:x:x:106/64 scope global noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::x:x:x:x/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

[k8s-6-5 ~]# ip route
default via 10.50.147.1 dev ens192
10.0.0.0/8 via 10.50.147.1 dev ens192 proto static metric 100
10.50.147.0/24 dev ens192 scope link

Delete the canal pod running on node k8s-6-5.
Logs of the new canal pod running on node k8s-6-5 show the IPv4 public-ip was overwritten.

I1019 16:06:51.799238       1 main.go:232] Created subnet manager: Kubernetes Subnet Manager - k8s-6-5
I1019 16:06:51.799246       1 main.go:235] Installing signal handlers
I1019 16:06:51.799445       1 main.go:543] Found network config - Backend type: vxlan
I1019 16:06:51.799515       1 match.go:206] Determining IP address of default interface
I1019 16:06:51.802271       1 match.go:259] Using interface with name ens192 and address 10.50.147.106
I1019 16:06:51.802338       1 match.go:262] Using interface with name ens192 and v6 address 2001:x:x:x:x:x:x:229
I1019 16:06:51.802361       1 match.go:281] Defaulting external address to interface address (10.50.147.106)
I1019 16:06:51.802384       1 match.go:294] Defaulting external v6 address to interface address (2001:x:x:x:x:x:x:229)
I1019 16:06:51.802478       1 vxlan.go:141] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
I1019 16:06:51.803638       1 kube.go:386] Overriding public ip with '10.50.147.229' from node annotation 'flannel.alpha.coreos.com/public-ip-overwrite'  <===
W1019 16:06:51.900358       1 main.go:596] no subnet found for key: FLANNEL_SUBNET in file: /run/flannel/subnet.env
I1019 16:06:51.900393       1 main.go:482] Current network or subnet (10.44.0.0/16, 10.44.4.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I1019 16:06:52.003024       1 main.go:357] Setting up masking rules
W1019 16:06:52.007518       1 main.go:631] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env
I1019 16:06:52.007548       1 main.go:508] Current ipv6 network or subnet (2001:x:x:x::/108, 2001:x:x:x::4000/116) is not equal to previous one (::/0, ::/0), trying to recycle old ip6tables rules

Ping between pod on k8s-6-5 and other node and dump traffic on k8s-6-5.

[k8s-6-5 ~]# tcpdump -i ens192 -pnnev 'host 10.50.147.105'
18:13:25.938561 aa:bb:cc:dd:ee:01 > aa:bb:cc:dd:ee:02, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 1648, offset 0, flags [none], proto UDP (17), length 134) 
    10.50.147.105.34695 > 10.50.147.106.8472: OTV, flags [I] (0x08), overlay 0, instance 1
e2:47:8f:a7:72:20 > 86:ea:5b:ca:6d:dc, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 53072, offset 0, flags [DF], proto ICMP (1), length 84)
    10.44.3.31 > 10.44.4.35: ICMP echo request, id 3328, seq 79, length 64

18:13:25.938755 aa:bb:cc:dd:ee:02 > aa:bb:cc:dd:ee:01, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 11106, offset 0, flags [none], proto UDP (17), length 134)
    10.50.147.106.57518 > 10.50.147.105.8472: OTV, flags [I] (0x08), overlay 0, instance 1
86:ea:5b:ca:6d:dc > e2:47:8f:a7:72:20, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 44374, offset 0, flags [none], proto ICMP (1), length 84)
    10.44.4.35 > 10.44.3.31: ICMP echo reply, id 3328, seq 79, length 64

The node with IP 10.50.147.105 still uses 10.50.147.106 as destination.
Also, the IPv4 used by k8s-6-5 is also still the first IP on the interface, despite the Overriding public ip log in kube-flannel.

Am I missing something here?

rbrtbnfgl · 2023-10-20T15:01:49Z

I found the original issue #712
I have to check why flannel is behaving differently now, it could be related to different code modification done during the years.

dvgt · 2023-11-06T13:47:32Z

Any update on this?

rbrtbnfgl · 2023-11-07T10:26:10Z

Hi I tested it and you are right. The override seems to be noticed when flannel starts but the actual value is not updated. I am trying to understand if I could find a fix for it.

dvgt · 2024-02-02T10:04:24Z

Just to keep this active, we would still like to have a fix for this. We're happy to help testing :).

dvergotes · 2024-05-10T13:29:22Z

Just tested the fix and it works. Thanks for the effort.
Note: I had to revert 2092b83
See new issue: #1968

dvgt · 2024-05-14T07:00:30Z

Fixed, thanks!

rbrtbnfgl mentioned this issue Jan 2, 2024

Hello, does Flannel (--iface) have a way to configure the network interface through NodeInternalIP like Calico, without specifying the --iface parameter in a multi-network interface environment? #1849

Open

rbrtbnfgl mentioned this issue Apr 17, 2024

Added configuration for pulic-ip through node annotation #1948

Merged

3 tasks

dvergotes mentioned this issue May 10, 2024

Crash at startup of flannel when setting IPv6 masq rules #1968

Closed

dvgt closed this as completed May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behavior between public-ipv6 annotations and public-ipv6 cli option #1813

Inconsistent behavior between public-ipv6 annotations and public-ipv6 cli option #1813

dvgt commented Oct 11, 2023 •

edited

Loading

rbrtbnfgl commented Oct 16, 2023 •

edited

Loading

dvgt commented Oct 17, 2023

rbrtbnfgl commented Oct 17, 2023

rbrtbnfgl commented Oct 18, 2023 •

edited

Loading

dvgt commented Oct 19, 2023

rbrtbnfgl commented Oct 20, 2023

dvgt commented Nov 6, 2023

rbrtbnfgl commented Nov 7, 2023

dvgt commented Feb 2, 2024

dvergotes commented May 10, 2024

dvgt commented May 14, 2024

Inconsistent behavior between public-ipv6 annotations and public-ipv6 cli option #1813

Inconsistent behavior between public-ipv6 annotations and public-ipv6 cli option #1813

Comments

dvgt commented Oct 11, 2023 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

rbrtbnfgl commented Oct 16, 2023 • edited Loading

dvgt commented Oct 17, 2023

rbrtbnfgl commented Oct 17, 2023

rbrtbnfgl commented Oct 18, 2023 • edited Loading

dvgt commented Oct 19, 2023

rbrtbnfgl commented Oct 20, 2023

dvgt commented Nov 6, 2023

rbrtbnfgl commented Nov 7, 2023

dvgt commented Feb 2, 2024

dvergotes commented May 10, 2024

dvgt commented May 14, 2024

dvgt commented Oct 11, 2023 •

edited

Loading

rbrtbnfgl commented Oct 16, 2023 •

edited

Loading

rbrtbnfgl commented Oct 18, 2023 •

edited

Loading