Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior between public-ipv6 annotations and public-ipv6 cli option #1813

Closed
dvgt opened this issue Oct 11, 2023 · 11 comments
Closed

Comments

@dvgt
Copy link

dvgt commented Oct 11, 2023

Expected Behavior

The node annotation flannel.alpha.coreos.com/public-ipv6 or flannel.alpha.coreos.com/public-ipv6-overwrite (if set), should have the same impact as setting the --public-ipv6 option of the flanneld binary.

Current Behavior

  • When setting one or both annotations to a specific address and not specifying the --public-ipv6 cli option, the external address used by flannel seems to always be the first address of the interface which also has the address from the annotations.
  • When using the --public-ipv6 cli option, the external address used by flannel is always the address given by the cli option.

Possible Solution

Steps to Reproduce (for bugs)

  1. Setup a dual-stack cluster with at least two nodes (we use rancher RKE2) using canal as network plugin.

Example config for a master node (/etc/rancher/rke2/config.yaml):

server: https://10.50.147.218:9345
container-runtime-endpoint: /run/containerd/containerd.sock
write-kubeconfig: /etc/rancher/rke2/rke2.yaml
write-kubeconfig-mode: 0644
debug: False
tls-san:
  - 10.50.147.218
  - 2001:x:x:x:x:x:x:218
data-dir: /var/lib/rancher/rke2
cluster-cidr: 10.44.0.0/16,2001:x:x:y::/108
service-cidr: 10.43.0.0/16,2001:x:x:z::/112
service-node-port-range: 30000-32767
cluster-dns: 10.43.0.10
cluster-domain: cluster.local
node-name: master-node
node-external-ip: 10.50.147.218,2001:x:x:x:x:x:x:218
node-ip: 10.50.147.218,2001:x:x:x:x:x:x:218
node-taint:
  - node-role.kubernetes.io/etcd=true:NoExecute
selinux: False
disable:
  - rke2-ingress-nginx
disable-cloud-controller: False
etcd-expose-metrics: True
etcd-disable-snapshots: False
etcd-snapshot-name: etcd-snapshot
etcd-snapshot-schedule-cron: 0 */1 * * *
etcd-snapshot-retention: 12
etcd-snapshot-dir: /var/lib/rancher/rke2/server/db/snapshots
kube-apiserver-arg:
  - kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
  - disable-admission-plugins=AlwaysPullImages
  - authorization-mode=Node,RBAC enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction,Priority,TaintNodesByCondition,PersistentVolumeClaimResize
kube-controller-manager-arg:
  - node-cidr-mask-size-ipv4=24
  - node-cidr-mask-size-ipv6=116
disable-scheduler: False
kubelet-arg:
  - fail-swap-on=true
  - max-pods=110
disable-kube-proxy: False
cni: canal

Example config for a worker node (/etc/rancher/rke2/config.yaml)

server: https://10.50.147.221:9345
token: <MASTER_TOKEN>
container-runtime-endpoint: /run/containerd/containerd.sock
debug: False
data-dir: /var/lib/rancher/rke2
node-name: worker-node
node-external-ip: 10.50.147.105,2001:x:x:x:x:x:x:105
node-ip: 10.50.147.105,2001:x:x:x:x:x:x:105
selinux: False
kubelet-arg:
  - fail-swap-on=true
  - max-pods=110
disable-kube-proxy: False
kubectl get cm -n kube-system rke2-canal-config -o yaml
apiVersion: v1
data:
  canal_iface: ""
  cni_network_config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "calico",
          "log_level": "info",
          "datastore_type": "kubernetes",
          "nodename": "__KUBERNETES_NODE_NAME__",
          "mtu": __CNI_MTU__,
          "ipam": {
              "type": "host-local",
              "ranges": [
                  [
                      {
                          "subnet": "usePodCidr"
                      }
                  ],
                  [
                      {
                          "subnet": "usePodCidrIPv6"
                      }
                  ]
              ]
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        },
        {
          "type": "bandwidth",
          "capabilities": {"bandwidth": true}
        }
      ]
    }
  masquerade: "true"
  net-conf.json: |
    {
      "Network": "10.44.0.0/16",
      "IPv6Network": "2001:x:x:y::/108",
      "EnableIPv6": true,
      "Backend": {
        "Type": "vxlan"
      }
    }
  typha_service_name: none
  veth_mtu: "1450"
[...]
  1. On the worker node, add an extra IPv6 address to the interface that is used for inter-pod traffic
    ip a add 2001:x:x:x:x:x:x:229/64 dev ens192

  2. Delete the canal pod that's running on the worker node
    kubectl delete pod rke2-canal-... -n kube-system

  3. Observe that the flannel logs of the worker node mention the wrong external address (2001:x:x:x:x:x:x:229 iso 2001:x:x:x:x:x:x:105)

kubectl logs rke2-canal-..

I1004 16:30:13.143547       1 main.go:543] Found network config - Backend type: vxlan
I1004 16:30:13.143620       1 match.go:206] Determining IP address of default interface
I1004 16:30:13.146384       1 match.go:259] Using interface with name ens192 and address 10.50.147.105
I1004 16:30:13.146440       1 match.go:262] Using interface with name ens192 and v6 address 2001:x:x:x:x:x:x:229
I1004 16:30:13.146463       1 match.go:281] Defaulting external address to interface address (10.50.147.105)
I1004 16:30:13.146486       1 match.go:294] Defaulting external v6 address to interface address (2001:x:x:x:x:x:x:229)
  1. Deploy a pod on each node using the overlay network and start a ping between them
NODE1="master-node"
NODE2="worker-node"
for NODE in ${NODE1} ${NODE2}; do \
kubectl run --restart=Never  --overrides="{ \"spec\": { \"nodeSelector\": { \"kubernetes.io/hostname\": \"${NODE}\" } } }" --image=docker.io/library/busybox:1.28 busybox-${NODE} -- sh -c 'sleep 36000'; \
done
sleep 10
IPV6POD2=$(kubectl get pods busybox-${NODE2} -o custom-columns=IP:".status.podIPs[1].ip" --no-headers); echo ${IPV6POD2}
kubectl exec -it busybox-${NODE1} -- ping -6 ${IPV6POD2}
  1. Observe the inter-node traffic on a node
tcpdump -i ens192 -pnnev ip6
16:37:52.833187 aa:bb:cc:dd:ee:01 > aa:bb:cc:dd:ee:02, ethertype IPv6 (0x86dd), length 188: (hlim 64, next-header UDP (17) payload length: 134) 2001:x:x:x:x:x:x:218.59599 > 2001:x:x:x:x:x:x:105.8472: [bad udp cksum 0x187f -> 0xcd86!] OTV, flags [I] (0x08), overlay 0, instance 1
86:8a:9a:80:a1:c5 > f6:18:3e:eb:4d:18, ethertype IPv6 (0x86dd), length 118: (flowlabel 0xda0ae, hlim 63, next-header ICMPv6 (58) payload length: 64) 2001:x:x:y::301f > 2001:x:x:y::4023: [icmp6 sum ok] ICMP6, echo request, seq 7
16:37:52.833575 aa:bb:cc:dd:ee:02 > aa:bb:cc:dd:ee:01, ethertype IPv6 (0x86dd), length 188: (hlim 64, next-header UDP (17) payload length: 134) 2001:x:x:x:x:x:x:229.54910 > 2001:x:x:x:x:x:x:218.8472: [udp sum ok] OTV, flags [I] (0x08), overlay 0, instance 1
f6:18:3e:eb:4d:18 > 86:8a:9a:80:a1:c5, ethertype IPv6 (0x86dd), length 118: (flowlabel 0xd15b5, hlim 63, next-header ICMPv6 (58) payload length: 64) 2001:x:x:y::4023 > 2001:x:x:y::301f: [icmp6 sum ok] ICMP6, echo reply, seq 7

The echo reply is sent with source address 2001:x:x:x:x:x:x:229 iso 2001:x:x:x:x:x:x:105 on the worker node.

Context

We have a dual-stack cluster setup with rancher RKE2. ON the interface that is used for inter-node kubernetes traffic there are multiple IPv6 addresses. We want to specifically use one of those addresses, not necessarily the first one, so that inter-node kubernetes packets have that source address. In rancher RKE2, flannel (as part of canal) is deployed via a rancher provided helm chart, which we don't want to manually modify. The only option to force the use of a specific public IPv6 address is to set the public-ipv6 annotations, which don't seem to have the expected behavior.

Your Environment

  • Flannel version: rancher/hardened-flannel:v0.22.0-build20230609
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: rancher/hardened-etcd:v3.5.7-k3s1-build20230406
  • Kubernetes version (if used): v1.25.11+rke2r1
  • Operating System and version: Oracle Linux Server 8.8

Edit: Added rke2-canal-config config map data.

@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Oct 16, 2023

Hi thanks for reporting this. Checking from the code the public-ip is used only to select the interface and what you are saying is right. We can rework the code to force to use the same defined IP in case of multiple IPs on that interface.
Edit: I think I misunderstood what you wrote. The annotation public-ip is not used by flannel to select the IP but it's configured by the CNI itself. The flannel configuration is specified only by the cli options on the current implementation and not from the annotations.
Edit 2: you can configure the helm chart values with RKE2 without editing the chart itself.

@dvgt
Copy link
Author

dvgt commented Oct 17, 2023

Thanks for your response. Let me clarify a bit more.

The public-ipv6 annotation seems to be always set correctly. While this is inconsistent with the real IP used by flannel at the moment, I don't think that plays a role for us. What would be more helpful is that setting the public-ipv6-overwrite annotation would force flannel to use the IP from the annotation iso the first IP of the interface. In other words, we expected that the public-ipv6-overwrite annotation would do the same as the --public-ipv6 cli option, which would then make the value of the public-ipv6 annotation also match.

I'm guessing this is the behavior that you mention which could updated in the code, right? That would be very helpful, because this is a setting that we can't control with helm chart values in RKE2. Flannel is deployed as container that calls the /opt/bin/flanneld directly, so there is no potential to customize that per node.

kubectl describe ds -n kube-system rke2-canal
[...]
   kube-flannel:
    Image:      rancher/hardened-flannel:v0.22.0-build20230609-custom1
    Port:       <none>
    Host Port:  <none>
    Command:
      /opt/bin/flanneld
    Args:
      --ip-masq
      --kube-subnet-mgr
    Environment:
      POD_NAME:           (v1:metadata.name)
      POD_NAMESPACE:      (v1:metadata.namespace)
      FLANNELD_IFACE:    <set to the key 'canal_iface' of config map 'rke2-canal-config'>  Optional: false
      FLANNELD_IP_MASQ:  <set to the key 'masquerade' of config map 'rke2-canal-config'>   Optional: false
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run/xtables.lock from xtables-lock (rw)
  Volumes:
   flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rke2-canal-config
    Optional:  false

@rbrtbnfgl
Copy link
Contributor

Yes you are right I'm inspecting the code to check if there are a possible solution for your issue.

@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Oct 18, 2023

Ok you are right. How you wrote on the issue there is a public-ip-overwrite but no public-ipv6-overwrite we didn't add it when the ipv6 support was introduced.
Edit: Reading from the docs the purpose of the overwrite is for the destination for the VXLAN tunnel not the source. You can force the public-ip using the env variable FLANNELD_PUBLIC_IPV6 I can try to check if it's feasible to use different environment settings for canal in rke2.

@dvgt
Copy link
Author

dvgt commented Oct 19, 2023

Initially, I found a reference to public-ipv6-overwrite in the below code reference, which is why we started using it. This is an easy way to modify the behavior per node.

BackendPublicIPv6Overwrite: prefix + "public-ipv6-overwrite",

Regarding the FLANNELD_PUBLIC_IPV6 environment variable:

  • The only way I think this environment variable can be set from within the deamonset is to expose pod information via a fieldRef (https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#envvarsource-v1-core), but in a dual-stack cluster, there is no field that just contains the IPv6 address: status.podIP contains the IPv4 address and status.podIPs contains a list of IPs. So I don't see a possibility there to make that work.
  • We would rather not change anything to the RKE2 deployed components (helm charts or components post-deploy) except for helm chart values. So if you would know or be able implement a method that accomplishes this, that would be great!

Regarding the meaning of the overwrite annotation:

What I understand from your explanation is that the annotation on nodeX is used as destination address by nodeY for inter-node packets send by nodeY to nodeX, right?
I tested this for IPv4, but I don't really observe this behavior:

  • Set public-ip-overwrite on k8s-6-5 (10.50.147.106) to 10.50.147.229. This IP is also present on the same interface as 10.50.147.106
# kubectl annotate node k8s-6-5 --overwrite "flannel.alpha.coreos.com/public-ip-overwrite=10.50.147.229"

[k8s-6-5 ~]# ip a show ens192
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether aa:bb:cc:dd:ee:01 brd ff:ff:ff:ff:ff:ff
    altname enp11s0
    inet 10.50.147.106/24 brd 10.50.147.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
    inet 10.50.147.229/32 scope global ens192
       valid_lft forever preferred_lft forever
    inet6 2001:x:x:x:x:x:x:229/128 scope global nodad deprecated 
       valid_lft forever preferred_lft 0sec
    inet6 2001:x:x:x:x:x:x:106/64 scope global noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::x:x:x:x/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

[k8s-6-5 ~]# ip route
default via 10.50.147.1 dev ens192
10.0.0.0/8 via 10.50.147.1 dev ens192 proto static metric 100
10.50.147.0/24 dev ens192 scope link
  • Delete the canal pod running on node k8s-6-5.
  • Logs of the new canal pod running on node k8s-6-5 show the IPv4 public-ip was overwritten.
I1019 16:06:51.799238       1 main.go:232] Created subnet manager: Kubernetes Subnet Manager - k8s-6-5
I1019 16:06:51.799246       1 main.go:235] Installing signal handlers
I1019 16:06:51.799445       1 main.go:543] Found network config - Backend type: vxlan
I1019 16:06:51.799515       1 match.go:206] Determining IP address of default interface
I1019 16:06:51.802271       1 match.go:259] Using interface with name ens192 and address 10.50.147.106
I1019 16:06:51.802338       1 match.go:262] Using interface with name ens192 and v6 address 2001:x:x:x:x:x:x:229
I1019 16:06:51.802361       1 match.go:281] Defaulting external address to interface address (10.50.147.106)
I1019 16:06:51.802384       1 match.go:294] Defaulting external v6 address to interface address (2001:x:x:x:x:x:x:229)
I1019 16:06:51.802478       1 vxlan.go:141] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
I1019 16:06:51.803638       1 kube.go:386] Overriding public ip with '10.50.147.229' from node annotation 'flannel.alpha.coreos.com/public-ip-overwrite'  <===
W1019 16:06:51.900358       1 main.go:596] no subnet found for key: FLANNEL_SUBNET in file: /run/flannel/subnet.env
I1019 16:06:51.900393       1 main.go:482] Current network or subnet (10.44.0.0/16, 10.44.4.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I1019 16:06:52.003024       1 main.go:357] Setting up masking rules
W1019 16:06:52.007518       1 main.go:631] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env
I1019 16:06:52.007548       1 main.go:508] Current ipv6 network or subnet (2001:x:x:x::/108, 2001:x:x:x::4000/116) is not equal to previous one (::/0, ::/0), trying to recycle old ip6tables rules
  • Ping between pod on k8s-6-5 and other node and dump traffic on k8s-6-5.
[k8s-6-5 ~]# tcpdump -i ens192 -pnnev 'host 10.50.147.105'
18:13:25.938561 aa:bb:cc:dd:ee:01 > aa:bb:cc:dd:ee:02, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 1648, offset 0, flags [none], proto UDP (17), length 134) 
    10.50.147.105.34695 > 10.50.147.106.8472: OTV, flags [I] (0x08), overlay 0, instance 1
e2:47:8f:a7:72:20 > 86:ea:5b:ca:6d:dc, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 53072, offset 0, flags [DF], proto ICMP (1), length 84)
    10.44.3.31 > 10.44.4.35: ICMP echo request, id 3328, seq 79, length 64

18:13:25.938755 aa:bb:cc:dd:ee:02 > aa:bb:cc:dd:ee:01, ethertype IPv4 (0x0800), length 148: (tos 0x0, ttl 64, id 11106, offset 0, flags [none], proto UDP (17), length 134)
    10.50.147.106.57518 > 10.50.147.105.8472: OTV, flags [I] (0x08), overlay 0, instance 1
86:ea:5b:ca:6d:dc > e2:47:8f:a7:72:20, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 44374, offset 0, flags [none], proto ICMP (1), length 84)
    10.44.4.35 > 10.44.3.31: ICMP echo reply, id 3328, seq 79, length 64

The node with IP 10.50.147.105 still uses 10.50.147.106 as destination.
Also, the IPv4 used by k8s-6-5 is also still the first IP on the interface, despite the Overriding public ip log in kube-flannel.

Am I missing something here?

@rbrtbnfgl
Copy link
Contributor

I found the original issue #712
I have to check why flannel is behaving differently now, it could be related to different code modification done during the years.

@dvgt
Copy link
Author

dvgt commented Nov 6, 2023

Any update on this?

@rbrtbnfgl
Copy link
Contributor

Hi I tested it and you are right. The override seems to be noticed when flannel starts but the actual value is not updated. I am trying to understand if I could find a fix for it.

@dvgt
Copy link
Author

dvgt commented Feb 2, 2024

Just to keep this active, we would still like to have a fix for this. We're happy to help testing :).

@dvergotes
Copy link

Just tested the fix and it works. Thanks for the effort.
Note: I had to revert 2092b83
See new issue: #1968

@dvgt
Copy link
Author

dvgt commented May 14, 2024

Fixed, thanks!

@dvgt dvgt closed this as completed May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants