Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipv6 + Calico + out of tree cloud-provider is broken #3401

Closed
CecileRobertMichon opened this issue Feb 22, 2023 · 38 comments
Closed

ipv6 + Calico + out of tree cloud-provider is broken #3401

CecileRobertMichon opened this issue Feb 22, 2023 · 38 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@CecileRobertMichon
Copy link
Contributor

While moving all CAPZ templates to out-of-tree CCM, I was not able to get a reliably passing test for ipv6. This test is very flaky (it fails most of the time). Looking into it I found that the error is that kubelet can't patch the status of one of the kube-system pods (sometimes kube-apiserver, sometimes kube-scheduler, sometimes kube-controller-manager). One example of the error can be seen here. The error is

	Pod \"kube-controller-manager-capz-e2e-4laer3-ipv6-control-plane-6v7dn\" is invalid: [status.podIPs: Invalid value: []core.PodIP{core.PodIP{IP:\"2001:1234:5678:9abc::5\"}, core.PodIP{IP:\"2001:1234:5678:9abc::5\"}}: may specify no more than one IP for each IP family, status.podIPs[1]: Duplicate value: core.PodIP{IP:\"2001:1234:5678:9abc::5\"}]"

The same error does not repro with the in-tree cloud-provider (see passing ipv6 test with CAPZ and in-tree cloud-provider here).

The cloud-provider-azure test for ipv6 is also failing (although the failure is different: https://testgrid.k8s.io/provider-azure-cloud-provider-azure#cloud-provider-azure-master-ipv6-capz)

@feiskyer
Copy link
Member

/assign @lzhecheng

@feiskyer
Copy link
Member

The error is invalid Pod IPs, which doesn't seem to be related to ccm/kcm. But let's check the logs why there was duplicated IPs in pod status.

@CecileRobertMichon
Copy link
Contributor Author

I agree it doesn't seem related to ccm, I think it might have to do with Calico somehow, but the weird thing is, I can't reproduce this error with the exact same template (Calico CNI too) when clou-provider is "azure" but as soon as I switch cloud-provider to "external" and install CCM I can repro easily.

@lzhecheng
Copy link
Contributor

I created a cluster with 1) CAPZ of your branch (oot-templates); 2) templates/cluster-template-ipv6.yaml; 3) ci-entrypoint.sh. The cluster looks good. Could you please share how to reproduce locally?

zhechengli@devbox:~$ kgpa
NAMESPACE          NAME                                                         READY   STATUS    RESTARTS   AGE     IP                       NODE
                       NOMINATED NODE   READINESS GATES
calico-apiserver   calico-apiserver-78c5b59ff9-8v9r4                            1/1     Running   0          3m51s   2001:1234:5678:9a40::6   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
calico-apiserver   calico-apiserver-78c5b59ff9-v7hg9                            1/1     Running   0          3m51s   2001:1234:5678:9a42::3   zhecheng-223-5-md-0-5cqbp            <none>           <none>
calico-system      calico-kube-controllers-f7574cc46-jn87s                      1/1     Running   0          6m5s    2001:1234:5678:9a40::5   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
calico-system      calico-node-cts6q                                            1/1     Running   0          6m5s    2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
calico-system      calico-node-pv52g                                            1/1     Running   0          4m51s   2001:1234:5678:9abd::4   zhecheng-223-5-md-0-5cqbp            <none>           <none>
calico-system      calico-node-qxvsn                                            1/1     Running   0          4m55s   2001:1234:5678:9abd::5   zhecheng-223-5-md-0-rb7nl            <none>           <none>
calico-system      calico-typha-7859969dd9-rgxsx                                1/1     Running   0          6m6s    2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
calico-system      calico-typha-7859969dd9-z6ck2                                1/1     Running   0          4m47s   2001:1234:5678:9abd::5   zhecheng-223-5-md-0-rb7nl            <none>           <none>
calico-system      csi-node-driver-c7gtq                                        2/2     Running   0          4m10s   2001:1234:5678:9a42::2   zhecheng-223-5-md-0-5cqbp            <none>           <none>
calico-system      csi-node-driver-dgl62                                        2/2     Running   0          4m14s   2001:1234:5678:9a41::2   zhecheng-223-5-md-0-rb7nl            <none>           <none>
calico-system      csi-node-driver-h855n                                        2/2     Running   0          5m27s   2001:1234:5678:9a40::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
calico-system      tigera-operator-64db64cb98-p4k66                             1/1     Running   0          6m14s   2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
kube-system        cloud-controller-manager-5c447487c8-j9695                    1/1     Running   0          6m10s   2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
kube-system        cloud-node-manager-5rqwt                                     1/1     Running   0          4m55s   2001:1234:5678:9abd::5   zhecheng-223-5-md-0-rb7nl            <none>           <none>
kube-system        cloud-node-manager-dg8mh                                     1/1     Running   0          6m10s   2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
kube-system        cloud-node-manager-jrpzb                                     1/1     Running   0          4m51s   2001:1234:5678:9abd::4   zhecheng-223-5-md-0-5cqbp            <none>           <none>
kube-system        coredns-565d847f94-d2btx                                     1/1     Running   0          6m23s   2001:1234:5678:9a40::2   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
kube-system        coredns-565d847f94-q9cx4                                     1/1     Running   0          6m23s   2001:1234:5678:9a40::3   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
kube-system        etcd-zhecheng-223-5-control-plane-xrhvp                      1/1     Running   0          6m29s   2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
kube-system        kube-apiserver-zhecheng-223-5-control-plane-xrhvp            1/1     Running   0          6m28s   2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
kube-system        kube-controller-manager-zhecheng-223-5-control-plane-xrhvp   1/1     Running   0          6m28s   2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
kube-system        kube-proxy-j4bnp                                             1/1     Running   0          4m51s   2001:1234:5678:9abd::4   zhecheng-223-5-md-0-5cqbp            <none>           <none>
kube-system        kube-proxy-r9bzj                                             1/1     Running   0          4m55s   2001:1234:5678:9abd::5   zhecheng-223-5-md-0-rb7nl            <none>           <none>
kube-system        kube-proxy-wftz6                                             1/1     Running   0          6m23s   2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
kube-system        kube-scheduler-zhecheng-223-5-control-plane-xrhvp            1/1     Running   0          6m28s   2001:1234:5678:9abc::4   zhecheng-223-5-control-plane-xrhvp   <none>           <none>
...
zhechengli@devbox:~$ kgno -owide
NAME                                 STATUS   ROLES           AGE     VERSION   INTERNAL-IP              EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
zhecheng-223-5-control-plane-xrhvp   Ready    control-plane   7m8s    v1.25.3   2001:1234:5678:9abc::4   <none>        Ubuntu 22.04.1 LTS   5.15.0-1021-azure   containerd://1.6.2
zhecheng-223-5-md-0-5cqbp            Ready    <none>          5m28s   v1.25.3   2001:1234:5678:9abd::4   <none>        Ubuntu 22.04.1 LTS   5.15.0-1021-azure   containerd://1.6.2
zhecheng-223-5-md-0-rb7nl            Ready    <none>          5m32s   v1.25.3   2001:1234:5678:9abd::5   <none>        Ubuntu 22.04.1 LTS   5.15.0-1021-azure   containerd://1.6.2

@CecileRobertMichon
Copy link
Contributor Author

@lzhecheng in my experience this doesn't repro every time but chances of reproing are a lot higher with 3+ control planes. I am usually able to repro easily within the first 4-5 clusters built.

@lzhecheng
Copy link
Contributor

This time I created clusters with 3 control-plane nodes, for 3 times. They all look good. Is the problem still there?

@CecileRobertMichon
Copy link
Contributor Author

@lzhecheng
Copy link
Contributor

Update:
There's another issue: machinedeployment cannot be available even if all pods and nodes are ready. E.g. https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/3221/pull-cluster-api-provider-azure-e2e/1631562894430703616

I observed such issue locally as well.

Status:
  Conditions:
    Last Transition Time:  2023-03-03T06:48:48Z
    Message:               Minimum availability requires 1 replicas, current 0 available
    Reason:                WaitingForAvailableMachines
    Severity:              Warning
    Status:                False
    Type:                  Ready
    Last Transition Time:  2023-03-03T06:48:48Z
    Message:               Minimum availability requires 1 replicas, current 0 available
    Reason:                WaitingForAvailableMachines
    Severity:              Warning
    Status:                False
    Type:                  Available
  Observed Generation:     1
  Phase:                   ScalingUp
  Replicas:                1
  Selector:                cluster.x-k8s.io/cluster-name=zhecheng-303-1,cluster.x-k8s.io/deployment-name=zhecheng-303-1-md-0
  Unavailable Replicas:    1
  Updated Replicas:        1

@CecileRobertMichon
Copy link
Contributor Author

CecileRobertMichon commented Mar 3, 2023

@lzhecheng you're right this was a different error on the worker node because I forgot to add back the "external" cloud-provider configuration on the worker node kubelet config (fixed in kubernetes-sigs/cluster-api-provider-azure@051ae83)

Now that it's fixed, it is still failing and this time repro'ing the same issue (on the first try) I was seeing before where 1 of the control planes has a pending pod for one of the k8s components, in this case it's the second control plane's scheduler pod: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/3221/pull-cluster-api-provider-azure-e2e/1631738500053209088/artifacts/clusters/bootstrap/resources/capz-e2e-6l4j4b/Machine/capz-e2e-6l4j4b-ipv6-control-plane-m8ktt.yaml

in the kubelet log on that VM, we can see

Mar 03 20:15:26.475236 capz-e2e-6l4j4b-ipv6-control-plane-d5d4j kubelet[1476]: I0303 20:15:26.475205 1476 status_manager.go:691] "Failed to update status for pod" pod="kube-system/kube-scheduler-capz-e2e-6l4j4b-ipv6-control-plane-d5d4j" err="failed to patch status \"{\\\"metadata\\\":{\\\"uid\\\":\\\"0881123a-ea6b-401b-9459-8d7c2214f0d9\\\"},\\\"status\\\":{\\\"conditions\\\":[{\\\"lastProbeTime\\\":null,\\\"lastTransitionTime\\\":\\\"2023-03-03T19:47:56Z\\\",\\\"status\\\":\\\"True\\\",\\\"type\\\":\\\"Initialized\\\"},{\\\"lastProbeTime\\\":null,\\\"lastTransitionTime\\\":\\\"2023-03-03T19:48:15Z\\\",\\\"status\\\":\\\"True\\\",\\\"type\\\":\\\"Ready\\\"},{\\\"lastProbeTime\\\":null,\\\"lastTransitionTime\\\":\\\"2023-03-03T19:48:15Z\\\",\\\"status\\\":\\\"True\\\",\\\"type\\\":\\\"ContainersReady\\\"},{\\\"lastProbeTime\\\":null,\\\"lastTransitionTime\\\":\\\"2023-03-03T19:47:56Z\\\",\\\"status\\\":\\\"True\\\",\\\"type\\\":\\\"PodScheduled\\\"}],\\\"containerStatuses\\\":[{\\\"containerID\\\":\\\"containerd://3bb38c78e25b21aa91ebaa30ca67397e453970b68cb666648d81d4f94dd1625a\\\",\\\"image\\\":\\\"registry.k8s.io/kube-scheduler:v1.25.6\\\",\\\"imageID\\\":\\\"registry.k8s.io/kube-scheduler@sha256:f41301881252779d21dde86aac5a45e9acfe560643b5a28cef1286eabb187e26\\\",\\\"lastState\\\":{},\\\"name\\\":\\\"kube-scheduler\\\",\\\"ready\\\":true,\\\"restartCount\\\":0,\\\"started\\\":true,\\\"state\\\":{\\\"running\\\":{\\\"startedAt\\\":\\\"2023-03-03T19:48:01Z\\\"}}}],\\\"hostIP\\\":\\\"10.0.0.5\\\",\\\"phase\\\":\\\"Running\\\",\\\"podIP\\\":\\\"2001:1234:5678:9abc::5\\\",\\\"podIPs\\\":[{\\\"ip\\\":\\\"2001:1234:5678:9abc::5\\\"},{\\\"ip\\\":\\\"2001:1234:5678:9abc::5\\\"}],\\\"startTime\\\":\\\"2023-03-03T19:47:56Z\\\"}}\" for pod \"kube-system\"/\"kube-scheduler-capz-e2e-6l4j4b-ipv6-control-plane-d5d4j\": Pod \"kube-scheduler-capz-e2e-6l4j4b-ipv6-control-plane-d5d4j\" is invalid: [status.podIPs: Invalid value: []core.PodIP{core.PodIP{IP:\"2001:1234:5678:9abc::5\"}, core.PodIP{IP:\"2001:1234:5678:9abc::5\"}}: may specify no more than one IP for each IP family, status.podIPs[1]: Duplicate value: core.PodIP{IP:\"2001:1234:5678:9abc::5\"}]"

@lzhecheng
Copy link
Contributor

@CecileRobertMichon thank you for the fix.
I have a question: Is the 3rd control plane node not coming up a separate issue or a result of this "invalid podIPs" issue? Do you think if it is expected?

@CecileRobertMichon
Copy link
Contributor Author

@lzhecheng the 3rd control plane node is not coming up because of the "invalid pod IPs" issue in the 2nd control plane: Cluster API checks the health of all control plane components before scaling up and in this case the check is failing because the scheduler pod is not healthy

@lzhecheng
Copy link
Contributor

@CecileRobertMichon Update: I ran ci-e2e.sh locally several times and have 2 findings:

  1. It succeeded most of times when only running the ipv6 test
  2. When running all 3 required tests, I saw such logs:
E0308 10:04:59.429233       1 controller.go:326]  "msg"="Reconciler error" "error"="admission webhook \"validation.azurecluster.infrastructure.cluster.x-k8s.io\" denied the request: AzureCluster.infrastructure.cluster.x-k8s.io \"zhecheng-308-1\" is invalid: spec.networkSpec.subnets[1].CIDRBlocks: Invalid value: []string{\"10.1.0.0/16\", \"2001:1234:5678:9abd::/64\"}: field is immutable" "AzureCluster"={"name":"zhecheng-308-1","namespace":"capz-e2e-c0c9z3"} "controller"="azurecluster" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureCluster" "name"="zhecheng-308-1" "namespace"="capz-e2e-c0c9z3" "reconcileID"="c8f083d8-d0ef-420d-b950-025ee090b76d"

Here capz-e2e-c0c9z3 is not the namespace for ipv6 cluster.
My guess is that there's a bug when creating clusters simultaneously. I am not 100% sure so I just want to highlight the possibility.
Could you please:

  1. Rebase your PR. Then I can also test it with CI pipeline by cherrypicking your commits.
  2. Do experiments with CI pipeline: Try to only run ipv6 test or run the 3 required tests sequentially.
    Thank you.

@CecileRobertMichon
Copy link
Contributor Author

CecileRobertMichon commented Mar 8, 2023

I just rebased the PR so it now only has the diff for switching ipv6. Running the job again then will disable everything but ipv6 in a test commit and run that.

@lzhecheng
Copy link
Contributor

Update: Reproduced on a manually created cluster, restarting these static manifest Pods will get recovered.

@lzhecheng
Copy link
Contributor

@CecileRobertMichon is it possible I can create a cluster with a kubelet binary built locally by myself?

kubelet seems to update Pod status with unexpected host IPs and I'd like to add logs to check it.

@CecileRobertMichon
Copy link
Contributor Author

@lzhecheng yes but you'll need to create a template based on the custom-builds.yaml template that uses out of tree + ipv6 (that's not a combination of features we have in the templates in the repo unfortunately): https://capz.sigs.k8s.io/developers/kubernetes-developers.html#kubernetes-117

@lzhecheng
Copy link
Contributor

Update:
I think I found the root cause: kubelet is not properly processing host IPs. OOT cloud-provider behaves little differently from before, which directly leads to failure but I think it is kubelet that should handle the situation.
Will do further verification tomorrow. And then make a fix and detailed root cause analysis.

@lzhecheng
Copy link
Contributor

lzhecheng commented Mar 23, 2023

Root cause: kubelet always adds the second host IP into PodIPs without checking if it is the same IP as PodIPs[0] or the same IP family. While the order of hostIPs is not ensured.

			if kubecontainer.IsHostNetworkPod(pod) {
				// Primary IP is not set
				if s.PodIP == "" {
					s.PodIP = hostIPs[0].String()
					s.PodIPs = []v1.PodIP{{IP: s.PodIP}}
				}
				// Secondary IP is not set #105320
				if len(hostIPs) == 2 && len(s.PodIPs) == 1 {
					s.PodIPs = append(s.PodIPs, v1.PodIP{IP: hostIPs[1].String()})
				}
			}

https://github.com/kubernetes/kubernetes/blob/v1.25.6/pkg/kubelet/kubelet_pods.go#L1534-L1544
In our case, IPv6 IP is added to Node first, hostIPs: [<IPv6>]. Then the IPv4 one is added, hostIPs: [<IPv4>, <IPv6>]. As a result, IPv6 addr is added to PodIPs twice. I added more log code into kubelet and the result:

# hostIPs: IPv6
Mar 22 11:48:08 zhecheng-322-18-control-plane-759sz kubelet[1479]: I0322 11:48:08.326325    1479 kubelet_pods.go:1536] DEBUG hostIPs [2001:1234:5678:9abc::6]
Mar 22 11:48:08 zhecheng-322-18-control-plane-759sz kubelet[1479]: I0322 11:48:08.326434    1479 kubelet_pods.go:1546] DEBUG s.PodIPs [{2001:1234:5678:9abc::6}]

# hostIPs: IPv4, IPv6
Mar 22 11:50:05 zhecheng-322-18-control-plane-759sz kubelet[1479]: I0322 11:50:05.222657    1479 kubelet_pods.go:1536] DEBUG hostIPs [10.0.0.6 2001:1234:5678:9abc::6]
Mar 22 11:50:05 zhecheng-322-18-control-plane-759sz kubelet[1479]: I0322 11:50:05.222673    1479 kubelet_pods.go:1546] DEBUG s.PodIPs [{2001:1234:5678:9abc::6} {2001:1234:5678:9abc::6}]

The fix: Check IP family before adding the IP into PodIPs
kubernetes/kubernetes#116879

Why after switching to OOT cloud-provider-azure, the problem appears:
I think the order of in-tree cloud-provider is hostIPs: [<IPv6>, <IPv4>] so the problem is not revealed. It can be further investigated.

@CecileRobertMichon @feiskyer

@CecileRobertMichon
Copy link
Contributor Author

I think the order of in-tree cloud-provider is hostIPs: [, ] so the problem is not revealed. It can be further investigated.

Would it be possible to switch the order in out of tree cloud-provider too so the order is consistent with in tree cloud-provider?

@lzhecheng
Copy link
Contributor

I think the order of in-tree cloud-provider is hostIPs: [, ] so the problem is not revealed. It can be further investigated.

Would it be possible to switch the order in out of tree cloud-provider too so the order is consistent with in tree cloud-provider?

Yes, I am also working on it.

@lzhecheng
Copy link
Contributor

@CecileRobertMichon this is the PR to fix the order issue in cloud-provider-azure code: #3643
But I wonder if we can make the fix easier. As you can see, the problem is that IPv4 is always handled first then IPv6.

if len(netInterface.IPV4.IPAddress) > 0 && len(netInterface.IPV4.IPAddress[0].PrivateIP) > 0 {
address := netInterface.IPV4.IPAddress[0]
addresses = append(addresses, v1.NodeAddress{
Type: v1.NodeInternalIP,
Address: address.PrivateIP,
})
if len(address.PublicIP) > 0 {
addresses = append(addresses, v1.NodeAddress{
Type: v1.NodeExternalIP,
Address: address.PublicIP,
})
}
}
if len(netInterface.IPV6.IPAddress) > 0 && len(netInterface.IPV6.IPAddress[0].PrivateIP) > 0 {
address := netInterface.IPV6.IPAddress[0]
addresses = append(addresses, v1.NodeAddress{
Type: v1.NodeInternalIP,
Address: address.PrivateIP,
})
if len(address.PublicIP) > 0 {
addresses = append(addresses, v1.NodeAddress{
Type: v1.NodeExternalIP,
Address: address.PublicIP,
})
}
}

I want to add logic there to avoid reorder later. So, here's the question:
Is it the process same for dualstack and IPv6 to add addresses to network interface? IPv6 cluster is IPv6 first then IPv4. Is dualstack cluster IPv4 first then IPv6? If so, we have to use my fix above. If not, means the process is the same, then as long as we have 2 internal IPs, we can always put IPv6 first.

@CecileRobertMichon
Copy link
Contributor Author

CecileRobertMichon commented Mar 24, 2023

Is it the process same for dualstack and IPv6 to add addresses to network interface? IPv6 cluster is IPv6 first then IPv4. Is dualstack cluster IPv4 first then IPv6?

I don't know the answer to this. Maybe @aramase does?

Edit: I just realized you meant in CAPZ. For network interfaces, it looks like IPv4 is always first: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/networkinterfaces/spec.go#L201

IPv6 cluster is IPv6 first then IPv4

where do you see this?

@lzhecheng
Copy link
Contributor

lzhecheng commented Mar 25, 2023

@CecileRobertMichon it is interesting. According to logs in CNM, IPv6 comes first and then the IPv4. These IPs comes from instance metadata.

return az.getLocalInstanceNodeAddresses(metadata.Network.Interface, string(name))

# I added some logs in CNM to help debug
I0324 11:46:49.170166       1 nodemanager.go:284] DEBUG old nodeAddresses [{InternalIP 2001:1234:5678:9abc::5} {Hostname zhecheng-324-6-control-plane-gbmtr}]
I0324 11:46:49.170399       1 nodemanager.go:285] DEBUG new nodeAddresses [{Hostname zhecheng-324-6-control-plane-gbmtr} {InternalIP 2001:1234:5678:9abc::5} {InternalIP 10.0.0.5}]

I am not familiar with the CAPZ code, but is it possible s.IPConfigs is an empty slice in your reference?

@jackfrancis
Copy link
Contributor

This sounds like a similar symptom that we discovered when using secondary IPs w/ Azure CNI.

@CecileRobertMichon
Copy link
Contributor Author

I am not familiar with the CAPZ code, but is it possible s.IPConfigs is an empty slice in your reference?

No, we always add the ipv4 ip config as the primary IP: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/networkinterfaces/spec.go#L151

@lzhecheng
Copy link
Contributor

I am not familiar with the CAPZ code, but is it possible s.IPConfigs is an empty slice in your reference?

No, we always add the ipv4 ip config as the primary IP: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/networkinterfaces/spec.go#L151

If there's the case, I think IMDS is not working properly.

@lzhecheng
Copy link
Contributor

From CNM, it gets host IPs with InstanceMetadataService and always returns IPv4 first then IPv6. If there's no such case that CNM gets a single IPv6 first then 2 IPs, this issue won't happen.

@aojea
Copy link

aojea commented Mar 27, 2023

@lzhecheng is kubelet configuring the --node-ip address in these clusters? because if that is the case, the controller must return the node.status.addresses in the same order as the --node-ip family

@CecileRobertMichon
Copy link
Contributor Author

Here is an example of a dual-stack cluster node:

On the Azure portal the NIC shows the ipv4 address first on the NIC:

Screenshot 2023-03-27 at 3 52 23 PM

And same thing in node addresses:

status:
  addresses:
  - address: dual-stack-25113-md-0-478wl
    type: Hostname
  - address: 10.1.0.4
    type: InternalIP
  - address: 2001:1234:5678:9abd::4
    type: InternalIP

And here is an ipv6 single stack cluster:

Screenshot 2023-03-27 at 3 52 05 PM

But note that this time the ipv6 address shows up first in the node IPs:

status:
  addresses:
  - address: ipv6-4530-md-0-f7sdx
    type: Hostname
  - address: 2001:1234:5678:9abd::4
    type: InternalIP
  - address: 10.1.0.4
    type: InternalIP

@CecileRobertMichon
Copy link
Contributor Author

is kubelet configuring the --node-ip address in these clusters

For the ipv6 cluster, kubelet --node-ip is set to ::

For the dual stack cluster, it is not set (assuming it is getting defaulted)

@lzhecheng
Copy link
Contributor

is kubelet configuring the --node-ip address in these clusters

For the ipv6 cluster, kubelet --node-ip is set to ::

For the dual stack cluster, it is not set (assuming it is getting defaulted)

Maybe for IPv6, --node-ip should not be set either since it is actually a dual-stack cluster as for Nodes? When someday the cluster is truly IPv6 only, this option should be set.

@CecileRobertMichon
Copy link
Contributor Author

Let me try without it. cc @aramase

@aojea
Copy link

aojea commented Mar 28, 2023

@CecileRobertMichon @lzhecheng please check https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3705-cloud-node-ips , this is WIP and a known and complicated problem, we also have some ammends because of the complexity kubernetes/enhancements#3898

Please check the KEP to see if you are case is contemplated and if not, please let us know

@CecileRobertMichon
Copy link
Contributor Author

Let me try without it

update: the e2e test passed several times in a row when node-ip is not set kubernetes-sigs/cluster-api-provider-azure#3221

@lzhecheng
Copy link
Contributor

Let me try without it

update: the e2e test passed several times in a row when node-ip is not set kubernetes-sigs/cluster-api-provider-azure#3221

Glad to know that. I think CNM doesn't need to change because as you said, IPv4 first and then IPv6 is the expected behaviour for CAPZ dualstack and IPv6 only.

@CecileRobertMichon
Copy link
Contributor Author

kubernetes-sigs/cluster-api-provider-azure#3221 is ready for review/merge

@lzhecheng
Copy link
Contributor

/close
The remaining k/k PR is an improvement, not the direct fix for this issue. The actual fix is on CAPZ side and has been applied.

@k8s-ci-robot
Copy link
Contributor

@lzhecheng: Closing this issue.

In response to this:

/close
The remaining k/k PR is an improvement, not the direct fix for this issue. The actual fix is on CAPZ side and has been applied.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants