New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod been allocated an IP that doesn't belongs to podSubnet in v1.12.1 #70785

Open
Lentil1016 opened this Issue Nov 8, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@Lentil1016

Lentil1016 commented Nov 8, 2018

What happened:

Hi, I have three nodes, they are well prepared for deploying kubernetes and they have password-less login ssh to each other.

role IP
master-01 10.130.29.80
master-02 10.130.29.81
master-03 10.130.29.82

I'm trying to deploy a kubernetes v1.12.1 HA master cluster with my script on them. I set my CIDR to 10.244.0.0/16. But after the script finished it's job, i found all pods on master-02 and master-03 own an IP that doesn't belongs to the CIDR I set, but belongs to 172.17.0.0/16 (a.k.a. docker0 CIDR). Here is an example:

kubectl get pods -o wide -n kube-system                                  
NAME                                              READY   STATUS    RESTARTS   AGE     IP             NODE                    NOMINATED NODE
calico-node-hc6k7                                 1/2     Running   0          3m33s   10.130.29.82   centos-7-x86-64-29-82   <none>
calico-node-jksjw                                 1/2     Running   0          3m33s   10.130.29.81   centos-7-x86-64-29-81   <none>
calico-node-xbblx                                 1/2     Running   1          3m33s   10.130.29.80   centos-7-x86-64-29-80   <none>
coredns-576cbf47c7-mb5xc                          1/1     Running   0          5m45s   10.244.0.4     centos-7-x86-64-29-80   <none>
coredns-576cbf47c7-tzw72                          1/1     Running   0          5m45s   10.244.0.5     centos-7-x86-64-29-80   <none>
etcd-centos-7-x86-64-29-80                        1/1     Running   0          4m46s   10.130.29.80   centos-7-x86-64-29-80   <none>
etcd-centos-7-x86-64-29-81                        1/1     Running   0          102s    10.130.29.81   centos-7-x86-64-29-81   <none>
etcd-centos-7-x86-64-29-82                        1/1     Running   0          2m2s    10.130.29.82   centos-7-x86-64-29-82   <none>
heapster-v1.5.4-65ff99c48d-25xvw                  2/2     Running   0          3m17s   172.17.0.2     centos-7-x86-64-29-81   <none>
kube-apiserver-centos-7-x86-64-29-80              1/1     Running   0          2m51s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-apiserver-centos-7-x86-64-29-81              1/1     Running   3          2m46s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-apiserver-centos-7-x86-64-29-82              1/1     Running   0          106s    10.130.29.82   centos-7-x86-64-29-82   <none>
kube-controller-manager-centos-7-x86-64-29-80     1/1     Running   2          4m48s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-controller-manager-centos-7-x86-64-29-81     1/1     Running   1          2m46s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-controller-manager-centos-7-x86-64-29-82     1/1     Running   0          104s    10.130.29.82   centos-7-x86-64-29-82   <none>
kube-proxy-4l7fg                                  1/1     Running   0          3m55s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-proxy-hd2k5                                  1/1     Running   0          3m39s   10.130.29.82   centos-7-x86-64-29-82   <none>
kube-proxy-mgttp                                  1/1     Running   0          5m45s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-scheduler-centos-7-x86-64-29-80              1/1     Running   2          4m46s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-scheduler-centos-7-x86-64-29-81              1/1     Running   1          3m17s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-scheduler-centos-7-x86-64-29-82              1/1     Running   0          107s    10.130.29.82   centos-7-x86-64-29-82   <none>
kubernetes-dashboard-6b475b66b5-d7d78             1/1     Running   0          28s     10.244.0.6     centos-7-x86-64-29-80   <none>
monitoring-influxdb-grafana-v4-65cc9bb8c8-2zcbz   2/2     Running   0          3m17s   172.17.0.3     centos-7-x86-64-29-81   <none>
traefik-ingress-controller-2wkn9                  1/1     Running   0          3m22s   10.130.29.82   centos-7-x86-64-29-82   <none>
traefik-ingress-controller-jkxvg                  1/1     Running   0          3m22s   10.130.29.81   centos-7-x86-64-29-81   <none>
traefik-ingress-controller-kb9v4                  1/1     Running   0          2m19s   10.130.29.80   centos-7-x86-64-29-80   <none>

I'm not sure if i should post this issue to kubernetes or calico, but I used to test v1.11.0 kubernetes with same calico manifest, and it doesn't run into this issue. So I think I should post it here first

# On my v1.11.0 cluster
kubectl get pods -o wide -n kube-system
NAME                                            READY     STATUS    RESTARTS   AGE       IP             NODE
calico-node-bzw4n                               2/2       Running   0          3m        10.130.29.80   centos-7-x86-64-29-80
calico-node-l55xb                               2/2       Running   0          3m        10.130.29.82   centos-7-x86-64-29-82
calico-node-x47qr                               2/2       Running   0          3m        10.130.29.81   centos-7-x86-64-29-81
coredns-78fcdf6894-nrzh7                        1/1       Running   0          6m        10.244.1.2     centos-7-x86-64-29-81
coredns-78fcdf6894-pgqmj                        1/1       Running   0          6m        10.244.1.5     centos-7-x86-64-29-81
etcd-centos-7-x86-64-29-80                      1/1       Running   0          5m        10.130.29.80   centos-7-x86-64-29-80
etcd-centos-7-x86-64-29-81                      1/1       Running   0          3m        10.130.29.81   centos-7-x86-64-29-81
etcd-centos-7-x86-64-29-82                      1/1       Running   0          1m        10.130.29.82   centos-7-x86-64-29-82
heapster-55f6bc464-9kk25                        1/1       Running   0          3m        10.244.1.3     centos-7-x86-64-29-81
kube-apiserver-centos-7-x86-64-29-80            1/1       Running   0          2m        10.130.29.80   centos-7-x86-64-29-80
kube-apiserver-centos-7-x86-64-29-81            1/1       Running   0          2m        10.130.29.81   centos-7-x86-64-29-81
kube-apiserver-centos-7-x86-64-29-82            1/1       Running   0          1m        10.130.29.82   centos-7-x86-64-29-82
kube-controller-manager-centos-7-x86-64-29-80   1/1       Running   1          4m        10.130.29.80   centos-7-x86-64-29-80
kube-controller-manager-centos-7-x86-64-29-81   1/1       Running   0          3m        10.130.29.81   centos-7-x86-64-29-81
kube-controller-manager-centos-7-x86-64-29-82   1/1       Running   0          2m        10.130.29.82   centos-7-x86-64-29-82
kube-proxy-9thnt                                1/1       Running   0          6m        10.130.29.80   centos-7-x86-64-29-80
kube-proxy-j6p67                                1/1       Running   0          4m        10.130.29.81   centos-7-x86-64-29-81
kube-proxy-sblcd                                1/1       Running   0          3m        10.130.29.82   centos-7-x86-64-29-82
kube-scheduler-centos-7-x86-64-29-80            1/1       Running   1          5m        10.130.29.80   centos-7-x86-64-29-80
kube-scheduler-centos-7-x86-64-29-81            1/1       Running   0          3m        10.130.29.81   centos-7-x86-64-29-81
kube-scheduler-centos-7-x86-64-29-82            1/1       Running   0          2m        10.130.29.82   centos-7-x86-64-29-82
kubernetes-dashboard-f76dc5995-lf75f            1/1       Running   0          3m        10.244.1.7     centos-7-x86-64-29-81
monitoring-grafana-74f7679848-pll4r             1/1       Running   0          3m        10.244.1.6     centos-7-x86-64-29-81
monitoring-influxdb-84b498559d-7xhfs            1/1       Running   0          3m        10.244.1.4     centos-7-x86-64-29-81
traefik-ingress-controller-kdh9f                1/1       Running   0          3m        10.130.29.81   centos-7-x86-64-29-81
traefik-ingress-controller-ljh52                1/1       Running   0          3m        10.130.29.80   centos-7-x86-64-29-80
traefik-ingress-controller-mg6lm                1/1       Running   0          3m        10.130.29.82   centos-7-x86-64-29-82

Sorry for I don't know how it happened clearly yet. I'm not sure if this is caused by my misconfigured or a bug, I've tried read controller log, calico log, but I failed to catch any clue. Is there any more information I may provide? Or any clue I can followed to continuing my debug? Thanks. ❤️

What you expected to happen:

Pods own an IP belongs to the podSubnet I set.

How to reproduce it (as minimally and precisely as possible):

Prerequisite:

  • Three nodes (CentOS/Fedora) with v1.12.1 kubeadm and kubernetes installed.
  • They can login each other password-lessly.

Reproduce:

$ echo """
CP0_IP=10.130.29.80
CP0_HOSTNAME=centos-7-x86-64-29-80
CP1_IP=10.130.29.81
CP1_HOSTNAME=centos-7-x86-64-29-81
CP2_IP=10.130.29.82
CP2_HOSTNAME=centos-7-x86-64-29-82
VIP=10.130.29.83
NET_IF=ens32
CIDR=10.244.0.0/16 """ > ./cluster-info
$ bash -c "$(curl -fsSL https://raw.githubusercontent.com/Lentil1016/kubeadm-ha/1.12.1/kubeha-gen.sh)"

Environment:

  • Kubernetes version (use kubectl version): v1.12.1
  • Cloud provider or hardware configuration: Virtual machine on private cloud
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.2.1511 (Core)
  • Kernel (e.g. uname -a): Linux centos-7-x86-64-29-80 4.18.12-1.el7.elrepo.x86_64 #1 SMP Thu Oct 4 09:36:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubeadm
  • Others:

/kind bug
/sig Network

@Lentil1016

This comment has been minimized.

Lentil1016 commented Nov 8, 2018

BTW

They are only a little bit different.

@anfernee

This comment has been minimized.

Member

anfernee commented Nov 9, 2018

I've noticed that your calico node isn't fully healthy: Only 1 container among 2 is running. Most likely it's install-cni.

calico-node-hc6k7                                 1/2     Running   0          3m33s   10.130.29.82   centos-7-x86-64-29-82   <none>
calico-node-jksjw                                 1/2     Running   0          3m33s   10.130.29.81   centos-7-x86-64-29-81   <none>
calico-node-xbblx                                 1/2     Running   1          3m33s   10.130.29.80   centos-7-x86-64-29-80   <none>
@Lentil1016

This comment has been minimized.

Lentil1016 commented Nov 9, 2018

I've noticed that your calico node isn't fully healthy: Only 1 container among 2 is running. Most likely it's install-cni.

Thanks for reminding😄. But I also noticed that, And I'm not sure that is the cause of my issue. If I boot the v1.12.1 cluster with my calico manifest version down-graded from 3.2.3 to 3.1.3, calico-node ds will be fully healthy, but this IP issue will remain the same. The unhealthy container is calico-node. That's another issue I'm looking into.

And I just noticed another unexpected behavior. When I join and mark my master-02 into cluster, before the network plugin(calico in my case) is installed, coreDNS have already been allocated an IP. I tried this procedure twice, it's reproducible.

$ ./kubeha-gen.sh
# init master-01 logs....
# add master-02 blahblahblah....
[markmaster] Marking the node centos-7-x86-64-29-81 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node centos-7-x86-64-29-81 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
^C

$ kubectl get pods -o wide -n kube-system
NAME                                            READY   STATUS    RESTARTS   AGE    IP             NODE                    NOMINATED NODE
coredns-576cbf47c7-57vc4                        1/1     Running   0          107s   172.17.0.2     centos-7-x86-64-29-81   <none>
coredns-576cbf47c7-pstzx                        1/1     Running   0          107s   172.17.0.3     centos-7-x86-64-29-81   <none>
etcd-centos-7-x86-64-29-80                      1/1     Running   0          51s    10.130.29.80   centos-7-x86-64-29-80   <none>
kube-apiserver-centos-7-x86-64-29-80            1/1     Running   0          62s    10.130.29.80   centos-7-x86-64-29-80   <none>
kube-controller-manager-centos-7-x86-64-29-80   1/1     Running   1          72s    10.130.29.80   centos-7-x86-64-29-80   <none>
kube-proxy-rz2wq                                1/1     Running   0          107s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-scheduler-centos-7-x86-64-29-80            1/1     Running   1          75s    10.130.29.80   centos-7-x86-64-29-80   <none>

Information

kubeadm alpha commands to add master-02

  ssh ${host} "
    kubeadm alpha phase certs all --config /etc/kubernetes/kubeadm-config.yaml
    kubeadm alpha phase kubeconfig controller-manager --config /etc/kubernetes/kubeadm-config.yaml
    kubeadm alpha phase kubeconfig scheduler --config /etc/kubernetes/kubeadm-config.yaml
    kubeadm alpha phase kubelet config write-to-disk --config /etc/kubernetes/kubeadm-config.yaml
    kubeadm alpha phase kubelet write-env-file --config /etc/kubernetes/kubeadm-config.yaml
    kubeadm alpha phase kubeconfig kubelet --config /etc/kubernetes/kubeadm-config.yaml
    systemctl restart kubelet
    kubeadm alpha phase etcd local --config /etc/kubernetes/kubeadm-config.yaml
    kubeadm alpha phase kubeconfig all --config /etc/kubernetes/kubeadm-config.yaml
    kubeadm alpha phase controlplane all --config /etc/kubernetes/kubeadm-config.yaml
    kubeadm alpha phase kubelet config annotate-cri --config /etc/kubernetes/kubeadm-config.yaml
    kubeadm alpha phase mark-master --config /etc/kubernetes/kubeadm-config.yaml"

kubeadm-config-m0.yaml

apiVersion: kubeadm.k8s.io/v1alpha3
kind: ClusterConfiguration
kubernetesVersion: v1.12.1
apiServerCertSANs:
- 10.130.29.80
- 10.130.29.81
- 10.130.29.82
- centos-7-x86-64-29-80
- centos-7-x86-64-29-81
- centos-7-x86-64-29-82
- 10.130.29.83
etcd:
  local:
    extraArgs:
      listen-client-urls: https://127.0.0.1:2379,https://10.130.29.80:2379
      advertise-client-urls: https://10.130.29.80:2379
      listen-peer-urls: https://10.130.29.80:2380
      initial-advertise-peer-urls: https://10.130.29.80:2380
      initial-cluster: centos-7-x86-64-29-80=https://10.130.29.80:2380
      initial-cluster-state: new
    serverCertSANs:
      - centos-7-x86-64-29-80
      - 10.130.29.80
    peerCertSANs:
      - centos-7-x86-64-29-80
      - 10.130.29.80
networking:
  # This CIDR is a Calico default. Substitute or remove for your CNI provider.
  podSubnet: 10.244.0.0/16
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs

kubeadm-config-m1.yaml

apiVersion: kubeadm.k8s.io/v1alpha3
kind: ClusterConfiguration
kubernetesVersion: v1.12.1
apiServerCertSANs:
- 10.130.29.80
- 10.130.29.81
- 10.130.29.82
- centos-7-x86-64-29-80
- centos-7-x86-64-29-81
- centos-7-x86-64-29-82
- 10.130.29.83
etcd:
  local:
    extraArgs:
      listen-client-urls: https://127.0.0.1:2379,https://10.130.29.81:2379
      advertise-client-urls: https://10.130.29.81:2379
      listen-peer-urls: https://10.130.29.81:2380
      initial-advertise-peer-urls: https://10.130.29.81:2380
      initial-cluster: centos-7-x86-64-29-80=https://10.130.29.80:2380,centos-7-x86-64-29-81=https://10.130.29.81:2380
      initial-cluster-state: existing
    serverCertSANs:
      - centos-7-x86-64-29-81
      - 10.130.29.81
    peerCertSANs:
      - centos-7-x86-64-29-81
      - 10.130.29.81
networking:
  # This CIDR is a Calico default. Substitute or remove for your CNI provider.
  podSubnet: 10.244.0.0/16
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
@Lentil1016

This comment has been minimized.

Lentil1016 commented Nov 9, 2018

I just rebuild my cluster with calico 3.1.3, and all containers are healthy now. My podSubnet is still 10.244.0.0/16. But pods on master-02 and master-03 still obtain an IP from 172.17.0.0/16. I've checked /etc/kubernetes/manifest on both three nodes, but I failed to find anything relevant to 172.17.

$ kubectl get ds -n kube-system calico-node -o yaml|grep 'value: 10.244' 
          value: 10.244.0.0/16

$ cat /etc/kubernetes/manifests/kube-controller-manager.yaml|grep 10.244
    - --cluster-cidr=10.244.0.0/16

$ kubectl get pods -o wide -n kube-system                  
NAME                                              READY   STATUS    RESTARTS   AGE     IP             NODE                    NOMINATED NODE
calico-node-77fcr                                 2/2     Running   0          6m52s   10.130.29.82   centos-7-x86-64-29-82   <none>
calico-node-8gl9s                                 2/2     Running   0          6m52s   10.130.29.80   centos-7-x86-64-29-80   <none>
calico-node-mvbvd                                 2/2     Running   0          6m52s   10.130.29.81   centos-7-x86-64-29-81   <none>
coredns-576cbf47c7-lb6sh                          1/1     Running   0          9m3s    172.17.0.2     centos-7-x86-64-29-81   <none>
coredns-576cbf47c7-mv5bd                          1/1     Running   0          9m3s    172.17.0.3     centos-7-x86-64-29-81   <none>
etcd-centos-7-x86-64-29-80                        1/1     Running   0          8m7s    10.130.29.80   centos-7-x86-64-29-80   <none>
etcd-centos-7-x86-64-29-81                        1/1     Running   0          5m32s   10.130.29.81   centos-7-x86-64-29-81   <none>
etcd-centos-7-x86-64-29-82                        1/1     Running   0          5m14s   10.130.29.82   centos-7-x86-64-29-82   <none>
heapster-v1.5.4-65ff99c48d-9l9x8                  2/2     Running   0          6m43s   172.17.0.4     centos-7-x86-64-29-81   <none>
kube-apiserver-centos-7-x86-64-29-80              1/1     Running   0          6m18s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-apiserver-centos-7-x86-64-29-81              1/1     Running   0          5m23s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-apiserver-centos-7-x86-64-29-82              1/1     Running   0          5m29s   10.130.29.82   centos-7-x86-64-29-82   <none>
kube-controller-manager-centos-7-x86-64-29-80     1/1     Running   2          8m35s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-controller-manager-centos-7-x86-64-29-81     1/1     Running   0          6m36s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-controller-manager-centos-7-x86-64-29-82     1/1     Running   0          5m18s   10.130.29.82   centos-7-x86-64-29-82   <none>
kube-proxy-bpfn8                                  1/1     Running   0          7m11s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-proxy-n7l89                                  1/1     Running   0          9m3s    10.130.29.80   centos-7-x86-64-29-80   <none>
kube-proxy-p25ss                                  1/1     Running   0          6m55s   10.130.29.82   centos-7-x86-64-29-82   <none>
kube-scheduler-centos-7-x86-64-29-80              1/1     Running   2          8m6s    10.130.29.80   centos-7-x86-64-29-80   <none>
kube-scheduler-centos-7-x86-64-29-81              1/1     Running   0          5m25s   10.130.29.81   centos-7-x86-64-29-81   <none>
kube-scheduler-centos-7-x86-64-29-82              1/1     Running   0          5m8s    10.130.29.82   centos-7-x86-64-29-82   <none>
kubernetes-dashboard-6b475b66b5-2xk6d             1/1     Running   1          6m42s   172.17.0.6     centos-7-x86-64-29-81   <none>
monitoring-influxdb-grafana-v4-65cc9bb8c8-vrnr6   2/2     Running   0          6m43s   172.17.0.5     centos-7-x86-64-29-81   <none>
traefik-ingress-controller-68j8f                  1/1     Running   0          6m45s   10.130.29.81   centos-7-x86-64-29-81   <none>
traefik-ingress-controller-lx9nw                  1/1     Running   0          6m45s   10.130.29.82   centos-7-x86-64-29-82   <none>
traefik-ingress-controller-m2bp8                  1/1     Running   0          6m      10.130.29.80   centos-7-x86-64-29-80   <none>
@Lentil1016

This comment has been minimized.

Lentil1016 commented Nov 12, 2018

@anfernee Hi, sorry for bothering. Any chance to provide me some feed back? Thanks. These days I also tried to create my cluster manually following the official guide Creating Highly Available Clusters with kubeadm. And I got a broken cluster almost same as the one I created with my script: Right after I run kubeadm alpha phase kubelet config annotate-cri --config kubeadm-config.yaml on my second master, then the coreDNS pod will obtain an IP from 172.17.0.0/16.

$ kubectl get pods -n kube-system -o wide   
NAME                                            READY   STATUS             RESTARTS   AGE     IP             NODE                    NOMINATED NODE
coredns-576cbf47c7-fpxm7                        1/1     Running            0          16m     172.17.0.3     centos-7-x86-64-29-81   <none>
coredns-576cbf47c7-mxchm                        1/1     Running            0          16m     172.17.0.2     centos-7-x86-64-29-81   <none>
etcd-centos-7-x86-64-29-80                      1/1     Running            0          9m15s   10.130.29.80   centos-7-x86-64-29-80   <none>
etcd-centos-7-x86-64-29-81                      1/1     Running            0          13m     10.130.29.81   centos-7-x86-64-29-81   <none>
kube-apiserver-centos-7-x86-64-29-80            1/1     Running            0          9m15s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-apiserver-centos-7-x86-64-29-81            1/1     Running            0          12m     10.130.29.81   centos-7-x86-64-29-81   <none>
kube-controller-manager-centos-7-x86-64-29-80   1/1     Running            0          9m15s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-controller-manager-centos-7-x86-64-29-81   0/1     CrashLoopBackOff   7          14m     10.130.29.81   centos-7-x86-64-29-81   <none>
kube-proxy-q96sv                                1/1     Running            0          16m     10.130.29.80   centos-7-x86-64-29-80   <none>
kube-proxy-ssmmq                                1/1     Running            0          14m     10.130.29.81   centos-7-x86-64-29-81   <none>
kube-scheduler-centos-7-x86-64-29-80            1/1     Running            0          9m15s   10.130.29.80   centos-7-x86-64-29-80   <none>
kube-scheduler-centos-7-x86-64-29-81            0/1     CrashLoopBackOff   7          14m     10.130.29.81   centos-7-x86-64-29-81   <none>

The only difference between this guide and my script is:

  • official guide dosen't create kubeconfig controller-manager and kubeconfig scheduler witch is why scheduler and controller are CrashLoopBackOff on second node I guess. Their failed reason is invalid configuration: no configuration has been provided
  • kubeadm-config.yaml is different. But I failed to find anything wrong with my configuration, I have pasted them in my third comment.

So I'm really confused. Am I the only one who meet this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment