Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod calico-node Hit error connecting to datastore - i/o timeout #3092

Closed
puzzloholic opened this issue Dec 26, 2019 · 25 comments
Closed

Pod calico-node Hit error connecting to datastore - i/o timeout #3092

puzzloholic opened this issue Dec 26, 2019 · 25 comments
Assignees

Comments

@puzzloholic
Copy link

Expected Behavior

Pod calico-node-xxx up and running on each node

Current Behavior

Pods calico-node on worker nodes are in state 'CrashLoopBackOff'
This is output from one of the pod

~$ kubectl logs -n kube-system calico-node-92lqd
2019-12-26 10:49:12.798 [INFO][8] startup.go 259: Early log level set to info
2019-12-26 10:49:12.798 [INFO][8] startup.go 275: Using NODENAME environment for node name
2019-12-26 10:49:12.798 [INFO][8] startup.go 287: Determined node name: ip-10-10-20-38.ap-southeast-1.compute.internal
2019-12-26 10:49:12.799 [INFO][8] k8s.go 228: Using Calico IPAM
2019-12-26 10:49:12.799 [INFO][8] startup.go 319: Checking datastore connection
2019-12-26 10:49:42.800 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://172.20.0.1:443/api/v1/nodes/foo: dial tcp 172.20.0.1:443: i/o timeout
2019-12-26 10:50:13.800 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://172.20.0.1:443/api/v1/nodes/foo: dial tcp 172.20.0.1:443: i/o timeout

Steps to Reproduce (for bugs)

  1. I followed step-by-step from this medium article and official calico docs here and here
  2. Create new nodegroup deprived from aws-node using nodeAffinity
  3. kubectl apply -f calico.yaml on that new nodegroup also using nodeAffinity
  4. Error
This is my edited calico.yaml

---
# Source: calico/templates/calico-config.yaml
# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  # Typha is disabled.
  typha_service_name: "none"
  # Configure the backend to use.
  calico_backend: "bird"

  # Configure the MTU to use
  veth_mtu: "1440"

  # The CNI network configuration to install on each node.  The special
  # values in this config will be automatically populated.
  cni_network_config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "calico",
          "log_level": "info",
          "datastore_type": "kubernetes",
          "nodename": "__KUBERNETES_NODE_NAME__",
          "mtu": __CNI_MTU__,
          "ipam": {
              "type": "calico-ipam"
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          }
        },
        {
          "type": "portmap",
          "snat": true,
          "capabilities": {"portMappings": true}
        }
      ]
    }

---
# Source: calico/templates/kdd-crds.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: felixconfigurations.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: FelixConfiguration
    plural: felixconfigurations
    singular: felixconfiguration
---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ipamblocks.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: IPAMBlock
    plural: ipamblocks
    singular: ipamblock

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: blockaffinities.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: BlockAffinity
    plural: blockaffinities
    singular: blockaffinity

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ipamhandles.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: IPAMHandle
    plural: ipamhandles
    singular: ipamhandle

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ipamconfigs.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: IPAMConfig
    plural: ipamconfigs
    singular: ipamconfig

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: bgppeers.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: BGPPeer
    plural: bgppeers
    singular: bgppeer

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: bgpconfigurations.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: BGPConfiguration
    plural: bgpconfigurations
    singular: bgpconfiguration

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ippools.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: IPPool
    plural: ippools
    singular: ippool

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: hostendpoints.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: HostEndpoint
    plural: hostendpoints
    singular: hostendpoint

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: clusterinformations.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: ClusterInformation
    plural: clusterinformations
    singular: clusterinformation

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: globalnetworkpolicies.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: GlobalNetworkPolicy
    plural: globalnetworkpolicies
    singular: globalnetworkpolicy

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: globalnetworksets.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: GlobalNetworkSet
    plural: globalnetworksets
    singular: globalnetworkset

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: networkpolicies.crd.projectcalico.org
spec:
  scope: Namespaced
  group: crd.projectcalico.org
  version: v1
  names:
    kind: NetworkPolicy
    plural: networkpolicies
    singular: networkpolicy

---

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: networksets.crd.projectcalico.org
spec:
  scope: Namespaced
  group: crd.projectcalico.org
  version: v1
  names:
    kind: NetworkSet
    plural: networksets
    singular: networkset
---
# Source: calico/templates/rbac.yaml

# Include a clusterrole for the kube-controllers component,
# and bind it to the calico-kube-controllers serviceaccount.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-kube-controllers
rules:
  # Nodes are watched to monitor for deletions.
  - apiGroups: [""]
    resources:
      - nodes
    verbs:
      - watch
      - list
      - get
  # Pods are queried to check for existence.
  - apiGroups: [""]
    resources:
      - pods
    verbs:
      - get
  # IPAM resources are manipulated when nodes are deleted.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - ippools
    verbs:
      - list
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - blockaffinities
      - ipamblocks
      - ipamhandles
    verbs:
      - get
      - list
      - create
      - update
      - delete
  # Needs access to update clusterinformations.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - clusterinformations
    verbs:
      - get
      - create
      - update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-kube-controllers
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: calico-kube-controllers
subjects:
- kind: ServiceAccount
  name: calico-kube-controllers
  namespace: kube-system
---
# Include a clusterrole for the calico-node DaemonSet,
# and bind it to the calico-node serviceaccount.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-node
rules:
  # The CNI plugin needs to get pods, nodes, and namespaces.
  - apiGroups: [""]
    resources:
      - pods
      - nodes
      - namespaces
    verbs:
      - get
  - apiGroups: [""]
    resources:
      - endpoints
      - services
    verbs:
      # Used to discover service IPs for advertisement.
      - watch
      - list
      # Used to discover Typhas.
      - get
  - apiGroups: [""]
    resources:
      - nodes/status
    verbs:
      # Needed for clearing NodeNetworkUnavailable flag.
      - patch
      # Calico stores some configuration information in node annotations.
      - update
  # Watch for changes to Kubernetes NetworkPolicies.
  - apiGroups: ["networking.k8s.io"]
    resources:
      - networkpolicies
    verbs:
      - watch
      - list
  # Used by Calico for policy information.
  - apiGroups: [""]
    resources:
      - pods
      - namespaces
      - serviceaccounts
    verbs:
      - list
      - watch
  # The CNI plugin patches pods/status.
  - apiGroups: [""]
    resources:
      - pods/status
    verbs:
      - patch
  # Calico monitors various CRDs for config.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - globalfelixconfigs
      - felixconfigurations
      - bgppeers
      - globalbgpconfigs
      - bgpconfigurations
      - ippools
      - ipamblocks
      - globalnetworkpolicies
      - globalnetworksets
      - networkpolicies
      - networksets
      - clusterinformations
      - hostendpoints
      - blockaffinities
    verbs:
      - get
      - list
      - watch
  # Calico must create and update some CRDs on startup.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - ippools
      - felixconfigurations
      - clusterinformations
    verbs:
      - create
      - update
  # Calico stores some configuration information on the node.
  - apiGroups: [""]
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
  # These permissions are only requried for upgrade from v2.6, and can
  # be removed after upgrade or on fresh installations.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - bgpconfigurations
      - bgppeers
    verbs:
      - create
      - update
  # These permissions are required for Calico CNI to perform IPAM allocations.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - blockaffinities
      - ipamblocks
      - ipamhandles
    verbs:
      - get
      - list
      - create
      - update
      - delete
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - ipamconfigs
    verbs:
      - get
  # Block affinities must also be watchable by confd for route aggregation.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - blockaffinities
    verbs:
      - watch
  # The Calico IPAM migration needs to get daemonsets. These permissions can be
  # removed if not upgrading from an installation using host-local IPAM.
  - apiGroups: ["apps"]
    resources:
      - daemonsets
    verbs:
      - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: calico-node
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: calico-node
subjects:
- kind: ServiceAccount
  name: calico-node
  namespace: kube-system

---
# Source: calico/templates/calico-node.yaml
# This manifest installs the calico-node container, as well
# as the CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: calico-node
  namespace: kube-system
  labels:
    k8s-app: calico-node
spec:
  selector:
    matchLabels:
      k8s-app: calico-node
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        k8s-app: calico-node
      annotations:
        # This, along with the CriticalAddonsOnly toleration below,
        # marks the pod as a critical add-on, ensuring it gets
        # priority scheduling and that its resources are reserved
        # if it ever gets evicted.
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      nodeSelector:
        beta.kubernetes.io/os: linux
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: app.kubernetes.io/purpose
                operator: In
                values:
                - calico
      hostNetwork: true
      tolerations:
        # Make sure calico-node gets scheduled on all nodes.
        - effect: NoSchedule
          operator: Exists
        # Mark the pod as a critical add-on for rescheduling.
        - key: CriticalAddonsOnly
          operator: Exists
        - effect: NoExecute
          operator: Exists
      serviceAccountName: calico-node
      # Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
      # deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
      terminationGracePeriodSeconds: 0
      priorityClassName: system-node-critical
      initContainers:
        # This container performs upgrade from host-local IPAM to calico-ipam.
        # It can be deleted if this is a fresh installation, or if you have already
        # upgraded to use calico-ipam.
        - name: upgrade-ipam
          image: calico/cni:v3.11.1
          command: ["/opt/cni/bin/calico-ipam", "-upgrade"]
          env:
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: calico_backend
          volumeMounts:
            - mountPath: /var/lib/cni/networks
              name: host-local-net-dir
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
          securityContext:
            privileged: true
        # This container installs the CNI binaries
        # and CNI network config file on each node.
        - name: install-cni
          image: calico/cni:v3.11.1
          command: ["/install-cni.sh"]
          env:
            # Name of the CNI config file to create.
            - name: CNI_CONF_NAME
              value: "10-calico.conflist"
            # The CNI network config to install on each node.
            - name: CNI_NETWORK_CONFIG
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: cni_network_config
            # Set the hostname based on the k8s node name.
            - name: KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            # CNI MTU Config variable
            - name: CNI_MTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            # Prevents the container from sleeping forever.
            - name: SLEEP
              value: "false"
          volumeMounts:
            - mountPath: /host/opt/cni/bin
              name: cni-bin-dir
            - mountPath: /host/etc/cni/net.d
              name: cni-net-dir
          securityContext:
            privileged: true
        # Adds a Flex Volume Driver that creates a per-pod Unix Domain Socket to allow Dikastes
        # to communicate with Felix over the Policy Sync API.
        - name: flexvol-driver
          image: calico/pod2daemon-flexvol:v3.11.1
          volumeMounts:
          - name: flexvol-driver-host
            mountPath: /host/driver
          securityContext:
            privileged: true
      containers:
        # Runs calico-node container on each Kubernetes node.  This
        # container programs network policy and routes on each
        # host.
        - name: calico-node
          image: calico/node:v3.11.1
          env:
            # Use Kubernetes API as the backing datastore.
            - name: DATASTORE_TYPE
              value: "kubernetes"
            # Wait for the datastore.
            - name: WAIT_FOR_DATASTORE
              value: "true"
            # Set based on the k8s node name.
            - name: NODENAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            # Choose the backend to use.
            - name: CALICO_NETWORKING_BACKEND
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: calico_backend
            # Cluster type to identify the deployment type
            - name: CLUSTER_TYPE
              value: "k8s,bgp"
            # Auto-detect the BGP IP address.
            - name: IP
              value: "autodetect"
            - name: IP_AUTODETECTION_METHOD
              value: "interface=eth.*"
            # Enable IPIP
            - name: CALICO_IPV4POOL_IPIP
              value: "CrossSubnet"
            # Set MTU for tunnel device used if ipip is enabled
            - name: FELIX_IPINIPMTU
              valueFrom:
                configMapKeyRef:
                  name: calico-config
                  key: veth_mtu
            # The default IPv4 pool to create on startup if none exists. Pod IPs will be
            # chosen from this range. Changing this value after installation will have
            # no effect. This should fall within `--cluster-cidr`.
            - name: CALICO_IPV4POOL_CIDR
              value: "10.11.0.0/16"
            # Disable file logging so `kubectl logs` works.
            - name: CALICO_DISABLE_FILE_LOGGING
              value: "true"
            # Set Felix endpoint to host default action to ACCEPT.
            - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
              value: "ACCEPT"
            # Disable IPv6 on Kubernetes.
            - name: FELIX_IPV6SUPPORT
              value: "false"
            # Set Felix logging to "info"
            - name: FELIX_LOGSEVERITYSCREEN
              value: "info"
            - name: FELIX_HEALTHENABLED
              value: "true"
            - name: CALICO_IPV4POOL_NAT_OUTGOING
              value: "true"
          securityContext:
            privileged: true
          resources:
            requests:
              cpu: 250m
          livenessProbe:
            exec:
              command:
              - /bin/calico-node
              - -felix-live
              - -bird-live
            periodSeconds: 10
            initialDelaySeconds: 10
            failureThreshold: 6
          readinessProbe:
            exec:
              command:
              - /bin/calico-node
              - -felix-ready
              - -bird-ready
            periodSeconds: 10
          volumeMounts:
            - mountPath: /lib/modules
              name: lib-modules
              readOnly: true
            - mountPath: /run/xtables.lock
              name: xtables-lock
              readOnly: false
            - mountPath: /var/run/calico
              name: var-run-calico
              readOnly: false
            - mountPath: /var/lib/calico
              name: var-lib-calico
              readOnly: false
            - name: policysync
              mountPath: /var/run/nodeagent
      volumes:
        # Used by calico-node.
        - name: lib-modules
          hostPath:
            path: /lib/modules
        - name: var-run-calico
          hostPath:
            path: /var/run/calico
        - name: var-lib-calico
          hostPath:
            path: /var/lib/calico
        - name: xtables-lock
          hostPath:
            path: /run/xtables.lock
            type: FileOrCreate
        # Used to install CNI.
        - name: cni-bin-dir
          hostPath:
            path: /opt/cni/bin
        - name: cni-net-dir
          hostPath:
            path: /etc/cni/net.d
        # Mount in the directory for host-local IPAM allocations. This is
        # used when upgrading from host-local to calico-ipam, and can be removed
        # if not using the upgrade-ipam init container.
        - name: host-local-net-dir
          hostPath:
            path: /var/lib/cni/networks
        # Used to create per-pod Unix Domain Sockets
        - name: policysync
          hostPath:
            type: DirectoryOrCreate
            path: /var/run/nodeagent
        # Used to install Flex Volume Driver
        - name: flexvol-driver-host
          hostPath:
            type: DirectoryOrCreate
            path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: calico-node
  namespace: kube-system

---
# Source: calico/templates/calico-kube-controllers.yaml

# See https://github.com/projectcalico/kube-controllers
apiVersion: apps/v1
kind: Deployment
metadata:
  name: calico-kube-controllers
  namespace: kube-system
  labels:
    k8s-app: calico-kube-controllers
spec:
  # The controllers can only have a single active instance.
  replicas: 1
  selector:
    matchLabels:
      k8s-app: calico-kube-controllers
  strategy:
    type: Recreate
  template:
    metadata:
      name: calico-kube-controllers
      namespace: kube-system
      labels:
        k8s-app: calico-kube-controllers
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      nodeSelector:
        beta.kubernetes.io/os: linux
      tolerations:
        # Mark the pod as a critical add-on for rescheduling.
        - key: CriticalAddonsOnly
          operator: Exists
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      serviceAccountName: calico-kube-controllers
      priorityClassName: system-cluster-critical
      containers:
        - name: calico-kube-controllers
          image: calico/kube-controllers:v3.11.1
          env:
            # Choose which controllers to run.
            - name: ENABLED_CONTROLLERS
              value: node
            - name: DATASTORE_TYPE
              value: kubernetes
          readinessProbe:
            exec:
              command:
              - /usr/bin/check-status
              - -r

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: calico-kube-controllers
  namespace: kube-system
---
# Source: calico/templates/calico-etcd-secrets.yaml

---
# Source: calico/templates/calico-typha.yaml

---
# Source: calico/templates/configure-canal.yaml

This is customization I perform on the manifest:

  • Add nodeAffinity on calico-node DaemonSet
  • Add env IP_AUTODETECTION_METHOD="interface=eth.*"
  • Change env CALICO_IPV4POOL_IPIP="CrossSubnet"
  • Change env CALICO_IPV4POOL_CIDR="10.11.0.0/16"
  • Add env CALICO_IPV4POOL_NAT_OUTGOING="true"

Context

I try to bypass AWS EKS maximum pod limitation by disabling AWS CNI and use Calico CNI as IP pool.

Your Environment

  • Calico version: v3.11
  • Orchestrator version: k8s 1.14.7
  • Operating System and version: Amazon Linux 2 (Linux 4.14.146-119.123.amzn2.x86_64)
@tmjd
Copy link
Member

tmjd commented Dec 30, 2019

It seems like the current problem is for node to reach the API server. Do you have any idea why that would be the case?
Have you tried accessing the apiserver IP 172.20.0.1:443 from a node? Does that work?

@Goeldeepesh
Copy link

Hi Erik, m also going through same issue. Difference is, i have done the setup on bare metal and the calico nodes keep on crashing on every node. I have already modified the calico file with: AUTO DETECTION METHOD and pod cidr but nothing worked.

@tmjd
Copy link
Member

tmjd commented Jan 17, 2020

@Goeldeepesh Can you provide some logs? Have you tried what I suggested? Can you access the apiserver from a node?

@Goeldeepesh
Copy link

So, i have setup a kubernetes HA with external ETCD cluster and a HA proxy over masters. After searching for calico issue i cam to know that we need to define etcd endpoints in calico. so i did the same and applied calico for etcd cluster. After this my calico node got up but " calico-kube-controller and core dns are not working.

### Calico-kube-contoller logs

I0117 18:12:19.957435 1 trace.go:116] Trace[1357217039]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96 (started: 2020-01-17 18:11:49.94948954 +0000 UTC m=+1520.210744032) (total time: 30.007913221s):
Trace[1357217039]: [30.007913221s] [30.007913221s] END
E0117 18:12:19.957454 1 reflector.go:123] pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96: Failed to list *v1.ServiceAccount: Get https://10.96.0.1:443/api/v1/serviceaccounts?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0117 18:12:20.001412 1 trace.go:116] Trace[208164637]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96 (started: 2020-01-17 18:11:49.990258604 +0000 UTC m=+1520.251513054) (total time: 30.011114497s):
Trace[208164637]: [30.011114497s] [30.011114497s] END
E0117 18:12:20.001434 1 reflector.go:123] pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0117 18:12:20.031412 1 trace.go:116] Trace[31841533]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96 (started: 2020-01-17 18:11:50.029252365 +0000 UTC m=+1520.290506835) (total time: 30.002128911s):
Trace[31841533]: [30.002128911s] [30.002128911s] END
E0117 18:12:20.031433 1 reflector.go:123] pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0117 18:12:20.143713 1 trace.go:116] Trace[1253775561]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96 (started: 2020-01-17 18:11:50.141284559 +0000 UTC m=+1520.402539008) (total time: 30.00239707s):
Trace[1253775561]: [30.00239707s] [30.00239707s] END
E0117 18:12:20.143733 1 reflector.go:123] pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96: Failed to list *v1.NetworkPolicy: Get https://10.96.0.1:443/apis/networking.k8s.io/v1/networkpolicies?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
2020-01-17 18:12:21.993 [ERROR][1] main.go 234: Failed to reach apiserver error=

### kubectl describe pod coredns-6955765f44-5472t -n kube-system

Name: coredns-6955765f44-5472t
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: n4tenl-depa0598/10.5.30.101
Start Time: Fri, 17 Jan 2020 18:19:46 +0530
Labels: k8s-app=kube-dns
pod-template-hash=6955765f44
Annotations:
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/coredns-6955765f44
Containers:
coredns:
Container ID:
Image: k8s.gcr.io/coredns:1.6.5
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-knrx7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-knrx7:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-knrx7
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Warning FailedCreatePodSandBox 45m (x40 over 66m) kubelet, n4tenl-depa0598 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "a612a10cc2e91572690e05ad7e69e6d63aa95c79558631d9fcc373212572f6e5" network for pod "coredns-6955765f44-5472t": networkPlugin cni failed to set up pod "coredns-6955765f44-5472t_kube-system" network: Get https://[10.96.0.1]:443/api/v1/namespaces/kube-system: dial tcp 10.96.0.1:443: i/o timeout
Warning NetworkNotReady 31m (x152 over 36m) kubelet, n4tenl-depa0598 network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Warning FailedCreatePodSandBox 6m21s (x37 over 26m) kubelet, n4tenl-depa0598 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1d049e6b6c261b196be3daad3e4e203d1b912f708a8b4108f653c3c932d8b7e8" network for pod "coredns-6955765f44-5472t": networkPlugin cni failed to set up pod "coredns-6955765f44-5472t_kube-system" network: Get https://[10.96.0.1]:443/api/v1/namespaces/kube-system: dial tcp 10.96.0.1:443: i/o timeout
Normal SandboxChanged 85s (x55 over 27m) kubelet, n4tenl-depa0598 Pod sandbox changed, it will be killed and re-created.

### kubectl logs kube-proxy-8j5tv -n kube-system

E0117 18:16:23.343854 1 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Endpoints: Get https://dockers.airtel.com:6443/api/v1/endpoints?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
E0117 18:16:23.843862 1 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: Get https://dockers.airtel.com:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
E0117 18:16:24.363192 1 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Endpoints: Get https://dockers.airtel.com:6443/api/v1/endpoints?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
E0117 18:16:24.857807 1 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: Get https://dockers.airtel.com:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

### kubeadm config

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
apiServer:
certSANs:

  • "docker.airtel.com"

networking:
podSubnet: 192.168.0.0/16
kubernetesVersion: stable
controlPlaneEndpoint: "dockers.airtel.com:6443"
etcd:
external:
endpoints:
- http://10.5.30.111:2379
- http://10.5.30.112:2379
- http://10.5.30.113:2379

I do not see anything running on 443 port rather 6443. I unable to understand where is this 10.96.0.1 come from..? i had provide pod-network-cidr to 192.168.0.0/16.

Please Erik i really need to sort this issue as i have been struggling with this setup for two weeks now and still on the same page.

@tmjd
Copy link
Member

tmjd commented Jan 17, 2020

@Goeldeepesh your issue seems to be quite different. From your kube-proxy logs it looks to be a problem there. Please resolve the issue with kube-proxy before you try to fix any problems with Calico.
It looks like your certificates are not setup correctly for kube-proxy. I can't offer any help on that.

@Goeldeepesh
Copy link

@tmjd Hmm... Thanks anyway.

@caseydavenport
Copy link
Member

E0117 18:16:23.843862 1 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: Get https://dockers.airtel.com:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

Yeah, this definitely looks like a kube-proxy issue. Calico relies on a functioning kube-proxy to access the Kubernetes API server.

@mohammadasim
Copy link

I know this issue has been closed but I am facing the same problem. I am running k8s cluster in AWS using kops.
kops version 1.16.0
kubectl version --short Client Version: v1.15.7 Server Version: v1.16.7
calicoctl version Client Version: v3.13.1 Git commit: eb796e31 Cluster Version: v3.9.3 Cluster Type: kops,bgp,kdd,k8s
When I install a helm chart the pod takes forever to get created. Especially for cronjobs that are run every five minutes sometimes, I have three pods in creating state because the pods take so long to get created.
When I run the kubectl describe pod command, I always see the following error.
kubelet, ip-10-10-94-150.eu-west-1.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "b7ce32bd98cd80dc141fbe28a8322acdbba7b94bbdcd85a39f53cd09a5e6b558" network for pod "service-name-1584711300-sg8hp": networkPlugin cni failed to set up pod "service-name-1584711300-sg8hp_default" network: error getting ClusterInformation: Get https://[100.64.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 100.64.0.1:443: i/o timeout, failed to clean up sandbox container "b7ce32bd98cd80dc141fbe28a8322acdbba7b94bbdcd85a39f53cd09a5e6b558" network for pod "service-name-1584711300-sg8hp": networkPlugin cni failed to teardown pod "service-name-1584711300-sg8hp_default" network: Get https://[100.64.0.1]:443/apis/crd.projectcalico.org/v1/ipamhandles/k8s-pod-network.b7ce32bd98cd80dc141fbe28a8322acdbba7b94bbdcd85a39f53cd09a5e6b558: dial tcp 100.64.0.1:443: i/o timeout]
When try to connect to IP 100.64.0.1 that is also in the config file in /etc/cni/net.d folder from calico node, the connection sometimes work and sometimes not as shown below.
root@ip-10-10-94-150:/# telnet 100.64.0.1 443 Trying 100.64.0.1... telnet: Unable to connect to remote host: Connection timed out root@ip-10-10-94-150:/# telnet 100.64.0.1 443 Trying 100.64.0.1... Connected to 100.64.0.1. Escape character is '^]'. ^CConnection closed by foreign host. root@ip-10-10-94-150:/# telnet 100.64.0.1 443 Trying 100.64.0.1... ^C root@ip-10-10-94-150:/# telnet 100.64.0.1 443 Trying 100.64.0.1... Connected to 100.64.0.1. Escape character is '^]'.
The logs from a newly started calico node pod are as follows.
2020-03-20 13:56:41.753 [INFO][8] startup.go 255: Early log level set to info 2020-03-20 13:56:41.753 [INFO][8] startup.go 271: Using NODENAME environment for node name 2020-03-20 13:56:41.753 [INFO][8] startup.go 283: Determined node name: ip-10-10-111-135.eu-west-1.compute.internal 2020-03-20 13:56:41.754 [INFO][8] k8s.go 228: Using Calico IPAM 2020-03-20 13:56:41.754 [INFO][8] startup.go 315: Checking datastore connection 2020-03-20 13:57:11.755 [INFO][8] startup.go 330: Hit error connecting to datastore - retry error=Get https://100.64.0.1:443/api/v1/nodes/foo: dial tcp 100.64.0.1:443: i/o timeout 2020-03-20 13:57:12.764 [INFO][8] startup.go 339: Datastore connection verified 2020-03-20 13:57:12.764 [INFO][8] startup.go 94: Datastore is ready 2020-03-20 13:57:12.771 [INFO][8] startup.go 583: Using autodetected IPv4 address on interface ens5: 10.10.111.135/19 2020-03-20 13:57:12.771 [INFO][8] startup.go 646: No AS number configured on node resource, using global value 2020-03-20 13:57:12.771 [INFO][8] startup.go 148: Setting NetworkUnavailable to False 2020-03-20 13:57:12.797 [INFO][8] startup.go 529: FELIX_IPV6SUPPORT is false through environment variable 2020-03-20 13:57:12.803 [INFO][8] startup.go 180: Using node name: ip-10-10-111-135.eu-west-1.compute.internal 2020-03-20 13:57:12.823 [INFO][74] k8s.go 228: Using Calico IPAM 2020-03-20 13:57:42.823 [FATAL][74] allocateip.go 62: failed to fetch node resource 'ip-10-10-111-135.eu-west-1.compute.internal' error=Get https://100.64.0.1:443/api/v1/nodes/ip-10-10-111-135.eu-west-1.compute.internal: dial tcp 100.64.0.1:443: i/o timeout Calico node failed to start
These logs also show a similar problem. Any help will be highly appreciated.
Thanks,

@Vitorspk
Copy link

I know this ticket is closed but just to help people who find this post i solved the problem when i changed my pod-network-cidr to pod-network-cidr=10.244.0.0/16

@zimmertr
Copy link

zimmertr commented Apr 2, 2020

Neither of the posted solutions are working for me with Packer/Vagrant and RHEL 7 using Kubeadm to bootstrap a v1.18.0 cluster.

Some network information:

LAN CIDR: 192.168.30.0/24
VPN CIDR: 172.27.0.0/16
Docker Engine CIDR: 192.168.65.0/24
Kubernetes Pod CIDR: 172.16.0.0/16
Kubernetes Service CIDR: 172.32.0.0/16
Calico IPV4 Pool CIDR: 172.16.0.0/16

The calico-node daemonset also has IP_AUTODETECTION_METHOD set to interface=eth1. This is because my Vagrant VMs come up with eth0 populated with 10.0.2.15/24. I'm not sure why. But eth0 has the proper IP Address supplied via Vagrant.

The calico-node pods running on each worker node are producing the following logs repeatedly:

[INFO][8] startup.go 365: Hit error connecting to datastore - retry error=Get https://172.32.0.1:443/api/v1/nodes/foo: dial tcp 172.32.0.1:443: connect: connection refused

I can curl that endpoint from the master node but not the worker nodes. Since they communicate via the LAN CIDR mentioned above: 192.168.30.0/24

Any ideas?

@tmjd
Copy link
Member

tmjd commented Apr 3, 2020

As mentioned before each calico-node and cni needs to be able to reach the Kubernetes API server. And kube-proxy is responsible for setting up the rules so that the kubernetes service IP redirects correctly.

@mohammadasim Are there multiple kubernetes masters and is it possible they are not all reachable or healthy? That might explain why the connection works sometimes and sometimes it does not. You've identified that it is not just a calico issue so you'll need to dig in to why connecting fails. I'd suggest looking at kube-proxy logs and the kubernetes API service endpoints.

@zimmertr I would guess that your API server is not configured to use the proper address because of the vagrant creating multiple interfaces and you'll need to change or set the IP address that the API server is listening on.

@zimmertr
Copy link

zimmertr commented Apr 3, 2020

API server is not configured to use the proper address

If I understand the documentation correctly, --apiserver-advertise-addressdoes what you are suggesting. And I have set that correctly. This Ansible code correctly references eth1.

- name: Initializing the Kubernetes Control Plane.
  # command: kubeadm init --config /tmp/master.yml
  command: kubeadm init --node-name master \
                       --kubernetes-version="{{ K8S_VERSION }}" \
                       --apiserver-advertise-address="{{ hostvars[inventory_hostname]['ansible_eth1']['ipv4']['address'] }}" \
                       --apiserver-cert-extra-sans="{{ hostvars[inventory_hostname]['ansible_eth1']['ipv4']['address'] }}"  \
                       --pod-network-cidr={{ K8S_POD_CIDR }}

This works on CentOS 7 but does not work on RHEL 7.

Alternatively, if I use Kubeadm Configuration files which have the same arguments, it doesn't work for either operating system.

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
metadata:
  name: kubelet-config
cgroupDriver: systemd
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
metadata:
  name: master-init
bootstrapTokens:
  - apiServerEndpoint: "{{ hostvars[inventory_hostname]['ansible_eth1']['ipv4']['address'] }}:6443"
    token: "{{ K8S_JOIN_TOKEN }}"
    ttl: "0"
LocalAPIEndpoint:
  AdvertiseAddress: "{{ hostvars[inventory_hostname]['ansible_eth1']['ipv4']['address'] }}"
nodeRegistration:
  kubeletExtraArgs:
    node-ip: "{{ hostvars[inventory_hostname]['ansible_eth1']['ipv4']['address'] }}"
  node-name: "{{ ansible_hostname }}"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
metadata:
  name: cluster-config
apiServerCertSANs:
  - "{{ hostvars[inventory_hostname]['ansible_eth1']['ipv4']['address'] }}"
  - "{{ ansible_hostname }}"
clusterName: "{{ K8S_CLUSTER_NAME }}"
controlPlaneEndpoint: "{{ hostvars[inventory_hostname]['ansible_eth1']['ipv4']['address'] }}:6443"
image-repository: "{{ K8S_IMAGE_REPOSITORY }}"
kubernetesVersion: "{{ K8S_VERSION }}"
networking:
  podSubnet: "{{ K8S_POD_CIDR }}"
  # serviceSubnet: "{{ K8S_SERVICE_CIDR }}"
skip-certificate-key-print: true
skip-token-print: true
etcd:
  local:
    imageRepository: "{{ K8S_IMAGE_REPOSITORY }}"

Lastly, here is the Kustomize patch I'm applying to the calico-node DaemonSet to ensure it's correctly configured for my Kubernetes pod network CIDR:

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: calico-node
spec:
  template:
    spec:
      containers:
        - name: calico-node
          env:
            - name: CALICO_IPV4POOL_CIDR
              value: "{{ K8S_POD_CIDR }}"
            - name: IP_AUTODETECTION_METHOD
              value: interface=eth1

@josejuanmontiel
Copy link

In my case, after reading a lot about it... my principal problem was the need of bridge-utils (https://wiki.debian.org/es/KVM)

@oldthreefeng
Copy link

In my case. its net.ipv4.conf.all.rp_filter need to be set 0 . ubuntu 20 set default to 2.

@itsecforu
Copy link

In my case. its net.ipv4.conf.all.rp_filter need to be set 0 . ubuntu 20 set default to 2.

all nodes?

@oldthreefeng
Copy link

yes。 @itsecforu

@itsecforu
Copy link

@oldthreefeng doent work for me

@oldthreefeng
Copy link

@itsecforu what the value of net.ipv4.conf.all.rp_filter in your nodes.

@itsecforu
Copy link

itsecforu commented Jun 3, 2021

@oldthreefeng set to 0
net.ipv4.conf.all.rp_filter=0

full:

net.ipv4.conf.all.rp_filter=0
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-arptables=1
net.bridge.bridge-nf-call-ip6tables=1

@oldthreefeng
Copy link

is there any error logs in your calico nodes pod ?
and any other pods like coredns run health? @itsecforu

@itsecforu
Copy link

1 of my master nodes:

2021-06-02 13:11:35.873 [ERROR][55] client.go 95: Failed to query current BGP settings: Get https://10.233.0.1:443/apis/crd.projectcalico.org/v1/bgpconfigurations/default: net/http: TLS handshake timeout
2021-06-02 13:11:35.965 [ERROR][181] resource.go 288: Error from checkcmd "bird6 -p -c /etc/calico/confd/config/.bird6.cfg964169467": ""
2021-06-02 13:11:45.992 [ERROR][181] resource.go 302: Error from reloadcmd: "2021-06-02 13:11:35.988 [INFO][198] k8s.go 228: Using Calico IPAM\n2021-06-02 13:11:45.990 [FATAL][198] allocateip.go 62: failed to fetch node resource 'node1' error=Get https://10.233.0.1:443/api/v1/nodes/node1: net/http: TLS handshake timeout\n"
2021-06-02 13:11:45.992 [ERROR][181] run.go 52: exit status 1

anothers are ok

@itsecforu
Copy link

itsecforu commented Jun 3, 2021

coredns on master node:

[ERROR] plugin/errors: 2 . NS: read udp 10.233.96.4:54018->8.8.8.8:53: i/o timeout
E0424 12:38:22.615161       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:kube-system:coredns" cannot list resource "endpoints" in API group "" at the cluster scope

@usersina
Copy link

In my case, after reading a lot about it... my principal problem was the need of bridge-utils (https://wiki.debian.org/es/KVM)

Can you elaborate more on what you did please?

@josejuanmontiel
Copy link

In my case, after reading a lot about it... my principal problem was the need of bridge-utils (https://wiki.debian.org/es/KVM)

Can you elaborate more on what you did please?

Reviewing my old notes... at the end, the problem is: no comunication between elements (master and nodes..)

During configuration of VM (in my case with KVM) i setup 3 (master, 2 nodes)

Each vm need... swap / bridge / ipforward (https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/)

cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.ipv4.conf.all.rp_filter = 1
EOF
sysctl --system

/etc/modules
br_netfilter

And the "trick" came here: https://wiki.debian.org/KVM

If you use libvirt to manage your VMs, libvirt provides a NATed bridged network named "default" that allows the host to communicate with the guests.
This network is available only for the system domains (that is VMs created by root or using the qemu:///system connection URI).
...
In order for things to work this way you need to have the recommended packages dnsmasq-base, bridge-utils and iptables installed.

Hope it's help @usersina

@usersina
Copy link

@josejuanmontiel Thanks for sharing! Although I haven't been able to get it to work this way since I'm using microk8s which does most of the networking behind the scenes. More on this if anyone is interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests