Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller failed to renew lease (leader election) due to timeout until pod restart #11287

Closed
satdeveloping opened this issue Apr 19, 2024 · 6 comments
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@satdeveloping
Copy link

satdeveloping commented Apr 19, 2024

What happened:

We deployed a new version of our application, at the ~same time, the ingress-nginx controller failed to renew a lease in the leader election. The controller was then unable to update the configmap. Our applications stopped receiving traffic.

When examining the logs of the controller, requests were being forwarded to pods that no longer existed.

Issuing the command kubectl -n ingress-controller rollout restart deployment ingress-nginx-controller resolved the issue.

E0418 12:58:38.117452       7 leaderelection.go:327] error retrieving resource lock ingress-nginx/ingress-nginx-leader: Get \"https://10.100.0.1:443/apis/coordination.k8s.io/v1/namespaces/ingress-nginx/leases/ingress-nginx-leader\": context deadline exceeded
I0418 12:58:38.117496       7 leaderelection.go:280] failed to renew lease ingress-nginx/ingress-nginx-leader: timed out waiting for the condition
I0418 12:58:38.120536       7 leaderelection.go:245] attempting to acquire leader lease ingress-nginx/ingress-nginx-leader...
E0418 12:58:38.120726       7 status.go:104] \"error running poll\" err=\"timed out waiting for the condition\"
2024/04/18 12:58:57 [error] 26#26: *1900 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.32.107.61, server: ......

# ... upstream errors

W0418 12:59:00.751625       7 reflector.go:456] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: watch of *v1.IngressClass ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding
W0418 12:59:00.751646       7 reflector.go:456] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: watch of *v1.Secret ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding
W0418 12:59:00.751682       7 reflector.go:456] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: watch of *v1.Service ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding
W0418 12:59:00.751680       7 reflector.go:456] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: watch of *v1.Ingress ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding
W0418 12:59:00.751714       7 reflector.go:456] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: watch of *v1.EndpointSlice ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding
W0418 12:59:00.751747       7 reflector.go:456] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: watch of *v1.ConfigMap ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding
E0418 12:59:00.751747       7 leaderelection.go:327] error retrieving resource lock ingress-nginx/ingress-nginx-leader: Get \"https://10.100.0.1:443/apis/coordination.k8s.io/v1/namespaces/ingress-nginx/leases/ingress-nginx-leader\": http2: client connection lost

# .... more upstream errors

W0418 12:59:31.582583       7 reflector.go:533] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get \"https://10.100.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=639340348\": dial tcp 10.100.0.1:443: i/o timeout
I0418 12:59:31.582670       7 trace.go:219] Trace[953552949]: \"Reflector ListAndWatch\" name:k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (18-Apr-2024 12:59:01.580) (total time: 30002ms):
Trace[953552949]: ---\"Objects listed\" error:Get \"https://10.100.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=639340348\": dial tcp 10.100.0.1:443: i/o timeout 30002ms (12:59:31.582)
Trace[953552949]: [30.002274375s] [30.002274375s] END
E0418 12:59:31.582693       7 reflector.go:148] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get \"https://10.100.0.1:443/apis/discovery.k8s.io/v1/endpointslices?resourceVersion=639340348\": dial tcp 10.100.0.1:443: i/o timeout
W0418 12:59:31.772843       7 reflector.go:533] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.IngressClass: Get \"https://10.100.0.1:443/apis/networking.k8s.io/v1/ingressclasses?resourceVersion=639340265\": dial tcp 10.100.0.1:443: i/o timeout
I0418 12:59:31.773134       7 trace.go:219] Trace[2067075918]: \"Reflector ListAndWatch\" name:k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (18-Apr-2024 12:59:01.771) (total time: 30001ms):
Trace[2067075918]: ---\"Objects listed\" error:Get \"https://10.100.0.1:443/apis/networking.k8s.io/v1/ingressclasses?resourceVersion=639340265\": dial tcp 10.100.0.1:443: i/o timeout 30000ms (12:59:31.772)
Trace[2067075918]: [30.001113797s] [30.001113797s] END
E0418 12:59:31.773182       7 reflector.go:148] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.IngressClass: failed to list *v1.IngressClass: Get \"https://10.100.0.1:443/apis/networking.k8s.io/v1/ingressclasses?resourceVersion=639340265\": dial tcp 10.100.0.1:443: i/o timeout
W0418 12:59:31.972234       7 reflector.go:533] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.ConfigMap: Get \"https://10.100.0.1:443/api/v1/configmaps?labelSelector=OWNER%21%3DTILLER&resourceVersion=639340267\": dial tcp 10.100.0.1:443: i/o timeout
I0418 12:59:31.972334       7 trace.go:219] Trace[1069569260]: \"Reflector ListAndWatch\" name:k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (18-Apr-2024 12:59:01.971) (total time: 30001ms):
Trace[1069569260]: ---\"Objects listed\" error:Get \"https://10.100.0.1:443/api/v1/configmaps?labelSelector=OWNER%21%3DTILLER&resourceVersion=639340267\": dial tcp 10.100.0.1:443: i/o timeout 30000ms (12:59:31.972)
Trace[1069569260]: [30.001006391s] [30.001006391s] END
E0418 12:59:31.972355       7 reflector.go:148] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: Get \"https://10.100.0.1:443/api/v1/configmaps?labelSelector=OWNER%21%3DTILLER&resourceVersion=639340267\": dial tcp 10.100.0.1:443: i/o timeout
W0418 12:59:32.073657       7 reflector.go:533] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get \"https://10.100.0.1:443/api/v1/services?resourceVersion=639340154\": dial tcp 10.100.0.1:443: i/o timeout
I0418 12:59:32.073932       7 trace.go:219] Trace[203255947]: \"Reflector ListAndWatch\" name:k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (18-Apr-2024 12:59:02.072) (total time: 30001ms):
Trace[203255947]: ---\"Objects listed\" error:Get \"https://10.100.0.1:443/api/v1/services?resourceVersion=639340154\": dial tcp 10.100.0.1:443: i/o timeout 30001ms (12:59:32.073)
Trace[203255947]: [30.001091746s] [30.001091746s] END
E0418 12:59:32.073989       7 reflector.go:148] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get \"https://10.100.0.1:443/api/v1/services?resourceVersion=639340154\": dial tcp 10.100.0.1:443: i/o timeout
W0418 12:59:32.188828       7 reflector.go:533] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Secret: Get \"https://10.100.0.1:443/api/v1/secrets?fieldSelector=%2Ctype%21%3Dhelm.sh%2Frelease.v1&resourceVersion=639340020\": dial tcp 10.100.0.1:443: i/o timeout
I0418 12:59:32.189005       7 trace.go:219] Trace[346248655]: \"Reflector ListAndWatch\" name:k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (18-Apr-2024 12:59:02.188) (total time: 30000ms):
Trace[346248655]: ---\"Objects listed\" error:Get \"https://10.100.0.1:443/api/v1/secrets?fieldSelector=%2Ctype%21%3Dhelm.sh%2Frelease.v1&resourceVersion=639340020\": dial tcp 10.100.0.1:443: i/o timeout 30000ms (12:59:32.188)
Trace[346248655]: [30.000680426s] [30.000680426s] END
E0418 12:59:32.189372       7 reflector.go:148] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Secret: failed to list *v1.Secret: Get \"https://10.100.0.1:443/api/v1/secrets?fieldSelector=%2Ctype%21%3Dhelm.sh%2Frelease.v1&resourceVersion=639340020\": dial tcp 10.100.0.1:443: i/o timeout
W0418 12:59:32.340149       7 reflector.go:533] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Ingress: Get \"https://10.100.0.1:443/apis/networking.k8s.io/v1/ingresses?resourceVersion=639340236\": dial tcp 10.100.0.1:443: i/o timeout
I0418 12:59:32.340236       7 trace.go:219] Trace[768964155]: \"Reflector ListAndWatch\" name:k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (18-Apr-2024 12:59:02.338) (total time: 30001ms):
Trace[768964155]: ---\"Objects listed\" error:Get \"https://10.100.0.1:443/apis/networking.k8s.io/v1/ingresses?resourceVersion=639340236\": dial tcp 10.100.0.1:443: i/o timeout 30001ms (12:59:32.340)
Trace[768964155]: [30.001520958s] [30.001520958s] END
E0418 12:59:32.340743       7 reflector.go:148] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Ingress: failed to list *v1.Ingress: Get \"https://10.100.0.1:443/apis/networking.k8s.io/v1/ingresses?resourceVersion=639340236\": dial tcp 10.100.0.1:443: i/o timeout

# ...

E0418 12:59:46.422583       7 leaderelection.go:327] error retrieving resource lock ingress-nginx/ingress-nginx-leader: Get \"https://10.100.0.1:443/apis/coordination.k8s.io/v1/namespaces/ingress-nginx/leases/ingress-nginx-leader\": dial tcp 10.100.0.1:443: i/o timeout

 # ...

W0418 13:00:03.894097       7 reflector.go:533] k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.ConfigMap: Get \"https://10.100.0.1:443/api/v1/configmaps?labelSelector=OWNER%21%3DTILLER&resourceVersion=639340267\": dial tcp 10.100.0.1:443: i/o timeout

# .... Repeat until the pod is killed

What you expected to happen:

I would expect the controller to reestablish the lease, without having to recreate the pod. As soon as the pod was recreated, the controller worked as expected.

NGINX Ingress controller version

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.9.3
  Build:         be93503b57a0ba2ea2e0631031541ca07515913a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

Kubernetes version

Client Version: v1.29.2
Server Version: v1.29.1-eks-b9c9ed7

Environment:

  • Cloud provider or hardware configuration:
    EKS 1.29

  • Install tools:
    EKS

  • Basic cluster related info:

  • How was the ingress-nginx-controller installed:

We took the 1.9.3 chart and modified for our needs, creating a second loadbalancer service for internal traffic

Controller Deployment Manifest ```yaml apiVersion: apps/v1 kind: Deployment metadata: labels: helm.sh/chart: ingress-nginx-4.8.2 app.kubernetes.io/name: ingress-nginx app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/version: "1.9.3" app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: controller name: ingress-nginx-controller namespace: ingress-nginx spec: selector: matchLabels: app.kubernetes.io/name: ingress-nginx app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/component: controller replicas: 1 revisionHistoryLimit: 10 minReadySeconds: 0 template: metadata: labels: helm.sh/chart: ingress-nginx-4.8.2 app.kubernetes.io/name: ingress-nginx app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/version: "1.9.3" app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: controller spec: dnsPolicy: ClusterFirst containers: - name: controller image: "registry.k8s.io/ingress-nginx/controller:v1.9.3@sha256:8fd21d59428507671ce0fb47f818b1d859c92d2ad07bb7c947268d433030ba98" imagePullPolicy: IfNotPresent lifecycle: preStop: exec: command: - /wait-shutdown args: - /nginx-ingress-controller - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller - --election-id=ingress-nginx-leader - --controller-class=k8s.io/ingress-nginx - --ingress-class=nginx - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller - --validating-webhook=:8443 - --validating-webhook-certificate=/usr/local/certificates/cert - --validating-webhook-key=/usr/local/certificates/key - --enable-ssl-passthrough securityContext: capabilities: drop: - ALL add: - NET_BIND_SERVICE runAsUser: 101 allowPrivilegeEscalation: true env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: LD_PRELOAD value: /usr/local/lib/libmimalloc.so livenessProbe: failureThreshold: 5 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 ports: - name: http containerPort: 80 protocol: TCP - name: https containerPort: 443 protocol: TCP - name: metrics containerPort: 10254 protocol: TCP - name: webhook containerPort: 8443 protocol: TCP volumeMounts: - name: webhook-cert mountPath: /usr/local/certificates/ readOnly: true resources: requests: cpu: 100m memory: 90Mi nodeSelector: kubernetes.io/os: linux serviceAccountName: ingress-nginx terminationGracePeriodSeconds: 300 volumes: - name: webhook-cert secret: secretName: ingress-nginx-admission ```
  • Current State of the controller:
    • kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.9.3
              helm.sh/chart=ingress-nginx-4.8.2
Annotations:  <none>
Controller:   k8s.io/ingress-nginx
Events:       <none>
  • kubectl -n <ingresscontrollernamespace> get all -o wide
NAME                                            READY   STATUS    RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
pod/ingress-nginx-controller-696d4ff7c5-f7cgj   1/1     Running   0          22h   172.32.184.183   ip-172-32-168-225.eu-west-1.compute.internal   <none>           <none>

NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP                                                                     PORT(S)                      AGE    SELECTOR
service/ingress-nginx-controller             LoadBalancer   10.100.17.246    adeexxx.elb.eu-west-1.amazonaws.com   80:30417/TCP,443:30515/TCP   513d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-admission   ClusterIP      10.100.249.207   <none>                                                                          443/TCP                      513d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-internal    LoadBalancer   10.100.228.144   a54cxxx.elb.eu-west-1.amazonaws.com   80:31515/TCP,443:32422/TCP   162d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-metrics     ClusterIP      10.100.25.177    <none>                                                                          10254/TCP                    84d    app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE    CONTAINERS   IMAGES                                                                                                                    SELECTOR
deployment.apps/ingress-nginx-controller   1/1     1            1           513d   controller   registry.k8s.io/ingress-nginx/controller:v1.9.3@sha256:8fd21d59428507671ce0fb47f818b1d859c92d2ad07bb7c947268d433030ba98   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                                  DESIRED   CURRENT   READY   AGE    CONTAINERS   IMAGES                                                                                                                    SELECTOR
replicaset.apps/ingress-nginx-controller-696d4ff7c5   1         1         1       22h    controller   registry.k8s.io/ingress-nginx/controller:v1.9.3@sha256:8fd21d59428507671ce0fb47f818b1d859c92d2ad07bb7c947268d433030ba98   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=696d4ff7c5
replicaset.apps/ingress-nginx-controller-7b8cc457d7   0         0         0       84d    controller   registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=7b8cc457d7
replicaset.apps/ingress-nginx-controller-bddc448b7    0         0         0       80d    controller   registry.k8s.io/ingress-nginx/controller:v1.9.3@sha256:8fd21d59428507671ce0fb47f818b1d859c92d2ad07bb7c947268d433030ba98   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=bddc448b7
  • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
Name:             ingress-nginx-controller-696d4ff7c5-f7cgj
Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             ip-172-32-168-225.eu-west-1.compute.internal/172.32.168.225
Start Time:       Thu, 18 Apr 2024 19:35:09 +0100
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress-nginx
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.9.3
                  helm.sh/chart=ingress-nginx-4.8.2
                  pod-template-hash=696d4ff7c5
Annotations:      kubectl.kubernetes.io/restartedAt: 2024-04-18T19:35:09+01:00
Status:           Running
IP:               172.32.184.183
IPs:
  IP:           172.32.184.183
Controlled By:  ReplicaSet/ingress-nginx-controller-696d4ff7c5
Containers:
  controller:
    Container ID:  containerd://672b2ac13c8eae21ea4e7003695dbfbf53fe34dfaf6cc16159d1b82492be1571
    Image:         registry.k8s.io/ingress-nginx/controller:v1.9.3@sha256:8fd21d59428507671ce0fb47f818b1d859c92d2ad07bb7c947268d433030ba98
    Image ID:      registry.k8s.io/ingress-nginx/controller@sha256:8fd21d59428507671ce0fb47f818b1d859c92d2ad07bb7c947268d433030ba98
    Ports:         80/TCP, 443/TCP, 10254/TCP, 8443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --enable-ssl-passthrough
    State:          Running
      Started:      Thu, 18 Apr 2024 19:35:16 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-696d4ff7c5-f7cgj (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5fdtk (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-5fdtk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>
  • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.9.3
                          helm.sh/chart=ingress-nginx-4.8.2
Annotations:              service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: true
                          service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: xxx
                          service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix: xxx
                          service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 60
                          service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: true
                          service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:eu-west-1:xxx:certificate/xxx
                          service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: ELBSecurityPolicy-TLS13-1-2-Res-2021-06
                          service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
                          service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=false
                          service.beta.kubernetes.io/aws-load-balancer-type: nlb
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.100.17.246
IPs:                      10.100.17.246
LoadBalancer Ingress:     adeexxx.elb.eu-west-1.amazonaws.com
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  30417/TCP
Endpoints:                172.32.184.183:80
Port:                     https  443/TCP
TargetPort:               http/TCP
NodePort:                 https  30515/TCP
Endpoints:                172.32.184.183:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason               Age                    From                Message
  ----    ------               ----                   ----                -------
  Normal  UpdatedLoadBalancer  2s (x1195 over 6d12h)  service-controller  Updated load balancer with new hosts

How to reproduce this issue:

Unknown

@satdeveloping satdeveloping added the kind/bug Categorizes issue or PR as related to a bug. label Apr 19, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 19, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@longwuyuan
Copy link
Contributor

/remove-kind bug

  • Post output of kubectl get events -A
  • Post logs of all the controller pods

Any idea why a re-election was triggered
Any other info at cluster level or on the nodes

/triage needs-information

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Apr 19, 2024
@satdeveloping
Copy link
Author

satdeveloping commented Apr 21, 2024

Post output of kubectl get events -A

All there is now is:

LAST SEEN   TYPE     REASON                OBJECT                                      MESSAGE
4m25s       Normal   UpdatedLoadBalancer   service/ingress-nginx-controller-internal   Updated load balancer with new hosts
4m24s       Normal   UpdatedLoadBalancer   service/ingress-nginx-controller            Updated load balancer with new hosts

Post logs of all the controller pods

There was only one controller pod at the time of the incident.

I tried looking at host logs for where the pod was scheduled. Between 2024-04-18T12:56:14 and 13:07:42 there was only one message:

{
    "host": "ip-172-32-107-61",
    "ident": "rsyslogd",
    "message": "imjournal: 182979 messages lost due to rate-limiting",
    "az": "eu-west-1a",
    "ec2_instance_id": "i-057706eb5156f775d"
}

There were some logs from the kubelet.service that look like it was restarting. It was printing out the configured flags and saying that "such-and-such Service is Starting".

Perhaps everything just got in to a twist?

The controller pod was Running long before and after the time of the incident

@strongjz
Copy link
Member

It looks like the controller couldn't reach the API server with all the dial tcp 10.100.0.1:443: i/o timeout

The restart fixed the issue; I'm going to close it for now; if it does happen again, feel free to reopen it.

@strongjz
Copy link
Member

/close

@k8s-ci-robot
Copy link
Contributor

@strongjz: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

4 participants