-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controller failed to renew lease (leader election) due to timeout until pod restart #11287
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-kind bug
Any idea why a re-election was triggered /triage needs-information |
All there is now is: LAST SEEN TYPE REASON OBJECT MESSAGE
4m25s Normal UpdatedLoadBalancer service/ingress-nginx-controller-internal Updated load balancer with new hosts
4m24s Normal UpdatedLoadBalancer service/ingress-nginx-controller Updated load balancer with new hosts
There was only one controller pod at the time of the incident. I tried looking at host logs for where the pod was scheduled. Between {
"host": "ip-172-32-107-61",
"ident": "rsyslogd",
"message": "imjournal: 182979 messages lost due to rate-limiting",
"az": "eu-west-1a",
"ec2_instance_id": "i-057706eb5156f775d"
} There were some logs from the Perhaps everything just got in to a twist? The controller pod was Running long before and after the time of the incident |
It looks like the controller couldn't reach the API server with all the The restart fixed the issue; I'm going to close it for now; if it does happen again, feel free to reopen it. |
/close |
@strongjz: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
We deployed a new version of our application, at the ~same time, the ingress-nginx controller failed to renew a lease in the leader election. The controller was then unable to update the configmap. Our applications stopped receiving traffic.
When examining the logs of the controller, requests were being forwarded to pods that no longer existed.
Issuing the command
kubectl -n ingress-controller rollout restart deployment ingress-nginx-controller
resolved the issue.What you expected to happen:
I would expect the controller to reestablish the lease, without having to recreate the pod. As soon as the pod was recreated, the controller worked as expected.
NGINX Ingress controller version
Kubernetes version
Environment:
Cloud provider or hardware configuration:
EKS 1.29
Install tools:
EKS
Basic cluster related info:
How was the ingress-nginx-controller installed:
We took the 1.9.3 chart and modified for our needs, creating a second loadbalancer service for internal traffic
Controller Deployment Manifest
```yaml apiVersion: apps/v1 kind: Deployment metadata: labels: helm.sh/chart: ingress-nginx-4.8.2 app.kubernetes.io/name: ingress-nginx app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/version: "1.9.3" app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: controller name: ingress-nginx-controller namespace: ingress-nginx spec: selector: matchLabels: app.kubernetes.io/name: ingress-nginx app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/component: controller replicas: 1 revisionHistoryLimit: 10 minReadySeconds: 0 template: metadata: labels: helm.sh/chart: ingress-nginx-4.8.2 app.kubernetes.io/name: ingress-nginx app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/version: "1.9.3" app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/managed-by: Helm app.kubernetes.io/component: controller spec: dnsPolicy: ClusterFirst containers: - name: controller image: "registry.k8s.io/ingress-nginx/controller:v1.9.3@sha256:8fd21d59428507671ce0fb47f818b1d859c92d2ad07bb7c947268d433030ba98" imagePullPolicy: IfNotPresent lifecycle: preStop: exec: command: - /wait-shutdown args: - /nginx-ingress-controller - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller - --election-id=ingress-nginx-leader - --controller-class=k8s.io/ingress-nginx - --ingress-class=nginx - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller - --validating-webhook=:8443 - --validating-webhook-certificate=/usr/local/certificates/cert - --validating-webhook-key=/usr/local/certificates/key - --enable-ssl-passthrough securityContext: capabilities: drop: - ALL add: - NET_BIND_SERVICE runAsUser: 101 allowPrivilegeEscalation: true env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: LD_PRELOAD value: /usr/local/lib/libmimalloc.so livenessProbe: failureThreshold: 5 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 ports: - name: http containerPort: 80 protocol: TCP - name: https containerPort: 443 protocol: TCP - name: metrics containerPort: 10254 protocol: TCP - name: webhook containerPort: 8443 protocol: TCP volumeMounts: - name: webhook-cert mountPath: /usr/local/certificates/ readOnly: true resources: requests: cpu: 100m memory: 90Mi nodeSelector: kubernetes.io/os: linux serviceAccountName: ingress-nginx terminationGracePeriodSeconds: 300 volumes: - name: webhook-cert secret: secretName: ingress-nginx-admission ```kubectl describe ingressclasses
kubectl -n <ingresscontrollernamespace> get all -o wide
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
How to reproduce this issue:
Unknown
The text was updated successfully, but these errors were encountered: