Target does not get registered if using readinessGate #1205

vedatappsamurai · 2020-03-25T07:53:45Z

I've upgraded to v1.1.6 to make use of the pod readiness gate feature to reduce 502/504's during HPA scales. Then I've proceeded to update my deployment per this document.

After I updated my deployment to have the readiness gate, the first pod that had the readinessGate spec got registered by the controller just fine with following status:

status:
  conditions:
  - lastProbeTime: "2020-03-25T07:17:09Z"
    lastTransitionTime: "2020-03-25T07:13:48Z"
    status: "True"
    type: target-health.alb.ingress.k8s.aws/<ingress>_<service>_<service-port>

The following pod of the same deployment, had its status stuck like this:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-03-25T07:10:29Z"
    message: corresponding condition of pod readiness gate "target-health.alb.ingress.k8s.aws/<ingress>_<service>_<service-port>"
      does not exist.
    reason: ReadinessGatesNotReady
    status: "False"
    type: Ready

I've waited for a good 5 minutes to see that pod get registered into the target group, but it did not. I could reproduce this error by performing multiple rollovers for the same deployment. However if I delete the controller pod and let it restart, it recognizes these pods and registers them into the target group. After the registrations at this point, it stops to register pods again.

Is there anything wrong with the approach I took? Or is this an issue at controllers end?

The text was updated successfully, but these errors were encountered:

ivanmp91 · 2020-03-25T11:25:38Z

I'm not sure if it could be the same issue, but at least looks like I'm experimenting a similar behaviour. I also started to test the new 1.1.6 version in order to use pod readiness gate and for me it's happening the other way around, when a new pod is launched using readiness gate, it takes around 5-10 minutes to update the pod readiness gate status and register the new pod to the target group. Also I've found some cases where the pod never gets updated having to delete the pod to create a new one. At the time when a new pod is launched I got exactly the same status reported in this issue:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-03-25T11:13:10Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-03-25T11:13:10Z"
    message: corresponding condition of pod readiness gate "target-health.alb.ingress.k8s.aws/link-handling-puma_link-handling-puma_80"
      does not exist.
    reason: ReadinessGatesNotReady
    status: "False"
    type: Ready

Also during that period of time the readiness gate is being updated for the new pod, I've been checking the target group and the new pod is not being registered there and no activity is showed in the controller logs. Just after waiting several minutes, controller starts to generate this logs and it successfully updates the readiness gate status for the pod and register it to the target group:

I0325 11:21:09.174585       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"link-handling","Name":"link-handling-puma"}
I0325 11:21:10.580955       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"link-handling","Name":"link-handling-puma"}
I0325 11:21:51.091404       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-qa-web","Name":"web-qa-ckapp-web"}
I0325 11:21:51.591457       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-qa-web","Name":"admin-web-qa-ckapp-web"}
I0325 11:21:52.159385       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-web","Name":"admin-web-staging-ckapp-web"}
I0325 11:21:52.651093       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-web","Name":"web-staging-ckapp-web"}
I0325 11:21:52.855920       1 targethealth.go:149] link-handling/link-handling-puma: Starting reconciliation of pod condition status for target group: arn:aws:elasticloadbalancing:us-east-2:XXXXXX:targetgroup/dc11cbf1-537c2cee1b9ff018039/6a88d59c627c58b1
I0325 11:21:52.855952       1 targets.go:93] link-handling/link-handling-puma: Adding targets to arn:aws:elasticloadbalancing:us-east-2:XXXXXX:targetgroup/XXXXXX-XXXXXXXXXXXX: 10.11.93.9:80
I0325 11:21:53.305202       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"link-handling","Name":"link-handling-puma"}

I0325 11:22:13.285831       1 targethealth.go:174] link-handling/link-handling-puma: Stopping reconciliation of pod condition status for target group: arn:aws:elasticloadbalancing:us-east-2:XXXXXX:targetgroup/XXXXXX-XXXXXX/XXXXXX
I0325 11:22:13.768052       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"link-handling","Name":"link-handling-puma"}

As you can see in this sample it took exactly 8 minutes to register the new pod. If it helps, here it's the ingress and service definitions:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig":
      { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:XXXXXX:certificate/XXXXXX-XXXXXX,arn:aws:acm:us-east-2:XXXXXX:certificate/XXXXXX-XXXXXX,arn:aws:acm:us-east-2:XXXXXX:certificate/XXXXXX-XXXXXX,arn:aws:acm:us-east-2:XXXXXX:certificate/XXXXXX-XXXXXX
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "10"
    alb.ingress.kubernetes.io/healthcheck-path: /healthcheck
    alb.ingress.kubernetes.io/healthcheck-port: "80"
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
    alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "5"
    alb.ingress.kubernetes.io/healthy-threshold-count: "2"
    alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0
    alb.ingress.kubernetes.io/ip-address-type: ipv4
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
    alb.ingress.kubernetes.io/load-balancer-attributes: deletion_protection.enabled=true,routing.http2.enabled=true,idle_timeout.timeout_seconds=60
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/success-codes: "200"
    alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30,load_balancing.algorithm.type=round_robin
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/unhealthy-threshold-count: "3"
    kubernetes.io/ingress.class: alb
  labels:
    name: link-handling-puma
    namespace: link-handling
spec:
  rules:
  - http:
      paths:
      - backend:
          serviceName: ssl-redirect
          servicePort: use-annotation
        path: /*
      - backend:
          serviceName: link-handling-puma
          servicePort: 80
        path: /*

apiVersion: v1
kind: Service
metadata:
  labels:
  name: link-handling-puma
  namespace: link-handling
spec:
  clusterIP: 172.20.13.140
  ports:
  - name: puma
    port: 80
    protocol: TCP
    targetPort: nginx
  selector:
    app.kubernetes.io/component: puma
    app.kubernetes.io/instance: link-handling
    app.kubernetes.io/name: link-handling
  sessionAffinity: None
  type: ClusterIP

M00nF1sh · 2020-03-25T21:07:10Z

@vedatappsamurai @ivanmp91 would you help attach the logs from the controller pod during the deployment?
seems like a controller bug if pod are belong to same deployment get different behavior.

vedatappsamurai · 2020-03-26T07:06:49Z

I have the following deployment strategy on my deployment:

strategy:
  rollingUpdate:
    maxSurge: 25%
    maxUnavailable: 1
  type: RollingUpdate

And ALB Ingress Controller left the following logs when I started the deployment:

I0326 06:48:30.713813       1 targets.go:108] default/<ingress-name>: Removing targets from arn:aws:elasticloadbalancing:<region>:<account-id>:targetgroup/<target-group-id>: 10.2.42.0:80
I0326 06:48:38.065220       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"<ingress-name>"}
I0326 06:48:49.118161       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"<ingress-name>"}

Since I have maxUnavailable: 1 it deregisters the 1 pod that is being terminated by the deployment, but it never registers the new pod. I continuously run kubectl get pods -o wide to monitor the pod and the result is like this:

NAME                                             READY   STATUS    RESTARTS   AGE     IP            NODE                                      NOMINATED NODE   READINESS GATES
<pod-id>                                         2/2     Running   0          4m16s   10.2.4.113    <node-id>                                 <none>           0/1

And it shows the following status on describe as we mentioned before:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-03-26T06:48:30Z"
    message: corresponding condition of pod readiness gate "target-health.alb.ingress.k8s.aws/<ingress>_<service>_<service-port>"
      does not exist.
    reason: ReadinessGatesNotReady
    status: "False"
    type: Ready

At the time of my writing, I was waiting for readiness gate to see if it would pass, and it did, but it took 10 minutes. Notice the timestamps in the log:

I0326 06:48:30.713813       1 targets.go:108] default/<ingress-name>: Removing targets from arn:aws:elasticloadbalancing:<region>:<account-id>:targetgroup/<target-group-id>: 10.2.42.0:80
I0326 06:48:38.065220       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"<ingress-name>"}
I0326 06:48:49.118161       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"<ingress-name>"}
I0326 06:58:23.650176       1 targethealth.go:149] default/<ingress-name>: Starting reconciliation of pod condition status for target group: arn:aws:elasticloadbalancing:<region>:<account-id>:targetgroup/<target-group-id>
I0326 06:58:23.650203       1 targets.go:93] default/<ingress-name>: Adding targets to arn:aws:elasticloadbalancing:<region>:<account-id>:targetgroup/<target-group-id>: 10.2.4.113:80
I0326 06:58:31.211503       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"<ingress-name>"}
I0326 06:58:43.981059       1 targethealth.go:174] default/<ingress-name>: Stopping reconciliation of pod condition status for target group: arn:aws:elasticloadbalancing:<region>:<account-id>:targetgroup/<target-group-id>
I0326 06:58:49.713082       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"<ingress-name>"}

It removed the old target at 06:48:30.713813 but added the new target at 06:58:23.650203

ivanmp91 · 2020-03-26T11:16:03Z

In my case, I've just tested increasing the number of pod replicas by 1. Here you can see the status of the pods, which you can see the new pod is ready, up and running, but still the new pod doesn't have the readiness gate status updated:

➜ kubectl get pods -o wide
NAME                                     READY   STATUS    RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
link-handling-puma-fd676dc5c-72kwt       2/2     Running   0          23h     10.11.86.200   ip-10-11-88-116.us-east-2.compute.internal   <none>           1/1
link-handling-puma-fd676dc5c-hl857       2/2     Running   0          2m47s   10.11.28.255   ip-10-11-29-240.us-east-2.compute.internal   <none>           0/1

And this is the status condition of the new pod launched:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-03-26T10:49:47Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-03-26T10:49:47Z"
    message: corresponding condition of pod readiness gate "target-health.alb.ingress.k8s.aws/link-handling-puma_link-handling-puma_80"
      does not exist.
    reason: ReadinessGatesNotReady
    status: "False"
    type: Ready

After waiting for 23 minutes the readiness gate status was updated:

NAME                                     READY   STATUS    RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
link-handling-puma-fd676dc5c-72kwt       2/2     Running   0          23h   10.11.86.200   ip-10-11-88-116.us-east-2.compute.internal   <none>           1/1
link-handling-puma-fd676dc5c-hl857       2/2     Running   0          23m   10.11.28.255   ip-10-11-29-240.us-east-2.compute.internal   <none>           1/1

These are the full controller logs during all this amount of time, since I increased the replica count for the deployment by 1, until the readiness gate status was updated:

I0326 10:49:40.580117       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"link-handling","Name":"link-handling-puma"}
I0326 10:49:41.161758       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-web","Name":"admin-web-staging-ckapp-web"}
I0326 10:49:41.701416       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-web","Name":"web-staging-ckapp-web"}
I0326 10:49:42.205008       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-qa-web","Name":"web-qa-ckapp-web"}
I0326 10:49:42.732805       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-qa-web","Name":"admin-web-qa-ckapp-web"}
I0326 10:50:07.826808       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"link-handling","Name":"link-handling-puma"}
I0326 11:11:43.865086       1 targethealth.go:149] link-handling/link-handling-puma: Starting reconciliation of pod condition status for target group: arn:aws:elasticloadbalancing:us-east-2:XXXXXXXXXX:targetgroup/XXXXXXXXXX-XXXXXXXXXX/XXXXXXXXXX
I0326 11:11:43.865113       1 targets.go:93] link-handling/link-handling-puma: Adding targets to arn:aws:elasticloadbalancing:us-east-2:XXXXXXXXXX:targetgroup/XXXXXXXXXX-XXXXXXXXXX/XXXXXXXXXX: 10.11.28.255:80
I0326 11:11:44.563524       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"link-handling","Name":"link-handling-puma"}
I0326 11:11:45.144717       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-web","Name":"admin-web-staging-ckapp-web"}
I0326 11:11:45.650006       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-web","Name":"web-staging-ckapp-web"}
I0326 11:11:46.201310       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-qa-web","Name":"admin-web-qa-ckapp-web"}
I0326 11:11:46.705419       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"ckapp-qa-web","Name":"web-qa-ckapp-web"}
I0326 11:12:04.215478       1 targethealth.go:174] link-handling/link-handling-puma: Stopping reconciliation of pod condition status for target group: arn:aws:elasticloadbalancing:us-east-2:XXXXXXXXXX:targetgroup/XXXXXXXXXX-XXXXXXXXXX/XXXXXXXXXX
I0326 11:12:04.492053       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"link-handling","Name":"link-handling-puma"}

jtnz · 2020-03-27T05:11:42Z

I can confirm we have the same problem. New pods come up as part of the deployment but them controller doesn't seem to add them to the target group until some (seemingly) random period of time later (in the order of minutes).

This only applies to deployments using the readinessGate feature of v1.1.6.

Our unchanged deployments (that use the various health check/lifecycle hacks) are working as per usual.

cupello · 2020-03-30T20:19:40Z

Hi! I have the same problem and after some tests, I found that removing the readiness probe from my deployment fixes it. Is there some restriction for using readiness gates and readiness probe in the same deployment with ALB Ing Controller?

jorihardman · 2020-03-31T15:34:02Z

I can confirm the same issue. I created a new deployment (2 replicas total) and the first pod to come up in the replicaset got stuck in "ReadinessGatesNotReady" state. I tried removing my readinessprobes per @cupello, but that did not allow my deployment to progress. I rebooted the ingress controller which allowed that stuck pod to progress to ready. HOWEVER, the next pod to be booted in the replicaset then got stuck in the same state. It seems the new pod is not being registered as a target on the alb. Here is my controller logs after the first reboot:

-------------------------------------------------------------------------------
AWS ALB Ingress controller
  Release:    v1.1.6
  Build:      git-95ee2ac8
  Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git
-------------------------------------------------------------------------------
W0331 15:23:04.900241       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0331 15:23:08.061157       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0331 15:23:08.061353       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0331 15:23:08.061410       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"=
I0331 15:23:08.061535       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0331 15:23:08.061566       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"=
I0331 15:23:08.061671       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0331 15:23:08.061924       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"daemonEndpoints":{"kubeletEndpoint":{"Port":0}},"nodeInfo":{"machineID":"","systemUUID":"","bootID":"","kernelVersion":"","osImage":"","containerRuntimeVersion":"","kubeletVersion":"","kubeProxyVersion":"","operatingSystem":"","architecture":""}}}}
I0331 15:23:08.062127       1 leaderelection.go:205] attempting to acquire leader lease  kube-system/ingress-controller-leader-alb...
I0331 15:23:25.129355       1 leaderelection.go:214] successfully acquired lease kube-system/ingress-controller-leader-alb
I0331 15:23:25.129510       1 recorder.go:53] kubebuilder/manager/events "level"=1 "msg"="Normal"  "message"="alb-ingress-controller-6ff568d65c-ttg64_84750dc1-7363-11ea-a14e-66f84451d774 became leader" "object"={"kind":"ConfigMap","namespace":"kube-system","name":"ingress-controller-leader-alb","uid":"573e0c1e-1dbe-11ea-8bfb-0e1d9427aea1","apiVersion":"v1","resourceVersion":"118836213"} "reason"="LeaderElection"
I0331 15:23:25.231649       1 controller.go:134] kubebuilder/controller "level"=0 "msg"="Starting Controller"  "controller"="alb-ingress-controller"
I0331 15:23:25.331774       1 controller.go:154] kubebuilder/controller "level"=0 "msg"="Starting workers"  "controller"="alb-ingress-controller" "worker count"=1
I0331 15:23:29.442755       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"staging-graphql-api-ingress"}
I0331 15:23:29.746935       1 targethealth.go:149] default/staging-rails-server-web-ingress: Starting reconciliation of pod condition status for target group: arn:aws:elasticloadbalancing:us-east-1:XXXX:targetgroup/bd528925-9af27d2c227753e1a41/152a9fe94eb2ac9a
I0331 15:23:29.746964       1 targets.go:93] default/staging-rails-server-web-ingress: Adding targets to arn:aws:elasticloadbalancing:us-east-1:XXXX:targetgroup/bd528925-9af27d2c227753e1a41/152a9fe94eb2ac9a: 172.18.242.141:8080
I0331 15:23:30.415652       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"staging-rails-server-web-ingress"}
I0331 15:23:45.176710       1 targets.go:108] default/staging-rails-server-web-ingress: Removing targets from arn:aws:elasticloadbalancing:us-east-1:XXXX:targetgroup/bd528925-9af27d2c227753e1a41/152a9fe94eb2ac9a: 172.18.166.195:8080
I0331 15:23:45.176787       1 targethealth.go:174] default/staging-rails-server-web-ingress: Stopping reconciliation of pod condition status for target group: arn:aws:elasticloadbalancing:us-east-1:XXXX:targetgroup/bd528925-9af27d2c227753e1a41/152a9fe94eb2ac9a
I0331 15:23:45.625641       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"staging-rails-server-web-ingress"}
I0331 15:23:46.321509       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"staging-rails-server-web-ingress"}
I0331 15:23:47.055137       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"staging-rails-server-web-ingress"}

I noticed the "Stopping reconciliation of pod condition status for target group" at the end there - could that be related?

a6kme · 2020-04-02T14:46:25Z

I have the same problem. I overcome it by restarting alb-ingress-controller pod, whenever I notice rollouts getting stuck. Linking earlier issue #1131 that I raised some time ago.

Attaching some logs pre and post re-start of ALB Ingress Controller, which might give some hint.

Existing running pod and newly created pod of the deployment. New pod is not ready even after 6 minutes.

events-api-events-api-6f6945bcfd-ggm49                           1/1     Running     0          47h     10.10.92.192   ip-10-10-79-119.ec2.internal   <none>           1/1
events-api-events-api-75ddbc8bc7-2xq7d                           1/1     Running     0          6m37s   10.10.92.33    ip-10-10-79-119.ec2.internal   <none>           0/1

Pre-restart logs

I0402 13:55:05.722808       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"pushowl-backend","Name":"pushowl-backend-ingress-alb"}
I0402 13:55:06.377859       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"pushowl-backend","Name":"pushowl-backend-ingress-alb"}
W0402 13:58:43.365978       1 reflector.go:270] pkg/mod/k8s.io/client-go@v0.0.0-20181213151034-8d9ed539ba31/tools/cache/reflector.go:95: watch of *v1.Endpoints ended with: too old resource version: 5461814 (5462767)
W0402 14:13:21.370567       1 reflector.go:270] pkg/mod/k8s.io/client-go@v0.0.0-20181213151034-8d9ed539ba31/tools/cache/reflector.go:95: watch of *v1.Endpoints ended with: too old resource version: 5464961 (5465449)
W0402 14:26:36.375179       1 reflector.go:270] pkg/mod/k8s.io/client-go@v0.0.0-20181213151034-8d9ed539ba31/tools/cache/reflector.go:95: watch of *v1.Endpoints ended with: too old resource version: 5467439 (5467792)
I0402 14:34:42.674948       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"pushowl-backend","Name":"pushowl-backend-ingress-alb"}

Post restart logs

-------------------------------------------------------------------------------
AWS ALB Ingress controller
  Release:    v1.1.6
  Build:      git-95ee2ac8
  Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git
-------------------------------------------------------------------------------

W0402 14:43:19.783472       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0402 14:43:23.006003       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0402 14:43:23.006262       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0402 14:43:23.006318       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"=
I0402 14:43:23.006445       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"loadBalancer":{}}}}
I0402 14:43:23.006480       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"=
I0402 14:43:23.006595       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0402 14:43:23.006818       1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource"  "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{},"status":{"daemonEndpoints":{"kubeletEndpoint":{"Port":0}},"nodeInfo":{"machineID":"","systemUUID":"","bootID":"","kernelVersion":"","osImage":"","containerRuntimeVersion":"","kubeletVersion":"","kubeProxyVersion":"","operatingSystem":"","architecture":""}}}}
I0402 14:43:23.007038       1 leaderelection.go:205] attempting to acquire leader lease  kube-system/ingress-controller-leader-alb...
I0402 14:43:39.178794       1 leaderelection.go:214] successfully acquired lease kube-system/ingress-controller-leader-alb
I0402 14:43:39.178929       1 recorder.go:53] kubebuilder/manager/events "level"=1 "msg"="Normal"  "message"="alb-ingress-controller-7b76cf75b4-9xzng_4ba51478-74f0-11ea-9828-6a688ffcef0a became leader" "object"={"kind":"ConfigMap","namespace":"kube-system","name":"ingress-controller-leader-alb","uid":"39944e91-6457-11ea-92c8-12bcfda3de27","apiVersion":"v1","resourceVersion":"5473020"} "reason"="LeaderElection"
I0402 14:43:39.279089       1 controller.go:134] kubebuilder/controller "level"=0 "msg"="Starting Controller"  "controller"="alb-ingress-controller"
I0402 14:43:39.379248       1 controller.go:154] kubebuilder/controller "level"=0 "msg"="Starting workers"  "controller"="alb-ingress-controller" "worker count"=1
I0402 14:43:43.983815       1 targethealth.go:149] pushowl-backend/pushowl-backend-ingress-alb: Starting reconciliation of pod condition status for target group: arn:aws:elasticloadbalancing:us-east-1:480681643791:targetgroup/60a14aa0-ff40d97fa59e9381bb7/635fcfe05b0b0107
I0402 14:43:43.983842       1 targets.go:93] pushowl-backend/pushowl-backend-ingress-alb: Adding targets to arn:aws:elasticloadbalancing:us-east-1:480681643791:targetgroup/60a14aa0-ff40d97fa59e9381bb7/635fcfe05b0b0107: 10.10.92.33:8000
I0402 14:43:44.785796       1 controller.go:236] kubebuilder/controller "level"=1 "msg"="Successfully Reconciled"  "controller"="alb-ingress-controller" "request"={"Namespace":"pushowl-backend","Name":"pushowl-backend-ingress-alb"}

metacyclic · 2020-04-03T16:27:12Z

We are experiencing the same thing. Registration sometimes doesn't happen, and deregistration also sometimes doesn't happen, even when the pods have no running containers.

Restarting the ALB ingress controller pod seems to fix the issue, which is tough to manage during hpa scaling.

nirnanaaa · 2020-04-06T15:47:34Z

I discovered, that this is basically always true.

when new targets are added to a target group this is also somehow 0

alfredkrohmer · 2020-04-06T16:43:44Z

@nirnanaaa and I found the underlying issue.

When all containers in a pod have started, the pod IP appears in the NotReadyAddresses field of a corresponding Endpoints object. For this update of the Endpoints object, we trigger a reconciliation loop. However, we ignore it if not all containers in the pod are ready at this point in time (which can be the case if there is a readinessProbe defined). Then, if all containers in the pod become ready, the pod still doesn't count as ready (because the readiness gate is not fulfilled yet) - its IP will not be moved from NotReadyAddress to Addresses - the Endpoints object doesn't get updated - thus no reconcile is triggered again for this Endpoints object - new pods might just sit there until an another update happens on the Endpoints object (for whatever reason).

The solution to this would be to add a pod reconciler which triggers endpoints reconciliations for any associated Endpoints objects. However, this will likely have other implications for the controller (more CPU & memory requirements because it will keep all the pod objects in memory). @M00nF1sh any thoughts?

bpineau · 2020-04-06T17:27:05Z

@devkid We came to the same conclusion (as seen in #1214). I confirm we're back to fast rollouts with that change.

AFAIK we already have implicitly hooked informers on pods objects (see for instance the failures messages in #1209)

nirnanaaa · 2020-04-06T17:29:38Z

@bpineau your PR works very good. Testing this right now at scale

We're relying on endpoints events to re-trigger reconciliations during rollouts, and we're considering pod's containers status (eg. are all pod's containers ContainerReady?) to either act upon, or swallow those events. Both (pods changes, ep changes) can be out of sync. For instance: a starting pod whose containers aren't all ContainerReady might have its addresses registered to an ep subset's NotReadyAddresses, kicking a reconcile event which won't propagate to the TargetGroups (+ conditions updates) since the pod is evaluated as not ready. Further pod changes won't kick an endpoint change (due to readiness gates, the pod's address will stay in NotReadyAddresses until we do something). As probably seen in kubernetes-sigs#1205 . In order to react reliably on pods changes, we have to hook in a pod watch. Doing so is slightly expensive as we have to map pod -> [service ->] endpoint -> ingress on pods events, though we limit the search to pod's ns (services can only reference pods from their own ns, and ingress services from their ns).

alfredkrohmer · 2020-04-11T13:59:26Z

This should be fixed on master now.

casret · 2020-04-13T16:17:50Z

Can we get a point release for this bugfix? It affects us as well.

a6kme · 2020-04-14T13:31:25Z

@casret - FWIW, I built my own image from master branch by following build instructions, and it's working like a charm. Feel free to use it if you like. It is hosted at pushowl/aws-alb-ingress-controller:v1.1.6.

M00nF1sh · 2020-04-14T18:29:16Z

@casret we plan to do a release this week. will try to include #1211, and will release without it if we cannot merge before friday

bgbraga · 2020-04-15T11:36:26Z

@M00nF1sh please add this to this week release. Maybe it the most important fix.

metacyclic · 2020-04-20T18:16:45Z

Seems like this went into 1.1.7?

mailjunze · 2020-04-21T09:18:51Z

Fixed via : #1214

Release notes https://github.com/kubernetes-sigs/aws-alb-ingress-controller/releases/

fejta-bot · 2020-07-20T09:51:13Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-08-19T10:32:11Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

MaximF · 2020-08-26T01:10:14Z

Bumping this up.
I'm seeing that same issue with alb-ingress-controller 1.1.9 and EKS 1.17.
Basically those are the latest versions available as of the moment of writing.

larrywax · 2020-09-01T07:48:07Z

Experiencing the same with alb-ingress-controller v1.1.8 and EKS 1.17

clayvan · 2020-09-02T16:42:52Z

Pretty sure I'm still seeing this on alb-ingress-controller v1.1.9 and EKS 1.16.

clayvan · 2020-09-02T17:20:53Z

Well maybe I found something?

It turns out that if I use the named port like is described in the docs, then the target never gets added. However if I flip my annotation back to using the port number, it's all working as expected.

@M00nF1sh I don't think anything has changed regarding the port naming requirement has it? Is this expected behavior?

johnniee · 2020-09-15T12:52:13Z

Also having the same issue of '' with version aws-alb-ingress-controller:v1.1.8 & EKS 1.17
corresponding condition of pod readiness gate "target-health.alb.ingress.k8s.aws/".

fejta-bot · 2020-10-15T13:49:25Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-10-15T13:49:33Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wildersachin · 2020-10-17T04:00:46Z

/reopen

k8s-ci-robot · 2020-10-17T04:00:53Z

@wildersachin: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alamsyahho · 2021-08-26T15:40:58Z

For people that might still have the issue with readiness gate, it seems we have to label the namespace before we can use readiness gate. It's documented here but this documentation is not visible under the main branch

kishorj · 2021-08-26T16:09:27Z

@alamsyahho, pod readiness gate documentation is available in the live docs as well - https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/deploy/pod_readiness_gate/
From the live docs page https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/, click on the deployment tab on the top left section, and click on Pod Readiness Gate item on the left menu. If you do not see the tab on the live docs, you might have to click on the three dashes in the top left corner.

alamsyahho · 2021-08-26T16:28:51Z

@kishorj Thanks for pointing it out for me 👍

We're relying on endpoints events to re-trigger reconciliations during rollouts, and we're considering pod's containers status (eg. are all pod's containers ContainerReady?) to either act upon, or swallow those events. Both (pods changes, ep changes) can be out of sync. For instance: a starting pod whose containers aren't all ContainerReady might have its addresses registered to an ep subset's NotReadyAddresses, kicking a reconcile event which won't propagate to the TargetGroups (+ conditions updates) since the pod is evaluated as not ready. Further pod changes won't kick an endpoint change (due to readiness gates, the pod's address will stay in NotReadyAddresses until we do something). As probably seen in kubernetes-sigs#1205 . In order to react reliably on pods changes, we have to hook in a pod watch. Doing so is slightly expensive as we have to map pod -> [service ->] endpoint -> ingress on pods events, though we limit the search to pod's ns (services can only reference pods from their own ns, and ingress services from their ns).

bpineau mentioned this issue Apr 6, 2020

React on pod events for readiness gates #1214

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 19, 2020

k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 19, 2020

k8s-ci-robot closed this as completed Oct 15, 2020

dusansusic mentioned this issue Aug 15, 2022

Targets aren't registered until ALB ingress controller pods are restarted #2755

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Target does not get registered if using readinessGate #1205

Target does not get registered if using readinessGate #1205

vedatappsamurai commented Mar 25, 2020

ivanmp91 commented Mar 25, 2020 •

edited

M00nF1sh commented Mar 25, 2020 •

edited

vedatappsamurai commented Mar 26, 2020

ivanmp91 commented Mar 26, 2020

jtnz commented Mar 27, 2020

cupello commented Mar 30, 2020

jorihardman commented Mar 31, 2020

a6kme commented Apr 2, 2020

metacyclic commented Apr 3, 2020 •

edited

nirnanaaa commented Apr 6, 2020

alfredkrohmer commented Apr 6, 2020

bpineau commented Apr 6, 2020 •

edited

nirnanaaa commented Apr 6, 2020

alfredkrohmer commented Apr 11, 2020

casret commented Apr 13, 2020

a6kme commented Apr 14, 2020

M00nF1sh commented Apr 14, 2020

bgbraga commented Apr 15, 2020

metacyclic commented Apr 20, 2020

mailjunze commented Apr 21, 2020

fejta-bot commented Jul 20, 2020

fejta-bot commented Aug 19, 2020

MaximF commented Aug 26, 2020

larrywax commented Sep 1, 2020

clayvan commented Sep 2, 2020

clayvan commented Sep 2, 2020

johnniee commented Sep 15, 2020

fejta-bot commented Oct 15, 2020

k8s-ci-robot commented Oct 15, 2020

wildersachin commented Oct 17, 2020

k8s-ci-robot commented Oct 17, 2020

alamsyahho commented Aug 26, 2021

kishorj commented Aug 26, 2021

alamsyahho commented Aug 26, 2021

Target does not get registered if using readinessGate #1205

Target does not get registered if using readinessGate #1205

Comments

vedatappsamurai commented Mar 25, 2020

ivanmp91 commented Mar 25, 2020 • edited

M00nF1sh commented Mar 25, 2020 • edited

vedatappsamurai commented Mar 26, 2020

ivanmp91 commented Mar 26, 2020

jtnz commented Mar 27, 2020

cupello commented Mar 30, 2020

jorihardman commented Mar 31, 2020

a6kme commented Apr 2, 2020

metacyclic commented Apr 3, 2020 • edited

nirnanaaa commented Apr 6, 2020

alfredkrohmer commented Apr 6, 2020

bpineau commented Apr 6, 2020 • edited

nirnanaaa commented Apr 6, 2020

alfredkrohmer commented Apr 11, 2020

casret commented Apr 13, 2020

a6kme commented Apr 14, 2020

M00nF1sh commented Apr 14, 2020

bgbraga commented Apr 15, 2020

metacyclic commented Apr 20, 2020

mailjunze commented Apr 21, 2020

fejta-bot commented Jul 20, 2020

fejta-bot commented Aug 19, 2020

MaximF commented Aug 26, 2020

larrywax commented Sep 1, 2020

clayvan commented Sep 2, 2020

clayvan commented Sep 2, 2020

johnniee commented Sep 15, 2020

fejta-bot commented Oct 15, 2020

k8s-ci-robot commented Oct 15, 2020

wildersachin commented Oct 17, 2020

k8s-ci-robot commented Oct 17, 2020

alamsyahho commented Aug 26, 2021

kishorj commented Aug 26, 2021

alamsyahho commented Aug 26, 2021

ivanmp91 commented Mar 25, 2020 •

edited

M00nF1sh commented Mar 25, 2020 •

edited

metacyclic commented Apr 3, 2020 •

edited

bpineau commented Apr 6, 2020 •

edited