-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
Version: 1.1.2
We encountered a deadlock in ALB ingress controller due to rate limits (and likely the backoff adding to rate limit issues), which caused an outage.
The deadlock was caused by this:
E1015 14:45:11.579790 1 :0] kubebuilder/controller "msg"="Reconciler error" "error"="error getting web acl for load balancer arn:aws:elasticloadbalancing:us-east-1:062151437226:loadbalancer/app/fffb5690-default-broadcast-bf66/a84fdc267491f064: ThrottlingException: Rate exceeded\n\tstatus code: 400, request id: 9f03f8c3-f087-4c5c-88e3-545fb3ae5c47" "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"broadcaster-job-ui"}
This blocked the reconciler for 10 minutes.
During our rolling deploy of new pods, which normally are added and removed from the ALB one by one, instead due to the deadlock all of the pods became unhealthy until it updated them all at the same time.
I1015 14:45:16.861165 1 targets.go:80] default/accounts-rest: Adding targets to arn:aws:elasticloadbalancing:us-east-1:062151437226:targetgroup/fffb5690-77f48c3689d480d89a0/39213728bb743224: 10.128.0.90:3000, 10.128.15.161:3000, 10.128.9.102:3000
I1015 14:45:17.148664 1 targets.go:95] default/accounts-rest: Removing targets from arn:aws:elasticloadbalancing:us-east-1:062151437226:targetgroup/fffb5690-77f48c3689d480d89a0/39213728bb743224: 10.128.10.211:3000, 10.128.1.169:3000, 10.128.0.191:3000
Normally what you see is it adds a single target, then removes a single target, repeat until all three are done.
Metadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.