-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There are too long delay between controller UPDATE operation and ModifyRule action. (which is causing downtime) #3588
Comments
@nessa829, hi I'm wondering how many resources(ingresses/services/TGs) are there in your VPC? since the controller utilizes an equeue to handle the CRUD events, it may have latency if there are a large scale of resources/events. For our latest version v2.7.1, we improved the performance of the controller by adding an ELB cache. Also, you can consider to enable the resourcegrouptagging api via the flag |
@oliviassss hi, thank you for the reply. I also tried rollout
As I wrote above, rollout ordered weight to be 100:0 on time, and lb-controller also received the request on time. I only have < 30 LBs, and < 300 TGs, < 100 Listener per LB. Can it be too much for LB APIs? Thank you. |
@nessa829, thanks, it should be fine with your scale level and all the mitigations. Would you be able to provide the controller logs during the time window you saw the issue for us to take a further look? Also, it would help if you can provide the arn. You can send out via email k8s-alb-controller-triage AT amazon.com; or reach out to me via Kubernetes slack oliviassss. Thanks |
@oliviassss Thank you for your interest. Thank you! |
@oliviassss It's the identical problem that I am having. When I dug deeper, I discovered that there is a shift delay from the load balancer side. Because @nessa829 I currently fixed this by disabling the |
Hi @Mufaddal5253110 @oliviassss @nessa829 I am also stuck with similar problem where as soon as canary deployment completes and last set of pods in stable rollout is marked for deletion, we see lot of 503s being thrown. In our case, we are supposed to use |
@deathsurgeon1 By adjusting the canary steps, you can reduce that, but not eliminate it as far as this issue is fixed from the ALB controller side. Here is the explained blog I have written explaining the entire scenario we faced and the solution we tried to fix it. |
Describe the bug
A concise description of what the bug is.
Ultimately, the problem seems to be that there is a time difference between the lb-controller's update operation request for weight change and the actual lb listner rule modification api.
I don't see any other errors in lb -controller logs, or limitation related issue.
Steps to reproduce
as above.
Expected outcome
A concise description of what you expected to happen.
no downtime when weight changed.
Environment
aws-load-balancer-controller:v2.6.2
1.27
Yes, 1.27
Additional Context:
The text was updated successfully, but these errors were encountered: