-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure Azure load balancer cleaned up on 404 or 403 #75256
Conversation
@feiskyer: GitHub didn't allow me to request PR reviews from the following users: weinong. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubernetes-e2e-aks-engine-azure |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: feiskyer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@andyzhangx Could you help to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
thanks @feiskyer, we just discovered that issue on 1.13.4. Some users set the wrong idle timeout (3 which isn't a valid value) and another user set a wrong resource group name. Both case, the service creation failed and the service got stuck in Deleting state with the same root cause. Users tried to delete recreate the same service (name + resourcegroup name == same) and the controller manager was ignoring the creation request since it was stuck trying to delete. After wiping the kube-controller-manager's memory (restart service) all was good. I'm guessing those PR/Cherry Pick are there to fix that exact issue ? |
@djsly Yep, cherry picking the fixes to all stable releases. |
…56-upstream-release-1.12 Automated cherry pick of #75256: Ensure Azure load balancer cleaned up on 404 or 403
…56-upstream-release-1.13 Automated cherry pick of #75256: Ensure Azure load balancer cleaned up on 404 or 403
…56-upstream-release-1.11 Automated cherry pick of #75256: Ensure Azure load balancer cleaned up on 404 or 403
What type of PR is this?
What this PR does / why we need it:
When deleting LoadBalancer services, Azure may return 404 or 403. This is usually caused by wrong annotations configured on the service spec.
Currently, an error is reported in
EnsureLoadBalancerDeleted()
, so that the service controller to retry deleting it again. However, then 404 or 403 is reported, this retry won't succeed. And if you create a new service with the same name, it will always fail.This PR fixes the issue by checking the response codes, and reports nil on 404 and 403.
Which issue(s) this PR fixes:
Fixes #75198
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
/sig azure
/kind bug
/priority critical-urgent
/milestone v1.14
/cc @andyzhangx @khenidak @weinong