Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-controller-manager is stucking in deleting Azure LB because of wrong idle timeout set #75356

Closed
feiskyer opened this Issue Mar 14, 2019 · 6 comments

Comments

@feiskyer
Copy link
Member

feiskyer commented Mar 14, 2019

What happened:

When the Idle Timeout annotation was set to "3" which isn't a supported case, kube-controller-manager is stuck in deleting Azure LB. New service with the same name couldn't be created again even without the annotation.

What you expected to happen:

New service should be created successfully.

How to reproduce it (as minimally and precisely as possible):

kind: Service
apiVersion: v1
metadata:
  name: nginx
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-tcp-idle-timeout: "3"
spec:
  type: LoadBalancer
  loadBalancerIP: "51.145.136.123"
  ports:
    - name: http
      port: 80
      targetPort: http
    - name: https
      port: 443
      targetPort: https
  selector:
    app: nginx
kubectl create -f nginx.yaml
kubectl get svc

kubectl delete -f nginx.yaml
kubectl get svc

kubectl create -f nginx.yaml

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

/sig azure
/kind bug

@feiskyer feiskyer added the kind/bug label Mar 14, 2019

@feiskyer feiskyer added this to Backlog in Cloud Provider Azure via automation Mar 14, 2019

@yastij

This comment has been minimized.

Copy link
Member

yastij commented Mar 14, 2019

/assign

Since I'm seeing this on some of our clusters

@yastij

This comment has been minimized.

Copy link
Member

yastij commented Mar 14, 2019

/priority important-soon

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Mar 15, 2019

@yastij Thanks

@yastij

This comment has been minimized.

Copy link
Member

yastij commented Mar 19, 2019

@feiskyer - The problem here seems that when we want to call EnsureLoadBalancerDeleted (when processing a delete event) it reconciles the loadbalancer which in turn tries to get the idleTimeout, resulting in an error.

one of the strategies is to still construct our diff in case of a delete and exercise the dirtyLb code path at the end. This basically translate into ignoring the error when getting idleTimeout and excluding it from finding if the rule is dirty or not when we are !wantLb. toughts ?

Generally speaking, whenever we update a loadbalancer with unsupported values on annotations, ensuring load balancer will fail resulting in an un-cleaned LB, this is a cross-provider issue.

/sig cloud-provider

cc @andrewsykim @cheftako

@andrewsykim

This comment has been minimized.

Copy link
Member

andrewsykim commented Mar 19, 2019

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Mar 19, 2019

one of the strategies is to still construct our diff in case of a delete and exercise the dirtyLb code path at the end. This basically translate into ignoring the error when getting idleTimeout and excluding it from finding if the rule is dirty or not when we are !wantLb. toughts ?

lgtm. the delete operation should continue to do cleanup job even some wrong options are provided from users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.