Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl/drain lib leaks goroutines and waits forever for the pods to be deleted #81333

Open
vikaschoudhary16 opened this issue Aug 13, 2019 · 0 comments

Comments

@vikaschoudhary16
Copy link
Member

commented Aug 13, 2019

What happened:
Two issues in the drain lib:

  • Leaking goroutines:
    In the drain logic, a goroutine is run for evicting each pod running on the node. If the eviction fails with error "too many requests", goroutine continues to make eviction requests forever with a sleep of 5 sec. After global timeout,, Drain() returns but the goroutine does not return.
    If drain.Drain() is invoked again and again as repeated retries after global timeout, each time new goroutines gets created and thus increased number of eviction requests per second at apiserver.
    Ex: https://bugzilla.redhat.com/show_bug.cgi?id=1732929

  • Waits forever to verify for pod deletion:
    Here it waits forever for pod deletion. Sometimes pods cannot be deleted ex: statefulset pods or if the kubelet is down. In such cases, this will stuck and because of goroutine leaking issue, this becomes bigger problem.

What you expected to happen:

  • At global timeout, all eviction requesting goroutines return.
  • WaitForDelete waits for pod deletion only upto a reasonable time.

How to reproduce it (as minimally and precisely as possible):
Issue is more likely to occur naturally when drain lib is used from higher tooling like cluster-api. But can also be reproduced using kubectl:

  1. Create the pdb with minAvailable 1:
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      "app": "nginx"
  1. Create a replicaset with 1 replica and labels as "app": "nginx"
  2. use kubectl command to drain the node. This will stuck because of pdb. Run same command again and again from another shell sessions. Each kubectl command will result in more and more eviction request making goroutines.

Steps for reproducing second issue:

  1. Create a pod
  2. Make kubelet down
  3. run kubectl command to drain the node. It will stuck at waitForDelete, which waits forever

/kind bug
/sig CLI
/cc @eparis @smarterclayton @derekwaynecarr @justinsb @vincepri @michaelgugino @enxebre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.