New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot force delete busted, empty statefulset #72598

Closed
ohthehugemanatee opened this Issue Jan 5, 2019 · 3 comments

Comments

Projects
None yet
3 participants
@ohthehugemanatee
Copy link

ohthehugemanatee commented Jan 5, 2019

What happened:
In playing with the metacontroller service per pod example for a blog post, I found I couldn't delete my StatefulSet.

I had applied

kubectl delete sts nginx gives a success message, but never returns me to the prompt. In a separate terminal - or if I SIGINT out - I can still describe the happy STS and its pods, with no indication that anything is terminating.

I thought I must have made a mistake in a finalizer or something similar, that blocked a pod from shutting down. But I could delete the pods individually, no problem. The STS stayed at Replicas: 3 desired | 0 total, without ever scaling back up. The pods status line was accurate, too: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed, so it doesn't seem like the STS was waiting on a phantom pod.

I eventually deleted everything else - all PVCs, PVs, pods, namespaces, services, deployments, and configmaps - and set replicas: 0 on the STS. Still, with no child objects to terminate, it hangs on delete. Force delete makes no difference.

I enabled diagnostic logs from the cluster (I'm on Azure), and I can validate that the HTTP DELETE is logged as received and returns a 200. But when I SSH'ed into the master node, even journalctl for the whole system doesn't get any output when the DELETE is sent.

What you expected to happen:

I expected the STS to be deleted, at least when using --force=true --grace-period=0.

How to reproduce it (as minimally and precisely as possible):
(see file links above)

# Create the statefulset.
$ kubectl apply -f statefulset-example-with-lb.yml
statefulset.apps/nginx created
service/nginx created

# Install metacontroller.
$ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/metacontroller/master/manifests/metacontroller-rbac.yaml
namespace/metacontroller created
serviceaccount/metacontroller created
clusterrole.rbac.authorization.k8s.io/metacontroller created
clusterrolebinding.rbac.authorization.k8s.io/metacontroller created
$ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/metacontroller/master/manifests/metacontroller.yaml
customresourcedefinition.apiextensions.k8s.io/compositecontrollers.metacontroller.k8s.io created
customresourcedefinition.apiextensions.k8s.io/decoratorcontrollers.metacontroller.k8s.io created
customresourcedefinition.apiextensions.k8s.io/controllerrevisions.metacontroller.k8s.io created
statefulset.apps/metacontroller created

# Create metacontroller hooks.
$ kubectl create configmap service-per-pod-hooks -n metacontroller --from-file=hooks
configmap/service-per-pod-hooks created

# Create service-per-pod service, deployment, decorator controllers from example code.
$ kubectl apply -f https://github.com/GoogleCloudPlatform/metacontroller/raw/master/examples/service-per-pod/service-per-pod.yaml
decoratorcontroller.metacontroller.k8s.io/service-per-pod created
decoratorcontroller.metacontroller.k8s.io/pod-name-label created
deployment.apps/service-per-pod created
service/service-per-pod created

# Try to delete the STS.
$ kubectl delete sts nginx
statefulset.apps "nginx" deleted
# Wait 30 minutes, then:
^C
$ kubectl delete sts nginx --force=true --grace-period=0
statefulset.apps "nginx" deleted
# Take your dog to the park, then:
^C
$ kubectl delete sts nginx --force=true --grace-period=0 -v=7
I0105 01:11:35.146909   20216 loader.go:359] Config loaded from file /home/ohthehugemanatee/.kube/config
I0105 01:11:35.148177   20216 loader.go:359] Config loaded from file /home/ohthehugemanatee/.kube/config
I0105 01:11:35.149683   20216 loader.go:359] Config loaded from file /home/ohthehugemanatee/.kube/config
I0105 01:11:35.153556   20216 loader.go:359] Config loaded from file /home/ohthehugemanatee/.kube/config
I0105 01:11:35.154574   20216 loader.go:359] Config loaded from file /home/ohthehugemanatee/.kube/config
I0105 01:11:35.155609   20216 loader.go:359] Config loaded from file /home/ohthehugemanatee/.kube/config
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
I0105 01:11:35.155840   20216 round_trippers.go:383] DELETE https://foobar.hcp.westeurope.azmk8s.io:443/apis/apps/v1/namespaces/default/statefulsets/nginx
I0105 01:11:35.155847   20216 round_trippers.go:390] Request Headers:
I0105 01:11:35.155851   20216 round_trippers.go:393]     Authorization: Bearer 1234567891011121314151617181920
I0105 01:11:35.155854   20216 round_trippers.go:393]     Content-Type: application/json
I0105 01:11:35.155857   20216 round_trippers.go:393]     Accept: application/json
I0105 01:11:35.155860   20216 round_trippers.go:393]     User-Agent: kubectl/v1.11.3 (linux/amd64) kubernetes/a452946
I0105 01:11:35.631186   20216 round_trippers.go:408] Response Status: 200 OK in 475 milliseconds
statefulset.apps "nginx" force deleted
I0105 01:11:35.638433   20216 round_trippers.go:383] GET https://foobar.hcp.westeurope.azmk8s.io:443/apis/apps/v1/namespaces/default/statefulsets/nginx
I0105 01:11:35.638470   20216 round_trippers.go:390] Request Headers:
I0105 01:11:35.638491   20216 round_trippers.go:393]     User-Agent: kubectl/v1.11.3 (linux/amd64) kubernetes/a452946
I0105 01:11:35.638526   20216 round_trippers.go:393]     Accept: application/json
I0105 01:11:35.638552   20216 round_trippers.go:393]     Authorization: Bearer 1234567891011121314151617181920
I0105 01:11:35.746535   20216 round_trippers.go:408] Response Status: 200 OK in 107 milliseconds
I0105 01:11:35.771051   20216 round_trippers.go:383] GET https://foobar.hcp.westeurope.azmk8s.io:443/apis/apps/v1/namespaces/default/statefulsets?fieldSelector=metadata.name%3Dnginx&resourceVersion=2557&watch=true
I0105 01:11:35.771112   20216 round_trippers.go:390] Request Headers:
I0105 01:11:35.771145   20216 round_trippers.go:393]     Accept: application/json
I0105 01:11:35.771171   20216 round_trippers.go:393]     User-Agent: kubectl/v1.11.3 (linux/amd64) kubernetes/a452946
I0105 01:11:35.771196   20216 round_trippers.go:393]     Authorization: Bearer 1234567891011121314151617181920
I0105 01:11:35.856290   20216 round_trippers.go:408] Response Status: 200 OK in 85 milliseconds
# Eat dinner, go for a nice walk afterwards, binge watch some Doctor Who, and check back. The command still hasn't returned.

Anything else we need to know?:

I think this is different from #65754 , because kubectl delete sts nginx -v=9 shows a DELETE request to /apis/apps/v1/namespaces/default/statefulsets/nginx, with a 200 SUCCESS response. No 404 error. Also my versions of kubectl and server are both 1.11.

I think this is different from #36333 , because the STS has no pods, and seems to have no trouble detecting that there are no pods left.

I think this is different from #59867, because I don't get any error messages that I can find.

Environment:

  • Kubernetes version (use kubectl version):
$ kubectl version              
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T18:02:47Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.4", GitCommit:"bf9a868e8ea3d3a8fa53cbb22f566771b3f8068b", GitTreeState:"clean", BuildDate:"2018-10-25T19:06:30Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

also replicated with client version 1.11.4, just in case.

  • Cloud provider or hardware configuration: Azure AKS
  • OS (e.g. from /etc/os-release): Ubuntu 18.04 (my client machine)
  • Kernel (e.g. uname -a): Linux simba 4.19.11-041911-generic #201812191931 SMP Wed Dec 19 19:33:33 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux (my client machine)
  • Install tools: downloaded kubectl binary directly.
  • Others:

/kind bug

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jan 5, 2019

@ohthehugemanatee: There are no sig labels on this issue. Please add a sig label by either:

  1. mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
    e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR

  2. specifying the label manually: /sig <group-name>
    e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. See the group list.
The <group-suffix> in method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Jan 5, 2019

Are there finalizers on the statefulset? (get from the apiserver with -o yaml to check)

If so, those must be removed by the controller responsible for them before the API object can be deleted. kubectl delete will wait for that to happen unless you specify --wait=false

@ohthehugemanatee

This comment has been minimized.

Copy link

ohthehugemanatee commented Jan 7, 2019

thank you @liggitt ! that was just the input I needed to find the problem: my DecoratorController had a sync webhook with a typo in the name. When I fixed the typo, everything resolved instantly.

For posterity: You can validate the finalizers on your statefulset with kubectl get sts nginx -o yaml, then you can check logs for those finalizers with kubectl logs -n metacontroller <finalizer pod name>.

Feature suggestion: add a state for STS "waiting for finalizers" (or simliar) that can make this kind of hang explicit. This was unnecessarily hard to track down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment