New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes namespaces stuck in terminating state #19317

Closed
paralin opened this Issue Jan 6, 2016 · 34 comments

Comments

Projects
None yet
@paralin
Copy link
Contributor

paralin commented Jan 6, 2016

I tried to delete some namespaces from my kubernetes cluster, but they've been stuck in Terminating state for over a month.

kubectl get ns
NAME              LABELS    STATUS        AGE
myproject         <none>    Active        12d
default              <none>    Active        40d
anotherproject  <none>    Terminating   40d
openshift         <none>    Terminating   40d
openshift-infra   <none>    Terminating   40d

The openshift namespaces were made as part of the example in this repo for running Openshift under Kube.

There's nothing in any of these namespaces (I used get on every resource type and they're all empty).

So what's holding up the terminate?

The kube cluster is healthy:

NAME                 STATUS    MESSAGE              ERROR
etcd-1               Healthy   {"health": "true"}   
controller-manager   Healthy   ok                   
scheduler            Healthy   ok                   
etcd-0               Healthy   {"health": "true"}   

The versions are:

Client Version: version.Info{Major:"1", Minor:"2+", GitVersion:"v1.2.0-alpha.4.208+c39262c9915b0b", GitCommit:"c39262c9915b0b1c493de66f37c49f3ef587cd97", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2+", GitVersion:"v1.2.0-alpha.4.166+d9ab692edc08a2", GitCommit:"d9ab692edc08a279396b29efb4d7b1e6248dfb60", GitTreeState:"clean"}

The server version corresponds to this commit: paralin@d9ab692

Compiled from source. Cluster was built using kube-up to GCE with the following env:

export KUBERNETES_PROVIDER=gce
export KUBE_GCE_ZONE=us-central1-b
export MASTER_SIZE=n1-standard-1
export MINION_SIZE=n1-standard-2
export NUM_MINIONS=3

export KUBE_ENABLE_NODE_AUTOSCALER=true
export KUBE_AUTOSCALER_MIN_NODES=3
export KUBE_AUTOSCALER_MAX_NODES=3

export KUBE_ENABLE_DAEMONSETS=true
export KUBE_ENABLE_DEPLOYMENTS=true

export KUBE_ENABLE_INSECURE_REGISTRY=true

Any ideas?

@ncdc

This comment has been minimized.

Copy link
Member

ncdc commented Jan 8, 2016

cc @derekwaynecarr. Do you think the namespace controller is in some sort of infinite loop?

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Jan 8, 2016

Can you paste the output for:

kubectl get namespace/openshift -o json

I assume openshift is no longer running on your cluster? Is there any
content in that namespace?

On Thursday, January 7, 2016, Andy Goldstein notifications@github.com
wrote:

cc @derekwaynecarr https://github.com/derekwaynecarr. Do you think the
namespace controller is in some sort of infinite loop?


Reply to this email directly or view it on GitHub
#19317 (comment)
.

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Jan 26, 2016

Smells like different components have different ideas about the finalizer list? Does rebooting controller-manager change anything?

@paralin

This comment has been minimized.

Copy link
Contributor

paralin commented Jan 26, 2016

kubectl get ns openshift -o json

{
    "kind": "Namespace",
    "apiVersion": "v1",
    "metadata": {
        "name": "openshift",
        "selfLink": "/api/v1/namespaces/openshift",
        "uid": "0a659292-94af-11e5-855c-42010af00002",
        "resourceVersion": "14645862",
        "creationTimestamp": "2015-11-27T02:32:01Z",
        "deletionTimestamp": "2015-12-25T03:20:25Z",
        "annotations": {
            "openshift.io/sa.scc.mcs": "s0:c6,c0",
            "openshift.io/sa.scc.supplemental-groups": "1000030000/10000",
            "openshift.io/sa.scc.uid-range": "1000030000/10000"
        }
    },
    "spec": {
        "finalizers": [
            "openshift.io/origin"
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}

Interestingly the finalizer is set to openshift.io/origin.

I tried deleting the finalizer out of the namespace using kubectl edit, but it still remains in another get operation.

@paralin

This comment has been minimized.

Copy link
Contributor

paralin commented Jan 26, 2016

This also happens with the one other namespace I manually created in OpenShift with the projects system:

Error from server: Namespace "dotabridge-dev" cannot be updated: The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.

I'm not actually using OpenShift anymore so these namespaces are pretty much stuck in my prod cluster until I can figure out how to get past this.

@paralin

This comment has been minimized.

Copy link
Contributor

paralin commented Jan 26, 2016

Deleted the controller-manager pod and the associated pause pod and restarted kubelet on the master. The containers were re-created, kubectl get cs shows everything as healthy, but the namespaces remain.

@davidopp davidopp added this to the v1.2 milestone Feb 4, 2016

@davidopp

This comment has been minimized.

Copy link
Member

davidopp commented Feb 4, 2016

@derekwaynecarr derekwaynecarr self-assigned this Feb 4, 2016

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Feb 4, 2016

@paralin - there is no code issue, but maybe I can look to improve in the openshift example clean-up scripts or document the steps. When you created a project in openshift, it created a namespace for that project, and annotated the namespace with a finalizer token that said before a namespace is deleted, an external agent needs to remove its lock on the object that says it was done clean-up. Since you are no longer running openshift, its agent did not remove the lock and take part in the termination flow.

A quick fix:

# find each namespace impacted
$ kubectl get namespaces -o json | grep "openshift.io/origin"
$ kubect get namespace <ns> -o json > temp.json
# vi temp.json and remove the finalizer entry for "openshift.io/origin"
# for example
{
    "kind": "Namespace",
    "apiVersion": "v1",
    "metadata": {
        "name": "testing",
        "selfLink": "/api/v1/namespaces/testing",
        "uid": "33074e57-cb72-11e5-9d3d-28d2444e470d",
        "resourceVersion": "234",
        "creationTimestamp": "2016-02-04T19:05:04Z",
        "deletionTimestamp": "2016-02-04T19:05:54Z"
    },
    "spec": {
        "finalizers": [
            "openshift.io/org"  <--- remove me
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}

$ curl -H "Content-Type: application/json" -X PUT --data-binary @temp.json http://127.0.0.1:8080/api/v1/namespaces/<name_of_namespace>/finalize
# wait a moment, and you should see your namespace removed
$ kubectl get namespaces 

That will remove the lock that blocks the namespace from being completely terminated, and you should quickly see that the namespace is removed from your system.

Closing the issue, but feel free to comment if you continue to have problems or hit me up on slack.

@bkmagnetron

This comment has been minimized.

Copy link

bkmagnetron commented Jul 12, 2016

I'm facing the same issue

# oc version
oc v1.3.0-alpha.2
kubernetes v1.3.0-alpha.1-331-g0522e63

I have deleted the project named "gitlab" via Openshift Origin web console. But it is not removed.

As said by @derekwaynecarr I did the following

# kubectl get namespace gitlab -o json > temp.json
# cat temp.json
{
    "kind": "Namespace",
    "apiVersion": "v1",
    "metadata": {
        "name": "gitlab",
        "selfLink": "/api/v1/namespaces/gitlab",
        "uid": "cd86c372-481e-11e6-aebc-408d5c676116",
        "resourceVersion": "3115",
        "creationTimestamp": "2016-07-12T10:53:01Z",
        "deletionTimestamp": "2016-07-12T11:11:36Z",
        "annotations": {
            "openshift.io/description": "",
            "openshift.io/display-name": "GitLab",
            "openshift.io/requester": "developer",
            "openshift.io/sa.scc.mcs": "s0:c8,c7",
            "openshift.io/sa.scc.supplemental-groups": "1000070000/10000",
            "openshift.io/sa.scc.uid-range": "1000070000/10000"
        }
    },
    "spec": {
        "finalizers": [
            "kubernetes"   <---removed
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}

and

# curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json https://10.28.27.65:8443/api/v1/namespaces/gitlab/finalize
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "User \"system:anonymous\" cannot update namespaces/finalize in project \"gitlab\"",
  "reason": "Forbidden",  <--- seems like nothing happened
  "details": {
    "name": "gitlab",
    "kind": "namespaces/finalize"
  },
  "code": 403
}

but it is removed.

@jchauncey

This comment has been minimized.

Copy link

jchauncey commented Jul 26, 2016

Im facing the same problem in GKE. Bouncing the cluster definitely fixes the issue (they are immediately terminated).

@linfan

This comment has been minimized.

Copy link

linfan commented Aug 2, 2016

I believe this issue still exist in v1.3 release.

Manually remove the finalizer doesn't seems to help.

$ curl -H "Content-Type: application/json" -X PUT --data-binary @temp.json http://ip-172-31-14-177:8080/api/v1/namespaces/limit/finalize
{
  "kind": "Namespace",
  "apiVersion": "v1",
  "metadata": {
    "name": "limit",
    "selfLink": "/api/v1/namespaces/limit/finalize",
    "uid": "caf5daa5-57f8-11e6-9e7e-0ad69bcef303",
    "resourceVersion": "10171",
    "creationTimestamp": "2016-08-01T15:01:14Z",
    "deletionTimestamp": "2016-08-02T04:30:24Z"
  },
  "spec": {},
  "status": {
    "phase": "Terminating"
  }

Several hours, it still remain.

$ kubectl get namespaces
NAME                STATUS         AGE
...                 ...
limit               Terminating   13h

Until I completely restart the master server, all "terminating" namespaces gone...

@monaka

This comment has been minimized.

Copy link

monaka commented Oct 10, 2016

Still v1.4.0 also...

$ kubectl get ns openmct -o json
{
    "kind": "Namespace",
    "apiVersion": "v1",
    "metadata": {
        "name": "openmct",
        "selfLink": "/api/v1/namespaces/openmct",
        "uid": "34124209-8e8d-11e6-8260-000d3a505da6",
        "resourceVersion": "11957259",
        "creationTimestamp": "2016-10-10T01:59:39Z",
        "deletionTimestamp": "2016-10-10T02:13:46Z",
        "labels": {
            "heritage": "deis"
        }
    },
    "spec": {
        "finalizers": [
            "kubernetes"
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:16:57Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0+coreos.2", GitCommit:"672d0ab602ada99c100e7f18ecbbdcea181ef008", GitTreeState:"clean", BuildDate:"2016-09-30T05:49:34Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
@jsloyer

This comment has been minimized.

Copy link

jsloyer commented Oct 10, 2016

im hitting the error with 1.3.5 as well....

$kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.5", GitCommit:"b0deb2eb8f4037421077f77cb163dbb4c0a2a9f5", GitTreeState:"clean", BuildDate:"2016-08-11T20:29:08Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.6", GitCommit:"ae4550cc9c89a593bcda6678df201db1b208133b", GitTreeState:"clean", BuildDate:"2016-09-22T01:52:27Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

@derekwaynecarr can we reopen this?

@monaka

This comment has been minimized.

Copy link

monaka commented Oct 17, 2016

At least in my case, it might be API issue...?

$ kubectl get ns | grep Terminating | wc -l
7

kube-apiserver:

E1017 01:55:35.954834       1 errors.go:63] apiserver received an error that is not an unversioned.Status: no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:55:35.959011       1 errors.go:63] apiserver received an error that is not an unversioned.Status: no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:55:36.772335       1 errors.go:63] apiserver received an error that is not an unversioned.Status: no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:55:37.248079       1 errors.go:63] apiserver received an error that is not an unversioned.Status: no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:55:38.254651       1 errors.go:63] apiserver received an error that is not an unversioned.Status: no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:55:38.584616       1 errors.go:63] apiserver received an error that is not an unversioned.Status: no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:55:39.171880       1 errors.go:63] apiserver received an error that is not an unversioned.Status: no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"

kube-controller-manager

E1017 01:50:36.002533       1 namespace_controller.go:163] no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:50:36.040668       1 namespace_controller.go:163] no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:50:37.066455       1 namespace_controller.go:163] no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:50:37.102275       1 namespace_controller.go:163] no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:50:38.229602       1 namespace_controller.go:163] no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:50:38.602775       1 namespace_controller.go:163] no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
E1017 01:50:39.181639       1 namespace_controller.go:163] no kind "DeleteOptions" is registered for version "net.alpha.kubernetes.io/v1alpha1"
@monaka

This comment has been minimized.

Copy link

monaka commented Nov 7, 2016

In my case, ThirdPartyResource had been kept on Etcd. Stucked namespaces was removed after deleting it like this.

etcdctl rm /registry/thirdpartyresources/default/network-policy.net.alpha.kubernetes.io
@zhouhaibing089

This comment has been minimized.

Copy link
Contributor

zhouhaibing089 commented Nov 22, 2016

The problem about thirdpartyresources is not the same as the original one, I think we need to create another new issue.

@zhouhaibing089

This comment has been minimized.

Copy link
Contributor

zhouhaibing089 commented Nov 22, 2016

created: #37278

@pidah

This comment has been minimized.

Copy link

pidah commented Nov 22, 2016

we are hitting this issue atm on 1.4.6;
edit: actually our issue is #37278

@hectorj2f

This comment has been minimized.

Copy link

hectorj2f commented Jan 9, 2017

we are hitting this issue atm on 1.5.2

#37554
#37278

@hectorj2f

This comment has been minimized.

Copy link

hectorj2f commented Jan 17, 2017

I am using v1.5.2 and the problem seems to be fixed. I am able to delete namespaces.

@nikhita

This comment has been minimized.

Copy link
Member

nikhita commented Mar 28, 2017

This is fixed in v1.5.2. Please see: #37278 (comment)

@paultiplady

This comment has been minimized.

Copy link

paultiplady commented Oct 19, 2017

I'm hitting this issue on v1.6.10-gke.1 -- ns stuck in 'terminating' after 1d. Looks like a regression. Manually deleting the finalizer fixed the problem. Here's a dump of my NS data;

$ kubectl get ns review-bugfix-cb-response-logging-qwil-2210 -o yaml                                             
apiVersion: v1                                                                 
kind: Namespace                                                                
metadata:                                                                      
  creationTimestamp: 2017-10-17T19:45:14Z                                      
  deletionTimestamp: 2017-10-18T22:56:53Z                                      
  name: review-bugfix-cb-response-logging-qwil-2210                            
  resourceVersion: "53696648"                                                  
  selfLink: /api/v1/namespacesreview-bugfix-cb-response-logging-qwil-2210      
  uid: b1fd43a1-b373-11e7-96b9-42010a80000b                                    
spec:                                                                          
  finalizers:                                                                  
  - kubernetes                                                                 
status:                                                                        
  phase: Terminating                                                           

$ kubectl describe ns review-bugfix-cb-response-logging-qwil-2210                                                                                                                                                                                                               
Name:           review-bugfix-cb-response-logging-qwil-2210                    
Labels:         <none>                                                         
Annotations:    <none>                                                         
Status:         Terminating                                                    

No resource quota.                                                             

No resource limits.                                   
@krmayankk

This comment has been minimized.

Copy link
Contributor

krmayankk commented Oct 20, 2017

i am seeing the same on 1.7.4 as well. Is the root cause the presence of tpr resources ?

@paultiplady

This comment has been minimized.

Copy link

paultiplady commented Oct 20, 2017

I did have a TPR in all of my stuck namespaces.

@krmayankk

This comment has been minimized.

Copy link
Contributor

krmayankk commented Oct 22, 2017

@nikhita what are the symptoms to look for to make sure the terminating issue is indeed due to tpr resources ? I am seeing this issue in 1.7.4 and want to eliminate that option . Any log line to look for ?

@nikhita

This comment has been minimized.

Copy link
Member

nikhita commented Nov 2, 2017

@krmayankk Sorry for the late reply. You can find some info here on how it is fixed: #37554 (comment).

@nikhita

This comment has been minimized.

Copy link
Member

nikhita commented Nov 2, 2017

Related #55002, #37554.

@attila123

This comment has been minimized.

Copy link

attila123 commented Mar 10, 2018

Hi, I got the same problem with Kubernetes 1.9.0 running by Minikube v0.24.1.
I started too many pods my laptop kept swapping and not responding. So I powered that off.
Upon restart lots of pods stucked, but I could delete them with kubectl -n some_namespace delete pod --all --grace-period=0 --force (first I deleted all the deployments, services, etc.)
I also installed rook (0.7.0) and its namespace get stuck, even after minikube restart (stop + start).

[vagrant@localhost ~]$ kubectl get ns rook -o json
{
    "apiVersion": "v1",
    "kind": "Namespace",
    "metadata": {
        "annotations": {
            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"rook\",\"namespace\":\"\"}}\n"
        },
        "creationTimestamp": "2018-03-09T16:13:08Z",
        "deletionTimestamp": "2018-03-10T07:54:03Z",
        "name": "rook",
        "resourceVersion": "29736",
        "selfLink": "/api/v1/namespaces/rook",
        "uid": "c1a4f8e6-23b4-11e8-8129-525400ad3b43"
    },
    "spec": {
        "finalizers": [
            "kubernetes"
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}
[vagrant@localhost ~]$ kubectl delete ns rook --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
Error from server (Conflict): Operation cannot be fulfilled on namespaces "rook": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.

I needed to move on with my work, so I stopped minikube, stopped the host VM, created a full clone of it in Virtualbox, so if there is anything I can check, any log, etc., I can provide it.
Then I minikube delete-ed my cluster. If I remember correctly, no resource was present in the rook namespace.

@davidmaitland

This comment has been minimized.

Copy link

davidmaitland commented Mar 15, 2018

@attila123 I was having the same issue. In the case of rook it creates a finalizer which was causing it to get stuck.

Directly from the cleanup docs:

The operator is responsible for removing the finalizer after the mounts have been cleaned up. If for some reason the operator is not able to remove the finalizer (ie. the operator is not running anymore), you can delete the finalizer manually.

kubectl -n rook edit cluster rook

Look for the finalizers element and delete the following line:

  - cluster.rook.io

Now save the changes and exit the editor. Within a few seconds you should see that the cluster CRD has been deleted and will no longer block other cleanup such as deleting the rook namespace.

@iftachsc

This comment has been minimized.

Copy link

iftachsc commented Apr 6, 2018

following @derekwaynecarr workaround - which helped me allot - wrote this script which will delete all terminating projects. sort of cleanup script. please someone tell me, i am one RedHat Openshift Enterprise! 3.7 and facing this issue. its so stupid that even i will just create an project and immediatley will delete it i will stumble on this issue. isnt this crazy?!?!!? the enterprise version cost fortune.
please just tell me i am not crazy.

this is the script; (uses jq, tested with redhat openshift ocp 3.7)

kubectl proxy &
serverPID=$!
for row in $(oc get ns -o json | jq -r '.items[] | select(.status.phase=="Terminating") | .metadata.name'); do
echo "force deleting name space ${row}"
oc project $row
oc delete --all all,secret,pvc > /dev/null
oc get ns $row -o json > tempns
sed -i '' '/"kubernetes"/d' ./tempns
curl --silent --output /dev/null -H "Content-Type: application/json" -X PUT --data-binary @tempns http://127.0.0.1:8001/api/v1/namespaces/$row/finalize
done
kill -9 $serverPID

@yevgeniyo

This comment has been minimized.

Copy link

yevgeniyo commented May 10, 2018

Hi, I got this:

curl -H "Content-Type: application/json" -X PUT --data-binary @temp.json http://127.0.0.1:8001/api/v1/namespaces/cattle-system/finalize/
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "Operation cannot be fulfilled on namespaces "cattle-system": the object has been modified; please apply your changes to the latest version and try again",
"reason": "Conflict",
"details": {
"name": "cattle-system",
"kind": "namespaces"
},
"code": 409

UPDATE: for those who got it, remove in json all line:

"resourceVersion": "somenumber",

And rerun curl

@akram

This comment has been minimized.

Copy link
Contributor

akram commented May 15, 2018

I faced this issue while running oadm diagnostics NetworkChecks

the diagnostic created many projects and kept the even after successful run.
Deletion was not possible, and projects were remaining in Terminating state

The API call with curl worked also

@medanasslim

This comment has been minimized.

Copy link

medanasslim commented May 16, 2018

I have already the same problem. I restarted the virtual machine, and also run "kubectl delete ns ***** --force --grace-period=0".
The namespace concerned is still stucking in Terminating.

@yan234280533

This comment has been minimized.

Copy link
Contributor

yan234280533 commented Nov 1, 2018

i found that ,my event etcd is crash. So i restart my event etcd ,the namespaces are deleted soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment