Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom resources with finalizers can "deadlock" customresourcecleanup.apiextensions.k8s.io finalizer #60538

Closed
liggitt opened this issue Feb 28, 2018 · 35 comments
Assignees
Labels
area/custom-resources kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Milestone

Comments

@liggitt
Copy link
Member

liggitt commented Feb 28, 2018

/kind bug
/sig api-machinery
/assign @sttts @nikhita

  1. Create a CRD
  2. Create a custom resource with a finalizer
  3. Delete the custom resource, leaving it pending finalization
  4. Delete the CRD, leaving it pending InstanceDeletionInProgress

As soon as the CRD enters InstanceDeletionInProgress, writes to custom resource objects are disallowed.

That means the finalizer cannot be removed, and the custom resource cannot be deleted, which blocks deletion of the CRD.

crd.yaml:

apiVersion: apiextensions.k8s.io/v1beta1 
kind: CustomResourceDefinition
metadata:
  name: crontabs.stable.example.com 
spec:
  group: stable.example.com 
  version: v1 
  scope: Namespaced 
  names:
    plural: crontabs 
    singular: crontab 
    kind: CronTab 
    shortNames:
    - ct 
---
apiVersion: "stable.example.com/v1" 
kind: CronTab 
metadata:
  name: my-new-cron-object 
  finalizers:
  - foo.bar.baz
spec: 
  cronSpec: "* * * * /5"
  image: my-awesome-cron-image
kubectl create -f crd.yaml
customresourcedefinition.apiextensions.k8s.io "crontabs.stable.example.com" created
crontab.stable.example.com "my-new-cron-object" created

kubectl delete -f crd.yaml 
customresourcedefinition.apiextensions.k8s.io "crontabs.stable.example.com" deleted
crontab.stable.example.com "my-new-cron-object" deleted

kubectl get -f crd.yaml -o yaml
apiVersion: v1
items:
- apiVersion: apiextensions.k8s.io/v1beta1
  kind: CustomResourceDefinition
  metadata:
    creationTimestamp: 2018-02-28T02:38:07Z
    deletionTimestamp: 2018-02-28T02:38:19Z
    finalizers:
    - customresourcecleanup.apiextensions.k8s.io
    generation: 1
    name: crontabs.stable.example.com
    namespace: ""
    resourceVersion: "1562"
    selfLink: /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/crontabs.stable.example.com
    uid: 6877df14-1c30-11e8-9148-acbc32c1ca87
  spec:
    group: stable.example.com
    names:
      kind: CronTab
      listKind: CronTabList
      plural: crontabs
      shortNames:
      - ct
      singular: crontab
    scope: Namespaced
    version: v1
  status:
    acceptedNames:
      kind: CronTab
      listKind: CronTabList
      plural: crontabs
      shortNames:
      - ct
      singular: crontab
    conditions:
    - lastTransitionTime: 2018-02-28T02:38:07Z
      message: no conflicts found
      reason: NoConflicts
      status: "True"
      type: NamesAccepted
    - lastTransitionTime: 2018-02-28T02:38:07Z
      message: the initial names have been accepted
      reason: InitialNamesAccepted
      status: "True"
      type: Established
    - lastTransitionTime: 2018-02-28T02:38:19Z
      message: CustomResource deletion is in progress
      reason: InstanceDeletionInProgress
      status: "True"
      type: Terminating
- apiVersion: stable.example.com/v1
  kind: CronTab
  metadata:
    clusterName: ""
    creationTimestamp: 2018-02-28T02:38:07Z
    deletionGracePeriodSeconds: 0
    deletionTimestamp: 2018-02-28T02:38:19Z
    finalizers:
    - foo.bar.baz
    generation: 2
    name: my-new-cron-object
    namespace: default
    resourceVersion: "1563"
    selfLink: /apis/stable.example.com/v1/namespaces/default/crontabs/my-new-cron-object
    uid: 687ac990-1c30-11e8-9148-acbc32c1ca87
  spec:
    cronSpec: '* * * * /5'
    image: my-awesome-cron-image
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

kubectl patch crontab/my-new-cron-object -p '{"metadata":{"finalizers":[]}}' --type=merge
Error from server (MethodNotAllowed): the server does not allow this method on the requested resource (patch crontabs.stable.example.com my-new-cron-object)
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 28, 2018
@liggitt

This comment was marked as outdated.

@liggitt liggitt changed the title Custom resources with finalizers can "deadlock" InstanceDeletionInProgress Custom resources with finalizers can "deadlock" customresourcecleanup.apiextensions.k8s.io finalizer Feb 28, 2018
@nikhita
Copy link
Member

nikhita commented Feb 28, 2018

An integration test for this would be a nice addition and a good task for new contributors.

@xmudrii can you work on this as your first issue? You can base your test on top of @liggitt's PR above.

Here are some things that can help you get started:

Feel free to ping me on Slack if you need any help with this! :)

@xmudrii
Copy link
Member

xmudrii commented Feb 28, 2018

@nikhita This sounds good to me. Thank you so much!

I'll take this one and I'll try to put up a PR soon.

@tpepper
Copy link
Member

tpepper commented Feb 28, 2018

What's the priority on this? High enough to push for current milestone?

@liggitt
Copy link
Member Author

liggitt commented Mar 1, 2018

Yes. The PR will be picked to existing release branches as well

@liggitt liggitt added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Mar 1, 2018
@liggitt liggitt added this to the v1.10 milestone Mar 1, 2018
k8s-github-robot pushed a commit that referenced this issue Mar 1, 2018
Automatic merge from submit-queue (batch tested with PRs 60542, 60237). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Allow update/patch of CRs while CRD is terminating

Fixes #60538

Update/patch need to be allowed so finalizers can act on custom resources for terminating CRDs

```release-note
Fixes potential deadlock when deleting CustomResourceDefinition for custom resources with finalizers
```
tamalsaha added a commit to voyagermesh/voyager that referenced this issue Mar 8, 2018
Also uses removed finalizers before deleting crds. kubernetes/kubernetes#60538
tamalsaha added a commit to voyagermesh/voyager that referenced this issue Mar 8, 2018
Also uses removed finalizers before deleting crds. kubernetes/kubernetes#60538
tamalsaha added a commit to voyagermesh/voyager that referenced this issue Mar 8, 2018
Also uses removed finalizers before deleting crds. kubernetes/kubernetes#60538
tamalsaha added a commit to voyagermesh/voyager that referenced this issue Mar 8, 2018
Also uses removed finalizers before deleting crds. kubernetes/kubernetes#60538
tamalsaha added a commit to voyagermesh/voyager that referenced this issue Mar 8, 2018
Also uses removed finalizers before deleting crds. kubernetes/kubernetes#60538
tamalsaha added a commit to voyagermesh/voyager that referenced this issue Mar 8, 2018
Also uses removed finalizers before deleting crds. kubernetes/kubernetes#60538
tamalsaha added a commit to voyagermesh/voyager that referenced this issue Mar 8, 2018
Also uses removed finalizers before deleting crds. kubernetes/kubernetes#60538
tamalsaha added a commit to voyagermesh/voyager that referenced this issue Mar 8, 2018
Also uses removed finalizers before deleting crds. kubernetes/kubernetes#60538
@msftcoderdjw
Copy link

having same issue while deleting crd of splunk of operator. Patching crd with empty finalizers throws error linke invalid json. Any pointers to troubleshoot.

Experienced the same issue (invalid json) on Windows. Switching to execute on Linux works.

@prprasad2020
Copy link

to get around this, I had to remove the customresourcecleanup.apiextensions.k8s.io finalizer from the CRD, then recreate it and go about removing the finalizers from the custom resources

# remove the CRD finalizer blocking on custom resource cleanup
kubectl patch crd/crontabs.stable.example.com -p '{"metadata":{"finalizers":[]}}' --type=merge

# now the CRD can be deleted (orphaning custom resources in etcd)
kubectl delete -f crd.yaml 
customresourcedefinition.apiextensions.k8s.io "crontabs.stable.example.com" deleted
Error from server (NotFound): error when deleting "crd.yaml": the server could not find the requested resource (delete crontabs.stable.example.com my-new-cron-object)

# when the CRD is recreated, it resurfaces the existing custom resources
kubectl create -f crd.yaml
customresourcedefinition.apiextensions.k8s.io "crontabs.stable.example.com" created
Error from server (AlreadyExists): error when creating "crd.yaml": object is being deleted: crontabs.stable.example.com "my-new-cron-object" already exists

# the custom resources can now be edited to remove finalizers
kubectl patch crontab/my-new-cron-object -p '{"metadata":{"finalizers":[]}}' --type=merge
crontab.stable.example.com "my-new-cron-object" patched

# and now both custom resource and CRD can be deleted
kubectl delete -f crd.yaml 
customresourcedefinition.apiextensions.k8s.io "crontabs.stable.example.com" deleted
Error from server (NotFound): error when deleting "crd.yaml": crontabs.stable.example.com "my-new-cron-object" not found

I was having issue with AgroCD's "applications.argoproj.io" crd. I accidently deleted the namespace and now this CRD is not getting deleted. This work around worked like a charm. Thank you !

kubectl patch crd/applications.argoproj.io -p '{"metadata":{"finalizers":[]}}' --type=merge

@liggitt
Copy link
Member Author

liggitt commented Nov 19, 2021

I was having issue with AgroCD's "applications.argoproj.io" crd. I accidently deleted the namespace and now this CRD is not getting deleted. This work around worked like a charm. Thank you !

kubectl patch crd/applications.argoproj.io -p '{"metadata":{"finalizers":[]}}' --type=merge

Manually removing the finalizer from the CRD is strongly discouraged, since it can leave orphaned custom resource data in etcd. Were you unable to remove the finalizers from the custom resources without short-circuiting the CRD cleanup? Do you have any logs or captured output of the attempts to do that?

@prprasad2020
Copy link

prprasad2020 commented Nov 22, 2021

Manually removing the finalizer from the CRD is strongly discouraged, since it can leave orphaned custom resource data in etcd. Were you unable to remove the finalizers from the custom resources without short-circuiting the CRD cleanup? Do you have any logs or captured output of the attempts to do that?

When I accidently deleted the namespace, I tried to find finalizers in the crd, but there was no such parameter in the crd. So I executed the patch command, then it got deleted. Then again I deployed this crd, and could not delete it again. Then I tried to edit it and was able to find the finalizer and I edited it. Now all are working fine. I'm not sure which logs I can provide you, if you can explain I'll be able to provide it. I'm trying this in my test cluster.

@dariuszbz
Copy link

HI. I'm not sure why did you close that. Could you point me where is the solution, please?

In my dev I had a problem with
{
"apiVersion": "http.keda.sh/v1alpha1",
"kind": "HTTPScaledObject",
"metadata": {
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{"apiVersion":"http.keda.sh/v1alpha1","kind":"HTTPScaledObject","metadata":{"annotations":{},"name":"k8sdemoingress","namespace":"k8sdemoingress"},"spec":{"host":"myhost.com","replicas":{"max":10,"min":5},"scaleTargetRef":{"deployment":"k8sdemoingress","port":80,"service":"k8sdemoingress"}}}\n"
},
"creationTimestamp": "2022-01-12T15:24:47Z",
"deletionGracePeriodSeconds": 0,
"deletionTimestamp": "2022-01-12T15:46:38Z",
"finalizers": [
"httpscaledobject.http.keda.sh"
],

As I don't have too much time to dig, I just removed the cluster :) LOL - and it worked :D hahahahaha

Seriously - it there better solution to remove deadlock customresource (or in general any resource) ?

Thank You.

@frosk1
Copy link

frosk1 commented Feb 14, 2022

!!!! IMPORTANT !!!!

This issue is still happening in 1.20.15_1568

former FIX from @liggitt still works:

remove the CRD finalizer blocking on custom resource cleanup

kubectl patch crd/name-of-the-CRD -p '{"metadata":{"finalizers":[]}}' --type=merge

@liggitt
Copy link
Member Author

liggitt commented Feb 14, 2022

former FIX from @liggitt still works:

I'll echo the warning and question from above:

Manually removing the finalizer from the CRD is strongly discouraged, since it can leave orphaned custom resource data in etcd. Were you unable to remove the finalizers from the custom resources without short-circuiting the CRD cleanup? Do you have any logs or captured output of the attempts to do that?

The reason the finalizer is not being removed from the CRD automatically is that the custom resource instances are still there.

@frosk1
Copy link

frosk1 commented Feb 14, 2022

I do not see any information on how to apply this:

Were you unable to remove the finalizers from the custom resources without short-circuiting the CRD cleanup?

@liggitt
Copy link
Member Author

liggitt commented Feb 14, 2022

  1. Locate remaining instances of the custom resource (not the CRD itself). They should all have have metadata.deletionTimestamp set, indicating the API server is trying to remove them as part of cleaning up the CRD.
kubectl get $mycustomtype -A -o yaml
  1. Find any that have metadata.finalizers set. That means something added a finalizer to that instance, indicating it wanted to block deletion of that instance, but has not yet removed its own finalizer.
apiVersion: mygroup.example.com/v1
kind: MyCustomType
metadata:
  name: foo
  deletionTimestamp: 2022-01-12T15:46:38Z
  finalizers:
  - myfinalizer.example.com
...
  1. If it is safe to override the finalizer on those instances, remove the finalizers from the custom resource instances (not the CRD):
kubectl patch $mycustomtype/foo -p '{"metadata":{"finalizers":[]}}' --type=merge

That will unblock deletion of those instances, which will unblock deletion of the CRD

@frosk1
Copy link

frosk1 commented Feb 14, 2022

Thx for the commands.

It is interesting, that it seems this does not unblock the deletion, maybe I forgot some custom resources, but I do not think so.

Indeed after some time overwriting the finalizer on the CRD solves the problem.

Is there a way to see remaining etcd entries via kubectl?

@frosk1
Copy link

frosk1 commented Feb 14, 2022

Addition:

I definitely forgot some custom resources. After removing the finalizers from all custom ressources of the CRD, the deletion has been unblocked!

Thx again @liggitt .

There should be a feature, having a kubectl comand that:

  1. automatically finds all custom ressources of a crd
  2. set their finalizer to []
  3. and then maybe also afterwards setting the finalizer of the CRD to []

something like a force delete from the client side. do you think this is necessary?

@liggitt
Copy link
Member Author

liggitt commented Feb 14, 2022

I'm pretty opposed to building in a footgun like that to kubectl... there's no telling what force-removing the finalizers will leak/break in the controllers that added them

@liggitt
Copy link
Member Author

liggitt commented Feb 14, 2022

it would probably be useful to improve the status reported on the CRD about which specific custom resource instances remain that are blocking the deletion (https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiextensions-apiserver/pkg/controller/finalizer/crd_finalizer.go#L232-L253)

@frosk1
Copy link

frosk1 commented Feb 14, 2022

yeah I got that, atm this issue should not be closed, because the deadlock in theory still could exist.

I am also thinking of something like a more sophisticated garbage collection, that invovles backtracing kubernetes objects from the point of creation and takes the deletion time stamps into account etc.

ah yes this is what I meant with that

it would probably be useful to improve the status reported on the CRD about which specific custom resource instances remain that are blocking the deletion (https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiextensions-apiserver/pkg/controller/finalizer/crd_finalizer.go#L232-L253)

@liggitt
Copy link
Member Author

liggitt commented Feb 14, 2022

yeah I got that, atm this issue should not be closed, because the deadlock in theory still could exist.

This issue was specifically about a bug in the API server that would prevent updates to the CR objects:

As soon as the CRD enters InstanceDeletionInProgress, writes to custom resource objects are disallowed.

@Bec-k
Copy link

Bec-k commented Mar 15, 2022

Still hitting deadlock in cert-manager v1.7.1 in Kubernetes v1.23.3

@frosk1
Copy link

frosk1 commented Mar 24, 2022

yes of course, I did not see any new issue for the deadlock that still is happening or any updates within a pull request for a suitable solution.

@liggitt any news?

@patelriki13
Copy link

  1. Locate remaining instances of the custom resource (not the CRD itself). They should all have have metadata.deletionTimestamp set, indicating the API server is trying to remove them as part of cleaning up the CRD.
kubectl get $mycustomtype -A -o yaml
  1. Find any that have metadata.finalizers set. That means something added a finalizer to that instance, indicating it wanted to block deletion of that instance, but has not yet removed its own finalizer.
apiVersion: mygroup.example.com/v1
kind: MyCustomType
metadata:
  name: foo
  deletionTimestamp: 2022-01-12T15:46:38Z
  finalizers:
  - myfinalizer.example.com
...
  1. If it is safe to override the finalizer on those instances, remove the finalizers from the custom resource instances (not the CRD):
kubectl patch $mycustomtype/foo -p '{"metadata":{"finalizers":[]}}' --type=merge

That will unblock deletion of those instances, which will unblock deletion of the CRD

After executed patch command I am getting following error. can anyone help me to force delete CRD.

Error from server (BadRequest): error decoding patch: json: cannot unmarshal string into Go value of type map[string]interface {}

My CRD Job Status is RESTARTING

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/custom-resources kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests