Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ansible-Operator] - Unable to deleted namespace when CR of an Ansible operator has a Finalizer #1513

Closed
camilamacedo86 opened this issue Jun 3, 2019 · 11 comments
Labels
design kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@camilamacedo86
Copy link
Contributor

camilamacedo86 commented Jun 3, 2019

Bug Report

  • Unable to deleted namespace when CR of an Ansible operator has a Finalizer
  • It doesn't occur with Golang operators
  • It occurs always when has a finalizer and we try to delete the namespace.
  • The Ansible operator is able to remove the metadata of the finalizer when deleting is directly made for the CR and not the namespace. (oc delete -f deploy/crds/cache_v1alpha1_memcached_cr.yaml)
  • The issue occurs because when we run oc delete project <namespace> the operator has been deleted before the CR. (the issue is with the cascade/link of the resources generated )

What did you do?

---
- version: v1alpha1
  group: cache.example.com
  kind: Memcached
  role: /opt/ansible/roles/memcached
  finalizer:
    name: finalizer.cache.example.com
  • Create an namespace and apply the example.
$ oc new-project memcached

$ kubectl create -f deploy/crds/cache_v1alpha1_memcached_crd.yaml
$ kubectl create -f deploy/service_account.yaml
$ kubectl create -f deploy/role.yaml
$ kubectl create -f deploy/role_binding.yaml
$ kubectl create -f deploy/operator.yaml
$ kubectl create -f deploy/crds/cache_v1alpha1_memcached_cr.yaml
  • Try to delete the namespace
$ oc project delete memcached

What did you expect to see?
The CR + Operator + Namespace be deleted with success.

What did you see instead? Under which circumstances?
The namespace is marked to be deleted, the operator is deleted, but the CR is not which not allows the namespace to be deleted as well and is hanging it.

  • Reason: The operator has been deleted before the CR then it cannot remove the finalizer metadata from it which causes the bug.

  • Workarround: manual deletion of the finalizer metadata from the CR which would be made by the operator if it was not deleted first. E.g oc patch memcached example-memcached -p '{"metadata":{"finalizers": []}}' --type=merge

OR

  • Delete the CR before deleting the namespace for the operator be able to remove the finalizer metadata.oc delete -f deploy/crds/cache_v1alpha1_memcached_cr.yaml

Environment

  • operator-sdk version: 0.8.1
  • go version: go version go1.12.5 darwin/amd64
  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-03-01T23:34:27Z", GoVersion:"go1.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2019-05-02T11:52:09Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind: Minishift
  • Are you writing your operator in ansible, helm, or go? Ansible

Additional context
Following the images to illustrate the bug.

  • Namespaced marked to be deleted and hanged by the CR with finalizer.

Screenshot 2019-05-31 at 10 22 53

  • See that the operator was deleted before the CR.

Screenshot 2019-05-31 at 10 23 27

  • See that the CR still there with the finalizer metadata which should be removed by the operator.

Screenshot 2019-05-31 at 10 23 20

NOTE: Issue opened in order to make clear the scenario/bug raised in #1493

@camilamacedo86
Copy link
Contributor Author

camilamacedo86 commented Jun 3, 2019

Hi @estroz,

I closed #1503 because I think that it had too many comments that will be not helpful for we are able to address this scenario.

Unfortunately, shows that my comments/descriptions there were not clear enough for @rcernich and @jmazzitelli and/or they are facing another issue which should be reported separately.

Please, could you add here in this on the respective label? Really thank you for your time and support.

@camilamacedo86 camilamacedo86 changed the title CR with finalizer hang when the namespace is deleted because of Ansible operator is allowing the deletion of the operator before the deletion of the CR to be accomplished. [Ansible-Operator] CR with finalizer hang when the namespace is deleted because of the deletion of the operator occurs before of the CR to be deleted. Jun 3, 2019
@camilamacedo86 camilamacedo86 changed the title [Ansible-Operator] CR with finalizer hang when the namespace is deleted because of the deletion of the operator occurs before of the CR to be deleted. [Ansible-Operator] - Unable to deleted namespace when CR of an Ansible operator has a Finalizer Jun 3, 2019
@rcernich
Copy link

rcernich commented Jun 3, 2019

@camilamacedo86, sorry for the confusion. This issue is not the problem I was describing, but sounds like it is the issue that @jmazzitelli was having. If you want me to create an issue for the problem I was describing, I can do that. As I mentioned in the other issue, the problem I was describing was very difficult to track down the root cause. Once again, sorry for the confusion.

@camilamacedo86
Copy link
Contributor Author

camilamacedo86 commented Jun 3, 2019

Hi, @rcernich no problem at all

IMHO: It is better we have one issue for each issue :-) How much more clear we be able to provide the info better, easier and faster can be the fix 👍 So, please feel free to raise your scenario in a new issue.

Thank you in advance for your understanding and collaboration.

@estroz estroz added the language/ansible Issue is related to an Ansible operator project label Jun 3, 2019
@robbie-demuth
Copy link

I believe this is also a problem for Helm-based operators. Custom resources created from Helm CRDs have a uninstall-helm-release finalizer. If the custom resource is namespace-scoped, deleting the namespace deletes the controller at which point the finalizer on the custom resources in the namespace cannot be handled - causing the namespace to get stuck in a terminating state

@shawn-hurley
Copy link
Member

I believe it is a problem with all operators. Once the delete on the namespace is called, there is nothing that the operator could do.

@shawn-hurley shawn-hurley added design kind/feature Categorizes issue or PR as related to a new feature. and removed language/ansible Issue is related to an Ansible operator project labels Jun 19, 2019
@camilamacedo86
Copy link
Contributor Author

camilamacedo86 commented Jul 10, 2019

HI @shawn-hurley it worked 100% in the Go operator ones. Note that the diff between both is that when it works in Go the operator pod is not removed before the CR/CRD be deleted. I believe that it is the root cause, the operator has been deleted before the CR then it cannot remove the finalizer metadata from it which causes the bug in the ansible ones.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 8, 2019
@camilamacedo86
Copy link
Contributor Author

camilamacedo86 commented Oct 8, 2019

/remove-lifecycle stale

@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 7, 2019
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants