Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to delete project/namespace due to waiting for SBO's finalizers #476

Closed
pmacik opened this issue May 19, 2020 · 15 comments
Closed

Unable to delete project/namespace due to waiting for SBO's finalizers #476

pmacik opened this issue May 19, 2020 · 15 comments
Assignees
Labels
kind/bug Something isn't working v0.3.0

Comments

@pmacik
Copy link
Contributor

pmacik commented May 19, 2020

After successfully executing the nodejs_postgresql scenario I attempted to delete the example's service-binding-demo project by executing the oc delete project service-binding-demo command.

I expected the project to be deleted after 10s of seconds (while it is supposed to delete the dependent resources (such as ImageStream, BuildConfig, Deployment, Database, ServiceBindingRequest,...) created during the example. But after ~10 minutes the project is still in the Terminating phase.

While inspecting the project I found it waiting for SBO's finalizers:

Some content in the namespace has finalizers remaining: finalizer.servicebindingrequest.openshift.io in 1 resource instances

apiVersion: project.openshift.io/v1
kind: Project
metadata:
  annotations:
    openshift.io/description: ""
    openshift.io/display-name: ""
    openshift.io/requester: system:admin
    openshift.io/sa.scc.mcs: s0:c24,c14
    openshift.io/sa.scc.supplemental-groups: 1000580000/10000
    openshift.io/sa.scc.uid-range: 1000580000/10000
  creationTimestamp: "2020-05-18T13:08:32Z"
  deletionTimestamp: "2020-05-18T14:39:43Z"
  name: service-binding-demo
  resourceVersion: "60881"
  selfLink: /apis/project.openshift.io/v1/projects/service-binding-demo
  uid: 412a7b80-7971-41ec-be02-5406390afd75
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2020-05-18T14:39:49Z"
    message: All resources successfully discovered
    reason: ResourcesDiscovered
    status: "False"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2020-05-18T14:39:49Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2020-05-18T14:39:49Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2020-05-18T14:39:49Z"
    message: 'Some resources are remaining: servicebindingrequests.apps.openshift.io
      has 1 resource instances'
    reason: SomeResourcesRemain
    status: "True"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2020-05-18T14:39:49Z"
    message: 'Some content in the namespace has finalizers remaining: finalizer.servicebindingrequest.openshift.io
      in 1 resource instances'
    reason: SomeFinalizersRemain
    status: "True"
    type: NamespaceFinalizersRemaining
  phase: Terminating
@pmacik pmacik changed the title Unable to delete namespace while waiting for SBO's finalazers Unable to delete namespace while waiting for SBO's finalizers May 19, 2020
@pmacik pmacik changed the title Unable to delete namespace while waiting for SBO's finalizers Unable to delete project/namespace while waiting for SBO's finalizers May 19, 2020
@pmacik pmacik changed the title Unable to delete project/namespace while waiting for SBO's finalizers Unable to delete project/namespace due to waiting for SBO's finalizers May 19, 2020
@pmacik pmacik added the kind/bug Something isn't working label May 19, 2020
@Avni-Sharma
Copy link
Contributor

I believe this is related to #384 as well

@matthewpwilson
Copy link

I've hit this too with my project. The status of the SBR is potentially interesting:

status:
  applications:
  - group: apps
    kind: Deployment
    name: barista-kafka
    version: v1
  conditions:
  - lastHeartbeatTime: "2020-05-22T10:11:29Z"
    lastTransitionTime: "2020-05-22T10:11:23Z"
    message: 'secrets "barista-kafka" is forbidden: unable to create new content in
      namespace coffeeshop-staging because it is being terminated'
    reason: BindingFail
    status: "False"
    type: Ready
  secret: barista-kafka

@isutton
Copy link
Contributor

isutton commented May 25, 2020

This happens, from where I can see, due to the operator being shut before it receives the resource deletion event required to remove the finalizer and unblock the garbage collector.

I wonder if enabling the --force option when deleting the resource might workaround the issue at hand, even if temporarily during test scenarios.

@sbose78 sbose78 added the v0.3.0 label Jun 23, 2020
@sbose78
Copy link
Member

sbose78 commented Jun 23, 2020

cc @pmacik @isutton could you please validate

@pmacik
Copy link
Contributor Author

pmacik commented Jun 24, 2020

cc @sbose78 I've just tried again with SBO v0.1.1-307 and with OpenShift v4.5.0-rc.2 and it's still there.

@pmacik
Copy link
Contributor Author

pmacik commented Jun 24, 2020

@isutton I even tried with the --force

$ oc delete project service-binding-demo --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.

but the workaround didn't work... the project is still in Terminating state after a while.

@Avni-Sharma
Copy link
Contributor

I also see that the random suffix appended to the test-namespace is not changing for me even when I change my clusters from 4.4 to 4.5.
I am getting the test namespace like this test-namespace-32cea2af-8e79-46f8-a48e-6ce3b074e60c irrespective of the cluster. The namespace name should ideally change

@pmacik
Copy link
Contributor Author

pmacik commented Jun 30, 2020

@Avni-Sharma the name (the suffix) of the namespace is generated and stored in the out/test-namespace file and is used until the file is deleted (for example by running make clean).
That is why it persists the change of a cluster.

@Avni-Sharma
Copy link
Contributor

Hi @pmacik If we add an ownerReference of the specific project in the sbr, something like

metadata:
  name: binding-request
  namespace: myproject
  ownerReferences:
  - apiVersion: project.openshift.io/v1
    kind: Project
    name: some-namespace
    uid: 9075567a-a948-419a-ac28-639f4ca8a2b0

Then IMO this issue won't happen as the finalisers of the SBO will be deleted.
Something like this was discussed in issue #542 where an app deployment was the owner of the sbr resource. On deletion of app deployment the SBO finalisers were waiting and not getting deleted. For which a fix was introduced.
Can you please try to provide an ownerRef of the project in the SBR resource and then delete the project to check whether this issue occurs?

@pmacik
Copy link
Contributor Author

pmacik commented Jul 31, 2020

@Avni-Sharma I've set the ownerReference of the project in the SBR but the issues still persists (I've checked with the version installed via operator hub (service-binding-operator.v0.1.1-352) and with CRC (v4.5.1):

$ oc get project test-namespace-6c1a8e6a -o jsonpath={.metadata.uid}
2c528c6f-df53-4aa4-86c2-7d288a9a6ade
$ oc get sbr binding-request-a-d-s -n test-namespace-6c1a8e6a -o json | jq -r '.metadata.ownerReferences'
[
  {
    "apiVersion": "project.openshift.io/v1",
    "kind": "Project",
    "name": "test-namespace-6c1a8e6a",
    "uid": "2c528c6f-df53-4aa4-86c2-7d288a9a6ade"
  }
]
$ oc get project test-namespace-6c1a8e6a 
NAME                      DISPLAY NAME   STATUS
test-namespace-6c1a8e6a                  Terminating
$ oc get sbr binding-request-a-d-s -n test-namespace-6c1a8e6a -o jsonpath='{.metadata.finalizers}'
[finalizer.servicebindingrequest.openshift.io]

@Avni-Sharma
Copy link
Contributor

That is strange as it works if the owner is a deployment 🤔 Maybe I am missing something with regard to the Namespace. Opened an issue operator-framework/operator-sdk#3625 for more input related to finalisers

@Avni-Sharma
Copy link
Contributor

I hope that we are not bitten by kubernetes/kubernetes#73098

@pedjak
Copy link
Contributor

pedjak commented Sep 8, 2020

Fixed in #639

@pedjak pedjak closed this as completed Sep 8, 2020
@matthewpwilson
Copy link

@pedjak did you really mean to close this? Looks like #639 is a fix for the tests, so don't see how it could resolve this issue with the operator itself unless I'm missing something?

@pedjak
Copy link
Contributor

pedjak commented Sep 9, 2020

@matthewpwilson our observation was that the issue happens when running the operator outside of the cluster and terminating it before removing SBRs in the namespace. This is typical for dev and testing workflows, and #639 fixed those. Are you able to reproduce the issue on a production setup, when operator runs inside cluster? Please share your steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working v0.3.0
Projects
None yet
Development

No branches or pull requests

6 participants