Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force delete a service-catalog resource #666

Closed
rthallisey opened this issue Jan 19, 2018 · 10 comments
Closed

Force delete a service-catalog resource #666

rthallisey opened this issue Jan 19, 2018 · 10 comments
Labels
3.11 | release-1.3 Kubernetes 1.11 | Openshift 3.11 | Broker release-1.3 feature

Comments

@rthallisey
Copy link
Contributor

Feature:

If you fail to deprovision a resource using the service-catalog and broker, the resource will stay in the service-catalog and you can't delete it. This forces the user to rename the resource in the template or redeploy the service-catalog and broker in order to re run an APB.

Solving this in all scenarios isn't going to be in the broker's scope, but for a developer creating an apb, if there's a failure during the deprovision then I think it's worth deleting the service instance object.

Here's a way we can handle the resource deletion in dev mode:

  1. Log that a deprovision failed
  2. If the broker sees a failed apb attached to the service-instance, clean it up.
  3. Log that the broker may not have cleaned up the application that the apb launched. Mention that it's up to the user to clean it up.
  4. Send a success status code to the service-catalog for the deprovision

This behaviour can be trigger by a broker config option: force_delete.

kubectl delete service-instance --force issue:
kubernetes-retired/service-catalog#1551

@rthallisey rthallisey added feature 3.10 | release-1.2 Kubernetes 1.10 | Openshift 3.10 | Broker release-1.2 labels Jan 19, 2018
@eriknelson
Copy link
Contributor

eriknelson commented Jan 20, 2018

+1. If you think about the developer loop, most of them will try to deploy a failing APB 90% of the time. We need to figure out a setup that will accommodate this fact, and provide useful feedback when it happens.

@philipgough
Copy link
Contributor

@rthallisey Might take a go at implementing this if that's ok?

@jmrodri
Copy link
Contributor

jmrodri commented Feb 1, 2018

I believe this falls under orphan mitigation.

https://github.com/openservicebrokerapi/servicebroker/blob/master/spec.md#orphans

So being able to send a success status to the catalog is important.

@rthallisey
Copy link
Contributor Author

rthallisey commented Feb 1, 2018

@philipgough sure! Any issue that isn't assigned or doesn't have an open PR is free game. Just assign yourself so folks are aware who's working on it.

@philipgough
Copy link
Contributor

philipgough commented Feb 1, 2018

@rthallisey, @jmrodri I've read the api in regards to orphan mitigation. So I understand we can clean up within the broker with these failed objects. What I am unsure of his how this will be implemented from the client using the current async flow.

We just return a 202 currently from what I can see, when the job is succesfully kicked off so according to spec, this should be seen as a failure, even if provision results in success. So how is the client to handle this orphan mitigation? Any suggestions

@rthallisey
Copy link
Contributor Author

@philipgough let's not worry about orphan migration just yet. I think we need to figure out exactly what the broker is doing when a deprovision fails. From there, we can figure out how we'll signal to the catalog to delete the serviceinstance and how we'll handle deleting apb resources.

@rthallisey
Copy link
Contributor Author

rthallisey commented Feb 5, 2018

I did a little initial investigation. See if this helps @philipgough.

Here's my patch:

--- a/pkg/handler/handler.go
+++ b/pkg/handler/handler.go
@@ -434,15 +434,19 @@ func (h handler) deprovision(w http.ResponseWriter, r *http.Request, params map[
 
        resp, err := h.broker.Deprovision(serviceInstance, planID, nsDeleted, async)
 
+       log.Errorf("Logging Deprovision errors %s", err)
        if err != nil {
                switch err {
                case broker.ErrorNotFound:
+                       log.Error("ErrorNotFound")
                        writeResponse(w, http.StatusGone, broker.DeprovisionResponse{})
                        return
                case broker.ErrorBindingExists:
+                       log.Error("ErrorBindingExists")
                        writeResponse(w, http.StatusBadRequest, broker.DeprovisionResponse{})
                        return
                case broker.ErrorDeprovisionInProgress:
+                       log.Error("ErrorDeprovisionInProgress")
                        writeResponse(w, http.StatusAccepted, broker.DeprovisionResponse{})
                        return
                default:
@@ -450,8 +454,10 @@ func (h handler) deprovision(w http.ResponseWriter, r *http.Request, params map[
                        return
                }
        } else if async {
+               log.Error("Using async write response 202")
                writeDefaultResponse(w, http.StatusAccepted, resp, err)
        } else {
+               log.Error("ELSE write response 201")
                writeDefaultResponse(w, http.StatusCreated, resp, err)
        }
 }

When I ran the broker with this additional logging, I got the following output:
[2018-02-05T15:22:36.551Z] [ERROR] - Using async write response 202
After the deprovision runs:
[2018-02-05T15:22:36.988Z] [ERROR] - ErrorDeprovisionInProgress
This returns to the catalog a 201 http.StatusAccepted. Then, the deprovision_job errors but the handler never sees it.
[2018-02-05T15:22:47.722Z] [ERROR] - broker::Deprovision error occurred. Pod [ apb-475a0756-b980-427d-ab45-fc6d2eb49b7b ] failed with exit code [2]
No report from the handler. But, it's logged that the job state has failed and nothing happens.

[2018-02-05T15:23:37.799Z] [DEBUG] - service_id: af83f445d6aae48fbb4f58cca8368593
[2018-02-05T15:23:37.799Z] [DEBUG] - plan_id: 3bec808b3f9187629c19eb04a535a72e
[2018-02-05T15:23:37.799Z] [DEBUG] - operation:  bda04429-0e47-4e25-b2d1-1a70894cdb00
[2018-02-05T15:23:37.8Z] [DEBUG] - state: failed
[2018-02-05T15:23:37.8Z] [DEBUG] - job state has an error. Assuming that any error here is human readable. err - Error occurred during deprovision. Please contact administrator if it persists.

@philipgough
Copy link
Contributor

philipgough commented Feb 5, 2018

Hey @rthallisey yeah I've made a first pass at this already, I'll do initial test & push it up as soon as I get some free time and we can go from there

@rthallisey
Copy link
Contributor Author

Awesome @philipgough! I look forward to reviewing it.

@rthallisey
Copy link
Contributor Author

Temporary work around to cleanup a terminating namespace with a failed deprovisioned serviceinstance:

for i in $(oc get projects  | grep Terminating| awk '{print $1}'); do echo $i; oc get serviceinstance -n $i -o yaml | sed "/kubernetes-incubator/d"| oc apply -f - ; done

@rthallisey rthallisey added 3.11 | release-1.3 Kubernetes 1.11 | Openshift 3.11 | Broker release-1.3 and removed 3.10 | release-1.2 Kubernetes 1.10 | Openshift 3.10 | Broker release-1.2 labels Apr 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 | release-1.3 Kubernetes 1.11 | Openshift 3.11 | Broker release-1.3 feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants