New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Force delete a service-catalog resource #751
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work. Minor nit
| @@ -441,6 +442,11 @@ parameters: | |||
| name: AUTO_ESCALATE | |||
| value: "false" | |||
|
|
|||
| - description: Allow the broker to prune services instances orphaned by failed deprovision | |||
| displayname: Force delete | |||
| name: FROCE_DELETE | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misspelling
|
@rthallisey couple of questions before we think about merging this:
Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea of doing a forced cleanup is good. I'm not entirely sure if we need a force_delete to do this. I almost feel like this should be the normal behavior. We should try to deprovision more than once during an error. If those fail, we still delete from etcd and log all errors that happened during the deprovision.
Is there ever a time we do not want to delete from etcd?
| @@ -298,6 +298,7 @@ objects: | |||
| ssl_cert_key: /etc/tls/private/tls.key | |||
| ssl_cert: /etc/tls/private/tls.crt | |||
| auto_escalate: ${AUTO_ESCALATE} | |||
| force_delete: ${FORCE_DELETE} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this value is not set here? Will it default to false? I would advocate not adding this to the simple template. If folks want this turned on they can edit it manually. I don't want this template to become unwieldy like the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be set to false by default from the params. See https://github.com/PhilipGough/ansible-service-broker/blob/b9f72c717d8ae3fe3321059daa2ee8c587c2e570/templates/simple-broker-template.yaml#L445-L448
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jmrodri on not adding this to the simple template if possible. The deploy-ansible-service-broker.template.yaml template is good for the more complicated stuff.
| @@ -722,6 +724,10 @@ func (a AnsibleBroker) Deprovision( | |||
| log.Info("Synchronous deprovision in progress") | |||
| _, err = apb.Deprovision(&instance) | |||
| if err != nil { | |||
| if a.brokerConfig.ForceDelete { | |||
| log.Infof("Deprovision failed. Attempting to clean up related resources. User should ensure clean up is successful") | |||
| cleanupDeprovision(&instance, a.dao) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're ignoring the err returned by cleanupDeprovision at a minimum we need to log it so we can figure out what was really going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do ignore the error from cleanupDeprovision but that function does take care of logging the error. Also we return the original error to the caller. Do you think this is sufficient?
|
@jmrodri thanks for your input. So what I took from the original issue was that this should be a config change enabled by the developer to enhance developer workflow. If this option is |
Re reading the open service broker api spec I found: I think you're right @jmrodri. I think the broker is not completely spec compliant on deprovision because we don't guarantee that all provision resources are cleaned up. I think this PR gets us most of the way there by allowing the service instance to be removed. I agree that this should be the default behaviour. |
|
@rthallisey @jmrodri ok, thanks guys. Looks like I need to refactor then with the flag removed and just force this behaviour. |
|
So having a think about this issue and specifically this from the spec:
To me this means we should clean up all the k8's objects that get created as part of the provision. However I see some issues with this. What happens if the deprovision playbook is broken? We have no way (currently) of tracking these resources. We wont have the context of what needs to be deleted. So going forward my suggestion for this pr is as follows if you guys agree on it.
Thoughts? |
Retrying could have some additional challenges. The broker has no idea what's in the playbook and retrying could trigger some weird path in the playbook. I think retrying is something worth looking into in the future, but let's have a separate discussion for that.
👍 . This will make it so can at least re provision the same service and we'll error really loudly so that the user is aware what happened. |
|
Highlights from IRC discussion:
|
Is there any benefit of avoiding the last operation endpoint? Might be worth continuing to process last_operation for logging.
I think the provisioned service should get removed in the UI. I think that's the service instance resource.
If the deprovision failed without cleaning up anything then it's up to the user to cleanup.
Ya let's test the Run function.
You do need a reference to that object. Makes sense to keep it as is. |
Sure, I was operating on the assumption that the playbook would be idempotent but I think this assumption is not a correct one to make.
I came back to this today, and this is in fact only cleaned up when the last operation handler returns a success. So wondering if its sufficient to return 200 from deprovision handler as I suggested above when the |
+1 @philipgough APBs should be idempotent. My comment was not entirely accurate. But let's do retries separately so this PR is easier to review.
Returning a 200 from deprovision when force_delete is set makes sense to me. |
|
@rthallisey I began writing unit tests for these changes today but realised I would need a further refactor to inject the I spoke to @maleck13 about this who pointed me at the #619 that he is working on. See the DI for job here and the related test here. The majority of the work is already done and after discussion @maleck13 felt it would be better to potentially merge this (if thats what you guys want to do), than me porting over his changes into this pr and causing lots of conflicts for his. Not sure how you want to progress but at this point I cant do much more with this pr. Works as expected I believe, with service instance cleaned up immediately in the ui and I was able to redeploy a failed apb |
|
@philipgough @eriknelson @shawn-hurley @rthallisey based on the discussion on this PR and the mailing list, I'd like to suggest we close this PR in favor of a proposal. Any objections? |
|
That's fine with me. We can always re open if we decide we want to use this solution. |
|
+1. I would like to pull back and state the problem in a proposal. |
Describe what this PR does and why we need it:
When a user deprovisions and the apb fails, they wont be able to redeploy without renaming template or tearing down environment. We want to allow the user to set some config which will help with the cleanup of resources left behind
Changes proposed in this pull request
force_deleteflag to broker configWhich issue this PR fixes (This will close that issue when PR gets merged)
fixes #666