-
Notifications
You must be signed in to change notification settings - Fork 387
Deleting a ServiceInstance or ServiceBinding while async operation in progress fails #1587
Comments
I've got several issues reported against this in OpenShift. Discussed with @staebler 2 possible approaches: A. At completion of async provisioning, if deletion timestamp is set, don't update the reconciled generation. B. Store the generation used when the async operation starts and make that the reconciled version when it completes successfully. Matt said deletion is the only pending operation that should cause this problem - api server prevents all others from getting in line while async is pending. The 2nd option may be the right long term, but for short term, I'm pursuing option A. @kibbles-n-bytes I'd like to get your review and thoughts on this for an immediate approach. |
I'm okay with (A) for the short term. I agree (B) would be ideal long-term, and would come in handy as well if we ever wanted to support concurrent updates. An additional alternative would be to change our logic to bail out of the poll and start attempting to deprovision. That could be done by adding more complex logic to (... though, deprovision requires a plan_id, and in the update case we're back to not knowing which one to send... argh! 😭 ) |
I prefer bailing out of the last-operation polling when the object is deleted. I wasn't sure whether we wanted to disobey the guidance in the OSB API spec that says the following (https://github.com/openservicebrokerapi/servicebroker/blob/master/spec.md#polling-last-operation).
|
I've got (A) implemented and tested and was just adding the integration test for it. I'm concerned about (C), my feeling is many brokers won't be able to abort immediately although as kibbles wrote, I'd expect them all to eventually finish the provisioning and then successfully execute the deprovision on the next pass. @staebler, @kibbles-n-bytes, do you feel strongly about (C) over (A)? |
@jboyd01 I am good with (A) being the short-term solution. As for (C), even if a broker did finish the provision before accepting the deprovision request, we would be no worse off then we would be if service-catalog waited until the provision completed. The only difference would be that service-catalog would be sending deprovision requests--to which the broker would respond with 422 Unprocessable Entity--instead of sending poll-last-operation requests. |
@jboyd01 Same; I'm good with (A), and we should definitely get your changes in for it. @staebler Interesting. I agree it seems like we're going against the guideline, though I think it's fine in this case. There's also precedent with the maximum retry duration for the Platform aborting a poll loop and starting to deprovision in order to orphan mitigate. I think we should clarify the spec though to make it clear that Platforms can decide to abort polling for a good reason, but that they shouldn't just abort at the first sign of a poll error. |
…nc update (#1587) (#1708) * handle instance deletion that occurs during async provisioning (#1587) * updated processUpdateServiceInstanceFailure(), processUpdateServiceInstanceSuccess (), and processProvisionFailure(), corrected and supplemented test cases. * added/refined tests for delete during async instance operations * reworked termination of async operation to avoid potential race condition * fix race condition with a channel * use atomic int for signaling end of async operation * add TODO and refactor common code
Reopening. pull #1708 only addressed instances, async bindings still has this same problem. Additionally, instances was addressed with a quick fix (don't bump the rec generation if there is a pending delete) vs the agreed upon correct fix (persist the async operation's generation and set that at the completion). |
How about (D): with Then we wouldn't need to store a generation separately as proposed in (B). |
@nilebox I'm not yet up to speed on all the changes required for 1747. On the surface I agree, this looks like a good solution that doesn't require persisting the pending generation. |
When an async operation is in progress for a ServiceInstance or ServiceBinding, the Controller prioritizes polling the last operation over deleting the resource. If the async operation completes after the user has deleted the resource, then the reconciled generation will get set to the incorrect generation--that being the generation after the delete instead of the generation before the delete. When the Controller attempts to update the status of the resource to set the current operation to deprovision or unbind, the update will be rejected by the API Server because a current operation cannot be set when the reconciled generation equals the current generation.
Here is an integration test that fails but should succeed.
The text was updated successfully, but these errors were encountered: