Skip to content
This repository has been archived by the owner on May 6, 2022. It is now read-only.

handle instance deletion that occurs during async provisioning or async update (#1587) #1708

Conversation

jboyd01
Copy link
Contributor

@jboyd01 jboyd01 commented Jan 31, 2018

partially fixes #1587 (this only covers deleting an Instance during provisioning or update, it does not deal with async service bindings). I'll do the bindings in a follow up.

Don't bump the ReconciledGeneration after async provisioning or update of an instance if there is a pending deletion.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 31, 2018
@jboyd01 jboyd01 force-pushed the async-instance-prov-with-delete branch from 06ef7d2 to dcb55e1 Compare January 31, 2018 14:23
@staebler
Copy link
Contributor

@jboyd01 Don't we need to do the same thing for async updating?

if err := ct.client.ServiceInstances(ct.instance.Namespace).Delete(ct.instance.Name, &metav1.DeleteOptions{}); err != nil {
t.Fatalf("failed to delete instance: %v", err)
}
ct.osbClient.PollLastOperationReaction = &fakeosb.PollLastOperationReaction{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot change the OSB client reactions while there are reconciliations that could be running. In general, don't change the reactions outside of setup. If the response needs to be dynamic, use a DynamicPollLastOperationReaction.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 1, 2018
@jboyd01
Copy link
Contributor Author

jboyd01 commented Feb 1, 2018

@staebler thanks for the review and slack discussion. I agree with your points, I've got some updates here, but more is needed. Marking as a work in progress

@jboyd01 jboyd01 changed the title handle instance deletion that occurs during async provisioning (#1587) (WIP) handle instance deletion that occurs during async provisioning (#1587) Feb 1, 2018
Copy link
Contributor

@pmorie pmorie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently there is no way to delete a review; I'm changing this comment because I reviewed the PR thinking it was supposed to do something else during a sleep-deprived state.

@jboyd01 jboyd01 force-pushed the async-instance-prov-with-delete branch from 7c92aa7 to c59d259 Compare February 12, 2018 19:54
@jboyd01 jboyd01 force-pushed the async-instance-prov-with-delete branch from c59d259 to e30c78c Compare February 13, 2018 20:26
@jboyd01 jboyd01 changed the title (WIP) handle instance deletion that occurs during async provisioning (#1587) handle instance deletion that occurs during async provisioning or async update (#1587) Feb 13, 2018
Copy link
Contributor

@pmorie pmorie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with the following reservations:

  • We should consider in the future whether to store the generation an operation is for in the status
  • We also need this issue to be fixed for async binding
  • When we fix the issue for async binding, we probably ought to just store the in-progress generation, since async binding is alpha anyway
  • We should determine if we begin saving the in-progress generation for instances, whether a data migration is required

clearServiceInstanceCurrentOperation(instance)

if instance.DeletionTimestamp != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fine workaround for now but I do think that we should consider, as a follow-up to this, storing the generation that the in-progress operation is for, so that when we complete an operation we can set the reconciled generation correctly to the generation that started the operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should store a correct reconciledGeneration 😀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jboyd01 leave a TODO then?

@pmorie pmorie added LGTM1 and removed do-not-merge labels Feb 13, 2018
t.Fatalf("error waiting for instance to be updating asynchronously: %v", err)
}

if err := ct.client.ServiceInstances(ct.instance.Namespace).Delete(ct.instance.Name, &metav1.DeleteOptions{}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's a race here between the update completing and the deletion timestamp getting set. Instead of basing the response of the last operation on the number of polls that occurred, I think we should use a boolean (shouldFinishPolling?) that we toggle from false to true after issuing the delete request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @kibbles-n-bytes on this. So that it's clear, the same race exists in the other test.

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, with the same reservations as @pmorie and @kibbles-n-bytes from their reviews.

t.Fatalf("error waiting for instance to be updating asynchronously: %v", err)
}

if err := ct.client.ServiceInstances(ct.instance.Namespace).Delete(ct.instance.Name, &metav1.DeleteOptions{}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @kibbles-n-bytes on this. So that it's clear, the same race exists in the other test.

@jboyd01
Copy link
Contributor Author

jboyd01 commented Feb 14, 2018

Thanks for the reviews. Good point and that makes a much better test @kibbles-n-bytes, @staebler, thanks. I've updated both tests.

state := osb.StateInProgress
select {
// nonblocking check for a message to finish async provisioning, return inProgress otherwise
case <-done:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might still be a race here. If a reconciliation starts, and then the deletion timestamp is set, we'd pull true out of done but then fail to update the instance (as our resource is out of date). The next reconciliation loop, we'd then be back to the channel being empty.

I think a better gauge would be if the channel is closed or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kibbles-n-bytes, are you ok with instead moving the declaration and initialization of state := osb.StateInProgress up to the top of the Run() ? This way reading from the channel toggles us to a done state and we stay there for any additional polls.

Copy link
Contributor

@staebler staebler Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using an atomic int instead of a channel at all? I think that makes it clearer what is going on from a readability standpoint.

The reaction function reads the atomic int. If the value is 0, then the state is in-progress. Otherwise, the state is succeeded or failed. The run function sets the atomic int to 1 after making the delete.

clearServiceInstanceCurrentOperation(instance)

if instance.DeletionTimestamp != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should store a correct reconciledGeneration 😀

clearServiceInstanceCurrentOperation(instance)

if instance.DeletionTimestamp != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jboyd01 leave a TODO then?

@@ -1474,6 +1483,13 @@ func (c *controller) processProvisionFailure(instance *v1beta1.ServiceInstance,
err = fmt.Errorf(failedCond.Message)
} else {
clearServiceInstanceCurrentOperation(instance)

if instance.DeletionTimestamp != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you extract this code to a separate method instead of copy-pasting? it will make a cleanup in a follow-up PR much easier.

@kibbles-n-bytes kibbles-n-bytes added this to the 0.1.8 milestone Feb 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. LGTM1 LGTM2 size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deleting a ServiceInstance or ServiceBinding while async operation in progress fails
6 participants