handle instance deletion that occurs during async provisioning or async update (#1587) #1708
handle instance deletion that occurs during async provisioning or async update (#1587) #1708
Conversation
06ef7d2
to
dcb55e1
Compare
@jboyd01 Don't we need to do the same thing for async updating? |
test/integration/controller_test.go
Outdated
if err := ct.client.ServiceInstances(ct.instance.Namespace).Delete(ct.instance.Name, &metav1.DeleteOptions{}); err != nil { | ||
t.Fatalf("failed to delete instance: %v", err) | ||
} | ||
ct.osbClient.PollLastOperationReaction = &fakeosb.PollLastOperationReaction{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot change the OSB client reactions while there are reconciliations that could be running. In general, don't change the reactions outside of setup. If the response needs to be dynamic, use a DynamicPollLastOperationReaction
.
@staebler thanks for the review and slack discussion. I agree with your points, I've got some updates here, but more is needed. Marking as a work in progress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently there is no way to delete a review; I'm changing this comment because I reviewed the PR thinking it was supposed to do something else during a sleep-deprived state.
…stanceSuccess (), and processProvisionFailure(), corrected and supplemented test cases.
7c92aa7
to
c59d259
Compare
c59d259
to
e30c78c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with the following reservations:
- We should consider in the future whether to store the generation an operation is for in the status
- We also need this issue to be fixed for async binding
- When we fix the issue for async binding, we probably ought to just store the in-progress generation, since async binding is alpha anyway
- We should determine if we begin saving the in-progress generation for instances, whether a data migration is required
clearServiceInstanceCurrentOperation(instance) | ||
|
||
if instance.DeletionTimestamp != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a fine workaround for now but I do think that we should consider, as a follow-up to this, storing the generation that the in-progress operation is for, so that when we complete an operation we can set the reconciled generation correctly to the generation that started the operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we should store a correct reconciledGeneration
😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jboyd01 leave a TODO then?
t.Fatalf("error waiting for instance to be updating asynchronously: %v", err) | ||
} | ||
|
||
if err := ct.client.ServiceInstances(ct.instance.Namespace).Delete(ct.instance.Name, &metav1.DeleteOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there's a race here between the update completing and the deletion timestamp getting set. Instead of basing the response of the last operation on the number of polls that occurred, I think we should use a boolean (shouldFinishPolling
?) that we toggle from false to true after issuing the delete request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur with @kibbles-n-bytes on this. So that it's clear, the same race exists in the other test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, with the same reservations as @pmorie and @kibbles-n-bytes from their reviews.
t.Fatalf("error waiting for instance to be updating asynchronously: %v", err) | ||
} | ||
|
||
if err := ct.client.ServiceInstances(ct.instance.Namespace).Delete(ct.instance.Name, &metav1.DeleteOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur with @kibbles-n-bytes on this. So that it's clear, the same race exists in the other test.
Thanks for the reviews. Good point and that makes a much better test @kibbles-n-bytes, @staebler, thanks. I've updated both tests. |
test/integration/controller_test.go
Outdated
state := osb.StateInProgress | ||
select { | ||
// nonblocking check for a message to finish async provisioning, return inProgress otherwise | ||
case <-done: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might still be a race here. If a reconciliation starts, and then the deletion timestamp is set, we'd pull true
out of done
but then fail to update the instance (as our resource is out of date). The next reconciliation loop, we'd then be back to the channel being empty.
I think a better gauge would be if the channel is closed or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kibbles-n-bytes, are you ok with instead moving the declaration and initialization of state := osb.StateInProgress
up to the top of the Run() ? This way reading from the channel toggles us to a done state and we stay there for any additional polls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using an atomic int instead of a channel at all? I think that makes it clearer what is going on from a readability standpoint.
The reaction function reads the atomic int. If the value is 0, then the state is in-progress. Otherwise, the state is succeeded or failed. The run function sets the atomic int to 1 after making the delete.
clearServiceInstanceCurrentOperation(instance) | ||
|
||
if instance.DeletionTimestamp != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we should store a correct reconciledGeneration
😀
clearServiceInstanceCurrentOperation(instance) | ||
|
||
if instance.DeletionTimestamp != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jboyd01 leave a TODO then?
@@ -1474,6 +1483,13 @@ func (c *controller) processProvisionFailure(instance *v1beta1.ServiceInstance, | |||
err = fmt.Errorf(failedCond.Message) | |||
} else { | |||
clearServiceInstanceCurrentOperation(instance) | |||
|
|||
if instance.DeletionTimestamp != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you extract this code to a separate method instead of copy-pasting? it will make a cleanup in a follow-up PR much easier.
partially fixes #1587 (this only covers deleting an Instance during provisioning or update, it does not deal with async service bindings). I'll do the bindings in a follow up.
Don't bump the ReconciledGeneration after async provisioning or update of an instance if there is a pending deletion.