Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for resource being deleted, retry reconciliation #1621

Merged
merged 1 commit into from
Jul 28, 2020

Conversation

alenkacz
Copy link
Contributor

What this PR does / why we need it:
This fixes a bug when a sequence of apply-delete-apply meant that the resource did not exist in the end because we did not wait for item to be deleted out of the cache of the client we're using. Details are in #1596

The solution introduces "health check" for deletion which queries client until it's deleted. Instead of poll & wait and blocking the worker thread we're relying on reconciliation loop being re-trigerred. We now reschedule reconciliation when plan is running to prevent stalling. A simple backoff mechanism implemented as well.

Fixes #1596

@alenkacz alenkacz force-pushed the av/no-change-lastupdate branch 2 times, most recently from f149fc6 to d224c78 Compare July 24, 2020 04:29
Copy link
Member

@nfnt nfnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

It would be great to have an integration test for this. Furthermore, at some point we may want to be able to configure the maximum backoff and consider an exponential backoff. But, IMO, the current solution is great and we should look at these options only once we run into limitations with the current approach of a linear backoff.

return reconcile.Result{}
}
lastUpdatedTime := instance.Status.PlanStatus[instance.Spec.PlanExecution.PlanName].LastUpdatedTimestamp.Time
secondsBackoffCount := int(math.Min(60, timeNow().Sub(lastUpdatedTime).Minutes())) + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Make the maximum backoff exactly one minute by changing this to:

Suggested change
secondsBackoffCount := int(math.Min(60, timeNow().Sub(lastUpdatedTime).Minutes())) + 1
secondsBackoffCount := int(math.Min(59, timeNow().Sub(lastUpdatedTime).Minutes())) + 1

@alenkacz
Copy link
Contributor Author

@nfnt oh yeah, integration test covering the regression is definitely a good idea. I'll work on that!

Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Work!

left a comment and agree with Jan on time... nothing that would stop or slow this down! Really great addition!

// computeTheReconcileResult decides whether retry reconciliation or not
// if plan was finished, reconciliation is not retried
// for others it uses LastUpdatedTimestamp of a current plan
// for plan updated less than a minute ago, the backoff would be a second, 10 minutes would be 10 seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems challenging to read

for plan updated less than a minute ago, the backoff would be a second, 10 minutes would be 10 seconds maximum backoff is one minute

I think what is meant is the initial backoff is 1 sec and that 1 min is the max backoff. I can't make sense of 10 minutes would be 10 seconds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clarify, let me know if it's better

Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
@alenkacz alenkacz merged commit f78cf0d into main Jul 28, 2020
@alenkacz alenkacz deleted the av/no-change-lastupdate branch July 28, 2020 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConfigMap not being re-apply with simple plan
3 participants