-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for resource being deleted, retry reconciliation #1621
Conversation
f149fc6
to
d224c78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It would be great to have an integration test for this. Furthermore, at some point we may want to be able to configure the maximum backoff and consider an exponential backoff. But, IMO, the current solution is great and we should look at these options only once we run into limitations with the current approach of a linear backoff.
return reconcile.Result{} | ||
} | ||
lastUpdatedTime := instance.Status.PlanStatus[instance.Spec.PlanExecution.PlanName].LastUpdatedTimestamp.Time | ||
secondsBackoffCount := int(math.Min(60, timeNow().Sub(lastUpdatedTime).Minutes())) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Make the maximum backoff exactly one minute by changing this to:
secondsBackoffCount := int(math.Min(60, timeNow().Sub(lastUpdatedTime).Minutes())) + 1 | |
secondsBackoffCount := int(math.Min(59, timeNow().Sub(lastUpdatedTime).Minutes())) + 1 |
@nfnt oh yeah, integration test covering the regression is definitely a good idea. I'll work on that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice Work!
left a comment and agree with Jan on time... nothing that would stop or slow this down! Really great addition!
// computeTheReconcileResult decides whether retry reconciliation or not | ||
// if plan was finished, reconciliation is not retried | ||
// for others it uses LastUpdatedTimestamp of a current plan | ||
// for plan updated less than a minute ago, the backoff would be a second, 10 minutes would be 10 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems challenging to read
for plan updated less than a minute ago, the backoff would be a second, 10 minutes would be 10 seconds maximum backoff is one minute
I think what is meant is the initial backoff is 1 sec and that 1 min is the max backoff. I can't make sense of 10 minutes would be 10 seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clarify, let me know if it's better
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
d224c78
to
49790f5
Compare
What this PR does / why we need it:
This fixes a bug when a sequence of apply-delete-apply meant that the resource did not exist in the end because we did not wait for item to be deleted out of the cache of the client we're using. Details are in #1596
The solution introduces "health check" for deletion which queries client until it's deleted. Instead of poll & wait and blocking the worker thread we're relying on reconciliation loop being re-trigerred. We now reschedule reconciliation when plan is running to prevent stalling. A simple backoff mechanism implemented as well.
Fixes #1596