-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not return errors from reconciliation when waiting for update job #224
Comments
Can I work on this? |
Sure @imskr, I will assign you. |
Thanks @Szymongib After reading the description I have some thoughts:
Am I right? |
Hey @imskr, yeah that seems like a fairly reasonable implementation. For (2), we would want to take the returned bool and have the reconciler queue up another reconcile loop after a certain specific delay. Something like 5-10 seconds seems about right to me. The idea here is we don't want to have the operator think there was too many errors and fall into delayed reconciliation checks which it will do at a certain threshold. |
Yes, the general idea sound right, we might however consider returning some struct representing the status from |
Summary
Before changes are rolled out to Mattermost deployment, the update job is deployed to verify that the image is correct. The Operator waits for the job to complete before starting deployment update.
From reconciler perspective, when the job is still running the error is returned to requeue reconciliation request. If error occur enough number of times, in rare cases this may significantly delay rolling out the deployment due to error reconciliation back off.
We should modify logic that waits for the update job to finish to return some indication that it is not yet finished rather than returning an error and use constant time for requeue delay. For now it should be enough to return
bool
alongside error and propagate it fromcheckUpdateJob
function to reconciler.The text was updated successfully, but these errors were encountered: