-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exit rolling updates when encountering specific errors #14194
Exit rolling updates when encountering specific errors #14194
Conversation
|
Welcome @jandersen-plaid! |
Hi @jandersen-plaid. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'd rather not add a flag for this. I think it is enough to inspect the returned error and return directly if it does not make sense to continue. |
I think I'd prefer this be the default or only option. /hold |
I believe the history was that the update on the IG would previously wait forever, not fail. |
40caf71
to
cb34dec
Compare
If a control plane IG fails, we already directly return an error. That is by far the most important behavior. If an IG fails and kOps keeps going to the next, and keeps going to the next and continues to gracefully drain and terminate nodes makes sense. But in the case of a validation error, it doesn't make sense to keep going as kOps won't succeed with the next IG either. |
b8e5031
to
a97a94b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
I think it is worth adding a brief explanation to the notes about the change in behavior. Something like:
* As of 1.24, kOps no longer hung indefinitely on eviction errors, but timed out after 15 minutes by default. After the timeout kOps will carry on to the next InstanceGroup. As of kOps 1.26, kOps will no longer carry on if a cluster validation error is encountered.
Signed-off-by: Jack Andersen <jandersen@plaid.com>
Signed-off-by: Jack Andersen <jandersen@plaid.com>
Signed-off-by: Jack Andersen <jandersen@plaid.com>
…exitable Signed-off-by: Jack Andersen <jandersen@plaid.com>
Co-authored-by: Ole Markus With <olemarkus@gmail.com>
…rs of err check Signed-off-by: Jack Andersen <jandersen@plaid.com>
Signed-off-by: Jack Andersen <jandersen@plaid.com>
Signed-off-by: Jack Andersen <jandersen@plaid.com>
27a5abb
to
58e9d38
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: olemarkus The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@johngmyers you still want to hold this one? |
/hold cancel |
/retest |
…194-origin-release-1.26 Automated cherry pick of #14194: Add a flag to rolling update to fail immediately on IG
…ick-of-#14194-origin-release-1.26 Automated cherry pick of kubernetes#14194: Add a flag to rolling update to fail immediately on IG
Fixes #14176
Add a new error to
instancegroups
which is returned when cluster validation times out. Then, check this error during cluster node and apiserver instancegroup rolls. If the cluster fails to be validated within the timeout then it is unlikely to succeed on subsequent node rolls, so the preference is to exit the cluster roll (to allow operators to fix the cluster validation issue).