New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1705649 : Cluster with halted master did not reschedule operators after 5m of being down #454
Bug 1705649 : Cluster with halted master did not reschedule operators after 5m of being down #454
Conversation
@ravisantoshgudimetla kube apiserver is static pod... I heard from @sjenning that static pods can't be evicted (ever). So I wonder if this PR makes sense for static pods (KAS, KCM and KSM) |
/lgtm |
Static pods can be evicted(by kubelet if they're not critical pods) but we have decided not to apply tolerations to the static. I will remove the tolerations for them soon. |
/hold |
a19cd8c
to
1ccf428
Compare
1ccf428
to
9ad60c1
Compare
/hold cancel |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, mfojtik, ravisantoshgudimetla The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@mfojtik @deads2k, seeing the above error in e2e-operator failures ^ |
/retest Please review the full test history for this PR and help us cut down flakes. |
/cc @RobertKrawitz |
/retest |
As we get closer to release, please ensure code changes have a bug and the bug is associated in the PR title - follow the conventions described in previous emails about how to associate bugs with PRs. The PR title must be This PR didn't get correct title because it was |
As of now, because of infinite tolerations against all the possible taints, we are seeing that operators are not getting evicted from nodes that have NoExecute taint on them. This PR tightens the conditions around which can be pods can be scheduled/evicted. The downside is there is a very good chance that pods would be evicted from nodes that have certain conditions like disk-pressure, memory-pressure, taints added by other controllers(operators) etc. So, please make sure that this change is ok with your operator/operand before merging this PR.
/cc @sjenning @smarterclayton