-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In autoscaling tests, add PDBs for more kube-system pods #52796
In autoscaling tests, add PDBs for more kube-system pods #52796
Conversation
{label: "kubernetes-dashboard", min_available: 0}, | ||
{label: "l7-default-backend", min_available: 0}, | ||
{label: "heapster", min_available: 0}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure all of those can be safely restarted? I am rather worried about restarting heapster (or any other critical system pod) in e2e. We had enough pain with rescheduler tainting our nodes already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to be sure, but:
- in tests with broken nodes, we don't select a node based on any of those pods, so we sometimes accidentally cause them to become unavailable and rescheduled anyway,
- we have timeouts and retry logic which should be enough to cover these scenarios if everything works as expected,
- if it doesn't work as expected, I think we should fail, even if it's not CA that is responsible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily agree with point 3 above. However, we discussed offline with @aleksandra-malinowska and it looks like restarting heapster should no longer break tests.
/lgtm |
/approve no-issue |
1 similar comment
/approve no-issue |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MaciekPytel, aleksandra-malinowska, mwielgus Associated issue requirement bypassed by: mwielgus The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.. |
…aling-test-fix-4 Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. Improve cluster autoscaling tests logging and error checking during cleanup This adds extra logs and error checks to autoscaling tests during PodDisruptionBudgets cleanup. It should help with identifying flake causes. Follow up to kubernetes#52796
This adds PDBs for more kube-system pods in scale down tests. It should reduce flakes caused by evenly distributed system components blocking scale down of all nodes.