Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-restart scheduler in shoot maintenance #2756

Merged
merged 1 commit into from
Aug 20, 2020

Conversation

rfranzke
Copy link
Member

How to categorize this PR?

/area ops-productivity robustness
/kind enhancement
/priority normal

What this PR does / why we need it:
The kube-scheduler is now auto-restarted in the shoot maintenance time window, similar to other controllers.

Which issue(s) this PR fixes:
Fixes #2722

Special notes for your reviewer:
As #2731 is still in discussion, I'm making this change now separately and rebase the PR later if necessary.

Release note:

The `kube-scheduler` is now auto-restarted in the shoot maintenance time window, similar to other controllers.

@rfranzke rfranzke requested a review from a team as a code owner August 19, 2020 13:48
@gardener-robot gardener-robot added area/ops-productivity Operator productivity related (how to improve operations) area/robustness Robustness, reliability, resilience related kind/enhancement Enhancement, improvement, extension priority/normal labels Aug 19, 2020
@rfranzke
Copy link
Member Author

/invite @timuthy

Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ialidzhikov
Copy link
Member

We initially stared with restart for cloud-controller-manager, then we did the same for kcm and mcm, now we do it also the kube-scheduler in this PR. This sounds like we bury/workaround issues? Are there occurrences of kube-scheduler Pod which "hangs" for some reason? If yes, shouldn't we rather try to understand the root cause?

@rfranzke
Copy link
Member Author

@ialidzhikov I don't think there are clear steps to reproduce the problems that we see (luckily very rarely). If you can help here it's highly appreciated to tackle the root cause in the first place, sure, but until then this one-time auto-restart in the maintenance time window is a simply thing to improve the ops experience.

@rfranzke rfranzke merged commit 89610c1 into gardener:master Aug 20, 2020
@rfranzke rfranzke deleted the feature/auto-restart-scheduler branch August 20, 2020 04:11
@gardener-robot gardener-robot added priority/3 Priority (lower number equals higher priority) and removed priority/3 Priority (lower number equals higher priority) labels Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ops-productivity Operator productivity related (how to improve operations) area/robustness Robustness, reliability, resilience related kind/enhancement Enhancement, improvement, extension
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto-restart kube-scheduler in maintenance time window
7 participants