Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dynamic change of scheduler's policy configuration #41600

Closed
bsalamat opened this issue Feb 16, 2017 · 27 comments
Closed

Allow dynamic change of scheduler's policy configuration #41600

bsalamat opened this issue Feb 16, 2017 · 27 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. milestone/removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@bsalamat
Copy link
Member

bsalamat commented Feb 16, 2017

Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST

Kubernetes currently supports "multiple schedulers" (users can run their own custom scheduler(s))
(https://kubernetes.io/docs/admin/multiple-schedulers/), but running an entire separate scheduler is heavyweight, especially for common desires like using best-fit instead of the default spreading policy. There is a scheduler configuration file which allows users to selectively enable/disable predicate and priority functions, and to choose the weights of the enabled priority functions. But this file is read from local disk, so it is not flexible enough for run-time changes and may not be easily accessible in hosted solutions.
It would be great if scheduler configuration could be changed dynamically (probably by setting/changing a ConfigMap in an API call).

One possible use case for this feature is to allow automatic change of scheduling policies when certain characteristics of the cluster change. For example, we could automatically switch the scheduling policy from spreading to best-fit when the user enables cluster autoscaler (https://kubernetes.io/docs/admin/cluster-management/#cluster-autoscaling) on an existing cluster.

@bgrant0607
Copy link
Member

@bsalamat This is a dupe of #1627

@bgrant0607 bgrant0607 added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 17, 2017
@bsalamat
Copy link
Member Author

@bgrant0607 Yes. Thanks for pointing out. This is specifically targeting scheduler's config, but I guess we can mark it a dupe and continue the discussion in #1627.

@davidopp
Copy link
Member

I think it's worth keeping both issues open; this one is specific to scheduler, even if it uses an appoach decided in #1627.

@bsalamat
Copy link
Member Author

@kubernetes/sig-api-machinery-misc @kubernetes/sig-scheduling-feature-requests

@timothysc
Copy link
Member

In general, I've been wanting a SIGUP reconfig on * kube components. Our initialization paths are really heavy weight atm, but imho we could probably start with rejigging for either signal-trap or fstat change = reconfig. I'm old-school, so i'll always prefer an explicit SIGUP.

@jayunit100
Copy link
Member

jayunit100 commented Feb 22, 2017

one way to do this would be

  • to put the config (which iirc is already an API object) in etcd ((i.e. via apiserver)) (i think its already maybe there anyway) ?
  • run a watch on it and dynamically reconfigure by default, so its transparent to the users
  • this style would also be generalizable to other components
  • biggest advantage here is admins can tune the scheduler by updating config in real time without actually having to touch/signal the scheduler at all.
  • for mad scientists in the audience: doing this would possibly allow the scheduler to auto tune itself and learn from its decisions in the long term.

@liggitt
Copy link
Member

liggitt commented Feb 22, 2017

most components don't have access to etcd

@jayunit100
Copy link
Member

(by etcd i meant apiserver, updated accordingly)

@bgrant0607
Copy link
Member

See also #12245

@bsalamat bsalamat changed the title Allow dynamic change of scheduler configuration Allow dynamic change of scheduler's policy configuration Mar 2, 2017
@bsalamat
Copy link
Member Author

bsalamat commented Mar 2, 2017

@bgrant0607 Thanks for the link. This issue targets one of the schedulers command-line config items which is the scheduler's policy config (https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/componentconfig/v1alpha1/types.go#L111). I changed the title to clarify that this is only about scheduler's policy configuration.

@davidopp
Copy link
Member

ref/ #28842

@bsalamat
Copy link
Member Author

If you haven't seen this document already, please take a look and leave your comments:
https://docs.google.com/document/d/19AKH6V6ejOeIvyGtIPNvRMR4Yi_X8U3Q1zz2fgTNhvM

You must be a member of one of the two following groups to be able to see and comment:
kubernetes-dev@googlegroups.com
kubernetes-sig-scheduling@googlegroups.com

@timothysc
Copy link
Member

Just so it's clear... the proposal today is death and restart vs. soft re-config. We should probably denote the two options and explore whether it makes sense to evaluate soft-reconfig, albeit a tougher problem.

@bsalamat
Copy link
Member Author

@timothysc In the document I alluded to the fact that we can apply the config without restarting, but soft reconfig was not the focus of the document. When implementing the feature, I will evaluate it more to see what it takes to reconfig without restarting. We should defenitely think about reconfig as the next step.

@davidopp
Copy link
Member

I hate to keep expanding the scope of this issue, but one other thing that might be worth considering is to make some kind of decision of how we view the scheduler extender. What I mean is: it was implemented in the very early days of Kubernetes, before we had much processes in place, and the feeling at the time was that we would not encourage people to use it except as a last resort. This is why we never wrote documentation for it. Essentially it was an "alpha feature," even though we never called it that. IIRC the main concern was that it would interfere with scheduling optimizations, and in general would hurt scheduler latency and throughput (of course only when it is being used). But if it is something we now consider a "permanent" and full-fledged part of the default scheduler, then we should at least write documentation for it.

The reason I am mentioning it here is that the scheduler policy config is how you define the extender endpoint(s) and other attributes of the extender. If we decide that it really is "alpha" then we should make sure the config somehow makes that clear.

Possibly this belongs in a separate issue, but for now maybe just mentioning it here is enough.

@bsalamat
Copy link
Member Author

As I have mentioned in the above doc, the final decision is to implement approach #2. The main reason for picking this approach is its advantages as explained in the doc.
There are scalability concerns for some components of the system with many instances, e.g., kubelet, if they decide to watch a number of API server objects. Such concerns do not exist for scheduler as we have usually one or at most several schedulers. Schedulers already have watches on many objects. So, adding one more object will not cause a noticeable impact.

Please let me know if you have any objections.

@davidopp
Copy link
Member

@roberthbailey

@roberthbailey
Copy link
Contributor

@mikedanese

@timothysc timothysc modified the milestones: v1.8, v1.7 Jun 5, 2017
@bsalamat
Copy link
Member Author

bsalamat commented Sep 7, 2017

kind/feature
priority/important-longterm

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Removed

@bsalamat

Important:
This issue was missing labels required for the v1.8 milestone for more than 7 days:

kind: Must specify exactly one of [kind/bug, kind/cleanup, kind/feature].
priority: Must specify exactly one of [priority/critical-urgent, priority/important-longterm, priority/important-soon].

Removing it from the milestone.

Additional instructions available here The commands available for adding these labels are documented here

@k8s-github-robot k8s-github-robot removed this from the v1.8 milestone Sep 9, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2018
@bsalamat
Copy link
Member Author

bsalamat commented Jan 5, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 5, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@Adirio
Copy link
Contributor

Adirio commented Feb 4, 2019

What is the state of this?

Was the doccument ever published for the public or just to the two mentioned GoogleGroups?

I'm intereseted in modifying the weights that each priority function has in a dynamic way.

@Adirio
Copy link
Contributor

Adirio commented Feb 5, 2019

@bsalamat tried to assign kind and priority here:

kind/feature
priority/important-longterm

/kind feature
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. milestone/removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.