New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dynamic change of scheduler's policy configuration #41600

Closed
bsalamat opened this Issue Feb 16, 2017 · 25 comments

Comments

Projects
None yet
@bsalamat
Contributor

bsalamat commented Feb 16, 2017

Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST

Kubernetes currently supports "multiple schedulers" (users can run their own custom scheduler(s))
(https://kubernetes.io/docs/admin/multiple-schedulers/), but running an entire separate scheduler is heavyweight, especially for common desires like using best-fit instead of the default spreading policy. There is a scheduler configuration file which allows users to selectively enable/disable predicate and priority functions, and to choose the weights of the enabled priority functions. But this file is read from local disk, so it is not flexible enough for run-time changes and may not be easily accessible in hosted solutions.
It would be great if scheduler configuration could be changed dynamically (probably by setting/changing a ConfigMap in an API call).

One possible use case for this feature is to allow automatic change of scheduling policies when certain characteristics of the cluster change. For example, we could automatically switch the scheduling policy from spreading to best-fit when the user enables cluster autoscaler (https://kubernetes.io/docs/admin/cluster-management/#cluster-autoscaling) on an existing cluster.

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Feb 17, 2017

Member

@bsalamat This is a dupe of #1627

Member

bgrant0607 commented Feb 17, 2017

@bsalamat This is a dupe of #1627

@bsalamat

This comment has been minimized.

Show comment
Hide comment
@bsalamat

bsalamat Feb 17, 2017

Contributor

@bgrant0607 Yes. Thanks for pointing out. This is specifically targeting scheduler's config, but I guess we can mark it a dupe and continue the discussion in #1627.

Contributor

bsalamat commented Feb 17, 2017

@bgrant0607 Yes. Thanks for pointing out. This is specifically targeting scheduler's config, but I guess we can mark it a dupe and continue the discussion in #1627.

@davidopp

This comment has been minimized.

Show comment
Hide comment
@davidopp

davidopp Feb 18, 2017

Member

I think it's worth keeping both issues open; this one is specific to scheduler, even if it uses an appoach decided in #1627.

Member

davidopp commented Feb 18, 2017

I think it's worth keeping both issues open; this one is specific to scheduler, even if it uses an appoach decided in #1627.

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc Feb 20, 2017

Member

In general, I've been wanting a SIGUP reconfig on * kube components. Our initialization paths are really heavy weight atm, but imho we could probably start with rejigging for either signal-trap or fstat change = reconfig. I'm old-school, so i'll always prefer an explicit SIGUP.

Member

timothysc commented Feb 20, 2017

In general, I've been wanting a SIGUP reconfig on * kube components. Our initialization paths are really heavy weight atm, but imho we could probably start with rejigging for either signal-trap or fstat change = reconfig. I'm old-school, so i'll always prefer an explicit SIGUP.

@jayunit100

This comment has been minimized.

Show comment
Hide comment
@jayunit100

jayunit100 Feb 22, 2017

Member

one way to do this would be

  • to put the config (which iirc is already an API object) in etcd ((i.e. via apiserver)) (i think its already maybe there anyway) ?
  • run a watch on it and dynamically reconfigure by default, so its transparent to the users
  • this style would also be generalizable to other components
  • biggest advantage here is admins can tune the scheduler by updating config in real time without actually having to touch/signal the scheduler at all.
  • for mad scientists in the audience: doing this would possibly allow the scheduler to auto tune itself and learn from its decisions in the long term.
Member

jayunit100 commented Feb 22, 2017

one way to do this would be

  • to put the config (which iirc is already an API object) in etcd ((i.e. via apiserver)) (i think its already maybe there anyway) ?
  • run a watch on it and dynamically reconfigure by default, so its transparent to the users
  • this style would also be generalizable to other components
  • biggest advantage here is admins can tune the scheduler by updating config in real time without actually having to touch/signal the scheduler at all.
  • for mad scientists in the audience: doing this would possibly allow the scheduler to auto tune itself and learn from its decisions in the long term.
@liggitt

This comment has been minimized.

Show comment
Hide comment
@liggitt

liggitt Feb 22, 2017

Member

most components don't have access to etcd

Member

liggitt commented Feb 22, 2017

most components don't have access to etcd

@jayunit100

This comment has been minimized.

Show comment
Hide comment
@jayunit100

jayunit100 Feb 23, 2017

Member

(by etcd i meant apiserver, updated accordingly)

Member

jayunit100 commented Feb 23, 2017

(by etcd i meant apiserver, updated accordingly)

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Mar 2, 2017

Member

See also #12245

Member

bgrant0607 commented Mar 2, 2017

See also #12245

@bsalamat bsalamat changed the title from Allow dynamic change of scheduler configuration to Allow dynamic change of scheduler's policy configuration Mar 2, 2017

@bsalamat

This comment has been minimized.

Show comment
Hide comment
@bsalamat

bsalamat Mar 2, 2017

Contributor

@bgrant0607 Thanks for the link. This issue targets one of the schedulers command-line config items which is the scheduler's policy config (https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/componentconfig/v1alpha1/types.go#L111). I changed the title to clarify that this is only about scheduler's policy configuration.

Contributor

bsalamat commented Mar 2, 2017

@bgrant0607 Thanks for the link. This issue targets one of the schedulers command-line config items which is the scheduler's policy config (https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/componentconfig/v1alpha1/types.go#L111). I changed the title to clarify that this is only about scheduler's policy configuration.

@davidopp

This comment has been minimized.

Show comment
Hide comment
Member

davidopp commented Mar 14, 2017

ref/ #28842

@bsalamat

This comment has been minimized.

Show comment
Hide comment
@bsalamat

bsalamat Mar 14, 2017

Contributor

If you haven't seen this document already, please take a look and leave your comments:
https://docs.google.com/document/d/19AKH6V6ejOeIvyGtIPNvRMR4Yi_X8U3Q1zz2fgTNhvM

You must be a member of one of the two following groups to be able to see and comment:
kubernetes-dev@googlegroups.com
kubernetes-sig-scheduling@googlegroups.com

Contributor

bsalamat commented Mar 14, 2017

If you haven't seen this document already, please take a look and leave your comments:
https://docs.google.com/document/d/19AKH6V6ejOeIvyGtIPNvRMR4Yi_X8U3Q1zz2fgTNhvM

You must be a member of one of the two following groups to be able to see and comment:
kubernetes-dev@googlegroups.com
kubernetes-sig-scheduling@googlegroups.com

@timothysc

This comment has been minimized.

Show comment
Hide comment
@timothysc

timothysc Mar 15, 2017

Member

Just so it's clear... the proposal today is death and restart vs. soft re-config. We should probably denote the two options and explore whether it makes sense to evaluate soft-reconfig, albeit a tougher problem.

Member

timothysc commented Mar 15, 2017

Just so it's clear... the proposal today is death and restart vs. soft re-config. We should probably denote the two options and explore whether it makes sense to evaluate soft-reconfig, albeit a tougher problem.

@bsalamat

This comment has been minimized.

Show comment
Hide comment
@bsalamat

bsalamat Mar 15, 2017

Contributor

@timothysc In the document I alluded to the fact that we can apply the config without restarting, but soft reconfig was not the focus of the document. When implementing the feature, I will evaluate it more to see what it takes to reconfig without restarting. We should defenitely think about reconfig as the next step.

Contributor

bsalamat commented Mar 15, 2017

@timothysc In the document I alluded to the fact that we can apply the config without restarting, but soft reconfig was not the focus of the document. When implementing the feature, I will evaluate it more to see what it takes to reconfig without restarting. We should defenitely think about reconfig as the next step.

@davidopp

This comment has been minimized.

Show comment
Hide comment
@davidopp

davidopp Mar 20, 2017

Member

I hate to keep expanding the scope of this issue, but one other thing that might be worth considering is to make some kind of decision of how we view the scheduler extender. What I mean is: it was implemented in the very early days of Kubernetes, before we had much processes in place, and the feeling at the time was that we would not encourage people to use it except as a last resort. This is why we never wrote documentation for it. Essentially it was an "alpha feature," even though we never called it that. IIRC the main concern was that it would interfere with scheduling optimizations, and in general would hurt scheduler latency and throughput (of course only when it is being used). But if it is something we now consider a "permanent" and full-fledged part of the default scheduler, then we should at least write documentation for it.

The reason I am mentioning it here is that the scheduler policy config is how you define the extender endpoint(s) and other attributes of the extender. If we decide that it really is "alpha" then we should make sure the config somehow makes that clear.

Possibly this belongs in a separate issue, but for now maybe just mentioning it here is enough.

Member

davidopp commented Mar 20, 2017

I hate to keep expanding the scope of this issue, but one other thing that might be worth considering is to make some kind of decision of how we view the scheduler extender. What I mean is: it was implemented in the very early days of Kubernetes, before we had much processes in place, and the feeling at the time was that we would not encourage people to use it except as a last resort. This is why we never wrote documentation for it. Essentially it was an "alpha feature," even though we never called it that. IIRC the main concern was that it would interfere with scheduling optimizations, and in general would hurt scheduler latency and throughput (of course only when it is being used). But if it is something we now consider a "permanent" and full-fledged part of the default scheduler, then we should at least write documentation for it.

The reason I am mentioning it here is that the scheduler policy config is how you define the extender endpoint(s) and other attributes of the extender. If we decide that it really is "alpha" then we should make sure the config somehow makes that clear.

Possibly this belongs in a separate issue, but for now maybe just mentioning it here is enough.

@bsalamat

This comment has been minimized.

Show comment
Hide comment
@bsalamat

bsalamat Mar 23, 2017

Contributor

As I have mentioned in the above doc, the final decision is to implement approach #2. The main reason for picking this approach is its advantages as explained in the doc.
There are scalability concerns for some components of the system with many instances, e.g., kubelet, if they decide to watch a number of API server objects. Such concerns do not exist for scheduler as we have usually one or at most several schedulers. Schedulers already have watches on many objects. So, adding one more object will not cause a noticeable impact.

Please let me know if you have any objections.

Contributor

bsalamat commented Mar 23, 2017

As I have mentioned in the above doc, the final decision is to implement approach #2. The main reason for picking this approach is its advantages as explained in the doc.
There are scalability concerns for some components of the system with many instances, e.g., kubelet, if they decide to watch a number of API server objects. Such concerns do not exist for scheduler as we have usually one or at most several schedulers. Schedulers already have watches on many objects. So, adding one more object will not cause a noticeable impact.

Please let me know if you have any objections.

@davidopp

This comment has been minimized.

Show comment
Hide comment
@roberthbailey

This comment has been minimized.

Show comment
Hide comment
@bsalamat

This comment has been minimized.

Show comment
Hide comment
@bsalamat

bsalamat Sep 7, 2017

Contributor

kind/feature
priority/important-longterm

Contributor

bsalamat commented Sep 7, 2017

kind/feature
priority/important-longterm

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Sep 9, 2017

Contributor

[MILESTONENOTIFIER] Milestone Removed

@bsalamat

Important:
This issue was missing labels required for the v1.8 milestone for more than 7 days:

kind: Must specify exactly one of [kind/bug, kind/cleanup, kind/feature].
priority: Must specify exactly one of [priority/critical-urgent, priority/important-longterm, priority/important-soon].

Removing it from the milestone.

Additional instructions available here The commands available for adding these labels are documented here
Contributor

k8s-merge-robot commented Sep 9, 2017

[MILESTONENOTIFIER] Milestone Removed

@bsalamat

Important:
This issue was missing labels required for the v1.8 milestone for more than 7 days:

kind: Must specify exactly one of [kind/bug, kind/cleanup, kind/feature].
priority: Must specify exactly one of [priority/critical-urgent, priority/important-longterm, priority/important-soon].

Removing it from the milestone.

Additional instructions available here The commands available for adding these labels are documented here

@k8s-merge-robot k8s-merge-robot removed this from the v1.8 milestone Sep 9, 2017

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Jan 5, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@bsalamat

This comment has been minimized.

Show comment
Hide comment
@bsalamat

bsalamat Jan 5, 2018

Contributor

/remove-lifecycle stale

Contributor

bsalamat commented Jan 5, 2018

/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Apr 5, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot May 5, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Jun 4, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment