Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move scheduler Score plugin weight defaulting to API #106426

Open
damemi opened this issue Nov 15, 2021 · 27 comments
Open

Move scheduler Score plugin weight defaulting to API #106426

damemi opened this issue Nov 15, 2021 · 27 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@damemi
Copy link
Contributor

damemi commented Nov 15, 2021

The scheduler framework instantiation currently handles defaulting the weights of Score plugins, as well as checking that the total weight is not more than the maximum:

https://github.com/kubernetes/kubernetes/blob/9dc0489/pkg/scheduler/framework/runtime/framework.go#L292-L306

this should be moved to the API defaulting itself to catch this earlier and be more secure
/sig scheduling
/kind feature

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 15, 2021
@k8s-ci-robot
Copy link
Contributor

@damemi: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kerthcet
Copy link
Member

I can give some help.
/assign

@NikhilSharmaWe
Copy link
Member

@damemi I would like to collaborate. Could you please elaborate and tell in what files the changes should be made for handling the weight of Score Plugins in API Defaulting.

@damemi
Copy link
Contributor Author

damemi commented Dec 1, 2021

@NikhilSharmaWe it looks like @kerthcet has already offered to work on this, but if you would like to collaborate on it feel free. The code in question is located in https://github.com/kubernetes/kubernetes/blob/9dc0489/pkg/scheduler/framework/runtime/framework.go#L292-L306, and we are interested in moving it to the actual API defaulting sections, probably https://github.com/kubernetes/kubernetes/blob/9dc0489c1ae3540453d94348efcc536585c31805/pkg/scheduler/apis/config/v1beta2/defaults.go

@kerthcet
Copy link
Member

kerthcet commented Dec 2, 2021

Yes, I'm working on this issue also, but I haven't push commits yet for I still have some questions, let's discuss about it
here. Correct me if any mis-understandings.

Since we'll merge out-of-tree plugins here, I think move the codes to default.go will miss some plugins.

And I prefer to move the code here, as we have already validate the QueueSort Plugins here. WDYT @damemi

@damemi
Copy link
Contributor Author

damemi commented Dec 2, 2021

@kerthcet I don't think out-of-tree plugins will miss the validation if we move it to API. This is going to be agnostic of the specific plugin types, validating that there is just a weight set for each score plugin and that those weights aren't more than the max score. Another possible location could be here: https://github.com/kubernetes/kubernetes/blob/9dc0489c1ae3540453d94348efcc536585c31805/pkg/scheduler/apis/config/validation/validation.go

I also don't think moving the code to profile.go is what we want either. We're validating QueueSort plugins in that spot because we need to check across each individual profile("framework") to make sure there are no conflicts. The reason that check is in that spot is because that's after each framework/profile has been initialized. But, we could check the score weights on individual profiles as they're created. Does that make sense?

Ultimately, the goal isn't just to move this code somewhere else. We specifically want it in API validation/defaulting to pick up that early, stronger validation. If we can't do that, we don't have much reason to move it elsewhere

@kerthcet
Copy link
Member

@kerthcet I don't think out-of-tree plugins will miss the validation if we move it to API. This is going to be agnostic of the specific plugin types, validating that there is just a weight set for each score plugin and that those weights aren't more than the max score. Another possible location could be here: https://github.com/kubernetes/kubernetes/blob/9dc0489c1ae3540453d94348efcc536585c31805/pkg/scheduler/apis/config/validation/validation.go

I also don't think moving the code to profile.go is what we want either. We're validating QueueSort plugins in that spot because we need to check across each individual profile("framework") to make sure there are no conflicts. The reason that check is in that spot is because that's after each framework/profile has been initialized. But, we could check the score weights on individual profiles as they're created. Does that make sense?

yes, I double-checked the code and I found I misunderstood the intention of function merge(). Thanks for your explanations, I'll push a commit tomorrow and some tests are needed too.

@kerthcet
Copy link
Member

hi @damemi , I found it's easy to handle plugins in KubeSchedulerProfile.Plugins.Score.Enabled since all the plugins are implements of ScoreExtensionPoint , however , we can not make sure what extension point these plugins implemented in KubeSchedulerProfile.Plugins.MultiPoint.Enabled, for only score plugins using these weights. do you have any idea?

@damemi
Copy link
Contributor Author

damemi commented Dec 14, 2021

@kerthcet that is the challenge with this, and I'm open to suggestions on how we could implement it. You might have to look into doing a similar loop to try to cast all of the MultiPoint plugins to a ScorePlugin, and if that succeeds then check for the weights.

@kerthcet
Copy link
Member

thanks, I'll have a try then.

@damemi
Copy link
Contributor Author

damemi commented Jan 5, 2022

@kerthcet any updates or progress on this?

@kerthcet
Copy link
Member

kerthcet commented Jan 6, 2022

Sorry, I just return from holidays and I talked with @NikhilSharmaWe , he is free to continue his work and I can provide some help if needed.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2022
@kerthcet
Copy link
Member

kerthcet commented Apr 6, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2022
@kerthcet
Copy link
Member

kerthcet commented Jul 5, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2022
@kerthcet
Copy link
Member

kerthcet commented Oct 8, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 8, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 5, 2023
@kerthcet
Copy link
Member

kerthcet commented Feb 6, 2023

/remove-lifecycle rotten
Still helpful

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 6, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2023
@kerthcet
Copy link
Member

kerthcet commented May 8, 2023

/remove-lifecycle stale
/kind help

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2023
@k8s-ci-robot
Copy link
Contributor

@kerthcet: The label(s) kind/help cannot be applied, because the repository doesn't have them.

In response to this:

/remove-lifecycle stale
/kind help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2024
@sanposhiho
Copy link
Member

/remove-lifecycle rotten

I guess @kerthcet is working on it.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
Status: Needs Triage
Development

Successfully merging a pull request may close this issue.

6 participants