Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: provide an option to not shuffle internal node index when using percentageOfNodesToScore #95709

Closed
Huang-Wei opened this issue Oct 19, 2020 · 21 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@Huang-Wei
Copy link
Member

I got some requirement from users that run serverless workloads, who cares a lot about minimizing the running machines to reduce cost.

They've started to use the NodeResourcesMostAllocated Score plugin (just as AutoScaler did) to replace the default NodeResourcesLeastAllocated plugin, however, it seems not to be good enough. The problem is that when using percentageOfNodesToScore, internally scheduler maintains a node index to shuffle the starting searching position everytime it starts the Scoring. Suppose we have 500 nodes that passed Filtering phase, and due to percentageOfNodesToScore, we only Score 100 nodes. When scheduling Pod1, the searching scope is from [node1, node2, ..., node100]; while when scheduling Pod2, the scope switches to [node101, node102, ..., node200] - which sorts of balances the workloads across the cluster, and it disobeys the user's desire to pack workloads onto minimum machines as possible.

So I'm proposing to provide an option to disable shuffling the internal node's index, and if we got a consensus, the implementation may need to correlate with #93270 as we may consider making it a profile level parameter.

/sig scheduling
/kind feature

@Huang-Wei Huang-Wei added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 19, 2020
@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 19, 2020
@k8s-ci-robot
Copy link
Contributor

@Huang-Wei: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 19, 2020
@SataQiu
Copy link
Member

SataQiu commented Oct 20, 2020

/cc
I'm willing to help with this if needed.

@ahg-g
Copy link
Member

ahg-g commented Oct 20, 2020

Starting from index 0 will be expensive on the long run because the first nodes will be full, but still being evaluated in every cycle.

I am wondering if node order should be a plugin; for bin-packing, the nodes would be ordered by how full they are. If we don't want to go that far, instead of resetting the index every time, may be start from the last selected node.

@adtac
Copy link
Member

adtac commented Oct 20, 2020

At the risk of stating the obvious, would setting PercentageOfNodesToScore to 100% solve this for the user? Performance will take a hit, of course, but it might be worth benchmarking for their use-case.

So I'm proposing to provide an option to disable shuffling the internal node's index, and if we got a consensus, the implementation may need to correlate with #93270 as we may consider making it a profile level parameter.

I propose making this more generic. Instead of an option to just disable shuffling, we can create an enum option for the starting index:

  1. round robin, like we currently do
  2. always start at zero
  3. random index

The third option need not be added as a part of this issue, but making it an enum will make it more extensible in the future.

Starting from index 0 will be expensive on the long run because the first nodes will be full, but still being evaluated in every cycle.

Wouldn't those nodes be filtered out in the filter phase? If they cannot fit the pod, I don't see why they'd be a part of the scoring phase.

@ahg-g
Copy link
Member

ahg-g commented Oct 20, 2020

Wouldn't those nodes be filtered out in the filter phase? If they cannot fit the pod, I don't see why they'd be a part of the scoring phase.

They would, but we are still paying the overhead of evaluating them in the filter phase.

@Huang-Wei
Copy link
Member Author

Starting from index 0 will be expensive on the long run because the first nodes will be full, but still being evaluated in every cycle.

It's a fair point. But even with current logic, we cannot ensure everytime the searching scope (starting at a shuffled nodeIndex) is optimal - it's random, but agnostic. So overall I think the overhead is acceptable, and can be mitigated by the automatic-exposed {Memory|CPU|Disk}Pressure taint - i.e., nodes that are out of resource will return at the very early TaintToleration Filter plugin and hence won't cost much.

Another aspect, as I mentioned earlier, is to make this option in profile level, so it won't impact the global (or other profiles) behavior. This is similar to the discussion we had in profile-level percentageOfNodesToScore.

I am wondering if node order should be a plugin; for bin-packing, the nodes would be ordered by how full they are.

I'm not sure it's practical enough as (1) the top ordered nodes may fail at the Filter phase due to capacity, and (2) we need to figure out the number of nodes to order.

I propose making this more generic. Instead of an option to just disable shuffling, we can create an enum option for the starting index:

  1. round robin, like we currently do
  2. always start at zero
  3. random index

This makes sense to me. Enum can be more extensible than bool.

At the risk of stating the obvious, would setting PercentageOfNodesToScore to 100% solve this for the user? Performance will take a hit, of course, but it might be worth benchmarking for their use-case.

Using a non-100% PercentageOfNodesToScore is the prerequisite of this issue. If the user goes with 100%, they can always get a consistent behavior and hence they don't need this proposal at all.

@alculquicondor
Copy link
Member

It sounds like moving percentagesOfNodesToScore to the profile is the best first step. Then we can re-evaluate if more is needed.

@Huang-Wei
Copy link
Member Author

It sounds like moving percentagesOfNodesToScore to the profile is the best first step. Then we can re-evaluate if more is needed.

Agree.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2021
@Huang-Wei
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2021
@alculquicondor
Copy link
Member

@Huang-Wei do you have any updates from your users?

@Huang-Wei
Copy link
Member Author

@alculquicondor I can touch base with them again, but I think their requirement is clear here. Do you plan to get it implemented along with v1beta2?

@alculquicondor
Copy link
Member

I was not planing to, as per #95709 (comment)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021
@Huang-Wei
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021
@Huang-Wei
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 18, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

8 participants