scheduler: provide an option to not shuffle internal node index when using percentageOfNodesToScore #95709

Huang-Wei · 2020-10-19T17:13:32Z

I got some requirement from users that run serverless workloads, who cares a lot about minimizing the running machines to reduce cost.

They've started to use the NodeResourcesMostAllocated Score plugin (just as AutoScaler did) to replace the default NodeResourcesLeastAllocated plugin, however, it seems not to be good enough. The problem is that when using percentageOfNodesToScore, internally scheduler maintains a node index to shuffle the starting searching position everytime it starts the Scoring. Suppose we have 500 nodes that passed Filtering phase, and due to percentageOfNodesToScore, we only Score 100 nodes. When scheduling Pod1, the searching scope is from [node1, node2, ..., node100]; while when scheduling Pod2, the scope switches to [node101, node102, ..., node200] - which sorts of balances the workloads across the cluster, and it disobeys the user's desire to pack workloads onto minimum machines as possible.

So I'm proposing to provide an option to disable shuffling the internal node's index, and if we got a consensus, the implementation may need to correlate with #93270 as we may consider making it a profile level parameter.

/sig scheduling
/kind feature

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2020-10-19T17:13:38Z

@Huang-Wei: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SataQiu · 2020-10-20T13:03:46Z

/cc
I'm willing to help with this if needed.

ahg-g · 2020-10-20T13:31:04Z

Starting from index 0 will be expensive on the long run because the first nodes will be full, but still being evaluated in every cycle.

I am wondering if node order should be a plugin; for bin-packing, the nodes would be ordered by how full they are. If we don't want to go that far, instead of resetting the index every time, may be start from the last selected node.

adtac · 2020-10-20T14:06:12Z

At the risk of stating the obvious, would setting PercentageOfNodesToScore to 100% solve this for the user? Performance will take a hit, of course, but it might be worth benchmarking for their use-case.

So I'm proposing to provide an option to disable shuffling the internal node's index, and if we got a consensus, the implementation may need to correlate with #93270 as we may consider making it a profile level parameter.

I propose making this more generic. Instead of an option to just disable shuffling, we can create an enum option for the starting index:

round robin, like we currently do
always start at zero
random index

The third option need not be added as a part of this issue, but making it an enum will make it more extensible in the future.

Starting from index 0 will be expensive on the long run because the first nodes will be full, but still being evaluated in every cycle.

Wouldn't those nodes be filtered out in the filter phase? If they cannot fit the pod, I don't see why they'd be a part of the scoring phase.

ahg-g · 2020-10-20T14:13:33Z

Wouldn't those nodes be filtered out in the filter phase? If they cannot fit the pod, I don't see why they'd be a part of the scoring phase.

They would, but we are still paying the overhead of evaluating them in the filter phase.

Huang-Wei · 2020-10-20T18:20:09Z

Starting from index 0 will be expensive on the long run because the first nodes will be full, but still being evaluated in every cycle.

It's a fair point. But even with current logic, we cannot ensure everytime the searching scope (starting at a shuffled nodeIndex) is optimal - it's random, but agnostic. So overall I think the overhead is acceptable, and can be mitigated by the automatic-exposed {Memory|CPU|Disk}Pressure taint - i.e., nodes that are out of resource will return at the very early TaintToleration Filter plugin and hence won't cost much.

Another aspect, as I mentioned earlier, is to make this option in profile level, so it won't impact the global (or other profiles) behavior. This is similar to the discussion we had in profile-level percentageOfNodesToScore.

I am wondering if node order should be a plugin; for bin-packing, the nodes would be ordered by how full they are.

I'm not sure it's practical enough as (1) the top ordered nodes may fail at the Filter phase due to capacity, and (2) we need to figure out the number of nodes to order.

I propose making this more generic. Instead of an option to just disable shuffling, we can create an enum option for the starting index:

round robin, like we currently do

always start at zero

random index

This makes sense to me. Enum can be more extensible than bool.

At the risk of stating the obvious, would setting PercentageOfNodesToScore to 100% solve this for the user? Performance will take a hit, of course, but it might be worth benchmarking for their use-case.

Using a non-100% PercentageOfNodesToScore is the prerequisite of this issue. If the user goes with 100%, they can always get a consistent behavior and hence they don't need this proposal at all.

alculquicondor · 2020-10-22T14:01:01Z

It sounds like moving percentagesOfNodesToScore to the profile is the best first step. Then we can re-evaluate if more is needed.

Huang-Wei · 2020-10-22T23:34:45Z

It sounds like moving percentagesOfNodesToScore to the profile is the best first step. Then we can re-evaluate if more is needed.

Agree.

fejta-bot · 2021-01-20T23:52:35Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Huang-Wei · 2021-01-21T00:17:06Z

/remove-lifecycle stale

alculquicondor · 2021-01-21T14:36:42Z

@Huang-Wei do you have any updates from your users?

Huang-Wei · 2021-01-22T07:48:23Z

@alculquicondor I can touch base with them again, but I think their requirement is clear here. Do you plan to get it implemented along with v1beta2?

alculquicondor · 2021-01-22T14:26:59Z

I was not planing to, as per #95709 (comment)

fejta-bot · 2021-04-22T15:20:15Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

Huang-Wei · 2021-04-22T16:28:48Z

/remove-lifecycle stale

fejta-bot · 2021-07-21T16:46:46Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

Huang-Wei · 2021-07-21T20:07:58Z

/remove-lifecycle stale

k8s-triage-robot · 2021-10-19T20:43:14Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-11-18T21:26:23Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-12-18T21:46:35Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-12-18T21:46:55Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Huang-Wei added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 19, 2020

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 19, 2020

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 19, 2020

SataQiu mentioned this issue Oct 23, 2020

scheduler: move percentagesOfNodesToScore to the scheduler profile #95823

Closed

SataQiu mentioned this issue Dec 13, 2020

scheduler: move percentagesOfNodesToScore to the scheduler profile #97263

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 18, 2021

k8s-ci-robot closed this as completed Dec 18, 2021

tzzcfrank mentioned this issue Jan 20, 2022

scheduler: add numOfNodesToScore to the scheduler profile #107661

Closed

yuanchen8911 mentioned this issue Sep 16, 2022

Add a scheduler profile level parameter percentageOfNodesToScore #112521

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: provide an option to not shuffle internal node index when using percentageOfNodesToScore #95709

scheduler: provide an option to not shuffle internal node index when using percentageOfNodesToScore #95709

Huang-Wei commented Oct 19, 2020

k8s-ci-robot commented Oct 19, 2020

SataQiu commented Oct 20, 2020

ahg-g commented Oct 20, 2020

adtac commented Oct 20, 2020 •

edited

ahg-g commented Oct 20, 2020

Huang-Wei commented Oct 20, 2020

alculquicondor commented Oct 22, 2020

Huang-Wei commented Oct 22, 2020

fejta-bot commented Jan 20, 2021

Huang-Wei commented Jan 21, 2021

alculquicondor commented Jan 21, 2021

Huang-Wei commented Jan 22, 2021

alculquicondor commented Jan 22, 2021

fejta-bot commented Apr 22, 2021

Huang-Wei commented Apr 22, 2021

fejta-bot commented Jul 21, 2021

Huang-Wei commented Jul 21, 2021

k8s-triage-robot commented Oct 19, 2021

k8s-triage-robot commented Nov 18, 2021

k8s-triage-robot commented Dec 18, 2021

k8s-ci-robot commented Dec 18, 2021

scheduler: provide an option to not shuffle internal node index when using percentageOfNodesToScore #95709

scheduler: provide an option to not shuffle internal node index when using percentageOfNodesToScore #95709

Comments

Huang-Wei commented Oct 19, 2020

k8s-ci-robot commented Oct 19, 2020

SataQiu commented Oct 20, 2020

ahg-g commented Oct 20, 2020

adtac commented Oct 20, 2020 • edited

ahg-g commented Oct 20, 2020

Huang-Wei commented Oct 20, 2020

alculquicondor commented Oct 22, 2020

Huang-Wei commented Oct 22, 2020

fejta-bot commented Jan 20, 2021

Huang-Wei commented Jan 21, 2021

alculquicondor commented Jan 21, 2021

Huang-Wei commented Jan 22, 2021

alculquicondor commented Jan 22, 2021

fejta-bot commented Apr 22, 2021

Huang-Wei commented Apr 22, 2021

fejta-bot commented Jul 21, 2021

Huang-Wei commented Jul 21, 2021

k8s-triage-robot commented Oct 19, 2021

k8s-triage-robot commented Nov 18, 2021

k8s-triage-robot commented Dec 18, 2021

k8s-ci-robot commented Dec 18, 2021

adtac commented Oct 20, 2020 •

edited