feature: Make Unschedulable scheduler performance test parametrized with the number of initial nodes. #128466

dom4ha · 2024-10-31T10:16:43Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Performance of processing unschedulable pods depends on the overall number of the scheduled pods, since all of them need to be considered in the postfilter by the preemption plugin.

This change parametrizes Unschedulable test with the number of initial pods, to show the difference in the throughput of regular pods when they are interleaved with these unschedulable ones.

The throughput numbers shown by this test case are not fully representative, as they only indirectly show the time scheduler needs spend processing unschedulable pods. In practices, we are more interested in the time this processing takes, since scheduling large number of high priority unschedulable pods will in fact block scheduler for a certain time.

I made some measures and it turns out that when there is 20k pods running, the scheduling takes 20ms (50/s), which means that scheduling 3k such pods will block scheduling for a minute.

Which issue(s) this PR fixes:

Part of #128221

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2024-10-31T10:16:52Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

dom4ha

/test pull-kubernetes-scheduler-perf

AxeZhan · 2024-10-31T11:28:31Z

test/integration/scheduler_perf/config/performance-config.yaml

      initNodes: 5000
+      initPods: 100
      measurePods: 1000
  - name: 5kNodes/10kPods


Suggested change

- name: 5kNodes/10kPods

- name: 5kNodes/100Init/10kPods

Should we keep the test names consistent?
Either all xNodes/xInit/xPods, or all xNodes/xPods

dom4ha

/test pull-kubernetes-scheduler-perf

dom4ha

/hold
thresholds needs to be adjusted

dom4ha · 2024-11-20T17:50:17Z

/test pull-kubernetes-scheduler-perf

dom4ha

Like mentioned in #128968 (comment), I'm not able to set thresholds accurately, so putting my predictions.

sanposhiho · 2024-11-26T01:33:42Z

For now, for newly added or changed tests, we can skip adding the thresholds, and can do later after getting an enough number of historical runs.

macsko · 2024-11-26T09:38:53Z

/lgtm

k8s-ci-robot · 2024-11-26T09:39:00Z

LGTM label has been added.

Git tree hash: a9b797f2bd647b27dfe9c633154a92ab2bd2a5ec

test/integration/scheduler_perf/misc/performance-config.yaml

macsko · 2024-11-26T14:40:33Z

To verify if kubernetes/test-infra#33850 is working:
/test pull-kubernetes-scheduler-perf

dom4ha · 2024-11-27T21:54:35Z

Thanks @macsko , I got the numbers from the produced artefacts:

Unschedulable_5kNodes_20kInit_10kPods - 121 (predicted 100)
Unschedulable_5kNodes_100Init_10kPods - 258 (limit 140)
Unschedulable_5kNodes_20kInit_10kPods_QueueingHintsEnabled - 208 (predicted 120)
Unschedulable_5kNodes_100Init_10kPods_QueueingHintsEnabled - 307 (limit 170)

I indeed guestimated the limits and looking into above number, they indeed might be too high. Let's remove the limits then and wait for some historical runs.

sanposhiho

/approve

leave /lgtm to @macsko

k8s-ci-robot · 2024-11-28T02:53:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anandfresh, dom4ha, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/integration/scheduler_perf/OWNERS~~ [sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

macsko · 2024-11-28T08:27:23Z

/lgtm

k8s-ci-robot · 2024-11-28T08:27:30Z

LGTM label has been added.

Git tree hash: a44ab33442fcbbaba4d51952cc9d9811267f9d6d

…ith the number of initial nodes.

dom4ha · 2025-01-23T00:49:38Z

/test pull-kubernetes-scheduler-perf

dom4ha · 2025-01-23T09:28:27Z

/unhold

macsko · 2025-01-23T09:30:22Z

/lgtm

k8s-ci-robot · 2025-01-23T09:30:28Z

LGTM label has been added.

Git tree hash: b6b1c7b5a83a1cd6b9f1ffe4f5e3b466142ea92e

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 31, 2024

k8s-ci-robot requested review from AxeZhan and denkensk October 31, 2024 10:17

dom4ha commented Oct 31, 2024

View reviewed changes

anandfresh approved these changes Oct 31, 2024

View reviewed changes

AxeZhan reviewed Oct 31, 2024

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2024

dom4ha force-pushed the scheduler-perf branch from 4dc6e21 to 9280429 Compare November 5, 2024 18:03

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 5, 2024

dom4ha force-pushed the scheduler-perf branch from 106f55e to e5edc4b Compare November 5, 2024 18:54

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 5, 2024

dom4ha commented Nov 5, 2024

View reviewed changes

dom4ha commented Nov 6, 2024

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 6, 2024

dom4ha force-pushed the scheduler-perf branch from e5edc4b to 06dbf5e Compare November 20, 2024 17:36

dom4ha force-pushed the scheduler-perf branch from 06dbf5e to 24e814c Compare November 25, 2024 21:58

k8s-ci-robot requested a review from sanposhiho November 25, 2024 22:02

dom4ha commented Nov 25, 2024

View reviewed changes

k8s-ci-robot assigned macsko Nov 26, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 26, 2024

sanposhiho reviewed Nov 26, 2024

View reviewed changes

test/integration/scheduler_perf/misc/performance-config.yaml Outdated Show resolved Hide resolved

dom4ha force-pushed the scheduler-perf branch from 24e814c to ad503c0 Compare November 27, 2024 21:54

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 27, 2024

sanposhiho approved these changes Nov 28, 2024

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 28, 2024

feature: Make Unschedulable scheduler performance test parametrized w…

f150016

…ith the number of initial nodes.

dom4ha force-pushed the scheduler-perf branch from ad503c0 to f150016 Compare January 23, 2025 00:48

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2025

k8s-ci-robot merged commit 2334b84 into kubernetes:master Jan 23, 2025
14 checks passed

k8s-ci-robot added this to the v1.33 milestone Jan 23, 2025

dom4ha mentioned this pull request Jan 24, 2025

Pop from the backoff queue whenever the active queue is empty #129806

Closed

feature: Make Unschedulable scheduler performance test parametrized with the number of initial nodes. #128466

feature: Make Unschedulable scheduler performance test parametrized with the number of initial nodes. #128466

Uh oh!

Conversation

dom4ha commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Oct 31, 2024

Uh oh!

dom4ha left a comment

Choose a reason for hiding this comment

Uh oh!

AxeZhan Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dom4ha Nov 5, 2024

Choose a reason for hiding this comment

Uh oh!

dom4ha left a comment

Choose a reason for hiding this comment

Uh oh!

dom4ha left a comment

Choose a reason for hiding this comment

Uh oh!

dom4ha commented Nov 20, 2024

Uh oh!

dom4ha left a comment

Choose a reason for hiding this comment

Uh oh!

sanposhiho commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

macsko commented Nov 26, 2024

Uh oh!

k8s-ci-robot commented Nov 26, 2024

Uh oh!

Uh oh!

macsko commented Nov 26, 2024

Uh oh!

dom4ha commented Nov 27, 2024

Uh oh!

sanposhiho left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Nov 28, 2024

Uh oh!

macsko commented Nov 28, 2024

Uh oh!

k8s-ci-robot commented Nov 28, 2024

Uh oh!

dom4ha commented Jan 23, 2025

Uh oh!

dom4ha commented Jan 23, 2025

Uh oh!

macsko commented Jan 23, 2025

Uh oh!

k8s-ci-robot commented Jan 23, 2025

Uh oh!

Uh oh!

Uh oh!

dom4ha commented Oct 31, 2024 •

edited

Loading

AxeZhan Oct 31, 2024 •

edited

Loading

sanposhiho commented Nov 26, 2024 •

edited

Loading