Skip to content

Conversation

dom4ha
Copy link
Member

@dom4ha dom4ha commented Oct 31, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Performance of processing unschedulable pods depends on the overall number of the scheduled pods, since all of them need to be considered in the postfilter by the preemption plugin.

This change parametrizes Unschedulable test with the number of initial pods, to show the difference in the throughput of regular pods when they are interleaved with these unschedulable ones.

The throughput numbers shown by this test case are not fully representative, as they only indirectly show the time scheduler needs spend processing unschedulable pods. In practices, we are more interested in the time this processing takes, since scheduling large number of high priority unschedulable pods will in fact block scheduler for a certain time.

I made some measures and it turns out that when there is 20k pods running, the scheduling takes 20ms (50/s), which means that scheduling 3k such pods will block scheduling for a minute.

Which issue(s) this PR fixes:

Part of #128221

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 31, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 31, 2024
Copy link
Member Author

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test pull-kubernetes-scheduler-perf

initNodes: 5000
initPods: 100
measurePods: 1000
- name: 5kNodes/10kPods
Copy link
Member

@AxeZhan AxeZhan Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: 5kNodes/10kPods
- name: 5kNodes/100Init/10kPods

Should we keep the test names consistent?
Either all xNodes/xInit/xPods, or all xNodes/xPods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2024
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 5, 2024
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 5, 2024
Copy link
Member Author

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test pull-kubernetes-scheduler-perf

Copy link
Member Author

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
thresholds needs to be adjusted

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 6, 2024
@dom4ha
Copy link
Member Author

dom4ha commented Nov 20, 2024

/test pull-kubernetes-scheduler-perf

Copy link
Member Author

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like mentioned in #128968 (comment), I'm not able to set thresholds accurately, so putting my predictions.

@sanposhiho
Copy link
Member

sanposhiho commented Nov 26, 2024

For now, for newly added or changed tests, we can skip adding the thresholds, and can do later after getting an enough number of historical runs.

@macsko
Copy link
Member

macsko commented Nov 26, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 26, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a9b797f2bd647b27dfe9c633154a92ab2bd2a5ec

@macsko
Copy link
Member

macsko commented Nov 26, 2024

To verify if kubernetes/test-infra#33850 is working:
/test pull-kubernetes-scheduler-perf

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 27, 2024
@dom4ha
Copy link
Member Author

dom4ha commented Nov 27, 2024

Thanks @macsko , I got the numbers from the produced artefacts:

Unschedulable_5kNodes_20kInit_10kPods - 121 (predicted 100)
Unschedulable_5kNodes_100Init_10kPods - 258 (limit 140)
Unschedulable_5kNodes_20kInit_10kPods_QueueingHintsEnabled - 208 (predicted 120)
Unschedulable_5kNodes_100Init_10kPods_QueueingHintsEnabled - 307 (limit 170)

I indeed guestimated the limits and looking into above number, they indeed might be too high. Let's remove the limits then and wait for some historical runs.

Copy link
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

leave /lgtm to @macsko

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anandfresh, dom4ha, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2024
@macsko
Copy link
Member

macsko commented Nov 28, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 28, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a44ab33442fcbbaba4d51952cc9d9811267f9d6d

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2025
@dom4ha
Copy link
Member Author

dom4ha commented Jan 23, 2025

/test pull-kubernetes-scheduler-perf

@dom4ha
Copy link
Member Author

dom4ha commented Jan 23, 2025

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2025
@macsko
Copy link
Member

macsko commented Jan 23, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b6b1c7b5a83a1cd6b9f1ffe4f5e3b466142ea92e

@k8s-ci-robot k8s-ci-robot merged commit 2334b84 into kubernetes:master Jan 23, 2025
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants