New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-1.19] Balance nodes in scheduling e2e #98810
[release-1.19] Balance nodes in scheduling e2e #98810
Conversation
@damemi: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @alculquicondor
this would be good to backport, should be a clean change but please take a look
…ining pods The test is not cleaning all pods it created. Memory balancing pods are deleted once the test namespace is. Thus, leaving the pods running or in terminating state when a new test is run. In case the next test is "[sig-scheduling] SchedulerPredicates [Serial] validates resource limits of pods that are allowed to run", the test can fail.
This adds a call to createBalancedPods during the ubernetes_lite scheduling e2es, which are prone to improper score balancing due to unbalanced utilization.
5b0a7d5
to
6e297f9
Compare
FYI @ingvagabund /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, damemi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ref #94684 /retest |
/retest Review the full test history for this PR. Silence the bot with an |
2 similar comments
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
What type of PR is this?
/kind flake
What this PR does / why we need it:
Backporting #98699 (and 2318992) to ensure nodes are balanced in scheduling e2e
In clusters that are unevenly running more components than a vanilla k8s install, resource request variance can be amplified which will influence scheduler scoring decisions. Occasionally this resource-balancing score influence is more than the desired spreading of the test, leading to flakes (of pods being unevenly spread). An example of this variance was observed here: openshift#547 (comment)
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/sig scheduling