New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix scheduler issue with nodetree additions #93387
Fix scheduler issue with nodetree additions #93387
Conversation
Hi @maelk. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @ahg-g |
/hold let's discuss on the original PR |
/ok-to-test |
567b12c
to
b88c10c
Compare
/hold cancel |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, maelk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
b88c10c
to
a87071a
Compare
/retest |
@kubernetes/release-managers |
/retest |
2 similar comments
/retest |
/retest |
/retest |
@hasheddan the test pull-kubernetes-e2e-gce-device-plugin-gpu is failing constantly on this 1.18 patch, will your #93207 fix make to 1.18 and is it going to fix the failing test? |
@ahg-g interesting, I was monitoring the test on the 1.18 blocking dashboard https://testgrid.k8s.io/sig-release-1.18-blocking#gce-device-plugin-gpu-1.18 and it has been passing.. somehow? Anyway, I opened a backport to 1.17 today and am happy to do so for 1.18 as well. Any idea why this is passing on 1.18 blocking? |
I honestly have no idea what this test is doing and why it is under sig-scheduling in the first place! |
@ahg-g I believe it is testing that pods requesting GPUs are scheduled to correct nodes, so I am guessing that is why it lives in SIG-scheduling? Anyways, I have opened a backport to 1.18 :) |
thanks, do you know if the failure in this PR will be fixed by your PR? |
@ahg-g I cannot be 100% certain, but the failure is consistent with the one that was fixed when the PR was merged to |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest |
1 similar comment
/retest |
ping @kubernetes/release-managers |
/retest |
pull-kubernetes-integration job is failed due to timeout. /retest |
/retest Review the full test history for this PR. Silence the bot with an |
/retest |
/retest Review the full test history for this PR. Silence the bot with an |
What type of PR is this?
/kind bug
/sig scheduling
What this PR does / why we need it:
This is a backport of #93355 .
When adding multiple nodes to the scheduler nodetree, the function getting the next node does not return all the nodes one after an other, but skips some and duplicate others. This commit works around the problem by always starting with reset counters.
Which issue(s) this PR fixes:
Fixes #91601
Does this PR introduce a user-facing change?: