New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use reservoir sampling to select one host from priority list #78009
Conversation
Hi @hainesc. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice!
ping @resouer @misterikkit for review and ok-to-test label Thanks. |
ping @kubernetes/sig-scheduling-pr-reviews for review |
@hainesc: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/priority backlog
/assign @misterikkit
@bsalamat - while i can buy the logic behind the change, I'm afraid that this might over-complicate our testing logic, thoughts ? |
/test pull-kubernetes-bazel-test |
@hainesc: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ping @yastij for |
/ok-to-test |
re-applying lgtm per my first comment /lgtm |
ping @misterikkit for approve, please take a look. Thanks. |
ping @yastij Can you help me ping an approver? It is a long time since the |
we are in code freeze now, even if it gets approved, it is not going to be merged until the freeze is over (probably mid June). |
Hi @ahg-g, Can this pr get merged? |
ping @yastij for approve. |
thanks @hainesc , I love this creative solution, but it seems this will break the previous round-robin manner to select a node and change to random selecting? |
@gaorong Yes, as shown by the benchmark, random selecting wins a lot against the round-robin manner in performance. And also, as you see, the code is more cleaner than the previous. I'd like to know what's your concern when we break the round-robin? In my opinion, there should be no assumption which node(with the highest score) will be selected, the only expected thing is that the candidate should be equal in probability. |
@gaorong The current "round robin" behavior doesn't actually result in round robin scheduling. Rather, the @bsalamat can confirm, but the intent of this mechanic is to prevent one node from getting all the pending pods when there are other equivalent nodes. |
Thanks @misterikkit Yes, the input slice is likely to be different in scores and order as the cluster is dynamic. Looking forward this PR getting merged (yeah, a long time wait), as it is proved to be better than the current code. It is approved by @ahg-g and @yastij Thanks, all! |
thanks misterikkit@, after rethinking, I guess the |
@Huang-Wei @bsalamat I like this PR, it allows removing a member variable of genericScheduler without compromising semantics (the minimal perf improvement is a cherry on the top). |
@gaorong, I took a deeper look into the code, and the filter step makes no promises about stable ordering. In fact, it evaluates nodes in parallel. So in that sense, we are already doing an arbitrary, undefined distribution with |
Thanks @misterikkit for the above comment. Simply put, we are doing (1) nodes shuffling + (2) (to some extent) fair nodes pickup by |
thank you guys for clarification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thanks, @hainesc !
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hainesc, Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm.
this change might node improve perf but it simplify the code and randomized the node selection. so let's take it.
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
The function
selectHost
was designed to select one host from priority list with a highest score, However, the current manner use round-robin and it has to keep a var namedlastNodeIndex
in the struct ofgenericScheduler
, and it introduces a helper functionfindMaxScores
, both are not needed if we use reservoir sampling, which is more suitable for selecting one candidate with same possibility from a set, and the code will be cleaner. For the detail of reservoir sampling, see here.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: