-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster scheduler #77509
Faster scheduler #77509
Conversation
Hi @ahg-g. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test Thank you very much for your contribution, it would be better if you could give performance comparison data. (When CI is happy, I will take the time to review the code.) |
@@ -28,7 +28,7 @@ import ( | |||
|
|||
"k8s.io/klog" | |||
|
|||
"k8s.io/api/core/v1" | |||
v1 "k8s.io/api/core/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe VS Code makes this change automatically. Please revert this line. There is no point in having this alias here an in other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It is emacs actually :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably wrapping the same util. I think this is a behavior of goreturns
.
@@ -400,14 +400,14 @@ func TestNodeTreeMultiOperations(t *testing.T) { | |||
nodesToAdd: append(allNodes[4:9], allNodes[3]), | |||
nodesToRemove: nil, | |||
operations: []string{"add", "add", "add", "add", "add", "next", "next", "next", "next", "add", "next", "next", "next"}, | |||
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-3", "node-8", "node-4"}, | |||
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-4", "node-5", "node-6"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... This seems to be incorrect. We are not seeing node-8 and node-3 before seeing node-4 for the second time. We haven't changed the logic of node tree. Why this test is changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We kind of changed the logic: every time a node is added or removed, recomputeAllNodes is invoked, which resets all indicies since it calls resetExhausted, which basically resets nt.next() to start from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I forgot about that. We should make sure that we have test cases that ensures all nodes appear in next
output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -20,7 +20,7 @@ import ( | |||
"fmt" | |||
"sync" | |||
|
|||
"k8s.io/api/core/v1" | |||
v1 "k8s.io/api/core/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and the next file still have the alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -400,14 +400,14 @@ func TestNodeTreeMultiOperations(t *testing.T) { | |||
nodesToAdd: append(allNodes[4:9], allNodes[3]), | |||
nodesToRemove: nil, | |||
operations: []string{"add", "add", "add", "add", "add", "next", "next", "next", "next", "add", "next", "next", "next"}, | |||
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-3", "node-8", "node-4"}, | |||
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-4", "node-5", "node-6"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I forgot about that. We should make sure that we have test cases that ensures all nodes appear in next
output.
// AllNodes returns the list of nodes as they would be iterated by | ||
// Next() method. | ||
func (nt *NodeTree) AllNodes() []string { | ||
nt.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use read-write lock —— nt.mu.RLock()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -159,9 +162,7 @@ func (nt *NodeTree) resetExhausted() { | |||
|
|||
// Next returns the name of the next node. NodeTree iterates over zones and in each zone iterates | |||
// over nodes in a round robin fashion. | |||
func (nt *NodeTree) Next() string { | |||
nt.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm that the lock here should be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is actually the main goal of this PR, removing the contention here.
@@ -510,7 +515,8 @@ func (g *genericScheduler) findNodesThatFit(pod *v1.Pod, nodes []*v1.Node) ([]*v | |||
|
|||
// Stops searching for more nodes once the configured number of feasible nodes | |||
// are found. | |||
workqueue.ParallelizeUntil(ctx, 16, int(allNodes), checkNode) | |||
workqueue.ParallelizeUntil(ctx, 16, len(allNodes), checkNode) | |||
g.lastIndex = (g.lastIndex + int(processedNodes)) % len(allNodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question, how do we deal with the reduction in the number of Nodes using the index? Is it possible to change position(skipped a node) or index out of range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modulo should handle this, it ensures that g.lastIndex is always between 0 and "number_of_nodes - 1". Please let me know if I didn't fully answer the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose the current number of nodes is 100, g. lastIndex is 90, but in the next scheduling loop, the number of nodes is reduced to 80, at this time g. lastIndex value does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, but we are taking the modulo: (90 + whatever) % 80 = a number between 0 and 79
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then will this break the fairness of each node being chosen? Because we can't guarantee which node nodes are deleted, we can't find the last time we left by indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, this is pretty much the same semantics of the original code (if not better).
In the original code, we were relying on next to pick the node. nodeArray.next function resets to zero if the index was larger than the length of the node array, and so it has the exact same issue you are describing here: if the index was 90, and the next scheduling loop the number of nodes were reduced to 80, then the next node that will be picked up is always the one at index 0 irrespective of which nodes were removed.
The only difference with the new logic is that we don't reset to zero, we loop back, so in the example above, the next nodes will be picked starting from index 10. So in a way we improve fairness in that we don't always restart at the same place (zero).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessarily better than the old code. In the old version of the code, fairness after adding/removing nodes was being kept by NodeTree. Now, the client of NodeTree preserves fairness. So, I think the final outcome is similar.
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thanks, @ahg-g and congrats on your first PR!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, bsalamat The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm cancel Oh, can you please squash commits? |
Commits squashed. |
/retest |
/lgtm |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Improves schedulers performance by reducing lock contention when iterating over the nodes.
Which issue(s) this PR fixes:
Fixes #72408
Special notes for your reviewer:
Does this PR introduce a user-facing change?: