Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster scheduler #77509

Merged
merged 1 commit into from
May 9, 2019
Merged

Faster scheduler #77509

merged 1 commit into from
May 9, 2019

Conversation

ahg-g
Copy link
Member

@ahg-g ahg-g commented May 6, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:
Improves schedulers performance by reducing lock contention when iterating over the nodes.

Which issue(s) this PR fixes:

Fixes #72408

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 6, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @ahg-g. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 6, 2019
@wgliang
Copy link
Contributor

wgliang commented May 6, 2019

/ok-to-test

Thank you very much for your contribution, it would be better if you could give performance comparison data.

(When CI is happy, I will take the time to review the code.)

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 6, 2019
@@ -28,7 +28,7 @@ import (

"k8s.io/klog"

"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe VS Code makes this change automatically. Please revert this line. There is no point in having this alias here an in other files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It is emacs actually :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably wrapping the same util. I think this is a behavior of goreturns.

@@ -400,14 +400,14 @@ func TestNodeTreeMultiOperations(t *testing.T) {
nodesToAdd: append(allNodes[4:9], allNodes[3]),
nodesToRemove: nil,
operations: []string{"add", "add", "add", "add", "add", "next", "next", "next", "next", "add", "next", "next", "next"},
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-3", "node-8", "node-4"},
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-4", "node-5", "node-6"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... This seems to be incorrect. We are not seeing node-8 and node-3 before seeing node-4 for the second time. We haven't changed the logic of node tree. Why this test is changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We kind of changed the logic: every time a node is added or removed, recomputeAllNodes is invoked, which resets all indicies since it calls resetExhausted, which basically resets nt.next() to start from the beginning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I forgot about that. We should make sure that we have test cases that ensures all nodes appear in next output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 6, 2019
@bsalamat bsalamat added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label May 6, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label May 6, 2019
@bsalamat bsalamat self-assigned this May 6, 2019
@@ -20,7 +20,7 @@ import (
"fmt"
"sync"

"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and the next file still have the alias.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -400,14 +400,14 @@ func TestNodeTreeMultiOperations(t *testing.T) {
nodesToAdd: append(allNodes[4:9], allNodes[3]),
nodesToRemove: nil,
operations: []string{"add", "add", "add", "add", "add", "next", "next", "next", "next", "add", "next", "next", "next"},
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-3", "node-8", "node-4"},
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-4", "node-5", "node-6"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I forgot about that. We should make sure that we have test cases that ensures all nodes appear in next output.

// AllNodes returns the list of nodes as they would be iterated by
// Next() method.
func (nt *NodeTree) AllNodes() []string {
nt.mu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use read-write lock —— nt.mu.RLock()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -159,9 +162,7 @@ func (nt *NodeTree) resetExhausted() {

// Next returns the name of the next node. NodeTree iterates over zones and in each zone iterates
// over nodes in a round robin fashion.
func (nt *NodeTree) Next() string {
nt.mu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that the lock here should be deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is actually the main goal of this PR, removing the contention here.

@@ -510,7 +515,8 @@ func (g *genericScheduler) findNodesThatFit(pod *v1.Pod, nodes []*v1.Node) ([]*v

// Stops searching for more nodes once the configured number of feasible nodes
// are found.
workqueue.ParallelizeUntil(ctx, 16, int(allNodes), checkNode)
workqueue.ParallelizeUntil(ctx, 16, len(allNodes), checkNode)
g.lastIndex = (g.lastIndex + int(processedNodes)) % len(allNodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question, how do we deal with the reduction in the number of Nodes using the index? Is it possible to change position(skipped a node) or index out of range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modulo should handle this, it ensures that g.lastIndex is always between 0 and "number_of_nodes - 1". Please let me know if I didn't fully answer the question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the current number of nodes is 100, g. lastIndex is 90, but in the next scheduling loop, the number of nodes is reduced to 80, at this time g. lastIndex value does not exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but we are taking the modulo: (90 + whatever) % 80 = a number between 0 and 79

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then will this break the fairness of each node being chosen? Because we can't guarantee which node nodes are deleted, we can't find the last time we left by indexing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is pretty much the same semantics of the original code (if not better).

In the original code, we were relying on next to pick the node. nodeArray.next function resets to zero if the index was larger than the length of the node array, and so it has the exact same issue you are describing here: if the index was 90, and the next scheduling loop the number of nodes were reduced to 80, then the next node that will be picked up is always the one at index 0 irrespective of which nodes were removed.

The only difference with the new logic is that we don't reset to zero, we loop back, so in the example above, the next nodes will be picked starting from index 10. So in a way we improve fairness in that we don't always restart at the same place (zero).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessarily better than the old code. In the old version of the code, fairness after adding/removing nodes was being kept by NodeTree. Now, the client of NodeTree preserves fairness. So, I think the final outcome is similar.

@ahg-g
Copy link
Member Author

ahg-g commented May 7, 2019

/retest

Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks, @ahg-g and congrats on your first PR!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 7, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, bsalamat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2019
@bsalamat
Copy link
Member

bsalamat commented May 7, 2019

/lgtm cancel

Oh, can you please squash commits?

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 7, 2019
@ahg-g
Copy link
Member Author

ahg-g commented May 8, 2019

Commits squashed.

@ahg-g
Copy link
Member Author

ahg-g commented May 8, 2019

/retest

@wgliang
Copy link
Contributor

wgliang commented May 8, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 8, 2019
@k8s-ci-robot k8s-ci-robot merged commit baa2030 into kubernetes:master May 9, 2019
@ahg-g ahg-g mentioned this pull request Jun 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NodeTree.Next() in scheduler is really expensive
5 participants