Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster scheduler #77509

Merged
merged 1 commit into from
May 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 10 additions & 4 deletions pkg/scheduler/core/generic_scheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ type genericScheduler struct {
pvcLister corelisters.PersistentVolumeClaimLister
pdbLister algorithm.PDBLister
disablePreemption bool
lastIndex int
percentageOfNodesToScore int32
}

Expand Down Expand Up @@ -460,8 +461,8 @@ func (g *genericScheduler) findNodesThatFit(pod *v1.Pod, nodes []*v1.Node) ([]*v
if len(g.predicates) == 0 {
filtered = nodes
} else {
allNodes := int32(g.cache.NodeTree().NumNodes())
numNodesToFind := g.numFeasibleNodesToFind(allNodes)
allNodes := g.cache.NodeTree().AllNodes()
numNodesToFind := g.numFeasibleNodesToFind(int32(len(allNodes)))

// Create filtered list with enough space to avoid growing it
// and allow assigning.
Expand All @@ -477,8 +478,12 @@ func (g *genericScheduler) findNodesThatFit(pod *v1.Pod, nodes []*v1.Node) ([]*v
// We can use the same metadata producer for all nodes.
meta := g.predicateMetaProducer(pod, g.nodeInfoSnapshot.NodeInfoMap)

processedNodes := int32(0)
checkNode := func(i int) {
nodeName := g.cache.NodeTree().Next()
// We check the nodes starting from where we left off in the previous scheduling cycle,
// this is to make sure all nodes have the same chance of being examined across pods.
atomic.AddInt32(&processedNodes, 1)
nodeName := allNodes[(g.lastIndex+i)%len(allNodes)]
fits, failedPredicates, err := podFitsOnNode(
pod,
meta,
Expand Down Expand Up @@ -510,7 +515,8 @@ func (g *genericScheduler) findNodesThatFit(pod *v1.Pod, nodes []*v1.Node) ([]*v

// Stops searching for more nodes once the configured number of feasible nodes
// are found.
workqueue.ParallelizeUntil(ctx, 16, int(allNodes), checkNode)
workqueue.ParallelizeUntil(ctx, 16, len(allNodes), checkNode)
g.lastIndex = (g.lastIndex + int(processedNodes)) % len(allNodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question, how do we deal with the reduction in the number of Nodes using the index? Is it possible to change position(skipped a node) or index out of range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modulo should handle this, it ensures that g.lastIndex is always between 0 and "number_of_nodes - 1". Please let me know if I didn't fully answer the question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the current number of nodes is 100, g. lastIndex is 90, but in the next scheduling loop, the number of nodes is reduced to 80, at this time g. lastIndex value does not exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but we are taking the modulo: (90 + whatever) % 80 = a number between 0 and 79

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then will this break the fairness of each node being chosen? Because we can't guarantee which node nodes are deleted, we can't find the last time we left by indexing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is pretty much the same semantics of the original code (if not better).

In the original code, we were relying on next to pick the node. nodeArray.next function resets to zero if the index was larger than the length of the node array, and so it has the exact same issue you are describing here: if the index was 90, and the next scheduling loop the number of nodes were reduced to 80, then the next node that will be picked up is always the one at index 0 irrespective of which nodes were removed.

The only difference with the new logic is that we don't reset to zero, we loop back, so in the example above, the next nodes will be picked starting from index 10. So in a way we improve fairness in that we don't always restart at the same place (zero).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessarily better than the old code. In the old version of the code, fairness after adding/removing nodes was being kept by NodeTree. Now, the client of NodeTree preserves fairness. So, I think the final outcome is similar.


filtered = filtered[:filteredLen]
if len(errs) > 0 {
Expand Down
6 changes: 3 additions & 3 deletions pkg/scheduler/internal/cache/cache_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1066,7 +1066,7 @@ func TestNodeOperators(t *testing.T) {
if !found {
t.Errorf("Failed to find node %v in internalcache.", node.Name)
}
if cache.nodeTree.NumNodes() != 1 || cache.nodeTree.Next() != node.Name {
if cache.nodeTree.NumNodes() != 1 || cache.nodeTree.next() != node.Name {
t.Errorf("cache.nodeTree is not updated correctly after adding node: %v", node.Name)
}

Expand Down Expand Up @@ -1109,7 +1109,7 @@ func TestNodeOperators(t *testing.T) {
t.Errorf("Failed to update node in schedulernodeinfo:\n got: %+v \nexpected: %+v", got, expected)
}
// Check nodeTree after update
if cache.nodeTree.NumNodes() != 1 || cache.nodeTree.Next() != node.Name {
if cache.nodeTree.NumNodes() != 1 || cache.nodeTree.next() != node.Name {
t.Errorf("unexpected cache.nodeTree after updating node: %v", node.Name)
}

Expand All @@ -1120,7 +1120,7 @@ func TestNodeOperators(t *testing.T) {
}
// Check nodeTree after remove. The node should be removed from the nodeTree even if there are
// still pods on it.
if cache.nodeTree.NumNodes() != 0 || cache.nodeTree.Next() != "" {
if cache.nodeTree.NumNodes() != 0 || cache.nodeTree.next() != "" {
t.Errorf("unexpected cache.nodeTree after removing node: %v", node.Name)
}
}
Expand Down
23 changes: 20 additions & 3 deletions pkg/scheduler/internal/cache/node_tree.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ type NodeTree struct {
tree map[string]*nodeArray // a map from zone (region-zone) to an array of nodes in the zone.
zones []string // a list of all the zones in the tree (keys)
zoneIndex int
allNodes []string
numNodes int
mu sync.RWMutex
}
Expand Down Expand Up @@ -92,6 +93,7 @@ func (nt *NodeTree) addNode(n *v1.Node) {
}
klog.V(5).Infof("Added node %v in group %v to NodeTree", n.Name, zone)
nt.numNodes++
nt.recomputeAllNodes()
}

// RemoveNode removes a node from the NodeTree.
Expand All @@ -112,6 +114,7 @@ func (nt *NodeTree) removeNode(n *v1.Node) error {
}
klog.V(5).Infof("Removed node %v in group %v from NodeTree", n.Name, zone)
nt.numNodes--
nt.recomputeAllNodes()
return nil
}
}
Expand Down Expand Up @@ -159,9 +162,7 @@ func (nt *NodeTree) resetExhausted() {

// Next returns the name of the next node. NodeTree iterates over zones and in each zone iterates
// over nodes in a round robin fashion.
func (nt *NodeTree) Next() string {
nt.mu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that the lock here should be deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is actually the main goal of this PR, removing the contention here.

defer nt.mu.Unlock()
func (nt *NodeTree) next() string {
if len(nt.zones) == 0 {
return ""
}
Expand All @@ -186,6 +187,22 @@ func (nt *NodeTree) Next() string {
}
}

func (nt *NodeTree) recomputeAllNodes() {
nt.allNodes = make([]string, 0, nt.numNodes)
nt.resetExhausted()
for i := 0; i < nt.numNodes; i++ {
nt.allNodes = append(nt.allNodes, nt.next())
}
}

// AllNodes returns the list of nodes as they would be iterated by
// Next() method.
func (nt *NodeTree) AllNodes() []string {
nt.mu.RLock()
defer nt.mu.RUnlock()
return nt.allNodes
}

// NumNodes returns the number of nodes.
func (nt *NodeTree) NumNodes() int {
nt.mu.RLock()
Expand Down
68 changes: 34 additions & 34 deletions pkg/scheduler/internal/cache/node_tree_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -140,28 +140,28 @@ func TestNodeTree_AddNode(t *testing.T) {
{
name: "single node no labels",
nodesToAdd: allNodes[:1],
expectedTree: map[string]*nodeArray{"": {[]string{"node-0"}, 0}},
expectedTree: map[string]*nodeArray{"": {[]string{"node-0"}, 1}},
},
{
name: "mix of nodes with and without proper labels",
nodesToAdd: allNodes[:4],
expectedTree: map[string]*nodeArray{
"": {[]string{"node-0"}, 0},
"region-1:\x00:": {[]string{"node-1"}, 0},
":\x00:zone-2": {[]string{"node-2"}, 0},
"region-1:\x00:zone-2": {[]string{"node-3"}, 0},
"": {[]string{"node-0"}, 1},
"region-1:\x00:": {[]string{"node-1"}, 1},
":\x00:zone-2": {[]string{"node-2"}, 1},
"region-1:\x00:zone-2": {[]string{"node-3"}, 1},
},
},
{
name: "mix of nodes with and without proper labels and some zones with multiple nodes",
nodesToAdd: allNodes[:7],
expectedTree: map[string]*nodeArray{
"": {[]string{"node-0"}, 0},
"region-1:\x00:": {[]string{"node-1"}, 0},
":\x00:zone-2": {[]string{"node-2"}, 0},
"region-1:\x00:zone-2": {[]string{"node-3", "node-4"}, 0},
"region-1:\x00:zone-3": {[]string{"node-5"}, 0},
"region-2:\x00:zone-2": {[]string{"node-6"}, 0},
"": {[]string{"node-0"}, 1},
"region-1:\x00:": {[]string{"node-1"}, 1},
":\x00:zone-2": {[]string{"node-2"}, 1},
"region-1:\x00:zone-2": {[]string{"node-3", "node-4"}, 2},
"region-1:\x00:zone-3": {[]string{"node-5"}, 1},
"region-2:\x00:zone-2": {[]string{"node-6"}, 1},
},
},
}
Expand Down Expand Up @@ -190,22 +190,22 @@ func TestNodeTree_RemoveNode(t *testing.T) {
existingNodes: allNodes[:7],
nodesToRemove: allNodes[:1],
expectedTree: map[string]*nodeArray{
"region-1:\x00:": {[]string{"node-1"}, 0},
":\x00:zone-2": {[]string{"node-2"}, 0},
"region-1:\x00:zone-2": {[]string{"node-3", "node-4"}, 0},
"region-1:\x00:zone-3": {[]string{"node-5"}, 0},
"region-2:\x00:zone-2": {[]string{"node-6"}, 0},
"region-1:\x00:": {[]string{"node-1"}, 1},
":\x00:zone-2": {[]string{"node-2"}, 1},
"region-1:\x00:zone-2": {[]string{"node-3", "node-4"}, 2},
"region-1:\x00:zone-3": {[]string{"node-5"}, 1},
"region-2:\x00:zone-2": {[]string{"node-6"}, 1},
},
},
{
name: "remove a few nodes including one from a zone with multiple nodes",
existingNodes: allNodes[:7],
nodesToRemove: allNodes[1:4],
expectedTree: map[string]*nodeArray{
"": {[]string{"node-0"}, 0},
"region-1:\x00:zone-2": {[]string{"node-4"}, 0},
"region-1:\x00:zone-3": {[]string{"node-5"}, 0},
"region-2:\x00:zone-2": {[]string{"node-6"}, 0},
"": {[]string{"node-0"}, 1},
"region-1:\x00:zone-2": {[]string{"node-4"}, 1},
"region-1:\x00:zone-3": {[]string{"node-5"}, 1},
"region-2:\x00:zone-2": {[]string{"node-6"}, 1},
},
},
{
Expand Down Expand Up @@ -257,11 +257,11 @@ func TestNodeTree_UpdateNode(t *testing.T) {
},
},
expectedTree: map[string]*nodeArray{
"region-1:\x00:": {[]string{"node-1"}, 0},
":\x00:zone-2": {[]string{"node-2"}, 0},
"region-1:\x00:zone-2": {[]string{"node-3", "node-4", "node-0"}, 0},
"region-1:\x00:zone-3": {[]string{"node-5"}, 0},
"region-2:\x00:zone-2": {[]string{"node-6"}, 0},
"region-1:\x00:": {[]string{"node-1"}, 1},
":\x00:zone-2": {[]string{"node-2"}, 1},
"region-1:\x00:zone-2": {[]string{"node-3", "node-4", "node-0"}, 3},
"region-1:\x00:zone-3": {[]string{"node-5"}, 1},
"region-2:\x00:zone-2": {[]string{"node-6"}, 1},
},
},
{
Expand All @@ -277,7 +277,7 @@ func TestNodeTree_UpdateNode(t *testing.T) {
},
},
expectedTree: map[string]*nodeArray{
"region-1:\x00:zone-2": {[]string{"node-0"}, 0},
"region-1:\x00:zone-2": {[]string{"node-0"}, 1},
},
},
{
Expand All @@ -293,8 +293,8 @@ func TestNodeTree_UpdateNode(t *testing.T) {
},
},
expectedTree: map[string]*nodeArray{
"": {[]string{"node-0"}, 0},
"region-1:\x00:zone-2": {[]string{"node-new"}, 0},
"": {[]string{"node-0"}, 1},
"region-1:\x00:zone-2": {[]string{"node-new"}, 1},
},
},
}
Expand Down Expand Up @@ -322,7 +322,7 @@ func TestNodeTree_Next(t *testing.T) {
tests := []struct {
name string
nodesToAdd []*v1.Node
numRuns int // number of times to run Next()
numRuns int // number of times to run next()
expectedOutput []string
}{
{
Expand Down Expand Up @@ -357,7 +357,7 @@ func TestNodeTree_Next(t *testing.T) {

var output []string
for i := 0; i < test.numRuns; i++ {
output = append(output, nt.Next())
output = append(output, nt.next())
}
if !reflect.DeepEqual(output, test.expectedOutput) {
t.Errorf("unexpected output. Expected: %v, Got: %v", test.expectedOutput, output)
Expand Down Expand Up @@ -399,15 +399,15 @@ func TestNodeTreeMultiOperations(t *testing.T) {
name: "add more nodes to an exhausted zone",
nodesToAdd: append(allNodes[4:9], allNodes[3]),
nodesToRemove: nil,
operations: []string{"add", "add", "add", "add", "add", "next", "next", "next", "next", "add", "next", "next", "next"},
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-3", "node-8", "node-4"},
operations: []string{"add", "add", "add", "add", "add", "next", "next", "next", "next", "next", "next", "next"},
expectedOutput: []string{"node-4", "node-5", "node-6", "node-7", "node-8", "node-4", "node-5"},
},
{
name: "remove zone and add new to ensure exhausted is reset correctly",
nodesToAdd: append(allNodes[3:5], allNodes[6:8]...),
nodesToRemove: allNodes[3:5],
operations: []string{"add", "add", "next", "next", "remove", "add", "add", "next", "next", "remove", "next", "next"},
expectedOutput: []string{"node-3", "node-4", "node-6", "node-7", "node-6", "node-7"},
expectedOutput: []string{"node-3", "node-4", "node-4", "node-6", "node-6", "node-7"},
},
}

Expand All @@ -434,7 +434,7 @@ func TestNodeTreeMultiOperations(t *testing.T) {
removeIndex++
}
case "next":
output = append(output, nt.Next())
output = append(output, nt.next())
default:
t.Errorf("unknow operation: %v", op)
}
Expand Down