New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

activate unschedulable pods only if the node became more schedulable #71551

Merged
merged 1 commit into from Dec 10, 2018

Conversation

@mlmhl
Copy link
Contributor

mlmhl commented Nov 29, 2018

What type of PR is this?
/kind feature

What this PR does / why we need it:

This is a new scheduler performance optimization PR try to fix issue #70316 . Compared with the earlier PR, this PR checks node conditions updates more fine-grained: Ignore the heartbeat timestamp updates of node's conditions.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #70316

Special notes for your reviewer:

I'm not sure how much side effects this change has: It always copy node's conditions before comparison, this maybe increase scheduler's resource consumption(memory and CPU) in a large cluster. cc @bsalamat @Huang-Wei

Does this PR introduce a user-facing change?:

Scheduler only activates unschedulable pods if node's scheduling related properties change.
@mlmhl

This comment has been minimized.

Copy link
Contributor

mlmhl commented Nov 29, 2018

/assign @bsalamat

@wgliang

This comment has been minimized.

Copy link
Member

wgliang commented Nov 29, 2018

/ok-to-test

@@ -1064,6 +1067,72 @@ func (c *configFactory) invalidateCachedPredicatesOnNodeUpdate(newNode *v1.Node,
}
}

func nodeSchedulingPropertiesChanged(newNode *v1.Node, oldNode *v1.Node) bool {
if nodeAllocatableChanged(newNode, oldNode) {
klog.V(4).Infof("Allocatable resource of node %s changed", newNode.Name)

This comment has been minimized.

@wgliang

wgliang Nov 29, 2018

Member

Maybe just printing out the change is not enough? In my opinion, it is necessary to print out the changed view. Otherwise, this log will not have much value to the viewer.

This comment has been minimized.

@resouer

resouer Nov 29, 2018

Member

+1, actually I don't think you need to log out anything in this function :-)

@resouer

This comment has been minimized.

Copy link
Member

resouer commented Nov 29, 2018

/assign

The idea is quite valid, I will take a round of review.

oldTaints, oldErr := helper.GetTaintsFromNodeAnnotations(oldNode.GetAnnotations())
if oldErr != nil {
// If parse old node's taint annotation failed, we assume node's taint changed.
klog.Errorf("Failed to get taints from annotation of old node %s: %v", oldNode.Name, oldErr)

This comment has been minimized.

@resouer

resouer Nov 29, 2018

Member

It sounds aggressive,I think just logout the error is enough.

}

func nodeConditionsChanged(newNode *v1.Node, oldNode *v1.Node) bool {
strip := func(conditions []v1.NodeCondition) []v1.NodeCondition {

This comment has been minimized.

@resouer

resouer Nov 29, 2018

Member

Instead of create a fake NodeCondition slice, it would be cleaner if you just do:

Suggested change Beta
strip := func(conditions []v1.NodeCondition) []v1.NodeCondition {
oldConditions := make(map[v1.NodeConditionType]v1.ConditionStatus)
newConditions := make(map[v1.NodeConditionType]v1.ConditionStatus)
for _, cond := range oldNode.Status.Conditions {
oldConditions[cond.Type] = cond.Status
}
for _, cond := range newNode.Status.Conditions {
newConditions[cond.Type] = cond.Status
}

WDYT?

This comment has been minimized.

@mlmhl

mlmhl Dec 4, 2018

Contributor

This optimization LGTM, it can make the condition comparison operation simpler and faster.

klog.V(4).Infof("Conditions of node %s changed", newNode.Name)
return true
}
if newNode.Spec.Unschedulable != oldNode.Spec.Unschedulable && newNode.Spec.Unschedulable == false {

This comment has been minimized.

@resouer

resouer Nov 29, 2018

Member

If you wrap this line in a function and remove klog lines, you can get a super clean code for this part. :-)

@resouer
Copy link
Member

resouer left a comment

Just took a first round review :-)

@@ -992,7 +992,10 @@ func (c *configFactory) updateNodeInCache(oldObj, newObj interface{}) {
}

c.invalidateCachedPredicatesOnNodeUpdate(newNode, oldNode)
c.podQueue.MoveAllToActiveQueue()
// Only activate unschedulable pods if the node became more schedulable.

This comment has been minimized.

@bsalamat

bsalamat Nov 29, 2018

Contributor

An optimization we can perform here is to look at the unschedulableQueue and if there is no pod in it, we can skip the check for changes in the node object and call MoveAllToActiveQueue. This optimization will be useful in large clusters where many nodes send updates and there is no unschedulable pods in the cluster.

This comment has been minimized.

@mlmhl

mlmhl Dec 4, 2018

Contributor

Sorry I missed this optimization, I will add this later.

@@ -1064,6 +1067,72 @@ func (c *configFactory) invalidateCachedPredicatesOnNodeUpdate(newNode *v1.Node,
}
}

func nodeSchedulingPropertiesChanged(newNode *v1.Node, oldNode *v1.Node) bool {

This comment has been minimized.

@k82cn

k82cn Nov 30, 2018

Member

Can we move this to helper/util and make it as public? So others can reuse it with predicates and priorities :)

This comment has been minimized.

@k82cn

k82cn Dec 7, 2018

Member

xref kubernetes-sigs/kube-batch#491

@jiaxuanzhou , something like this PR seems better :)

klog.Errorf("Failed to get taints from annotation of old node %s: %v", oldNode.Name, oldErr)
return true
}
newTaints, newErr := helper.GetTaintsFromNodeAnnotations(newNode.GetAnnotations())

This comment has been minimized.

@k82cn

k82cn Nov 30, 2018

Member

GetTaintsFromNodeAnnotations is only used for validation right now; if we did not cherry pick this PR, it's not necessary to me.

BTW, Toleration/Taint should be GAed for a while, we should remove the annotation :)

This comment has been minimized.

@mlmhl

mlmhl Dec 4, 2018

Contributor

Yeah, It's not appropriate to use GetTaintsFromNodeAnnotations here. I'm not sure if we need to be compatible with old nodes has taint annotations, if not, remove the annotation comparison LGTM.

This comment has been minimized.

@bsalamat

bsalamat Dec 5, 2018

Contributor

+1 to @k82cn's point. I don't think we need to process taints from annotations.

This comment has been minimized.

@mlmhl

mlmhl Dec 6, 2018

Contributor

Done

strippedConditions := make([]v1.NodeCondition, len(conditions))
copy(strippedConditions, conditions)
for i := range strippedConditions {
strippedConditions[i].LastHeartbeatTime = metav1.Time{}

This comment has been minimized.

@k82cn

k82cn Nov 30, 2018

Member

seems we only enhanced nodeConditionsChanged, why includes other part comparing to #70366 :)

This comment has been minimized.

@mlmhl

mlmhl Dec 4, 2018

Contributor

#70366 is reverted, so I includes other comparisons in this PR too.

@mlmhl

This comment has been minimized.

Copy link
Contributor

mlmhl commented Dec 4, 2018

@resouer @bsalamat @k82cn All comments are addressed, PTAL :)

@@ -530,6 +537,13 @@ func (p *PriorityQueue) DeleteNominatedPodIfExists(pod *v1.Pod) {
p.lock.Unlock()
}

// HasUnschedulablePods returns true if any unschedulable pods exist in the SchedulingQueue.
func (p *PriorityQueue) HasUnschedulablePods() bool {
p.lock.Lock()

This comment has been minimized.

@wgliang

wgliang Dec 4, 2018

Member

There is no write operation here, and p.lock is a sync.RWMutex, so I recommend you to use p.lock.RLock().
(FYI:https://golang.org/pkg/sync/#RWMutex)

This comment has been minimized.

@bsalamat

bsalamat Dec 4, 2018

Contributor

+1. This should be a read lock.

@@ -992,7 +992,10 @@ func (c *configFactory) updateNodeInCache(oldObj, newObj interface{}) {
}

c.invalidateCachedPredicatesOnNodeUpdate(newNode, oldNode)
c.podQueue.MoveAllToActiveQueue()
// Only activate unschedulable pods if the node became more schedulable.
if c.podQueue.HasUnschedulablePods() && nodeSchedulingPropertiesChanged(newNode, oldNode) {

This comment has been minimized.

@bsalamat

bsalamat Dec 4, 2018

Contributor

You should instead do the following:

Suggested change Beta
if c.podQueue.HasUnschedulablePods() && nodeSchedulingPropertiesChanged(newNode, oldNode) {
if !c.podQueue.HasUnschedulablePods() || nodeSchedulingPropertiesChanged(newNode, oldNode) {
c.podQueue.MoveAllToActiveQueue()
}

It may seem odd to move pods when there is nothing in the unschedulable queue, but we should do so, because there may be a pod that the scheduler is currently processing and it may be determined "unschedulable". The scheduling queue has a mechanism to retry such pod when a "move to active queue" event happens. So, what the above optimization does is that when there is no unschedulable pod in the unschedulable queue, it skips the node comparison and send a move to active queue request.

@@ -530,6 +537,13 @@ func (p *PriorityQueue) DeleteNominatedPodIfExists(pod *v1.Pod) {
p.lock.Unlock()
}

// HasUnschedulablePods returns true if any unschedulable pods exist in the SchedulingQueue.
func (p *PriorityQueue) HasUnschedulablePods() bool {
p.lock.Lock()

This comment has been minimized.

@bsalamat

bsalamat Dec 4, 2018

Contributor

+1. This should be a read lock.

@mlmhl mlmhl force-pushed the mlmhl:scheduler_optimization branch from 7d25f32 to 2fe9b14 Dec 10, 2018

@mlmhl

This comment has been minimized.

Copy link
Contributor

mlmhl commented Dec 10, 2018

@bsalamat PR rebased :)

@mlmhl

This comment has been minimized.

Copy link
Contributor

mlmhl commented Dec 10, 2018

/test pull-kubernetes-e2e-gce-100-performance

@Huang-Wei

This comment has been minimized.

Copy link
Member

Huang-Wei commented Dec 10, 2018

/lgtm

@mlmhl

This comment has been minimized.

Copy link
Contributor

mlmhl commented Dec 10, 2018

/retest

@k8s-ci-robot k8s-ci-robot merged commit 698db70 into kubernetes:master Dec 10, 2018

18 checks passed

cla/linuxfoundation mlmhl authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Dec 10, 2018

@mlmhl Please go ahead and send the backport PRs. In case of not having the time, please let us know.

@mlmhl

This comment has been minimized.

Copy link
Contributor

mlmhl commented Dec 11, 2018

@bsalamat The backport PRs are send to 1.13, 1.12, and 1.11, PTAL :)

k8s-ci-robot added a commit that referenced this pull request Dec 11, 2018

Merge pull request #71931 from mlmhl/automated-cherry-pick-of-#71551-…
…upstream-release-1.13

Automated cherry pick of #71551: activate unschedulable pods only if the node became more

k8s-ci-robot added a commit that referenced this pull request Dec 12, 2018

Merge pull request #71932 from mlmhl/automated-cherry-pick-of-#71551-…
…upstream-release-1.12

Automated cherry pick of #71551: activate unschedulable pods only if the node became more

k8s-ci-robot added a commit that referenced this pull request Dec 12, 2018

Merge pull request #71933 from mlmhl/automated-cherry-pick-of-#71551-…
…upstream-release-1.11

Automated cherry pick of #71551: activate unschedulable pods only if the node became more

foxish added a commit that referenced this pull request Dec 14, 2018

k8s-ci-robot added a commit that referenced this pull request Dec 14, 2018

Merge pull request #72040 from kubernetes/revert-71933-automated-cher…
…ry-pick-of-#71551-upstream-release-1.11

Revert "Automated cherry pick of #71551: activate unschedulable pods only if the node became more"

bsalamat added a commit that referenced this pull request Jan 4, 2019

Revert "Revert "Automated cherry pick of #71551: activate unschedulab…
…le pods only if the node became more""

k8s-ci-robot added a commit that referenced this pull request Jan 8, 2019

Merge pull request #72600 from bsalamat/automated-cherry-pick-of-#715…
…51-upstream-release-1.11

Automated cherry pick of #71551: activate unschedulable pods only if the node became more
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment