Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low node utilization to respect priority while evicting pods #105

Conversation

ravisantoshgudimetla
Copy link
Contributor

This PR introduces the notion of priority while evicting pods. Also, updated existing test cases to reflect the same. Some of the e2e's may fail which I will work on.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 2, 2018
@ravisantoshgudimetla
Copy link
Contributor Author

/cc @aveshagarwal

@aveshagarwal
Copy link
Contributor

@ravisantoshgudimetla I do not see that the pods with same priority are being evicted based on their QoS order (be, busrtable, and guaranteed)? did you remove the existing part?

@aveshagarwal
Copy link
Contributor

@ravisantoshgudimetla nevermind, I see them now.

@ravisantoshgudimetla
Copy link
Contributor Author

@aveshagarwal - Any other major concerns? The unit tests seems to be passing for me locally, will look more into e2e's and units, unless you have any other comments.

@aveshagarwal
Copy link
Contributor

@ravisantoshgudimetla is it backward compatible?

@ravisantoshgudimetla
Copy link
Contributor Author

@aveshagarwal - Depends, I think as long as we mention that the next release of descheduler to make is compatible with kube 1.11+ where priority is enabled, we should be good. If someone has enabled priority in older version of kube clusters and used this release of descheduler, they will see surprises.

totalReqs := map[v1.ResourceName]resource.Quantity{}
for _, pod := range pods {
glog.Infof("Ravig %v", pod.Name)
Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: Need to remove this.

@aveshagarwal
Copy link
Contributor

If someone has enabled priority in older version of kube clusters and used this release of descheduler, they will see surprises.

I did not mean that infact I dont think so anyone is going to enable priority in older releases. I meant the code after this PR keeps working with older releases?

@ravisantoshgudimetla
Copy link
Contributor Author

ravisantoshgudimetla commented Aug 2, 2018

I meant the code after this PR keeps working with older releases?

Got it. Thanks. I think this line could cause a problem https://github.com/kubernetes-incubator/descheduler/pull/105/files#diff-36906f0d7bfe55a6cab3b3b86aa63bacR194 but I will fix it. I need add a check for nil for pod.Spec.Priority.

@ravisantoshgudimetla
Copy link
Contributor Author

ravisantoshgudimetla commented Aug 3, 2018

@aveshagarwal - I have ensured this strategy is backward compatible with some unit tests to catch regressions. PTAL.

@ravisantoshgudimetla
Copy link
Contributor Author

/test e2e

1 similar comment
@ravisantoshgudimetla
Copy link
Contributor Author

/test e2e

@ravisantoshgudimetla
Copy link
Contributor Author

@aveshagarwal - I have updated unit tests to catch regressions and e2e's are passing as well. Can you PTAL?

}
return lowNodes, targetNodes
}

// evictPodsFromTargetNodes evicts pods based on priority, if all the pods on the node have priority, if not
// evicts them based on QoS as fallback option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is not correct, it should be "pods are evicted in the order of their priorities from low to high. Pods with same priority are evicted based on their QoS." QoS is not a fallback option, both priority and QoS work together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is when any of the pods on the node have no priority associated with them, if we have atleast one pod with no priority then we will evict all the pods based on QoS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is when any of the pods on the node have no priority associated with them, if we have atleast one pod with no priority then we will evict all the pods based on QoS.

That approach seems incorrect. We should not do that. Anyway I think, either all pods have priority or no pods have priority (mainly in older release). If priority admission plugin is enabled, there will not be any mixed scenario.

Even if there is a chance of mixed case, we should consider all pods with out priority as with same priority and evict them based on their QoS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out priority as with same priority

You mean update the priority to 0 or something else? If yes, I think we shouldn't touch priority field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean update the priority to 0 or something else? If yes, I think we shouldn't touch priority field.

I did not mention about updating priority field at all. I just mentioned "treating/grouping them as with the same priority".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate. We are sorting based on priority field.

@@ -308,28 +382,37 @@ func IsNodeWithLowUtilization(nodeThresholds api.ResourceThresholds, thresholds
return true
}

func NodeUtilization(node *v1.Node, pods []*v1.Pod) (api.ResourceThresholds, []*v1.Pod, []*v1.Pod, []*v1.Pod, []*v1.Pod, []*v1.Pod) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still trying to understand why you have split this function "NodeUtilization" into 2 functions (classifyPods and NodeUtilization). The reason I had combined them for "performance reasons". I wanted to traverse all nodes and their pods "just once". So please make sure you are not traversing the lists of pods more than once. Try to create anything just based on one traversal of all pods on all nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that when I was making changes. If you think performance is a bottleneck, I can revert the change but I was trying to make code a bit more readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think performance is a bottleneck, I can revert the change but I was trying to make code a bit more readable.

This code is not really large that we can not maintain readability. You could always split that function into multiple functions but can call new functions inside the bigger function to improve readability, You do not have to separate them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its more on the lines of one function doing one thing but I see your point. I can make the changes after writing the benchmark.

@aveshagarwal
Copy link
Contributor

@ravisantoshgudimetla I need more time to review this, mainly I am trying to understand the changes in the PR does not impact performance. Traversing pods on node multiple time (or even more than once) must be avoided, even if that means consuming more memory (though I assume that it would not lead to an unacceptable memory consumption).

@ravisantoshgudimetla
Copy link
Contributor Author

@aveshagarwal - Sure, I can wait, I am just curious, if you have some performance numbers in your mind, I can try tweeking the code so as to fit in that range. If not, I can try writing benchmarks. Let me know either way.

@aveshagarwal
Copy link
Contributor

Sure, I can wait, I am just curious, if you have some performance numbers in your mind, I can try tweeking the code so as to fit in that range.

I do not have any performance numbers in my mind. But doing something obvious that would impact performance on large clusters should be avoided in the first place.

If not, I can try writing benchmarks. Let me know either way.

Its upto you.

@ravisantoshgudimetla
Copy link
Contributor Author

ravisantoshgudimetla commented Aug 8, 2018

@aveshagarwal - I reverted the change I made for NodeUtilization function. PTAL. I will create a benchmark in subsequent PR.

@ravisantoshgudimetla
Copy link
Contributor Author

/test e2e


// checkIfAllPodsHavePriority checks if all the pods in the given pod list have a priority associated with them.
func checkIfAllPodsHavePriority(podList []*v1.Pod) bool {
for _, pod := range podList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you checking it for all pods? if one pod has priority, that means all have priority, if one does not have, that means, all dont have priority, so just checking for one pod should be enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I am not sure, if priority admission plugin is enabled after some pods are created, there could be some pods that don't have priority.(Because admission controller is run on create/update). While I did not test this scenario, I think this could be a problem, that's why I ran for entire set of pods on the node. This also ensures that the subsequent sorting of pods based on priority won't fail.

if checkIfAllPodsHavePriority(node.allpods) {
glog.V(1).Infof("All pods have priority associated with them. Evicting pods based on priority")
// sort the evictable Pods based on priority.
sortPodsBasedOnPriority(node.allpods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking if you could find out how to sort based on 2 parameters: priority and qos field, that way, you will have just one sorted pod list, and you would not have to create any priorityToPodMap or classify pods again. For example, lets say there are 5 unsorted pods as follows (convention is like pod-name(priority, qos) just for example) : p1(5,G), p2(3,G), p3(5,be), p4(2,G) p5(3,burstable), and after sorting based on priority and qos field, the sorted list would look like: p4(2,G), p5(3,burstable), p2(3,G), p3(5,be), p1(5,G), Once we have this sorted list, we could just start evicting from the first one till the point in the sorted list when our target is met. I think it would simplify the logic a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravisantoshgudimetla I have another idea too which could be a way better from performance point of view. Fetch the list of the priority classes from apiserver. This way we already know the list of priorities. Create same number of buckets(slices) as number of priorities. Traverse the list of pods one and put each pod in the respective bucket (counting sort). The same algorithm (counting sort which is o(n)) I am using in NodeUtilization for guaranteed, best effort and burstable pods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though, the 2nd option LGTM in terms of performance, I want to something on the lines of first option.

type podPriorityMap struct {
priority int32
podList []*v1.Pod
}

and then we will sort podPriorityMap struct based on priorities. After this, we can evict pods from podPriorityMap.podList based on QoS tiers. This will ensure that there is one sorting and then one linear search while classification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravisantoshgudimetla IMO, we dont need podPriorityMap. Just sorting pod list based on priority and QoS is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can do it, if we allow multiple traversals of the list and you wanted this to be in single traversal unless I am missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can do it, if we allow multiple traversals of the list and you wanted this to be in single traversal unless I am missing something.

Just use golang's sort and specify Less that takes care of priority and QoS at the same time, and that's it i think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I was trying to do the samething but I see that less works on both the keys that we want to sort. I will try to simplify it and see. https://gist.github.com/ravisantoshgudimetla/b3ed1da28c125abf9d43b3ca1b5b8ed3

@aveshagarwal
Copy link
Contributor

@ravisantoshgudimetla are you working on this, or if you want, i can work on it?

@ravisantoshgudimetla
Copy link
Contributor Author

@aveshagarwal - I have updated PR as per your suggestions. PTAL, if this is the same thing you have in your mind.

return true
} else if podutil.IsBurstablePod(evictablePods[i]) && podutil.IsGuaranteedPod(evictablePods[j]) {
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both if and else are returning true, why to use if-else then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, another if would be good enough.

@aveshagarwal
Copy link
Contributor

@aveshagarwal - I have updated PR as per your suggestions. PTAL, if this is the same thing you have in your mind.

yes.

@@ -35,12 +35,13 @@ import (
type NodeUsageMap struct {
node *v1.Node
usage api.ResourceThresholds
allPods []*v1.Pod
allpods []*v1.Pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing, typo when I added it back.

nodepodCount[node.node] = currentPodsEvicted
podsEvicted = podsEvicted + nodepodCount[node.node]
glog.V(1).Infof("%v pods evicted from node %#v with usage %v", nodepodCount[node.node], node.node.Name, node.usage)
}
return podsEvicted
}

// checkIfAllPodsHavePriority checks if all the pods in the given pod list have a priority associated with them.
func checkIfAllPodsHavePriority(podList []*v1.Pod) bool {
for _, pod := range podList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think that we need to check it for all pods as priority admission plugin is enabled by default upstream so either all will have priority or none will have priority?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will remove this, check if one pod priority, the downside is program will crash without it.

evictablePods := make([]*v1.Pod, 0)
evictablePods = append(append(node.bPods, node.bePods...), node.gPods...)

// sort the evictable Pods based on priority. This also sorts them based on QoS. If there multiple pods with same priority, they are sorted based on QoS tiers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in this comment: "If there multiple pods with same priority"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the priority-low-node branch 4 times, most recently from 601f462 to 9b72edd Compare August 20, 2018 17:51
@ravisantoshgudimetla
Copy link
Contributor Author

/test e2e

func sortPodsBasedOnPriority(evictablePods []*v1.Pod) {
sort.Slice(evictablePods, func(i, j int) bool {
if evictablePods[i].Spec.Priority == nil {
*evictablePods[i].Spec.Priority = 0 // This should have been handled by priority admission controller but to be on safe-side, we are setting its priority to 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think assignment here is necessary, you could just return true here.

*evictablePods[i].Spec.Priority = 0 // This should have been handled by priority admission controller but to be on safe-side, we are setting its priority to 0.
}
if evictablePods[j].Spec.Priority == nil {
*evictablePods[j].Spec.Priority = 0 // This should have been handled by priority admission controller but to be on safe-side, we are setting its priority to 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, you could just return false here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem will be if both pods don't have priority and one is best effort and other is guaranteed, we will them going other way around, if we return immediately.

}
if *evictablePods[i].Spec.Priority == *evictablePods[j].Spec.Priority {
if podutil.IsBestEffortPod(evictablePods[i]) && (podutil.IsGuaranteedPod(evictablePods[j]) || podutil.IsBurstablePod(evictablePods[j])) {
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just "if podutil.IsBestEffortPod(evictablePods[i])" is enough, and no need to check for "podutil.IsGuaranteedPod(evictablePods[j]) || podutil.IsBurstablePod(evictablePods[j])"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I was just being over-cautious

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah why to have unnecessary check, so removing that would be good.

return true
}
if podutil.IsBurstablePod(evictablePods[i]) && podutil.IsGuaranteedPod(evictablePods[j]) {
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have not checked for the scenario when j is best effort, and i is burstable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it is covered by false below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

if evictablePods[j].Spec.Priority == nil && evictablePods[i].Spec.Priority != nil {
return false
}
if (evictablePods[j].Spec.Priority == nil && evictablePods[i].Spec.Priority == nil) || (*evictablePods[i].Spec.Priority == *evictablePods[j].Spec.Priority) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aveshagarwal - The above covers all the 3 scenarios when one of the pods doesn't have a priority and when both doesn't have priority. Please let me know, if you have anything else in your mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks better for sure. The only thing I am not able to understand when can we hit (evictablePods[j].Spec.Priority == nil), can you explain the use cases?

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, say we have 2 pods for comparing and first pod has priority where as 2nd one is doesn't, we need this case.

return false
}
if (evictablePods[j].Spec.Priority == nil && evictablePods[i].Spec.Priority == nil) || (*evictablePods[i].Spec.Priority == *evictablePods[j].Spec.Priority) {
if podutil.IsBestEffortPod(evictablePods[i]) && (podutil.IsGuaranteedPod(evictablePods[j]) || podutil.IsBurstablePod(evictablePods[j])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just update this and then we can merge it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@aveshagarwal
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 21, 2018
@ravisantoshgudimetla
Copy link
Contributor Author

e2es passing. Merging the PR.

@ravisantoshgudimetla ravisantoshgudimetla merged commit 0115748 into kubernetes-sigs:master Aug 21, 2018
@ravisantoshgudimetla ravisantoshgudimetla deleted the priority-low-node branch August 24, 2018 14:19
@ravisantoshgudimetla ravisantoshgudimetla restored the priority-low-node branch September 10, 2018 14:16
@ravisantoshgudimetla ravisantoshgudimetla deleted the priority-low-node branch December 18, 2018 23:23
briend pushed a commit to briend/descheduler that referenced this pull request Feb 11, 2022
…rity-low-node

Low node utilization to respect priority while evicting pods
ingvagabund pushed a commit to ingvagabund/descheduler that referenced this pull request Jan 5, 2024
…ency-openshift-4.16-atomic-openshift-descheduler

OCPBUGS-24749: Updating atomic-openshift-descheduler-container image to be consistent with ART
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants