-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low node utilization to respect priority while evicting pods #105
Low node utilization to respect priority while evicting pods #105
Conversation
/cc @aveshagarwal |
@ravisantoshgudimetla I do not see that the pods with same priority are being evicted based on their QoS order (be, busrtable, and guaranteed)? did you remove the existing part? |
@ravisantoshgudimetla nevermind, I see them now. |
@aveshagarwal - Any other major concerns? The unit tests seems to be passing for me locally, will look more into e2e's and units, unless you have any other comments. |
@ravisantoshgudimetla is it backward compatible? |
@aveshagarwal - Depends, I think as long as we mention that the next release of descheduler to make is compatible with kube 1.11+ where priority is enabled, we should be good. If someone has enabled priority in older version of kube clusters and used this release of descheduler, they will see surprises. |
totalReqs := map[v1.ResourceName]resource.Quantity{} | ||
for _, pod := range pods { | ||
glog.Infof("Ravig %v", pod.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: Need to remove this.
I did not mean that infact I dont think so anyone is going to enable priority in older releases. I meant the code after this PR keeps working with older releases? |
Got it. Thanks. I think this line could cause a problem https://github.com/kubernetes-incubator/descheduler/pull/105/files#diff-36906f0d7bfe55a6cab3b3b86aa63bacR194 but I will fix it. I need add a check for nil for pod.Spec.Priority. |
6b503c7
to
42e2461
Compare
@aveshagarwal - I have ensured this strategy is backward compatible with some unit tests to catch regressions. PTAL. |
/test e2e |
1 similar comment
/test e2e |
@aveshagarwal - I have updated unit tests to catch regressions and e2e's are passing as well. Can you PTAL? |
} | ||
return lowNodes, targetNodes | ||
} | ||
|
||
// evictPodsFromTargetNodes evicts pods based on priority, if all the pods on the node have priority, if not | ||
// evicts them based on QoS as fallback option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above is not correct, it should be "pods are evicted in the order of their priorities from low to high. Pods with same priority are evicted based on their QoS." QoS is not a fallback option, both priority and QoS work together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is when any of the pods on the node have no priority associated with them, if we have atleast one pod with no priority then we will evict all the pods based on QoS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is when any of the pods on the node have no priority associated with them, if we have atleast one pod with no priority then we will evict all the pods based on QoS.
That approach seems incorrect. We should not do that. Anyway I think, either all pods have priority or no pods have priority (mainly in older release). If priority admission plugin is enabled, there will not be any mixed scenario.
Even if there is a chance of mixed case, we should consider all pods with out priority as with same priority and evict them based on their QoS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out priority as with same priority
You mean update the priority to 0 or something else? If yes, I think we shouldn't touch priority field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean update the priority to 0 or something else? If yes, I think we shouldn't touch priority field.
I did not mention about updating priority field at all. I just mentioned "treating/grouping them as with the same priority".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate. We are sorting based on priority field.
@@ -308,28 +382,37 @@ func IsNodeWithLowUtilization(nodeThresholds api.ResourceThresholds, thresholds | |||
return true | |||
} | |||
|
|||
func NodeUtilization(node *v1.Node, pods []*v1.Pod) (api.ResourceThresholds, []*v1.Pod, []*v1.Pod, []*v1.Pod, []*v1.Pod, []*v1.Pod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still trying to understand why you have split this function "NodeUtilization" into 2 functions (classifyPods and NodeUtilization). The reason I had combined them for "performance reasons". I wanted to traverse all nodes and their pods "just once". So please make sure you are not traversing the lists of pods more than once. Try to create anything just based on one traversal of all pods on all nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that when I was making changes. If you think performance is a bottleneck, I can revert the change but I was trying to make code a bit more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think performance is a bottleneck, I can revert the change but I was trying to make code a bit more readable.
This code is not really large that we can not maintain readability. You could always split that function into multiple functions but can call new functions inside the bigger function to improve readability, You do not have to separate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its more on the lines of one function doing one thing but I see your point. I can make the changes after writing the benchmark.
@ravisantoshgudimetla I need more time to review this, mainly I am trying to understand the changes in the PR does not impact performance. Traversing pods on node multiple time (or even more than once) must be avoided, even if that means consuming more memory (though I assume that it would not lead to an unacceptable memory consumption). |
@aveshagarwal - Sure, I can wait, I am just curious, if you have some performance numbers in your mind, I can try tweeking the code so as to fit in that range. If not, I can try writing benchmarks. Let me know either way. |
I do not have any performance numbers in my mind. But doing something obvious that would impact performance on large clusters should be avoided in the first place.
Its upto you. |
42e2461
to
9915f7e
Compare
@aveshagarwal - I reverted the change I made for |
/test e2e |
|
||
// checkIfAllPodsHavePriority checks if all the pods in the given pod list have a priority associated with them. | ||
func checkIfAllPodsHavePriority(podList []*v1.Pod) bool { | ||
for _, pod := range podList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you checking it for all pods? if one pod has priority, that means all have priority, if one does not have, that means, all dont have priority, so just checking for one pod should be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I am not sure, if priority admission plugin is enabled after some pods are created, there could be some pods that don't have priority.(Because admission controller is run on create/update). While I did not test this scenario, I think this could be a problem, that's why I ran for entire set of pods on the node. This also ensures that the subsequent sorting of pods based on priority won't fail.
if checkIfAllPodsHavePriority(node.allpods) { | ||
glog.V(1).Infof("All pods have priority associated with them. Evicting pods based on priority") | ||
// sort the evictable Pods based on priority. | ||
sortPodsBasedOnPriority(node.allpods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking if you could find out how to sort based on 2 parameters: priority and qos field, that way, you will have just one sorted pod list, and you would not have to create any priorityToPodMap or classify pods again. For example, lets say there are 5 unsorted pods as follows (convention is like pod-name(priority, qos) just for example) : p1(5,G), p2(3,G), p3(5,be), p4(2,G) p5(3,burstable), and after sorting based on priority and qos field, the sorted list would look like: p4(2,G), p5(3,burstable), p2(3,G), p3(5,be), p1(5,G), Once we have this sorted list, we could just start evicting from the first one till the point in the sorted list when our target is met. I think it would simplify the logic a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravisantoshgudimetla I have another idea too which could be a way better from performance point of view. Fetch the list of the priority classes from apiserver. This way we already know the list of priorities. Create same number of buckets(slices) as number of priorities. Traverse the list of pods one and put each pod in the respective bucket (counting sort). The same algorithm (counting sort which is o(n)) I am using in NodeUtilization for guaranteed, best effort and burstable pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though, the 2nd option LGTM in terms of performance, I want to something on the lines of first option.
type podPriorityMap struct {
priority int32
podList []*v1.Pod
}
and then we will sort podPriorityMap struct based on priorities. After this, we can evict pods from podPriorityMap.podList based on QoS tiers. This will ensure that there is one sorting and then one linear search while classification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravisantoshgudimetla IMO, we dont need podPriorityMap. Just sorting pod list based on priority and QoS is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we can do it, if we allow multiple traversals of the list and you wanted this to be in single traversal unless I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we can do it, if we allow multiple traversals of the list and you wanted this to be in single traversal unless I am missing something.
Just use golang's sort and specify Less that takes care of priority and QoS at the same time, and that's it i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I was trying to do the samething but I see that less works on both the keys that we want to sort. I will try to simplify it and see. https://gist.github.com/ravisantoshgudimetla/b3ed1da28c125abf9d43b3ca1b5b8ed3
@ravisantoshgudimetla are you working on this, or if you want, i can work on it? |
9915f7e
to
16e8f2e
Compare
@aveshagarwal - I have updated PR as per your suggestions. PTAL, if this is the same thing you have in your mind. |
return true | ||
} else if podutil.IsBurstablePod(evictablePods[i]) && podutil.IsGuaranteedPod(evictablePods[j]) { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both if and else are returning true, why to use if-else then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, another if would be good enough.
yes. |
@@ -35,12 +35,13 @@ import ( | |||
type NodeUsageMap struct { | |||
node *v1.Node | |||
usage api.ResourceThresholds | |||
allPods []*v1.Pod | |||
allpods []*v1.Pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing, typo when I added it back.
nodepodCount[node.node] = currentPodsEvicted | ||
podsEvicted = podsEvicted + nodepodCount[node.node] | ||
glog.V(1).Infof("%v pods evicted from node %#v with usage %v", nodepodCount[node.node], node.node.Name, node.usage) | ||
} | ||
return podsEvicted | ||
} | ||
|
||
// checkIfAllPodsHavePriority checks if all the pods in the given pod list have a priority associated with them. | ||
func checkIfAllPodsHavePriority(podList []*v1.Pod) bool { | ||
for _, pod := range podList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think that we need to check it for all pods as priority admission plugin is enabled by default upstream so either all will have priority or none will have priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will remove this, check if one pod priority, the downside is program will crash without it.
evictablePods := make([]*v1.Pod, 0) | ||
evictablePods = append(append(node.bPods, node.bePods...), node.gPods...) | ||
|
||
// sort the evictable Pods based on priority. This also sorts them based on QoS. If there multiple pods with same priority, they are sorted based on QoS tiers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in this comment: "If there multiple pods with same priority"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
601f462
to
9b72edd
Compare
/test e2e |
func sortPodsBasedOnPriority(evictablePods []*v1.Pod) { | ||
sort.Slice(evictablePods, func(i, j int) bool { | ||
if evictablePods[i].Spec.Priority == nil { | ||
*evictablePods[i].Spec.Priority = 0 // This should have been handled by priority admission controller but to be on safe-side, we are setting its priority to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think assignment here is necessary, you could just return true here.
*evictablePods[i].Spec.Priority = 0 // This should have been handled by priority admission controller but to be on safe-side, we are setting its priority to 0. | ||
} | ||
if evictablePods[j].Spec.Priority == nil { | ||
*evictablePods[j].Spec.Priority = 0 // This should have been handled by priority admission controller but to be on safe-side, we are setting its priority to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, you could just return false here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem will be if both pods don't have priority and one is best effort and other is guaranteed, we will them going other way around, if we return immediately.
} | ||
if *evictablePods[i].Spec.Priority == *evictablePods[j].Spec.Priority { | ||
if podutil.IsBestEffortPod(evictablePods[i]) && (podutil.IsGuaranteedPod(evictablePods[j]) || podutil.IsBurstablePod(evictablePods[j])) { | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just "if podutil.IsBestEffortPod(evictablePods[i])" is enough, and no need to check for "podutil.IsGuaranteedPod(evictablePods[j]) || podutil.IsBurstablePod(evictablePods[j])"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I was just being over-cautious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah why to have unnecessary check, so removing that would be good.
return true | ||
} | ||
if podutil.IsBurstablePod(evictablePods[i]) && podutil.IsGuaranteedPod(evictablePods[j]) { | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have not checked for the scenario when j
is best effort, and i
is burstable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, it is covered by false below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
9b72edd
to
7f93a06
Compare
if evictablePods[j].Spec.Priority == nil && evictablePods[i].Spec.Priority != nil { | ||
return false | ||
} | ||
if (evictablePods[j].Spec.Priority == nil && evictablePods[i].Spec.Priority == nil) || (*evictablePods[i].Spec.Priority == *evictablePods[j].Spec.Priority) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aveshagarwal - The above covers all the 3 scenarios when one of the pods doesn't have a priority and when both doesn't have priority. Please let me know, if you have anything else in your mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks better for sure. The only thing I am not able to understand when can we hit (evictablePods[j].Spec.Priority == nil), can you explain the use cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, say we have 2 pods for comparing and first pod has priority where as 2nd one is doesn't, we need this case.
return false | ||
} | ||
if (evictablePods[j].Spec.Priority == nil && evictablePods[i].Spec.Priority == nil) || (*evictablePods[i].Spec.Priority == *evictablePods[j].Spec.Priority) { | ||
if podutil.IsBestEffortPod(evictablePods[i]) && (podutil.IsGuaranteedPod(evictablePods[j]) || podutil.IsBurstablePod(evictablePods[j])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just update this and then we can merge it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
7f93a06
to
d0305da
Compare
/lgtm |
e2es passing. Merging the PR. |
…rity-low-node Low node utilization to respect priority while evicting pods
…ency-openshift-4.16-atomic-openshift-descheduler OCPBUGS-24749: Updating atomic-openshift-descheduler-container image to be consistent with ART
This PR introduces the notion of priority while evicting pods. Also, updated existing test cases to reflect the same. Some of the e2e's may fail which I will work on.