New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: performance improvement on PodAffinity #76243
Conversation
- replace unnecessary Lock/Unlock with atomic AddInt64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, LGTM.
Will leave /lgtm to @bsalamat.
@@ -230,7 +227,7 @@ func (ipa *InterPodAffinity) CalculateInterPodAffinityPriority(pod *v1.Pod, node | |||
for _, node := range nodes { | |||
fScore := float64(0) | |||
if (maxCount - minCount) > 0 { | |||
fScore = float64(schedulerapi.MaxPriority) * ((pm.counts[node.Name] - minCount) / (maxCount - minCount)) | |||
fScore = float64(schedulerapi.MaxPriority) * (float64(*pm.counts[node.Name]-minCount) / float64(maxCount-minCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, we might have lost the float scores by now. I believe, we tried something similar in the past and it affected correctness. The one thing, I am curious about is - any difference between the scores computed, is there some difference between the scores computed with this patch/without this patch?(IIRC, last time the issue was related to too many nodes having same score).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravisantoshgudimetla could you point me the issue #?
This PR introduced following changes related with fraction:
weight
is always an integer, so new change that loadsweight
intop.counts[node.Name]
(from float64 to int64) in an atomic way should be fine.- change from
float64(term.Weight*int32(multiplier)))
toint64(term.Weight*int32(multiplier))
should be also fine asmultiplier
is of typeint
maxCount/minCount
are assigned fromp.counts[node.Name]
; should be good as well- the last one: change from
(pm.counts[node.Name] - minCount) / (maxCount - minCount)
tofloat64(*pm.counts[node.Name]-minCount) / float64(maxCount-minCount)
; this doens't lose correctness either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the explanation. I have some questions on the term.Weight*int32(multiplier)
but I see that both of them are of the type int.
/hold |
@@ -63,15 +64,15 @@ type podAffinityPriorityMap struct { | |||
nodes []*v1.Node | |||
// counts store the mapping from node name to so-far computed score of | |||
// the node. | |||
counts map[string]float64 | |||
counts map[string]*int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the value need to be an int64
pointer, instead of just an int64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because atomic#AddInt64 takes pointer as parameter.
If we go with map[string]int64
, map value is not addressable (&map["key"]
is illegal); and alsoyou can't make val := map["key"]; ptr := &val
b/c that's another address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. In Go you cannot get a pointer to map entries as they may change during execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good and makes sense. Just a small comment.
Performance improvement is over 2X for preferred affinity. This is worth noting in the release notes. |
@bsalamat Will update the release note. One thing to note is that the 2X improvement is not for soft pod affinity, it's for hard pod affinity. BTW: performance for soft(preferred) pod (anti-) affinity hasn't been measured due to lack of benchmark tests. But from the code's perspective, it's expected to have a performance improvement as well. Why the priority changes can impact hard pod affinity is because of kubernetes/pkg/scheduler/algorithm/priorities/interpod_affinity.go Lines 164 to 177 in 552d1eb
|
Thanks, @Huang-Wei for clarifying. You are right. This improvement impacts both soft and hard affinity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Please change the release note and remove "hard" from it. This PR improves performance of both hard and soft affinity.
@@ -63,15 +64,15 @@ type podAffinityPriorityMap struct { | |||
nodes []*v1.Node | |||
// counts store the mapping from node name to so-far computed score of | |||
// the node. | |||
counts map[string]float64 | |||
counts map[string]*int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. In Go you cannot get a pointer to map entries as they may change during execution.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Yeap. And hard pod anti-affinity isn't involved in Priorities, so this PR doesn't help for that case. BTW: I'm also trying to improve hard pod anti-affinity, such as:
But I don't see either of them gains performance improvement yet. Will continue digging. |
One thing that can improve performance, is to remove something like: NodesWithAffinityPods map[string]podsWithAffinity // map from node name to affinity pods |
/hold cancel |
@Huang-Wei Scheduling benchmarks show faster execution, but our Kubemark and real cluster benchmarks show a regression: |
What type of PR is this?
/kind design
/sig scheduling
/assign @bsalamat
What this PR does / why we need it:
This PR tries to eliminate unnecessary Lock/Unlock in the logic of InterPodAffinity priorities. By replacing them with atomic AddInt64 can significantly improve the performance of:
Hard PodAffinity (2.2X performance improvement, see below)
Before
After (with this PR)
Soft PodAffinity/PodAntiAffinity (Can be inferred from the code and benchmark result of Hard PodAffinity. We can add the benchmark tests if necessary)
Which issue(s) this PR fixes:
Special notes for your reviewer:
Above test result is run at a Baremetal machine with 8 core cpus and 32GB memory.
Does this PR introduce a user-facing change?: