-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: fix perf downgrade of cases without presence of (anti-)affinity pods #76973
scheduler: fix perf downgrade of cases without presence of (anti-)affinity pods #76973
Conversation
4bc5476
to
492b970
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Huang-Wei! It looks good. I hope this one behaves as expected.
I also see a couple other potential optimizations in finding min/max, but let's first see if this PR improves performance. We can add those in follow-up PRs.
} | ||
} | ||
|
||
// calculate final priority score for each node | ||
result := make(schedulerapi.HostPriorityList, 0, len(nodes)) | ||
maxMinDiff := maxCount - minCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that if maxMinDiff
is zero, we can skip the for loop below altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Yes, that could be possible. I can investigate later to see whether current Priority reduce interface is fine with an empty []schedulerapi.HostPriorityList vs. mandatory entries with schedulerapi.HostPriority{Host: node.Name, Score: 0}
in the result set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
What type of PR is this?
/kind bug
/sig scheduling
/assign @bsalamat @ravisantoshgudimetla
What this PR does / why we need it:
Some background: scheduler: performance improvement on PodAffinity #76243 was introduced to improve pod affinity, but it sorts of regress other usecases, so we reverted that PR.
Root cause of old PR: For a typical kubemark test, it just runs some regular workloads. No affinity pods are involved in. With that said, the mutex in inter-podaffinity priority is actually a non-op. So, what's the exact overhead old pr brings in? I did a test and it turns out the allocation of *int64 pointer for each node is the culprit.
How this PR fixes the issue: This PR introduces a dynamic init mechanism for those *int64 pointers. Then in the final score calculation phase, if it's not inited, we simply assign score 0 to it.
Benchmark tests for Inter-PodAffinity Priority: It shows old PR has 60%+ overhead (for cases without affinity pods), and new(this) PR has zero overhead.
Can this PR still help PodAffinity cases: Yes.
Which issue(s) this PR fixes:
A robust version of #76243.
Special notes for your reviewer:
If you have already reviewed the old PR, just take a look at commit 4bc5476.
Does this PR introduce a user-facing change?:
(I guess we shouldn't mention it in release-note again, yes or no?)