New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dra test: enhance performance of test driver controller #119819
dra test: enhance performance of test driver controller #119819
Conversation
Analyzing the CPU profile of go test -timeout=0 -count=5 -cpuprofile profile.out -bench=BenchmarkPerfScheduling/.*Claim.* -benchtime=1ns -run=xxx ./test/integration/scheduler_perf showed that a significant amount of time was spent iterating over allocated claims to determine how many were allocated per node. That "naive" approach was taken to avoid maintaining a redundant data structure, but now that performance measurements show that this comes at a cost, it's not "premature optimization" anymore to introduce such a second field. The average scheduling throughput in SchedulingWithResourceClaimTemplate/2000pods_100nodes increases from 16.4 pods/s to 19.2 pods/s.
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Mon Aug 7 22:33:07 UTC 2023. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/triage accepted |
/retest |
/lgtm |
LGTM label has been added. Git tree hash: 1d3b19e3bfe01f295f5c1ebe289a6f1d1b8a15fc
|
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Analyzing the CPU profile of
showed that a significant amount of time was spent iterating over allocated claims to determine how many were allocated per node. That "naive" approach was taken to avoid maintaining a redundant data structure, but now that performance measurements show that this comes at a cost, it's not "premature optimization" anymore to introduce such a second field.
The average scheduling throughput in
SchedulingWithResourceClaimTemplate/2000pods_100nodes increases from 16.4 pods/s to 19.2 pods/s.
Does this PR introduce a user-facing change?