Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Optimize scheduler cache snapshot to improve scheduling throughput #74041
What type of PR is this?
What this PR does / why we need it:
Our benchmarks shows the latency of scheduling a pod is improved over 20% from 8.5ms to 6.7ms in a 5000 node cluster when scheduling 10000 pods:
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: bsalamat
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing
Feb 21, 2019
16 checks passed
Our scalability results after this PR are in and available in our perf dashboard.
Choose "gce-5000Nodes > Scheduler > SchedulingThroughput" in the perf dashboard to see the results. Latest at the time of this comment is run 314.
Thanks for noticing that. Average binding time has not changed, but 90th percentile has increased a lot. I think I know the reason. We have a rate limit of 100 qps to the API server. Scheduler's throughput now exceeds that limit and can schedule more than 100 pods per second. As a result pods wait longer in binding stage and we see an increase in binding latency.