Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid copying PriorityConfig and SchedulerExtender structs for every node while running priority functions #71722

Merged
merged 3 commits into from
Dec 6, 2018

Conversation

bsalamat
Copy link
Member

@bsalamat bsalamat commented Dec 4, 2018

What type of PR is this?
/kind cleanup

For lack of a better "kind", I should mark this as a clean-up, but this has caused performance regression in the scheduler.

What this PR does / why we need it:
There has been a performance degradation in the scheduler. We suspect it is caused after #70274. This PR fixes the issue and applies the same fix to several similar existing code with the hope that it fixes the issue.

Which issue(s) this PR fixes:

Fixes #

fixes (hopefully) #71659

Does this PR introduce a user-facing change?:

NONE

/sig scheduling

cc/ @wojtek-t @krzysied

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 4, 2018
@bsalamat bsalamat added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 4, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 4, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 4, 2018
@bsalamat bsalamat removed the request for review from k82cn December 4, 2018 19:43
@wojtek-t
Copy link
Member

wojtek-t commented Dec 4, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 4, 2018
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 4, 2018
@wojtek-t
Copy link
Member

wojtek-t commented Dec 4, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 4, 2018
@bsalamat
Copy link
Member Author

bsalamat commented Dec 4, 2018

While avoiding those copies seems to be important, I cannot verify that the PR improves performance.
I ran our scheduler benchmarks (integration tests) with and without this PR and I don't see any meaningful difference:

With this PR:

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/1000Pods-12         	   10000	  10530934 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	129.670s
+++ [1204 15:14:35] Cleaning up etcd
+++ [1204 15:14:35] Integration test cleanup complete

Without the PR:

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/1000Pods-12         	   10000	  10317707 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	127.931s
+++ [1204 15:21:42] Cleaning up etcd
+++ [1204 15:21:42] Integration test cleanup complete

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 5, 2018
@bsalamat
Copy link
Member Author

bsalamat commented Dec 5, 2018

I figured what caused the performance degradation after #70274 and #70892. It was not extra copying, it was the fact that our original code used to run old-style priority functions in parallel to the map-reduce style ones.
#70892 fixed a race condition, but it was running the old-style priority functions and when they finished, it started running the new style ones. The third commit of this PR, restores the original parallelism.

Benchmarks results with the latest version of this PR:

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/1000Pods-12         	   10000	   8932769 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	112.951s

for i, priorityConfig := range priorityConfigs {
if priorityConfig.Reduce == nil {
for i := range priorityConfigs {
if priorityConfigs[i].Reduce == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity. There will never be a situation when Function != nil and Reduce != nil, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. The Map/Reduce functions are the new style for priority functions. The old-style was using "Function". No priority function is supposed to mix the two and none of our priority functions do that.

@bsalamat
Copy link
Member Author

bsalamat commented Dec 5, 2018

/retest

@wojtek-t
Copy link
Member

wojtek-t commented Dec 5, 2018

/lgtm

great catch!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 5, 2018
Copy link

@misterikkit misterikkit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 720c102 into kubernetes:master Dec 6, 2018
k8s-ci-robot added a commit that referenced this pull request Dec 7, 2018
…22-upstream-release-1.13

Automated cherry pick of #71722 upstream release 1.13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants