Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short circuit volume checker if the pod is not requesting any volumes #73652

Merged
merged 2 commits into from Feb 3, 2019

Conversation

bsalamat
Copy link
Member

@bsalamat bsalamat commented Feb 2, 2019

What type of PR is this?
/kind cleanup

This is an optimization.

What this PR does / why we need it:
Volume checker predicate function turns out to take about 5% of the scheduler execution time in our benchmarks while pods in those tests do not request any volumes. This PR short circuits the logic for pods that do not need any volumes.

Before:

BenchmarkScheduling/5000Nodes/1000Pods-12         	   10000	   9140564 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	114.688s

After:

pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/1000Pods-12         	   10000	   8541605 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	108.582s

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

/sig scheduling
/assign @msau42 @cofyc

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Feb 2, 2019
@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 2, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2019
@bsalamat bsalamat added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 2, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 2, 2019
@@ -1643,6 +1643,11 @@ func NewVolumeBindingPredicate(binder *volumebinder.VolumeBinder) FitPredicate {
}

func (c *VolumeBindingChecker) predicate(pod *v1.Pod, meta PredicateMetadata, nodeInfo *schedulernodeinfo.NodeInfo) (bool, []PredicateFailureReason, error) {
// If pod does not request any volumes, we don't need to do anything.
if len(pod.Spec.Volumes) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might help the benchmark but I'm not sure if it will help in reality because there is an admission controller that injects a secret volume into every pod.

This predicate should be a no-op if the pod does not use any PVCs. @cofyc can you help check if there's anything we can fix in FindPodVolumes to return early?

Copy link
Member

@cofyc cofyc Feb 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FindPodVolumes will return early if pod has no PVCs, see

// FindPodVolumes caches the matching PVs and PVCs to provision per node in podBindingCache.
// This method intentionally takes in a *v1.Node object instead of using volumebinder.nodeInformer.
// That's necessary because some operations will need to pass in to the predicate fake node objects.
func (b *volumeBinder) FindPodVolumes(pod *v1.Pod, node *v1.Node) (unboundVolumesSatisfied, boundVolumesSatisfied bool, err error) {
podName := getPodName(pod)
// Warning: Below log needs high verbosity as it can be printed several times (#60933).
klog.V(5).Infof("FindPodVolumes for pod %q, node %q", podName, node.Name)
// Initialize to true for pods that don't have volumes
unboundVolumesSatisfied = true
boundVolumesSatisfied = true
start := time.Now()
defer func() {
VolumeSchedulingStageLatency.WithLabelValues("predicate").Observe(time.Since(start).Seconds())
if err != nil {
VolumeSchedulingStageFailed.WithLabelValues("predicate").Inc()
}
}()
if !podHasClaims(pod) {
// Fast path
return unboundVolumesSatisfied, boundVolumesSatisfied, nil
}

How about skip VolumeSchedulingStageLatency logic and klog.V in FindPodVolumes too? This will make predicate a real no-op if the pod does not use any PVCs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @msau42 and @cofyc!
We definitely need to fix this issue. The current predicate is unacceptably slow for pods that do not request volumes.
FindPodVolumes gets pod name, reads time which often needs a system call, and increments Prometheus metrics even when a pod does not need any volume.
The predicate itself (which calls FindPodVolumes) also allocates an array for failure reasons and processes the output of FindPodVolumes for all pods whether they request a volume or not. The predicate itself must have a check to bypass all this and return immediately when a pod does not need a volume. I changed the PR to check PVCs and return immediately if there is no PVCs.
PTAL.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2019
@cofyc
Copy link
Member

cofyc commented Feb 2, 2019

LGTM
It makes sense to skip FindPodVolumes in predicate function when it knows there is no need to find pod volumes. Prometheus metrics in volume binder will record latency only for pods which needs volume scheduling, I think this is acceptable and good for performance.
btw, TestSchedulerWithVolumeBinding fails, I guess we need to pass a pod with PVCs to test VolumeBinder now

@bsalamat
Copy link
Member Author

bsalamat commented Feb 2, 2019

Thanks, @cofyc!
I added a PVC to the pod and the client to make the test happy.

Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @bsalamat. LGTM. I will wait for @cofyc to give a final lgtm.

func (c *VolumeBindingChecker) predicate(pod *v1.Pod, meta PredicateMetadata, nodeInfo *schedulernodeinfo.NodeInfo) (bool, []PredicateFailureReason, error) {
// If pod does not request any PVC, we don't need to do anything.
if !podHasPVCs(pod) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a log statement with level 5 or above would here would help in debugging.

@cofyc
Copy link
Member

cofyc commented Feb 3, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 3, 2019
@k8s-ci-robot k8s-ci-robot merged commit 1b8435d into kubernetes:master Feb 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants