Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: add metrics to record number of pending pods in different queues #75501

Merged
merged 4 commits into from Apr 9, 2019

Conversation

@Huang-Wei
Copy link
Member

commented Mar 20, 2019

What type of PR is this?

/kind feature
/sig scheduling
/assign @bsalamat

What this PR does / why we need it:

Expose metrics of pending pods so as to know overall health of the whole cluster.

BTW: metrics of scheduled pods has been represented by scheduler_schedule_attempts_total{result="scheduled"}.

Which issue(s) this PR fixes:

Fixes #75267.

Special notes for your reviewer:

  1. commit 4e849b2 was refactored for reusability. @denkensk PTAL to ensure it doesn't break your previous PR.
  2. in addition to unit test 0b0d3ac, this PR is also verified in a single node cluster:
    • case a) taint the node, then run a 100-replica deployment. this is aimed to ensure metrics works well with flushUnschedulableQLeftover which runs every 30 seconds. In this case, unschedulable is always 0 as the cluster doesn't become more "schedulable".
      scheduler_pending_pods_num{queue="active"} 57
      scheduler_pending_pods_num{queue="backoff"} 0
      scheduler_pending_pods_num{queue="unschedulable"} 42
      scheduler_pending_pods_num{queue="active"} 41
      scheduler_pending_pods_num{queue="backoff"} 0
      scheduler_pending_pods_num{queue="unschedulable"} 59
      scheduler_pending_pods_num{queue="active"} 16
      scheduler_pending_pods_num{queue="backoff"} 0
      scheduler_pending_pods_num{queue="unschedulable"} 83
      scheduler_pending_pods_num{queue="active"} 0
      scheduler_pending_pods_num{queue="backoff"} 0
      scheduler_pending_pods_num{queue="unschedulable"} 100
    • case b) taint the node, run a 100-replica deployment, then simulate a high-churn cluster by repeat tainting the node with different key/values. this is aimed to ensure metrics works well with backoff related code.
      scheduler_pending_pods_num{queue="active"} 0
      scheduler_pending_pods_num{queue="backoff"} 0
      scheduler_pending_pods_num{queue="unschedulable"} 100
      scheduler_pending_pods_num{queue="active"} 87
      scheduler_pending_pods_num{queue="backoff"} 0
      scheduler_pending_pods_num{queue="unschedulable"} 12
      scheduler_pending_pods_num{queue="active"} 63
      scheduler_pending_pods_num{queue="backoff"} 33
      scheduler_pending_pods_num{queue="unschedulable"} 3
      scheduler_pending_pods_num{queue="active"} 33
      scheduler_pending_pods_num{queue="backoff"} 33
      scheduler_pending_pods_num{queue="unschedulable"} 33
      scheduler_pending_pods_num{queue="active"} 9
      scheduler_pending_pods_num{queue="backoff"} 68
      scheduler_pending_pods_num{queue="unschedulable"} 22
      scheduler_pending_pods_num{queue="active"} 0
      scheduler_pending_pods_num{queue="backoff"} 98
      scheduler_pending_pods_num{queue="unschedulable"} 2
      scheduler_pending_pods_num{queue="active"} 1
      scheduler_pending_pods_num{queue="backoff"} 87
      scheduler_pending_pods_num{queue="unschedulable"} 11
    • in both cases, total number of pending pods doesn't exceed 100 (some pods can be in scheduling cycle, or goroutines, so total number can be less than 100)

Does this PR introduce a user-facing change?:

scheduler: add metrics to record number of pending pods in different queues
@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Mar 20, 2019

/hold
to avoid accidental merge.

@Huang-Wei Huang-Wei force-pushed the Huang-Wei:scheduler-metrics branch 2 times, most recently from 5a054bf to 91d8ac7 Mar 20, 2019

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Mar 20, 2019

/test pull-kubernetes-e2e-gce-100-performance

@Huang-Wei Huang-Wei force-pushed the Huang-Wei:scheduler-metrics branch 2 times, most recently from 43b3649 to 4a98126 Mar 20, 2019

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Mar 20, 2019

It's quite weird that the import-boss is only happy after applying:

diff --git a/pkg/controller/.import-restrictions b/pkg/controller/.import-restrictions
index 79dbe901..ae0f45d 100644
--- a/pkg/controller/.import-restrictions
+++ b/pkg/controller/.import-restrictions
@@ -257,7 +257,8 @@
         "k8s.io/kubernetes/pkg/fieldpath",
         "k8s.io/kubernetes/pkg/scheduler/volumebinder",
         "k8s.io/kubernetes/pkg/util/resizefs",
-        "k8s.io/kubernetes/pkg/apis/apps"
+        "k8s.io/kubernetes/pkg/apis/apps",
+        "k8s.io/kubernetes/pkg/scheduler/metrics"
       ]
     },
     {

But this PR doesn't even touch code in pkg/controller.

PS: without above change, import-boss reported:

errors in package "k8s.io/kubernetes/pkg/controller/daemon":
import k8s.io/kubernetes/pkg/scheduler/metrics did not match any allowed prefix

@sttts could you kindly shed some light on this?

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Mar 21, 2019

It's just pkg/scheduler/util imports k8s.io/kubernetes/pkg/scheduler/metrics, and I didn't find pkg/controller is related with pkg/scheduler/util.

Updated: pkg/controller does import pkg/scheduler/util:

"k8s.io/kubernetes/pkg/scheduler/util",

@Huang-Wei Huang-Wei force-pushed the Huang-Wei:scheduler-metrics branch from 4a98126 to ed1ec9f Mar 22, 2019

@k8s-ci-robot k8s-ci-robot added sig/apps and removed approved labels Mar 22, 2019

@bsalamat
Copy link
Member

left a comment

I just noticed that I hadn't submitted these comments before. I wrote these last week.

Show resolved Hide resolved pkg/scheduler/metrics/metrics.go Outdated
podInfoMap: make(map[string]*podInfo),
keyFunc: util.GetPodFullName,
}
if metricRecorder != nil {

This comment has been minimized.

Copy link
@bsalamat

bsalamat Mar 25, 2019

Member

No need to do a nil check.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Mar 26, 2019

Author Member

I just put nil check here to tolerate any queue implementation which doesn't want to inject a metric logger.

Prefer to take it out?

This comment has been minimized.

Copy link
@bsalamat

bsalamat Mar 29, 2019

Member

You are only assigning it in the body of the if statement. You are not dereferencing it. It should be fine even if metricRecorder is nil.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Mar 29, 2019

Author Member

oh, I mis-read... I thought you were suggesting removing it in all occurrences. Ok, just the assignment.

data: &heapData{
items: map[string]*heapItem{},
queue: []string{},
keyFunc: keyFn,
lessFunc: lessFn,
},
}
if metricRecorder != nil {

This comment has been minimized.

Copy link
@bsalamat

bsalamat Mar 25, 2019

Member

same here, no need for nil check.

This comment has been minimized.

Copy link
@bsalamat

bsalamat Mar 29, 2019

Member

Similar to above, you are no dereferencing the pointer. You can remove the nil check.

tests := []struct {
name string
operations []operation
operands []*podInfo

This comment has been minimized.

Copy link
@bsalamat

bsalamat Mar 25, 2019

Member

I think the previous version of the code was more readable. Also, it had the option of passing a variable number of arguments to operations, but I see why you had to make this change. It is fine.

@@ -210,6 +220,7 @@ var (
DeprecatedSchedulingAlgorithmPremptionEvaluationDuration,
PreemptionVictims,
PreemptionAttempts,
pendingPods,

This comment has been minimized.

Copy link
@bsalamat

bsalamat Mar 25, 2019

Member

In the issue, I also asked for adding a metric for the number of pods scheduled. We can add that in a separate PR. In that case, this PR shouldn't close #75267.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Mar 26, 2019

Author Member

oh, I mentioned the reason in the PR description: "BTW: metrics of scheduled pods has been represented by scheduler_schedule_attempts_total{result="scheduled"}."

This comment has been minimized.

Copy link
@bsalamat

bsalamat Mar 29, 2019

Member

sorry, I missed it. Thanks!

},
operands: [][]*podInfo{
pInfos[:total],
{nil},

This comment has been minimized.

Copy link
@bsalamat

bsalamat Mar 25, 2019

Member

You can use {} instead.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Mar 26, 2019

Author Member

It has to be kept; otherwise moveAllToActiveQ/flushBackoffQ will be skipped due to no operand.

@Huang-Wei Huang-Wei force-pushed the Huang-Wei:scheduler-metrics branch 2 times, most recently from d011501 to d082b1b Mar 26, 2019

@bsalamat
Copy link
Member

left a comment

/lgtm
/approve

Thanks, @Huang-Wei!

@k8s-ci-robot k8s-ci-robot added the lgtm label Mar 29, 2019

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2019

/cc @mikedanese

Could you please help approve the changes in .import-restrictions - which is to make import-boss happy.

@k8s-ci-robot k8s-ci-robot requested a review from mikedanese Mar 29, 2019

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2019

Friendly ping @mikedanese @derekmahar @janetkuo for /approval on pkg/controller/.import-restrictions.

@bsalamat

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

@Huang-Wei Could you please rebase this? I will then ping one of the controller owners for approval. Thanks!

Huang-Wei added some commits Mar 19, 2019

@Huang-Wei Huang-Wei force-pushed the Huang-Wei:scheduler-metrics branch from d082b1b to 63c3a61 Apr 9, 2019

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Apr 9, 2019

Thanks a lot @bsalamat !! Code has been rebased.

@bsalamat

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Apr 9, 2019

@janetkuo

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, Huang-Wei, janetkuo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Apr 9, 2019

Thanks @janetkuo !

/hold cancel

@Huang-Wei

This comment has been minimized.

Copy link
Member Author

commented Apr 9, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit a93f803 into kubernetes:master Apr 9, 2019

17 checks passed

cla/linuxfoundation Huang-Wei authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@Huang-Wei Huang-Wei deleted the Huang-Wei:scheduler-metrics branch Apr 9, 2019

@brancz
Copy link
Member

left a comment

I realize this was already merged, but could we get a follow up fixing the naming guideline violation? Thanks! :)

(Also as a side note it would be nice if next time sig instrumentation was tagged :) )

pendingPods = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Subsystem: SchedulerSubsystem,
Name: "pending_pods_total",

This comment has been minimized.

Copy link
@brancz

brancz Apr 10, 2019

Member

The total suffix is meant for counter according to the metrics instrumentation guidelines. This should just be “pending_pods”

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Apr 10, 2019

Author Member

Sure. I will raise a follow up PR shortly. We do need sig-instrumentation 's advice on best practices, conventions on this kind of PR :-)

Thanks @brancz .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.