New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some scheduler metrics(pending_pods and schedule_attempts_total) are not recorded. #87692
Fix some scheduler metrics(pending_pods and schedule_attempts_total) are not recorded. #87692
Conversation
05ed996
to
26b7452
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @liu-cong
We should back port this to 1.17
@@ -291,6 +291,8 @@ func New(client clientset.Interface, | |||
nodeInfoSnapshot: snapshot, | |||
} | |||
|
|||
metrics.Register() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why moving this to here makes a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, because it has to be done before initializing the scheduling_queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the reason. I wanted to find a more essential & safe way to Register
metric instances, though.
@@ -262,6 +265,9 @@ func Register() { | |||
legacyregistry.MustRegister(metric) | |||
} | |||
volumeschedulingmetrics.RegisterVolumeSchedulingMetrics() | |||
PodScheduleSuccesses = scheduleAttempts.With(metrics.Labels{"result": "scheduled"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this, perhaps we should follow the pendingPods metric approach, define functions that return the metric (see lines 275-292)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I guess we access those handlers directly in the scheduler, ok I guess this is fine.
Looking at v1.16 and v1.15, registering the metrics has always been done after creating the queue, I wonder why this is an issue now. |
Isn't this a bug since the beginning so we need to backport to all applicable versions? |
Yes we do, I realized that this is a bug in 1.16 and 1.15 after I posted this. |
I think commit 8da448d is the main culprit, it switched the scheduler from Prometheus to k8s wrapper in 1.16, but didn't fix the registration. Ok, so this fix must be backported to 1.17 and 1.16 (just backport the call to Register to happen earlier since the attempts metric doesn't exist in 1.16) |
/priority critical-urgent |
@ahg-g: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
/retest |
Thanks for the quick review!! /retest |
hmm, although I'm not familiar with e2e test, I believe that my change would not affect e2e. I'm not sure what I could do other than /retest . |
/retest |
26b7452
to
9d1c6d2
Compare
/lgtm |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, please squash the commit.
/hold
because metric initializations are too early. This causes actual metric instance become no-op. modification made in thie commit to make sure actual metric instance won't be no-op metrics: - re-initialize scheduler/metrics.PodSchedule{Successes, Failure, Errors} after metric creation - scheduler/metrics.Register() should be called before initializing SchedulingQueue,
9d1c6d2
to
c9c4be6
Compare
squashed. |
/hold cancel |
/lgtm |
/retest Review the full test history for this PR. Silence the bot with an |
…-upstream-release-1.16 Automated cherry pick of #87692: Fix pending_pods, schedule_attempts_total was not recorded
…-upstream-release-1.17 Automated cherry pick of #87692: Fix pending_pods, schedule_attempts_total was not recorded
What type of PR is this?
/kind bug
/sig scheduling
What this PR does / why we need it:
fix two scheduler metrics (
pending_pods
andschedule_attempts_total
) are not recorded.Which issue(s) this PR fixes:
Fixes #87690
Special notes for your reviewer:
scheduler/metrics.{PodScheduleSuccesses, PodScheduleFailures, PodScheduleErrors}
andscheduler/metrics.PendingPodsRecorders(ActivePods, UnschedulablePods, BackoffPods)
seems to too early to be initialized.All the
k8s.io/component-base/metrics
's metrics primitives are lazy metric. So, callingsome_metric.With(labels)
before registration returnsnoop
metrics.So I made mainly two modifications.
scheduler/metrics.Registry()
before crerating scheduler queuePodScheduleSuccesses, PodScheduleFailures, PodScheduleErrors
metrics' initializationDoes this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
none.