New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scheduler feature usage metrics #838
Add scheduler feature usage metrics #838
Conversation
/cc @deads2k |
@@ -209,6 +209,12 @@ data: | |||
# Possible values are bootstrap|cloud-resources|monitoring|authentication|products|solution-explorer|deletion|complete. | |||
# This metric is used by OCM to detect when an RHMI installation is complete & ready to use i.e. rhmi_status{stage='complete'} | |||
- '{__name__="rhmi_status"}' | |||
# (workloads-team, @openshift/openshift-group-b) cluster_legacy_scheduler_policy reports whether the scheduler operator is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the metric itself have a description that explains the labels and values? If so, copy it here (along with why). Not knowing possible values (and really should just be a copy of the core metric) might cause a customer / user to ask "but what is it reporting" and we want that answer to be easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are "boolean" gauges that just report 0 or 1 (https://github.com/openshift/cluster-kube-scheduler-operator/blob/master/pkg/operator/configmetrics/configmetrics.go#L15-L21). Updated the description with that
In description add how many series you expect. Also, in general, your commit message and title should roughly contain some of this info (so someone doesn't have to look at the PR later) |
@smarterclayton both of these metrics just report 0 or 1 for the cluster. Is that 1 or 2 series? |
That would be one series with a value of |
Just out of curiosity: Do you want to later on count the number of cluster that have this enabled vs the ones that have it disabled? Trying to understand the use case 😉 |
@metalmatze that's exactly what we want to gather, a percent of clusters that have these features enabled to gauge against our effort required to continue supporting them |
The metrics 'cluster_legacy_scheduler_policy' and 'cluster_master_schedulable' will be used to measure the usage of the custom Policy config and mastersSchedulable settings in kube-scheduler-operator. These are important to determine if it is necessary to continue supporting Policy config as upstream migrates to Plugins.
|
@openshift/openshift-team-monitoring (cc @brancz ) from my perspective lgtm modulo the naming, asking for another set of eyes. |
/approve |
As these are gauges, I'm fine with the naming. Leaving final lgtm to @smarterclayton though to give a chance to verify if his comment was addressed in a way he expected. |
bump @smarterclayton does this look good to you? |
/retest |
This LGTM, feel free to tag. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brancz, damemi, s-urbaniak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
13 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
Adds
cluster_legacy_scheduler_policy
(openshift/cluster-kube-scheduler-operator#250) andcluster_master_schedulable
(openshift/cluster-kube-scheduler-operator#168) metricsThis is important for us because the upstream scheduler is migrating from Policy configs (setting predicates/priorities) to a new Profile config, and is planning to remove Policy support entirely in the coming releases.
This requires us to react (see openshift/origin#25203, openshift/cluster-kube-scheduler-operator#255, openshift/api#677) if we wish to continue supporting the ability to configure the scheduler algorithm through KSO.
The current usage % of Policy configs from
cluster_legacy_scheduler_policy
will help us determine whether it is worth continuing to support.cluster_master_schedulable
, assumed being a reasonably used setting, will help set the benchmark threshold against which Policy config is compared (as well as provide usage information for that feature).