Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scheduler feature usage metrics #838

Merged
merged 1 commit into from Jul 10, 2020

Conversation

damemi
Copy link

@damemi damemi commented Jul 2, 2020

Adds cluster_legacy_scheduler_policy (openshift/cluster-kube-scheduler-operator#250) and cluster_master_schedulable (openshift/cluster-kube-scheduler-operator#168) metrics

This is important for us because the upstream scheduler is migrating from Policy configs (setting predicates/priorities) to a new Profile config, and is planning to remove Policy support entirely in the coming releases.

This requires us to react (see openshift/origin#25203, openshift/cluster-kube-scheduler-operator#255, openshift/api#677) if we wish to continue supporting the ability to configure the scheduler algorithm through KSO.

The current usage % of Policy configs from cluster_legacy_scheduler_policy will help us determine whether it is worth continuing to support. cluster_master_schedulable, assumed being a reasonably used setting, will help set the benchmark threshold against which Policy config is compared (as well as provide usage information for that feature).

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@damemi
Copy link
Author

damemi commented Jul 2, 2020

/cc @deads2k

@@ -209,6 +209,12 @@ data:
# Possible values are bootstrap|cloud-resources|monitoring|authentication|products|solution-explorer|deletion|complete.
# This metric is used by OCM to detect when an RHMI installation is complete & ready to use i.e. rhmi_status{stage='complete'}
- '{__name__="rhmi_status"}'
# (workloads-team, @openshift/openshift-group-b) cluster_legacy_scheduler_policy reports whether the scheduler operator is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the metric itself have a description that explains the labels and values? If so, copy it here (along with why). Not knowing possible values (and really should just be a copy of the core metric) might cause a customer / user to ask "but what is it reporting" and we want that answer to be easy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are "boolean" gauges that just report 0 or 1 (https://github.com/openshift/cluster-kube-scheduler-operator/blob/master/pkg/operator/configmetrics/configmetrics.go#L15-L21). Updated the description with that

@smarterclayton
Copy link
Contributor

In description add how many series you expect.

Also, in general, your commit message and title should roughly contain some of this info (so someone doesn't have to look at the PR later)

@damemi
Copy link
Author

damemi commented Jul 2, 2020

In description add how many series you expect.

@smarterclayton both of these metrics just report 0 or 1 for the cluster. Is that 1 or 2 series?

@metalmatze
Copy link
Contributor

That would be one series with a value of 0 or 1, I suppose. 😊

@metalmatze
Copy link
Contributor

Just out of curiosity: Do you want to later on count the number of cluster that have this enabled vs the ones that have it disabled? Trying to understand the use case 😉

@damemi
Copy link
Author

damemi commented Jul 2, 2020

Just out of curiosity: Do you want to later on count the number of cluster that have this enabled vs the ones that have it disabled? Trying to understand the use case 😉

@metalmatze that's exactly what we want to gather, a percent of clusters that have these features enabled to gauge against our effort required to continue supporting them

The metrics 'cluster_legacy_scheduler_policy' and 'cluster_master_schedulable' will be used
to measure the usage of the custom Policy config and mastersSchedulable settings in kube-scheduler-operator.
These are important to determine if it is necessary to continue supporting Policy config as upstream
migrates to Plugins.
@s-urbaniak
Copy link
Contributor

@s-urbaniak
Copy link
Contributor

@openshift/openshift-team-monitoring (cc @brancz ) from my perspective lgtm modulo the naming, asking for another set of eyes.

@s-urbaniak
Copy link
Contributor

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2020
@brancz
Copy link
Contributor

brancz commented Jul 3, 2020

As these are gauges, I'm fine with the naming. Leaving final lgtm to @smarterclayton though to give a chance to verify if his comment was addressed in a way he expected.

@damemi
Copy link
Author

damemi commented Jul 7, 2020

bump @smarterclayton does this look good to you?

@damemi
Copy link
Author

damemi commented Jul 7, 2020

/retest

@smarterclayton
Copy link
Contributor

This LGTM, feel free to tag.

@brancz
Copy link
Contributor

brancz commented Jul 9, 2020

/lgtm
/retest

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, damemi, s-urbaniak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@damemi
Copy link
Author

damemi commented Jul 9, 2020

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

13 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 83f478c into openshift:master Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants