Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable #643

Merged

Conversation

wking
Copy link
Member

@wking wking commented Aug 25, 2021

These are the only two labels we set on the metric, but the Prometheus scraper adds some more, like job, namespace, pod, etc., to describe who was scraping what. Reducing to the labels we care about avoids annoying re-triggers, e.g. the CVO pod changes. With this change, folks will be able to use a single silence per channel/upstream tuple.

I could even see stripping all the labels, but folks who care enough to bump their channel or upstream are presumably interested in hearing about available updates, at least for a while, so having them re-silence if they go back to not caring doesn't sound that tedious.

@openshift-ci openshift-ci bot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 25, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2021

@wking: This pull request references Bugzilla bug 1997596, which is invalid:

  • expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…ls for UpdateAvailable

These are the only two labels we set on the metric, but the Prometheus
scraper adds some more, like job, namespace, pod, etc., to describe
who was scraping what.  Reducing to the labels we care about avoids
annoying re-triggers, e.g. the CVO pod changes [1].  I'm using [2]:

  sum by (channel,upstream) (cluster_version_available_updates)

to aggregate over all cluster_version_available_updates series by
collapsing the other labels and keeping only the listing two.  For
example, an input series like:

  cluster_version_available_updates{channel="stable-4.8", endpoint="metrics", instance="192.168.1.164:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-7dd68fd686-p7ckm", prometheus="openshift-monitoring/k8s", receive="true", service="cluster-version-operator", upstream="https://api.openshift.com/api/upgrades_info/v1/graph"} 3

will be collapsed to:

  {channel="stable-4.8",upstream="https://api.openshift.com/api/upgrades_info/v1/graph"} 3

if there happened to be a second cluster_version_available_updates in
the cluster (which is unlikely, because the CVO only serves metrics
after acquiring the leader lock), that would get added in too.  With
the label collapse, folks will be able to use a single silence per
channel/upstream tuple.

I could even see stripping all the labels, but folks who care enough
to bump their channel or upstream are presumably interested in hearing
about available updates, at least for a while, so having them
re-silence if they go back to not caring doesn't sound that tedious.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1997596
[2]: https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2021
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 25, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [LalatenduMohanty,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Member Author

wking commented Aug 25, 2021

/bugzilla refresh

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Aug 25, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2021

@wking: This pull request references Bugzilla bug 1997596, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jianlinliu

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Aug 25, 2021
@openshift-ci openshift-ci bot requested a review from jianlinliu August 25, 2021 19:52
@wking
Copy link
Member Author

wking commented Aug 25, 2021

Neither failure was relevant to this alert touch.

/override ci/prow/e2e-agnostic
/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2021

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic, ci/prow/e2e-agnostic-upgrade

In response to this:

Neither failure was relevant to this alert touch.

/override ci/prow/e2e-agnostic
/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 17d9690 into openshift:master Aug 25, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2021

@wking: All pull requests linked via external trackers have merged:

Bugzilla bug 1997596 has been moved to the MODIFIED state.

In response to this:

Bug 1997596: install/0000_90_cluster-version-operator_02_servicemonitor: Trim labels for UpdateAvailable

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants