Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1916660: UPSTREAM: 97916: kube-aggregator: fix apiservice availability gauge #25800

Merged

Conversation

dgrisonnet
Copy link
Member

When an apiservice is deleted, its relative
aggregator_unavailable_apiservice metric remains with the value of the
last availability observed. Hence, if an apiservice is deleted while
being unavailable, the metric remains marked as unavailable.
This presents some problems when alerting on unavailable apiservices
as deleted apiservices might trigger the alert indefinitely.

To solve this issue, the aggregator_unavailable_apiservice metric should
only reflect the availability of existing apiservices.

This is achievable by using a custom Collector instead of a GaugeVec and
create throw-away metrics based on an apiservice lister output. With
this approach, on deletion, the apiservice will not be listed anymore,
resulting in its availability metric not being exposed.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jan 15, 2021
@openshift-ci-robot
Copy link

@dgrisonnet: This pull request references Bugzilla bug 1916660, which is invalid:

  • expected dependent Bugzilla bug 1888866 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ASSIGNED instead
  • expected dependent Bugzilla bug 1888866 to target a release in 4.6.0, 4.6.z, but it targets "4.7.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1916660: UPSTREAM: 97916: kube-aggregator: fix apiservice availability gauge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dgrisonnet
Copy link
Member Author

/assign @sttts

@openshift-ci-robot openshift-ci-robot added the vendor-update Touching vendor dir or related files label Jan 15, 2021
@dgrisonnet
Copy link
Member Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jan 26, 2021
@openshift-ci-robot
Copy link

@dgrisonnet: This pull request references Bugzilla bug 1916660, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.z) matches configured target release for branch (4.5.z)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1915247 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1915247 targets the "4.6.z" release, which is one of the valid target releases: 4.6.0, 4.6.z
  • bug has dependents

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jan 26, 2021
When an apiservice is deleted, its relative
aggregator_unavailable_apiservice metric remains with the value of the
last availability observed. Hence, if an apiservice is deleted while
being unavailable, the metric remains marked as unavailable.
This presents some problems when alerting on unavailable apiservices
as deleted apiservices might trigger the alert indefinitely.

To solve this issue, the aggregator_unavailable_apiservice metric should
only reflect the availability of existing apiservices.

This is achievable by using a custom Collector instead of a GaugeVec and
create throw-away metrics based on an apiservice lister output. With
this approach, on deletion, the apiservice will not be listed anymore,
resulting in its availability metric not being exposed.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
@dgrisonnet
Copy link
Member Author

/retest

1 similar comment
@dgrisonnet
Copy link
Member Author

/retest

@wangke19
Copy link

/bugzilla cc-qa

@openshift-ci-robot
Copy link

@wangke19: This pull request references Bugzilla bug 1916660, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.z) matches configured target release for branch (4.5.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1915247 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1915247 targets the "4.6.z" release, which is one of the valid target releases: 4.6.0, 4.6.z
  • bug has dependents

Requesting review from QA contact:
/cc @wangke19

In response to this:

/bugzilla cc-qa

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wangke19
Copy link

Launched one 4.5 cluster with this PR successfully, verification steps see https://bugzilla.redhat.com/show_bug.cgi?id=1916660#c3

@wangke19
Copy link

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 27, 2021
@openshift-ci-robot
Copy link

@wangke19: The label(s) qe-approved cannot be applied, because the repository doesn't have them

In response to this:

/label qe-approved

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dgrisonnet
Copy link
Member Author

/retest

@sttts
Copy link
Contributor

sttts commented Jan 27, 2021

/approve

@sttts
Copy link
Contributor

sttts commented Jan 27, 2021

/bugzilla refresh

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, sttts, wangke19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

@sttts: This pull request references Bugzilla bug 1916660, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.z) matches configured target release for branch (4.5.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1915247 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1915247 targets the "4.6.z" release, which is one of the valid target releases: 4.6.0, 4.6.z
  • bug has dependents

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@dgrisonnet
Copy link
Member Author

@sttts the ci/prow/e2e-agnostic-cmd job seems to be failing constantly for unrelated reasons. Is there anything for me to do in order to have this PR merged?

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@knobunc knobunc added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Feb 3, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

14 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2021

@dgrisonnet: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-cmd 5bd6e7c link /test e2e-agnostic-cmd

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 433c650 into openshift:release-4.5 Feb 4, 2021
@openshift-ci-robot
Copy link

@dgrisonnet: All pull requests linked via external trackers have merged:

Bugzilla bug 1916660 has been moved to the MODIFIED state.

In response to this:

Bug 1916660: UPSTREAM: 97916: kube-aggregator: fix apiservice availability gauge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dgrisonnet dgrisonnet deleted the pick-97916-4.5 branch February 4, 2021 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants