AggregatedAPIDown alert threshold set back to 85% #1237

raptorsun · 2021-06-22T13:02:59Z

I added CHANGELOG entry for this change.
No user facing changes, so no entry in CHANGELOG was needed.

This pull request reverts a patch to version 4.8 that mitigates a problem when upgrading monitoring components from 4.7 to 4.8: AggregatedAPIDown is triggered due to a long period of unavailability of prometheus adapter. Both deployments are offline at the same time during the upgrade.

After integrating these 2 PRs(PR1, PR2) into 4.8 release, Prometheus Adapter is able to keep itself always available during the upgrade process. We can set back the alert threshold to its default value 85%.

…to 85%

raptorsun · 2021-06-23T15:59:42Z

/retest

openshift-ci · 2021-06-23T18:50:44Z

@raptorsun: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-single-node	`66086e9`	link	`/test e2e-aws-single-node`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

simonpasquier · 2021-06-29T08:32:53Z

/lgtm
/hold
@dgrisonnet can you have a second look? Once it merges, we need to carefully monitor the CI logs to verify that the AggregatedAPIDown alert doesn't start firing again for 4.9 jobs.

openshift-ci · 2021-06-29T08:33:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: raptorsun, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [raptorsun,simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dgrisonnet · 2021-06-29T10:01:20Z

Looks good from my side.
I will monitor the CI, but the alert shouldn't fire anymore because of non highly available aggregated APIs since they should all have been made HA in 4.8, so we shouldn't have any disruption during a 4.8 to 4.9 upgrade.

/unhold

openshift-bot · 2021-06-29T10:29:39Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-06-29T10:53:51Z

/retest

Please review the full test history for this PR and help us cut down flakes.

revert PR openshift#1211: AggregatedAPIDown alert threshold set back …

66086e9

…to 85%

openshift-ci bot requested review from fpetkovski and simonpasquier June 22, 2021 13:03

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 22, 2021

raptorsun changed the title ~~[WIP / Do not merge] Test: AggregatedAPIDown alert threshold set back to 85%~~ [WIP] Test: AggregatedAPIDown alert threshold set back to 85% Jun 22, 2021

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 22, 2021

raptorsun changed the title ~~[WIP] Test: AggregatedAPIDown alert threshold set back to 85%~~ AggregatedAPIDown alert threshold set back to 85% Jun 23, 2021

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 23, 2021

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 29, 2021

openshift-ci bot assigned simonpasquier Jun 29, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2021

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 29, 2021

openshift-merge-robot merged commit f4b1311 into openshift:master Jun 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AggregatedAPIDown alert threshold set back to 85% #1237

AggregatedAPIDown alert threshold set back to 85% #1237

raptorsun commented Jun 22, 2021 •

edited

raptorsun commented Jun 23, 2021

openshift-ci bot commented Jun 23, 2021

simonpasquier commented Jun 29, 2021

openshift-ci bot commented Jun 29, 2021

dgrisonnet commented Jun 29, 2021

openshift-bot commented Jun 29, 2021

openshift-bot commented Jun 29, 2021

AggregatedAPIDown alert threshold set back to 85% #1237

AggregatedAPIDown alert threshold set back to 85% #1237

Conversation

raptorsun commented Jun 22, 2021 • edited

raptorsun commented Jun 23, 2021

openshift-ci bot commented Jun 23, 2021

simonpasquier commented Jun 29, 2021

openshift-ci bot commented Jun 29, 2021

dgrisonnet commented Jun 29, 2021

openshift-bot commented Jun 29, 2021

openshift-bot commented Jun 29, 2021

raptorsun commented Jun 22, 2021 •

edited