Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install/0000_90_cluster-version-operator_02_servicemonitor: Info-level alert for available updates #415

Merged

Conversation

wking
Copy link
Member

@wking wking commented Jul 23, 2020

The purpose of this alert is to give cluster-admins something they can subscribe to (via Alertmanager notifications) to let them know when they can initiate an alert update for a particular cluster.

It is info severity because we want the quick push notification, but letting your cluster sit on the old version for a few days or a week or so is unlikely to cause much trouble. For things like CVE fixes, a more timely update might be advisable, but the cluster-version operator currently has no way to make the "critical CVE" vs. "no fixes, but adds a feature" distinction. So I'm punting on the warn-level or so "CVO is getting grumpy about your failure to take the long-recommended update" alert for now, and just addressing the info-level "you could update, no pressure".

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2020
@vrutkovs
Copy link
Member

I like the idea of adding this to alertmanager, but wouldn't it be too annoying for admin console users?

We show "upgrade available" on the top navbar, on Overview page and on Cluster Settings page. This alert would show on Notifications page - isn't that enough?

Lets have this approved by UI/UX team?

@jottofar
Copy link
Contributor

jottofar commented Jul 23, 2020

they can initiate an alert for a particular cluster.

In description, shouldn't this be:
"they can initiate an update for a particular cluster."

@sdodson
Copy link
Member

sdodson commented Jul 23, 2020

@ncameronbritt please lend us your eyes, I know you were looking at how to improve notification of updates, this seems like a good approach and can be completely orthogonal to anything that we're doing elsewhere, the main thing is that alerts are meant to feed into other systems and those systems could allow customers to choose how to handle this without being in the console.

@ncameronbritt
Copy link

I think this is a good approach. And I believe we discussed that if we had alerts for available upgrades (as proposed here), then we probably don't need the indication we have in the console today, because customers could still see the alert in the console. This design shows that: https://marvelapp.com/prototype/64j205c/screen/70755417
@cshinn @megan-hall @beanh66, what do you all think?

@wking wking force-pushed the alert-on-available-updates branch from c53d639 to b397d8f Compare July 23, 2020 15:59
@wking
Copy link
Member Author

wking commented Jul 23, 2020

"they can initiate an update for a particular cluster."

Fixed with c53d639 -> b397d8f, and edited the initial PR comment too.

…l alert for available updates

The purpose of this alert is to give cluster-admins something they can
subscribe to (via Alertmanager notifications) to let them know when
they can initiate an update for a particular cluster.

It is info severity because we want the quick push notification, but
letting your cluster sit on the old version for a few days or a week
or so is unlikely to cause much trouble.  For things like CVE fixes, a
more timely update might be advisable, but the cluster-version
operator currently has no way to make the "critical CVE" vs. "no
fixes, but adds a feature" distinction.  So I'm punting on the
warn-level or so "CVO is getting grumpy about your failure to take the
long-recommended update" alert for now, and just addressing the
info-level "you could update, no pressure".
@wking wking force-pushed the alert-on-available-updates branch from b397d8f to ccf0882 Compare July 31, 2020 03:23
@wking
Copy link
Member Author

wking commented Jul 31, 2020

e2e failed on an end user can use OLM can subscribe to the operator, which is the unrelated rhbz#1862322.

/override ci/prow/e2e

@openshift-ci-robot
Copy link
Contributor

@wking: Overrode contexts on behalf of wking: ci/prow/e2e

In response to this:

e2e failed on an end user can use OLM can subscribe to the operator, which is the unrelated rhbz#1862322.

/override ci/prow/e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@vrutkovs vrutkovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 31, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrutkovs, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 5974984 into openshift:master Jul 31, 2020
@wking wking deleted the alert-on-available-updates branch July 31, 2020 13:27
annotations:
message: Your upstream update recommendation service recommends you update your cluster. For more information refer to 'oc adm upgrade'{{ "{{ with $console_url := \"console_url\" | query }}{{ if ne (len (label \"url\" (first $console_url ) ) ) 0}} or {{ label \"url\" (first $console_url ) }}/settings/cluster/{{ end }}{{ end }}" }}.
expr: |
cluster_version_available_updates > 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh, this should be >0, right?

Copy link
Member

@vrutkovs vrutkovs Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so - I'll try it out and file bug to fix it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed; filed #432 to fix.

wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 11, 2020
… UpdateAvailable

Fixing an off-by-one from ccf0882
(install/0000_90_cluster-version-operator_02_servicemonitor:
Info-level alert for available updates, 2020-07-22, openshift#415).  One
available update is all we need for this to go off.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants