Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insights Operator Prometheus Alerts for Insights Recommendations #1036

Merged

Conversation

natiiix
Copy link
Contributor

@natiiix natiiix commented Feb 17, 2022

No description provided.

@bparees
Copy link
Contributor

bparees commented Feb 22, 2022

This looks reasonable and the impact is largely contained to the insights team, so i don't know that additional approval outside the insight team itself is needed, but interested parties would probably include:

@simonpasquier (metrics/monitoring)
@sdodson @wking (telemetry/fleet health observation)

@natiiix natiiix force-pushed the insights-recommendation-alerts branch from 0e8c5c6 to 647478e Compare February 23, 2022 10:55
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. I've made a few comments to ask for more details.

@natiiix natiiix force-pushed the insights-recommendation-alerts branch from 3f07479 to 5ddeff5 Compare February 24, 2022 17:23
@natiiix natiiix force-pushed the insights-recommendation-alerts branch from 5ddeff5 to c1926c7 Compare March 4, 2022 17:55
@natiiix natiiix force-pushed the insights-recommendation-alerts branch from c1926c7 to 6e6063f Compare March 21, 2022 12:34
@natiiix natiiix force-pushed the insights-recommendation-alerts branch from 6e6063f to 6f9e8fb Compare March 22, 2022 06:46
@tremes
Copy link
Contributor

tremes commented Mar 23, 2022

One minor typo. Otherwise it looks good to me. Please squash the commits.
/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2022

An example of what the Insights recommendation metric may look like:
```text
insights_recommendation_active{rule_id="ccx_rules_ocp.external.rules.empty_prometheus_db_volume.report",error_key="PROMETHEUS_DB_VOLUME_IS_EMPTY",severity="moderate",description="Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated",info_url="http://example.com/"} 1646416494
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description here is a longer string than I am used to seeing in Prom metrics. And it doesn't seem to be called out in your earlier:

...will have labels containing the ID, severity (based on the total risk property), a human-readable name, and a link to a detailed description of the recommendation and its remediation steps (Insights Advisor URL...)

You might also want to consider rule_name or just name or some such instead of error_key. Insights rules might not even be errors, right?

Copy link
Contributor Author

@natiiix natiiix Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule_name and error_key are field names that we use throughout the Insights stack, so those are not just some arbitrarily chosen label names.

Description is the name of the field that contains the human-readable name mentioned in the quoted paragraph, and its length was one of the concerns I had initially, but after some discussions, we came to the conclusion that it should not be an issue.

Edit: Here's a link to the definition of the type returned by the Smart Proxy, for instance.
https://github.com/RedHatInsights/insights-results-smart-proxy/blob/c9bb2933a570ea79d9fabe244b7170cfdb5d5281/types/types.go#L52

@natiiix natiiix force-pushed the insights-recommendation-alerts branch from 6f9e8fb to 12fc668 Compare March 30, 2022 00:46
@bparees
Copy link
Contributor

bparees commented Mar 30, 2022

/approve

@bparees
Copy link
Contributor

bparees commented Mar 30, 2022

team doing the implementation should apply the final /lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 30, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, tremes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Update Insights Alerts enhance. based on feedback

Remove template text and add relevant information

Explicitly mention alerts disabled by default

Don't mention Grafana in Insights alerts enhancement

Co-authored-by: Simon Pasquier <spasquie@redhat.com>

Slight clarification of Insights alerts

Insights alerts: add metric sample & minor tweaks

Update Insights alerts based on PR feedback

Answer Insights alerts open questions section

Minor tweaks to Insights recommendation alerts

Update Insights alerts: approvers, test plan
@natiiix natiiix force-pushed the insights-recommendation-alerts branch from 12fc668 to 520168a Compare March 31, 2022 12:50
@tremes
Copy link
Contributor

tremes commented Mar 31, 2022

One of my last suggestions was to slightly extend the test plan scenarios to include more error scenarios. This was updated. I reviewed it several times and I think we (CCX/Insights) should be able to implement it. Thanks @natiiix.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 31, 2022

@natiiix: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit cf7f2a0 into openshift:master Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants