Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAC] Add indicator about enabled / disabled rules on the alerts page #116476

Closed
hbharding opened this issue Oct 27, 2021 · 19 comments · Fixed by #119750, #119852 or #120335
Closed

[RAC] Add indicator about enabled / disabled rules on the alerts page #116476

hbharding opened this issue Oct 27, 2021 · 19 comments · Fixed by #119750, #119852 or #120335
Assignees
Labels
enhancement New value added to drive a business result Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" Theme: rac label obsolete v8.0.0 v8.1.0

Comments

@hbharding
Copy link
Contributor

hbharding commented Oct 27, 2021

Summary

When users are on the alerts page, they have no way of knowing if any of their rules are disabled, muted, or have errors. Knowing this information could be helpful in certain situations.

Solution

Add a text indicator like we do on other RAC pages in the right area of the page header. See screenshots below for details.

image

We can use EuiStat size="xs" which comes with a convenient isLoading prop.

AC

  • Rule summary component is created and added to the observability alerts page, which should include:
    • the total number of "observability" rules
    • the number of "observability" rules that are disabled (rule.enabled = false)
    • the number of "observability" rules that are muted (rule.mute_all = true)
    • the number of "observability" rules that have errors (rule.execution_status.status = "error" || "unknown")
    • A "Manage Rules" link to the rules and connectors management page (can this be filtered to our rules? I don't think so at the moment but let's double-check)
  • Do not show this component at all if user doesn't have permission to read info about Rules (i.e. if the API call to the rules fails)

Open Questions

  • Should "errors" include the "unknown" status, or just the "error" status?
  • How do we define "observability" rules? Is there a tagging mechanism when a rule is created, or do we have to maintain a hard-coded list of rule type IDs to query for? If the latter, can we at least centralize that list of IDs for ourselves? What is the current list of these IDs?

Related

  • The same information should be shown on the Security Alerts page, but focused on Security Rules ... we should sync with the security solution team once this is in for us, so they can potentially use the same component (if it's possible to share it)
@hbharding hbharding added enhancement New value added to drive a business result Theme: rac label obsolete Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" labels Oct 27, 2021
@paulb-elastic
Copy link
Contributor

Some work is needed to determine if the data needed can be queried for (i.e. do the APIs exist?)

@hbharding
Copy link
Contributor Author

I'll work on creating a separate issue so that these Stats can link to the Rules page with a relevant filter applied

@jasonrhodes
Copy link
Member

jasonrhodes commented Nov 15, 2021

Refining Questions:

  • How do we query for existing rules? @elastic/kibana-alerting-services
    • Once queried, can we tell which are disabled/muted?
    • Is there a way to query for "only a subset of rule types", and what subset should we use? A hard-coded list of rule type IDs? Is there any way to make this opt-in and dynamic? Options include:
      • Some "observability" tag
      • Using a hard-coded list of feature ID "producer" values
      • Something else?
  • Does "disabled" and "muted" in this view mean "in that state right now"?
  • What does "errors" mean here?

@mgiota
Copy link
Contributor

mgiota commented Nov 16, 2021

@jasonrhodes Observability rule types are registered like this https://github.com/elastic/kibana/blob/main/x-pack/plugins/infra/public/plugin.ts#L37. We could keep track of the observability rule types while registering them and have a list method that returns them back if that's what we want mgiota@6dd0d25#diff-ed7415e46fe4dd47e01009081b6d385fd629bb6dee329826a3c8ae215e56f2eeR35. Would this help?

@ymao1
Copy link
Contributor

ymao1 commented Nov 16, 2021

Are we able to use the Alerting APIs here? We can GET rule by ID or FIND rules.

Once queried, can we tell which are disabled/muted?

You should be able to tell in the response body if the rule is disabled or muted or if certain alerts for the rule are muted.

Is there a way to query for "only a subset of rule types",

The find API takes filters so you can filter by rule type id (still named alertTypeId in the rule saved object). Here is an example of using the find API to filter by rule type: https://localhost:5601/api/alerting/rules/_find?page=1&per_page=10&filter=alert.attributes.alertTypeId%3A(apm.error_rate)&default_search_operator=AND&sort_field=name&sort_order=asc.

@mgiota
Copy link
Contributor

mgiota commented Nov 18, 2021

@ymao1 In the get rule by id api link you posted there are two urls

GET <kibana host>:<port>/api/alerting/rule/<id>

GET <kibana host>:<port>/s/<space_id>/api/alerting/rule/<id>

Is any of these a legacy url that we shouldn't use?

@ymao1
Copy link
Contributor

ymao1 commented Nov 18, 2021

No, both of those are current routes. If you are using them in the client using the kibana http library, I believe it will inject the space id if you're calling it from a custom space.

Here is how we use it for the Rule Management UI: https://github.com/elastic/kibana/blob/main/x-pack/plugins/triggers_actions_ui/public/application/lib/alert_api/rules.ts.

We don't add any special handling to determine if we're in the default or custom space.

@jasonrhodes
Copy link
Member

Thank you so much, @ymao1!

@jasonrhodes
Copy link
Member

@hbharding / @vinaychandrasekhar, refining questions left for you:

  • Does "disabled" and "muted" in this view mean "in that state right now"? I assume we mean that for this kind of "rule summary", right?
  • What does "errors" mean here? Do we have a specific idea or do you just want me to inquire with the Alerting team to see if we expose any kind of "rules that are currently erroring" status?

@vinaychandrasekhar
Copy link

@jasonrhodes

Does "disabled" and "muted" in this view mean "in that state right now"? I assume we mean that for this kind of "rule summary", right?

Yes, in that state currently, and yes on it being "rule summary"

What does "errors" mean here? Do we have a specific idea or do you just want me to inquire with the Alerting team to see if we expose any kind of "rules that are currently erroring" status?

Rules that are currently erroring.

@hbharding if you have a different opinion, please chime in.

@jasonrhodes
Copy link
Member

@ymao1 a few follow up questions re: the API if you don't mind:

Here is the response for a single rule in either of the API calls you mention above:

{
  "id": "0a037d60-6b62-11eb-9e0d-85d233e3ee35",
  "notify_when": "onActionGroupChange",
  "params": {
    "aggType": "avg",
  },
  "consumer": "alerts",
  "rule_type_id": "test.rule.type",
  "schedule": {
    "interval": "1m"
  },
  "actions": [],
  "tags": [],
  "name": "test rule",
  "enabled": true,
  "throttle": null,
  "api_key_owner": "elastic",
  "created_by": "elastic",
  "updated_by": "elastic",
  "mute_all": false,
  "muted_alert_ids": [],
  "updated_at": "2021-02-10T05:37:19.086Z",
  "created_at": "2021-02-10T05:37:19.086Z",
  "scheduled_task_id": "0b092d90-6b62-11eb-9e0d-85d233e3ee35",
  "execution_status": {
    "last_execution_date": "2021-02-10T17:55:14.262Z",
    "status": "ok"
  }
}

Here are my assumptions, please confirm/deny :)

  • enabled: true|false represents whether the rule is enabled
  • mute_all: true|false represents whether we think of this rule as "muted" (vs. individual alert instances being muted)
  • execution_status -- does this always have this same shape? so we could rely on something like execution_status.status being some value that isn't "ok" (we should understand the possible values here and what they represent) to determine how many rules are in some kind of "error" state, right?

Last, do you think the Alerting Framework would be ok with us poking around and possibly suggesting some kind of /rules/summary or /rules/counts API endpoint that brings back just the numbers for these kinds of overall states, but still allows the same query/filter params as the find rules API? We can put together a small RFC for the intended API, but this is just a first quick gut check on whether that sounds ok. I imagine that ES query will be much faster than needing to bring back all of the rule data, and it might be useful for others.

@ymao1
Copy link
Contributor

ymao1 commented Nov 19, 2021

enabled: true|false represents whether the rule is enabled
mute_all: true|false represents whether we think of this rule as "muted" (vs. individual alert instances being muted)

This is correct!

execution_status -- does this always have this same shape? so we could rely on something like

Yes, this should have the same shape. Here is the definition:

export interface AlertExecutionStatus {
status: AlertExecutionStatuses;
lastExecutionDate: Date;
lastDuration?: number;
error?: {
reason: AlertExecutionStatusErrorReasons;
message: string;
};
}

where status can be one of ok | active | error | pending | unknown

do you think the Alerting Framework would be ok with us poking around and possibly suggesting some kind of /rules/summary or /rules/counts API endpoint

Absolutely! It doesn't look like we have an issue for it yet but the saved objects client has added support for aggregations, which the rules client could take advantage of by allowing an additional field in the find API to specify what to aggregate on. Feel free to make an issue and we will triage (or if you have the bandwidth to tackle it, even better :))

@jasonrhodes
Copy link
Member

Yeah I think we could tackle it. We'll create an issue to get more firm feedback and then go from there. Thanks!

@jasonrhodes
Copy link
Member

Ticket for a new endpoint created! We'll move forward using the existing find API for now.

@claudiopro
Copy link
Contributor

Thanks for the precious pointers! I made some progress on the implementation reusing the loadAlertAggregations from your API @ymao1

export async function loadAlertAggregations({
http,
searchText,
typesFilter,
actionTypesFilter,
alertStatusesFilter,
}: {
http: HttpSetup;
searchText?: string;
typesFilter?: string[];
actionTypesFilter?: string[];
alertStatusesFilter?: string[];
}): Promise<AlertAggregations> {

Screenshot 2021-11-25 at 22 41 37

Now I have a question for @hbharding and @vinaychandrasekhar : should the counters reflect the current query? I'd imagine at a minimum you'd want them to be sensitive to kibana.alert.rule.* filters.

@claudiopro claudiopro linked a pull request Nov 29, 2021 that will close this issue
1 task
@vinaychandrasekhar
Copy link

@claudiopro per discussion in the weekly AO call with Katrin and Henry, we'd like to have the numbers reflect the total rules, and should not consider the current filters. Thanks for the question!

@vinaychandrasekhar
Copy link

Separately, @claudiopro a question for you - in the screenshot above, the disabled and muted counts show negative numbers. Is that by intent?

cc @hbharding

@hbharding
Copy link
Contributor Author

hbharding commented Nov 29, 2021

I suspect it's a bug or an in progress screenshot :). Let's make sure that if no Disabled / Muted rules are found, the number should say 0

@claudiopro
Copy link
Contributor

@vinaychandrasekhar @hbharding correct, it's a placeholder to make sure I don't forget to wire the UI with the API request for disabled and muted rules, currently missing (to be implemented with #119852)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" Theme: rac label obsolete v8.0.0 v8.1.0
Projects
None yet
8 participants