Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metric threshold] Persist group by information and apply it in the alert details page #181689

Merged

Conversation

maryam-saeidi
Copy link
Member

@maryam-saeidi maryam-saeidi commented Apr 25, 2024

Resolves #178998

Summary

This PR

  • Persists group by information and apply it in the alert details page
  • Adds source and tags to the alert summary field
  • Fixes annotation issue on the chart by adding a margin-top

Note
I showed the chart title temporarily in the screenshots below for verification: (You can do the same by removing hideTitle)

State Screenshot
Before
After

How to test

  • Create a metric threshold rule

    • make sure to enable the related feature flag
    xpack.observability.unsafe.alertDetails.observability.enabled: true
    
  • Go to the alert details page and verify the charts show data related to the selected group

    • either remove hideTitle

    • or make sure the data in the chart matches expectations for that specific group

    • or check the metrics_explorer

  • Create an APM Latency threshold rule and check the active alert annotation to have the right color.
    image

…lert details page; Add source and tags to the alert summary field
@maryam-saeidi maryam-saeidi added release_note:skip Skip the PR/issue when compiling release notes Feature:Alert Details Page Observability ux management team labels Apr 25, 2024
@maryam-saeidi maryam-saeidi self-assigned this Apr 25, 2024
@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@maryam-saeidi
Copy link
Member Author

/ci


import React from 'react';

export function Groups({ groups }: { groups: Array<{ field: string; value: string }> }) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this Groups component temporarily until we share the Source component between different rules: #181692

@maryam-saeidi
Copy link
Member Author

/ci

@maryam-saeidi maryam-saeidi marked this pull request as ready for review April 25, 2024 18:20
@maryam-saeidi maryam-saeidi requested review from a team as code owners April 25, 2024 18:20
@fkanout fkanout self-assigned this Apr 26, 2024
@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label Apr 29, 2024
Co-authored-by: Cauê Marcondes <55978943+cauemarcondes@users.noreply.github.com>
@benakansara
Copy link
Contributor

I see that we are introducing kibana.alert.group array in metric threshold rule. I think before we introduce this field, we need to discuss this ticket and make a decision on how we want to streamline saving of group by fields across rules.

We already have context.group and context.groupByKeys as context variables in Metric threshold rule which have string and object formats respectively and customers are familiar with those. With kibana.alert.group, we will have third field for the same purpose with different format.

I think we should also rethink about point Brandon raised here and which Soren mentioned here. I understand that this might be an edge case, but I wander why would we want to do it anyway if we know there is a limitation. Instead groupByKeys which is an object gives better UX where users can directly query kibana.alert.groupByKeys.host.name and it will always give accurate results.

@maryam-saeidi
Copy link
Member Author

We already have context.group and context.groupByKeys as context variables in Metric threshold rule which have string and object formats respectively and customers are familiar with those. With kibana.alert.group, we will have third field for the same purpose with different format.

Those fields are in the action variables and when I was checking AAD on the alert details page, we didn't have those fields available.

I think we should also rethink about point Brandon raised here and which Soren mentioned here. I understand that this might be an edge case, but I wander why would we want to do it anyway if we know there is a limitation.

We had this discussion in this RFC and as you mentioned this is an edge case that we don't know it might even happen for our users and there is a proposal in this document about how to solve it if we saw this turns into a challenge for our users.

Instead groupByKeys which is an object gives better UX where users can directly query kibana.alert.groupByKeys.host.name and it will always give accurate results.

Adding kibana.alert.groupByKeys.host.name is similar to the second approach mentioned in RFC which without dynamic mapping does not provide the auto-suggestion which was a requirement and ResponseOps wasn't onboard with adding dynamic mapping due to its impact on number of fields in mapping limit.

@botelastic botelastic bot added the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label May 10, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@@ -142,7 +142,7 @@ function LatencyChart({
<AlertActiveTimeRangeAnnotation
alertStart={alert.start}
alertEnd={alertEnd}
color={chroma(transparentize('#F04E981A', 0.2)).hex().toUpperCase()}
color={euiTheme.colors.danger}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opacity of this color is now handled in x-pack/packages/observability/alert_details/src/components/alert_active_time_range_annotation.tsx.

Before this change, annotation for Metric threshold had the following issue:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was working on fixing recovered alert annotation in APM Latency chart, euiTheme.colors.danger with opacity 0.1 was not showing right color as I expected, so I used chroma(transparentize('#F04E981A', 0.2)).hex().toUpperCase() with opacity 1. But now it looks fine with euiTheme.colors.danger and opacity 0.1. Not sure what was the issue before. 🤔

Copy link
Contributor

@cauemarcondes cauemarcondes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benakansara benakansara self-requested a review May 14, 2024 08:30
Copy link
Contributor

@benakansara benakansara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we have a line chart as a main chart. In custom threshold rule which is similar to metric threshold, we are showing bar chart. The preview chart in rule form also has bar chart. Ideally I would have expected to work it similar to preview chart or custom threshold alert page. Was it a design decision to use line chart in metric threshold alert page? If so, could you please add the design link to the PR.

Should we remove "Technical preview"? Once we have alerts history PR merged, I think we are good with first version of the alert page. Wdyt?

@benakansara
Copy link
Contributor

Can we adjust the interval of buckets to match with rule interval? For some alerts, it shows no data in chart because the chart isn't able to show the bucket with data.
Screenshot 2024-05-14 at 11 17 22

Screenshot 2024-05-14 at 11 18 03

Custom threshold rule for reference
Screenshot 2024-05-14 at 11 21 08

@benakansara
Copy link
Contributor

In case of only "warning", we might need to adjust the "Threshold breached" component and "Alert started" annotation.
Is it possible to -

  1. change "Threshold breached" to have "Warning when > 1"
  2. change color of alert annotation to match with "warning" color and change text to "Warning started"?
Screenshot 2024-05-14 at 11 28 19 Screenshot 2024-05-14 at 11 25 22

@maryam-saeidi
Copy link
Member Author

@benakansara Thanks for reviewing this PR thoroughly.

I will try to address the comments as much as possible, but if I see there is a challenge in implementing those, I will create a separate ticket to move forward with this PR. The main goal of this PR was adding group information and the rest was developed a long time ago, so I need to check about the reasons for some decisions (in which the direction might have been changed over time).

About your comments, do you see any of them need to be addressed in this PR or are you fine with having separate tickets for all of them in case I see fixing them is not straightforward?

@benakansara
Copy link
Contributor

@maryam-saeidi Let's create separate issues for them and handle outside of this PR. I can create tickets if you agree with the findings.

I will do one more round of testing.

@maryam-saeidi
Copy link
Member Author

@maryam-saeidi Let's create separate issues for them and handle outside of this PR. I can create tickets if you agree with the findings.

I will do one more round of testing.

Sure, let me give them a try, and I will let you know which one will not be covered.

Copy link
Contributor

@benakansara benakansara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tested locally, and worked as expected. 🎉
I have left some comments/questions.

@@ -142,7 +142,7 @@ function LatencyChart({
<AlertActiveTimeRangeAnnotation
alertStart={alert.start}
alertEnd={alertEnd}
color={chroma(transparentize('#F04E981A', 0.2)).hex().toUpperCase()}
color={euiTheme.colors.danger}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was working on fixing recovered alert annotation in APM Latency chart, euiTheme.colors.danger with opacity 0.1 was not showing right color as I expected, so I used chroma(transparentize('#F04E981A', 0.2)).hex().toUpperCase() with opacity 1. But now it looks fine with euiTheme.colors.danger and opacity 0.1. Not sure what was the issue before. 🤔

@maryam-saeidi
Copy link
Member Author

maryam-saeidi commented May 15, 2024

@benakansara Regarding your comments:

  1. Changing to a line chart is easy, but then we will have the challenge of aligning annotations, so let's have a separate ticket for it.
    image

  2. No data chart

Can we adjust the interval of buckets to match with rule interval? For some alerts, it shows no data in chart because the chart isn't able to show the bucket with data.

The data is correctly bucketed based on the look-back window but I think the issue lies somewhere else related to how charts aggregate data. Can you please create a ticket for it and assign it to me? I will investigate it separately.

  1. Warning state
    Good catch, can you please create a ticket for it?
    I think we need to discuss more about how we want to show warning information on this page, keeping in mind that we didn't want to spend much time on this alert details page, but wanted to have a simple base ready.
  1. change "Threshold breached" to have "Warning when > 1"

We can either do this or always show both thresholds and indicate which one is breached in the title.

  1. change color of alert annotation to match with "warning" color and change text to "Warning started"?

I think we can keep it as is since it is an alert even if it is because of breaching the warning threshold. (Just to avoid extra work for a rule that will be deprecated in the future)

I created a ticket to deprecate the feature flag and I included the removal of the technical preview badge there, can you please list the above tickets in the ticket that I shared?

@kibana-ci
Copy link
Collaborator

kibana-ci commented May 16, 2024

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
infra 1555 1557 +2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
apm 3.3MB 3.3MB -222.0B
infra 1.5MB 1.5MB +1.7KB
total +1.5KB

Canvas Sharable Runtime

The Canvas "shareable runtime" is an bundle produced to enable running Canvas workpads outside of Kibana. This bundle is included in third-party webpages that embed canvas and therefor should be as slim as possible.

id before after diff
module count - 5405 +5405
total size - 8.8MB +8.8MB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
infra 105.4KB 105.5KB +90.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @fkanout @maryam-saeidi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:review backport:skip This commit does not require backporting ci:project-deploy-observability Create an Observability project Feature:Alert Details Page Observability ux management team release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Metric threshold][Alert details page] Apply group information on the condition charts
10 participants