Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs UI] Log alerts chart previews #75296

Merged
merged 16 commits into from
Aug 25, 2020

Conversation

Kerry350
Copy link
Contributor

@Kerry350 Kerry350 commented Aug 18, 2020

Summary

This PR implements #69530, which adds chart previews to log threshold alerts.

Screenshot 2020-08-19 at 17 24 04

Screenshot 2020-08-19 at 17 25 32

Notes

ℹ️ Prior to implementation I thought there would be a higher crossover with the metrics implementation on the UI side, and the ability to reuse a lot of chart implementation code. However, the metrics chart preview implementation is very heavily built around Metrics Explorer functionality. It doesn't make sense for us to use this, nor would it make sense for metrics to step away from it. As such I've extracted some pieces that seemed truly reusable, and placed them in the shared folder (I haven't converted the metrics code to use these here), but for the most part I've used a logs centric approach. There is potentially, on the logs side, some chart code that could possibly be even more reusable and shared with the anomalies page etc, but I didn't want to go too far down that rabbit hole here 😅

ℹ️ The Colour Palette code has been shifted from Metrics Explorer only, to something that can be shared.

ℹ️ The API is, for the most part, a wrapper around the executor code. It takes the same queries the executor would use, but expands the time range, and adds an inner date histogram.

ℹ️ The lookback period is the same as Metrics, which is 20 * whatever the timeSize is. So if 1 minute is set in the alert, chart previews show the last 20 minutes.

Discussion points

⚠️ I'd be interested in hearing opinions on whether the bars should be stacked when displaying data which has a groupBy set. I went back and forth on which one is better. (@katrin-freihofner would love your opinion here).

I'll show the two cases here (please ignore the yellow colour for the threshold annotation, they're just older screenshots):

Stacked:

Screenshot 2020-08-19 at 13 16 04

This looks nice, and works well with the limited screen real estate. However, it might be a bit misleading with the threshold set. Grouped alerts fire per group, so looking at this chart when a bar is above (or below) the threshold that might imply that, just because the bar is over / under the threshold, the alert would fire, but that would only be the case if the actual portion of the bar for that group was over / under the threshold.

Not stacked:

Screenshot 2020-08-19 at 13 13 55

Here it's much clearer which individual groups would fire the alert, as 1 bar = 1 grouping, it is a more accurate threshold representation imo. However, the limited space makes this a bit (a lot...) of an issue.

Another option could be to use line series, rather than bar, for the grouped scenario.

I did look at the metrics implementation for "metric threshold" alerts, but the charts there don't support displaying group by information.

⚠️ I'd be interested in opinions on what the tooltip should show when there's no group by applied, I've used "Everything", but that almost feels at odds with "Nothing (ungrouped" which we show in the actual group by selector.

Screenshot 2020-08-19 at 17 24 04

@Kerry350 Kerry350 force-pushed the 69530-logs-alerting-chart-previews branch from fd0a675 to 3a1ef97 Compare August 19, 2020 12:52
@Kerry350 Kerry350 force-pushed the 69530-logs-alerting-chart-previews branch from 3a1ef97 to 43c8aa4 Compare August 19, 2020 14:10
@Kerry350 Kerry350 requested a review from a team August 19, 2020 16:26
@Kerry350 Kerry350 self-assigned this Aug 19, 2020
@Kerry350 Kerry350 added Feature:Logs UI Logs UI feature release_note:enhancement Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.10.0 v8.0.0 labels Aug 19, 2020
@Kerry350 Kerry350 added this to the Logs UI 7.10 milestone Aug 19, 2020
@Kerry350 Kerry350 marked this pull request as ready for review August 19, 2020 16:30
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@afgomez afgomez self-requested a review August 20, 2020 09:06
@katrin-freihofner
Copy link
Contributor

Hi Kerry, this looks great! I agree with your comment

This looks nice, and works well with the limited screen real estate. However, it might be a bit misleading with the threshold set.

I think we should go with to option 2. I see that the limited space is an issue. Is there a way we could decrease the time range shown in this case? Also, I'm wondering if we should set a maximum? To only show the "first" 5 in a group and add a message that there are more which are not shown in the chart.
Another option would be to not show the individual datasets but only the maximum number of log lines. The main purpose of the chart is to find out where to set the threshold.

I'd be interested in opinions on what the tooltip should show when there's no group by applied, I've used "Everything", but that almost feels at odds with "Nothing (ungrouped" which we show in the actual group by selector.

My best proposal is to say "Log entries...2"

One tiny thing, can we adjust the timestamp in the tooltip so it does not show milliseconds? I don't think we need that level of detail in this chart.

@Kerry350
Copy link
Contributor Author

@katrin-freihofner Thanks for the feedback 🙏

I think we should go with to option 2.

👌

Is there a way we could decrease the time range shown in this case?

Yep, this is easy to do. I think for the grouped scenario halving the buckets to 10 (maybe even all the way to 5) would work a lot better.

Also, I'm wondering if we should set a maximum? To only show the "first" 5 in a group and add a message that there are more which are not shown in the chart.

Yeah, this sounds like a good idea. With regards to which groups to show, I think it should be the case that with more than thresholds the groups with the 5 highest values are shown, and for less than thresholds the groups with the 5 lowest values are shown (and a message, like you said, to indicate some aren't shown).

Another option would be to not show the individual datasets but only the maximum number of log lines. The main purpose of the chart is to find out where to set the threshold.

Yep, this could certainly work. I agree the charts are here to find out where to set the threshold, but I still think a bit of group information is useful. Even with limiting them to 5 it roughly shows "is this going to fire more than once" (Which I guess does start treading in the "how many times will this alert fire" functionality that metrics has)

My best proposal is to say "Log entries...2"

👍

One tiny thing, can we adjust the timestamp in the tooltip so it does not show milliseconds? I don't think we need that level of detail in this chart.

👍

I'll switch this over to:

  • Unstacked bars
  • Limited groups
  • Smaller time range (for grouped)
  • Altered tooltip text
  • Altered tooltip date formatting

And we can reassess.

@Kerry350
Copy link
Contributor Author

@katrin-freihofner Ooo, actually one last idea (always think of them after hitting submit 😅). I could do 4 group bars, and 1 extra bar which always shows the maximum regardless of threshold comparator?

Copy link
Contributor

@afgomez afgomez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass in the code and I clicked around. I found a small think regarding the "group by" filter, but it might be the expected behaviour.

I set up an alert with log.level == error. With no "group by" it shows some entries in the preview.
Screenshot 2020-08-20 at 15 38 14

However, when I group by event.dataset, I get nothing
Screenshot 2020-08-20 at 15 38 28

My guess is that in the non-grouped version, the logs belong to a document with an empty event.dataset. Should these be treated as an "unknown" group as we do in the anomaly detection?

@katrin-freihofner
Copy link
Contributor

@katrin-freihofner Ooo, actually one last idea (always think of them after hitting submit 😅). I could do 4 group bars, and 1 extra bar which always shows the maximum regardless of threshold comparator?

And the tooltip would say something like:

Max........111
dataset1...110
dataset2....32
dataset3....23
dataset4....11

That could work too!

@Kerry350
Copy link
Contributor Author

Kerry350 commented Aug 20, 2020

@afgomez

My guess is that in the non-grouped version, the logs belong to a document with an empty event.dataset. Should these be treated as an "unknown" group as we do in the anomaly detection?

Yep, your guess is correct, those will be documents with no event.dataset value. This is correct behaviour, insofar as how I think this should work. I wanted the chart previews to correlate with what will actually match criteria on the executor side. In the case of grouped alerts only concrete groups will fire alerts (there's no firing for "unknown", the expectation would be to use no grouping for that behaviour). In my opinion it would be misleading to show chart data against a threshold for "unknown" when it would never fire.

Edit: Also wanted to add that from a technical standpoint the data is a bit different. In the anomalies page case we're dealing with ML documents in ML indices, these documents are stored with a event.dataset: '' value. We then convert '' to unknown. In the case of the alerting data we're querying data from our Filebeat indices with a composite aggregation against the event.dataset field (when that's used as the group by), so if it doesn't exist on the document, it won't land in the aggregation at all. Two very different use cases.

@Kerry350
Copy link
Contributor Author

@afgomez @katrin-freihofner Just pushed up some changes to improve the grouped rendering based on the above conversations. I did backtrack on the idea of adding the "maximum" information no matter what, there's a fair bit of overhead adding it, and I don't think the reward is high enough 🤔

API side I also amended the date histogram to utilise extended bounds to ensure we always have a full set of buckets regardless of empty sets of data at the start / end of the range.

Ungrouped:

Screenshot 2020-08-20 at 18 03 55

Grouped:

Screenshot 2020-08-20 at 17 40 39

@Kerry350
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@afgomez afgomez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor tweaks here and there, but overall it's good work and I don't think anything is really blocking (maybe this). The code works as intended an it's easy to follow.

Nice job 👍

[getChartPreviewData]
);

const isStacked = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line can be removed

Kerry350 and others added 7 commits August 24, 2020 15:32
…n_editor/criteria.tsx

Co-authored-by: Alejandro Fernández Gómez <antarticonorte@gmail.com>
…350/kibana into 69530-logs-alerting-chart-previews
…n_editor/criterion_preview_chart.tsx

Co-authored-by: Alejandro Fernández Gómez <antarticonorte@gmail.com>
…350/kibana into 69530-logs-alerting-chart-previews
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

@kbn/optimizer bundle module count

id value diff baseline
infra 1093 +5 1088

async chunks size

id value diff baseline
infra 3.6MB +70.2KB 3.5MB

page load bundle size

id value diff baseline
infra 276.6KB +962.0B 275.6KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Logs UI Logs UI feature release_note:enhancement Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.10.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants