Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] [Log threshold rule] Explore options to fix "too many buckets" error in alert details page #179640

Closed
benakansara opened this issue Mar 28, 2024 · 7 comments
Labels
Feature:Alerting Team:obs-ux-management Observability Management User Experience Team

Comments

@benakansara
Copy link
Contributor

benakansara commented Mar 28, 2024

Explore the options to fix alerts history chart "too many buckets" error described in #173020 and come up with a suitable solution.

@benakansara benakansara added the Team:obs-ux-management Observability Management User Experience Team label Mar 28, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@benakansara
Copy link
Contributor Author

I think the issue will be resolved once we filter the alerts history chart by group fields in #176718. I checked the same query that results in too_many_buckets_exception with filters applied. With filters, it didn't throw the exception.

We still need to do stress testing to see if there are many alerts for a single group field (for example, host-0 has 1000+ alerts), does it work or not. Based on that we can evaluate if #176718 would be enough to address the underlying issue or do we need to change the implementation of alerts history chart.

@jasonrhodes jasonrhodes changed the title [Spike] [Log threshold rule] Explore options to fix alerts history chart in alert details page [Spike] [Log threshold rule] Explore options to fix "too many buckets" error in alert details page Apr 3, 2024
@jasonrhodes
Copy link
Member

jasonrhodes commented Apr 3, 2024

I think you're right that #176718 will fix the majority of cases where this error could occur. Can we use this issue to figure out how we avoid that "Oops" error even if a rule creates too many buckets?

At the very least, let's log a telemetry event any time a user encounters this error.

@jasonrhodes
Copy link
Member

@elastic/obs-bi-team is there a currently "approved" way for us to log some kind of telemetry event in a certain error scenario, so we can track how many times that scenario appears for customers? I think that would be a sufficient outcome for this ticket at this point (do you agree, @benakansara?)

@almudenasanz
Copy link
Contributor

@benakansara
Copy link
Contributor Author

@jasonrhodes Since we are replacing current history chart with AlertSummaryWidget kind of chart, this issue will become obsolete. The new chart will be common for all alert detail pages, so it will automatically appear on Log threshold alert details page too, and we can remove current history chart. Once that is implemented, we can close this ticket. Wdyt?

@jasonrhodes
Copy link
Member

@benakansara yes, I agree, this was another reason I was excited by that other direction.

Closed in favor of #181475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting Team:obs-ux-management Observability Management User Experience Team
Projects
None yet
Development

No branches or pull requests

4 participants