Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acknowledging an alert while it's being resolved unresolves it #3161

Closed
deepy opened this issue Oct 18, 2023 · 5 comments
Closed

Acknowledging an alert while it's being resolved unresolves it #3161

deepy opened this issue Oct 18, 2023 · 5 comments
Labels
awaiting input Awaiting input/answer from user responsible for the issue/PR bug Something isn't working part:alert flow & configuration

Comments

@deepy
Copy link

deepy commented Oct 18, 2023

What went wrong?

What happened:
In Slack I saw an alert, I hit acknowledge and intended to start working on it
But the alert resolved before my acknowledge was completed and before the Slack UI updated to show it being resolved

Alert Log clearly shows the issue

4m20s: resolved
4m20s: unresolved by anordlund (@Alex Nordlund)
4m20s: acknowledged by anordlund (@Alex Nordlund)

What did you expect to happen:
A resolve happening within seconds of an acknowledgement should hold priority, especially as the ChatOps flow has some additional latency

How do we reproduce it?

Acknowledge an alert just as it's being resolved

Grafana OnCall Version

Cloud

Product Area

Chatops, Other

Grafana OnCall Platform?

I use Grafana Cloud

User's Browser?

No response

Anything else to add?

No response

@deepy deepy added the bug Something isn't working label Oct 18, 2023
@joeyorlando
Copy link
Contributor

@deepy what does the escalation chain related to this alert group look like?

@joeyorlando joeyorlando added part:alert flow & configuration awaiting input Awaiting input/answer from user responsible for the issue/PR and removed part:chatops more info needed needs triage labels Jun 14, 2024
@deepy
Copy link
Author

deepy commented Jun 17, 2024

@joeyorlando escalation chain only has wait and notification steps
The alert group was resolved by the firing alert returning to healthy

@joeyorlando
Copy link
Contributor

@deepy I assume you have an Autoresolution template set for the integration (docs)?

The behavior of acknowledge changing a resolved alert group from resolved -> unresolved -> acknowledged is expected because performing the acknowledge always puts the alert group into that state and a prerequisite is that it is unresolved. So why does this happen? When a user received this notification the task notifying them does not know it has been resolved since these events are happening concurrently. When the user acknowledges it performs this action because if that is what the user actually wants (ie. Assign this alert group to me) we cannot ignore the input and leave it in the resolved state.

If this is a common occurrence (Alert groups resolving quickly at the same time initial notifications are taking place) my recommendation is to add a 1 minute delay (or longer) into the escalation chain before people are contacted which would give time to auto-resolve and then the user would not be contacted so unnecessarily.

@deepy
Copy link
Author

deepy commented Jun 17, 2024

@joeyorlando I understand why it's happening, but the problem here is that the slack integration can be pretty slow and the actual details on when the alert fires, the notification happens, and other variables don't really factor in

This is a race condition and as a user you get no feedback about this happening, in fact from what I recall everything looked normal because the alert group log is inside the slack thread and the acknowledge button is available outside the thread
I get that fixing this might come with a higher price than its worth, but at the time it was somewhat frustrating as I ended up on a wild-goose chase (though short)

@deepy
Copy link
Author

deepy commented Jun 17, 2024

Actually now that I think of it, this is purely a slack UX issue. It's not a problem that the alert group gets unresolved, the problem is that it happens without my knowledge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting input Awaiting input/answer from user responsible for the issue/PR bug Something isn't working part:alert flow & configuration
Projects
None yet
Development

No branches or pull requests

3 participants