Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce repeatable alert #314

Closed
samanhappy opened this issue Mar 17, 2023 · 2 comments · Fixed by #315
Closed

Introduce repeatable alert #314

samanhappy opened this issue Mar 17, 2023 · 2 comments · Fixed by #315
Labels
enhancement New feature or request

Comments

@samanhappy
Copy link
Collaborator

Currently the notification in EaseProbe is using edge triggered mode which means one notification will be sent only once when the status changes, but people may miss notifications in some situations, below are some:

  • one was busy at something else when receiving a important fail notification, and when he came back the alert was just forgotten, as you know, people always forget
  • in some times of emergency, one will receive many notifications in a very short period, unfortunately there are two different problems in these notifications, one is resloved, but other one was just neglected

In other notification systems using level trigger mode as I know, aliyun is using a silence duration to control repeatable alert.

So is it necessary for us to introduce a similar mechanism? What is your opinion? @haoel @proditis

@samanhappy samanhappy added the enhancement New feature or request label Mar 17, 2023
@haoel
Copy link
Contributor

haoel commented Mar 17, 2023

This is a good issue.

I thought this before I wrote the first line code of EaseProbe, edge trigger or level trigger really put me in a dilemma. And it never disappears from my mind. Thanks to bring this here.

for the level trigger, I have the following concerns

  • Most of the time, an accident could cause many services unavailable, and they all keep sending alerts, your phone or other notifiers would be flooded totally.

  • The silence duration would be magic work, nobody can set it right.

Maybe, we can use something like "exponential backoff". For example, only sending 5 alerts, the time between them can be deltaTime = 2*deltaTime (deltaTime can be initialized as probe interval)

@samanhappy
Copy link
Collaborator Author

I like the backoff strategy, it's commonly used for handling failed operations in Kubernetes, all we need is to choose a reasonable amount of time and a maximum number of retries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants