Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alert not firing loss pattern = >0%,*10*,>0% #388

Closed
FliesLikeABrick opened this issue Dec 5, 2023 · 2 comments
Closed

alert not firing loss pattern = >0%,*10*,>0% #388

FliesLikeABrick opened this issue Dec 5, 2023 · 2 comments

Comments

@FliesLikeABrick
Copy link
Contributor

I am not great with perl but have been trying to understand more of the alerting behavior for patterns such as
pattern = >0%,10,>0%
as-documented at
https://oss.oetiker.ch/smokeping/doc/smokeping_config.en.html

Here is a SSH probe target configured and some recent data from our testing:
image
Here is the probe config:
+SSH
binary = /usr/bin/ssh-keyscan
forks = 5
offset = 50%
step = 60
timeout = 5

The following variables can be overridden in each target section

keytype = rsa
pings = 5

and the alert:
+NETENG-L-BACKBONE-TCP-LOSS
type = loss
pattern = >0%,10,>0%
comment = Two failed connections in 10 polls
to = |/etc/smokeping/etc/smokedetector-no-merge
edgetrigger = yes

The target config is just the host name, none of the probe attributes are overridden.

Based on the graph above, we would have expected this to fire (we simulated failure on two polls within 10 minutes - maybe even 3 based on the 1/5 loss shown before one of the complete downs)

We were monitoring the system logs, alert script, and e-mail that should have shown this alert firing, but no such alert event was triggered. The alert does trigger if we have two 100% loss events in a row.

Can someone shed some further light on this alert pattern and why it is not firing for 2-3 loss events within 10 minutes, but does trigger for two complete loss events in a row?

@FliesLikeABrick
Copy link
Contributor Author

It looks like this did end up firing but a couple minutes later. I will do more testing and either provide clarifications or close this issue today

@FliesLikeABrick
Copy link
Contributor Author

Non-issue, confirmed after validating our test procedure and testing again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant