New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CheckCommand 'icinga' seems to ignore retry interval via command_endpoint #6603
Comments
Can you try to connect to the API event streams and verify that the check result is received multiple times a second? It looks strange from your Icinga Web 2 screenshot. https://www.icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#checks-are-not-executed (at the bottom of the section). |
Check result is indeed received multiple times a second. This is output from event stream:
|
Hmmm, so the checks are actually executed on the satellites themselves. Which zone is affected here from your configuration zones.conf? |
This would be the zone 'zone3' from the zones.conf |
Ok, thanks. Still strange .. One last question - does this happen to only the |
Yes, it happens only to |
I've just found out that this issue occurs only when icinga2 agent on host is not running -> unknown state. If icinga2 is running and has other problem (in my case problem with reload: |
I have a similar problem but with the cluster-zone command: |
Are you using dependencies by chance? |
Yes, I do. Here it is
|
So, when the check turns initially fails, the parents defined by the dependency run an immediate re-check to quicker know about reachability the next time the service is checked. That's what you see within Icinga Web 2, for anything you'll define as parent, e.g. icinga or cluster-zone checks. I guess it is the same as with #5022 and #5375. |
Yes, you are right. Everything works as expected after removing the dependencies. |
2.10 contains a PR which should fix this behaviour. |
CheckCommand
icinga
ignores soft states when problem occurs. It jumps right into the hard state and send notification. Perhaps theicinga
command ignoresretry_interval
parameter (in my case 30s) of service object as I can see in Web UI all attempts in just two seconds, see image below:Expected Behavior
CheckCommand
icinga
should respect parametersretry_interval
andmax_check_attempts
of service objectContext
I'm using 3-level architecture with HA master zone and 3 child zones. Issue happens on all 3 levels (masters, satelites, clients). Runtime
service
object:Your Environment
icinga2 --version
): r2.9.1-1icinga2 feature list
): api checker graphite ido-pgsql mainlog notificationicinga2 feature list
): api checker mainlogicinga2 feature list
): api mainlogicinga2 daemon -C
):zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes.The text was updated successfully, but these errors were encountered: