New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parents who are non-active should not be rescheduled #5375
Conversation
if (!parent->GetEnableActiveChecks()) | ||
continue; | ||
|
||
if (parent->GetNextCheck() >= now + parent->GetRetryInterval()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't feel right. Any explanation why one would use retry_interval here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now there is no limitation to the amount of reschedules that can be forced to a parent by a failed child check. In some cases with about 20 children for one NRPE parent, it will go through its max_check_attempts within a few seconds.
Conceptually the minimum waiting period between checks is the retry_interval during normal scheduling(thus excluding a manual forced reschedule). Theoretically it is possible to have a lower check_interval than retry_interval, however I doubt anyone does that as it makes little sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I'd like to see this PR tested in various scenarios then, including test cases and excerpts on the next_check values from the API.
Coming back here, I wasn't sure about the logic with the retry_interval. Especially some changes made with 2.9 should help prevent false positive check scheduling inserts. |
Sorry about the late reply, sometimes things are a little more clear when life turns good. Thanks for your contribution :) |
Note to self: Don't use |
Parents who are non-active should not be rescheduled.
This fix also prevents reschedule storms when there are lots of children for a parent. The parent will now only reschedule when its next check is too far in the future, this is determined by the retry interval of the parent.
fixes #5022