Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parents who are non-active should not be rescheduled #5375

Merged
merged 1 commit into from Sep 27, 2018
Merged

Parents who are non-active should not be rescheduled #5375

merged 1 commit into from Sep 27, 2018

Conversation

vanscheijen
Copy link

@vanscheijen vanscheijen commented Jun 22, 2017

Parents who are non-active should not be rescheduled.

This fix also prevents reschedule storms when there are lots of children for a parent. The parent will now only reschedule when its next check is too far in the future, this is determined by the retry interval of the parent.

fixes #5022

@dnsmichi dnsmichi changed the title Fix issue 5022 Parents who are non-active should not be rescheduled Sep 28, 2017
@dnsmichi dnsmichi added the area/checks Check execution and results label Sep 28, 2017
if (!parent->GetEnableActiveChecks())
continue;

if (parent->GetNextCheck() >= now + parent->GetRetryInterval()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel right. Any explanation why one would use retry_interval here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now there is no limitation to the amount of reschedules that can be forced to a parent by a failed child check. In some cases with about 20 children for one NRPE parent, it will go through its max_check_attempts within a few seconds.

Conceptually the minimum waiting period between checks is the retry_interval during normal scheduling(thus excluding a manual forced reschedule). Theoretically it is possible to have a lower check_interval than retry_interval, however I doubt anyone does that as it makes little sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I'd like to see this PR tested in various scenarios then, including test cases and excerpts on the next_check values from the API.

@dnsmichi dnsmichi added the needs feedback We'll only proceed once we hear from you again label Nov 2, 2017
@dnsmichi dnsmichi added the TBD To be defined - We aren't certain about this yet label Mar 20, 2018
@dnsmichi dnsmichi removed the TBD To be defined - We aren't certain about this yet label Sep 27, 2018
@dnsmichi
Copy link
Contributor

Coming back here, I wasn't sure about the logic with the retry_interval. Especially some changes made with 2.9 should help prevent false positive check scheduling inserts.

@dnsmichi
Copy link
Contributor

Discussed this with @dgoetz this week, and it likely affects the seen behaviour with #6629 (which talks about soft states only, thus retry_interval).

@dnsmichi dnsmichi added bug Something isn't working and removed needs feedback We'll only proceed once we hear from you again labels Sep 27, 2018
@dnsmichi dnsmichi added this to the 2.10.0 milestone Sep 27, 2018
@dnsmichi
Copy link
Contributor

Sorry about the late reply, sometimes things are a little more clear when life turns good. Thanks for your contribution :)

@dnsmichi dnsmichi merged commit bc1dc9c into Icinga:master Sep 27, 2018
@dnsmichi
Copy link
Contributor

Note to self: Don't use rebase and merge via button, this doesn't merge with --no-ff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checks Check execution and results bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dependencies may reschedule passive checks, triggering freshness checks
2 participants