Retain next_check schedule on restart (#224, #156) #259

jacobbaungard · 2018-09-10T11:07:20Z

This commit ensures that the next_check schedule for hosts and services
are retained on Naemon restart, given that use_retained_scheduling_info
is enabled.

The logic is as follows:

If use_retained_scheduling_info is disabled, set a random time (as
before)
If use_retained_schedule_info is enabled:
- If we didn't miss the check during the restart, retain the old
  next_check time
- If we missed one check, schedule the service/host within the next
  interval_length (usually 60 seconds)
- If we missed more than one check, schedule the next check randomly.

We schedule missed checks within 60 seconds, rather than immediately in
order to do some load balacing. This is also the rationale for
scheduling the check randomly, in case we missed more than one check
(this indicates Naemon has been down for a longer period of time).

This fixes:

Signed-off-by: Jacob Hansen jhansen@op5.com

sni · 2018-09-10T11:12:46Z

thats great news, thanks

This commit ensures that the next_check schedule for hosts and services are retained on Naemon restart, given that use_retained_scheduling_info is enabled. The logic is as follows: - If use_retained_scheduling_info is disabled, set a random time (as before) - If use_retained_schedule_info is enabled: - If we didn't miss the check during the restart, retain the old next_check time - If we missed one check, schedule the service/host within the next interval_length (usually 60 seconds) - If we missed more than one check, schedule the next check randomly. We schedule missed checks within 60 seconds, rather than immediately in order to do some load balacing. This is also the rationale for scheduling the check randomly, in case we missed more than one check (this indicates Naemon has been down for a longer period of time). This fixes: - naemon#224 - naemon#156 - MON-10720 (https://jira.op5.com/browse/MON-10720) Signed-off-by: Jacob Hansen <jhansen@op5.com>

This commit adds tests to ensure that the next_check is set correctly after Naemon restarts. This ensures the logic is from the previous commit is correctly followed. This fixes: - naemon#224 - naemon#156 - MON-10720 (https://jira.op5.com/browse/MON-10720) Signed-off-by: Jacob Hansen <jhansen@op5.com>

jacobbaungard · 2018-09-11T10:00:54Z

I think this is ready now.

Added tests
Fixed an issue from the previous commit (check_interval was in minutes instead of seconds as it should be).

roengstrom

Nice fix, looks good to me.

jacobbaungard · 2018-09-20T13:42:57Z

@nook24, @jvigna, how does the logic in this PR sound for you guys? You think it would solve the problems you have reported?

jvigna · 2018-09-26T11:38:53Z

Look very good to me!

jacobbaungard · 2018-09-27T07:19:03Z

I will go ahead and merge this, this afternoon unless there are any objections. (Edit: Okay didn't manage that, will do so next week. Any comments are still appreciated).

sni · 2018-09-28T13:38:43Z

i was on vacation... :-) no objections.

After naemon#259 we now keep the next_check schedule over restarts if use_retained_schedule_info is enabled. However after this patch, if one would lower the check_interval it was possible that after the restart, the next check of an object would be more than one check_interval away. This commit ensures that if the next_check is more than one check_interval away, then we randomly schedule the next check, instead of using the retention data. This fixed MON-11295 (https://jira.op5.com/browse/MON-11295) Signed-off-by: Jacob Hansen <jhansen@op5.com>

jacobbaungard added 2 commits September 11, 2018 09:58

jacobbaungard force-pushed the bugfix/MON-10720_retain-next-schedule branch from 98f96b0 to 402706b Compare September 11, 2018 09:59

roengstrom approved these changes Sep 18, 2018

View reviewed changes

jacobbaungard merged commit 183178c into naemon:master Oct 1, 2018

jacobbaungard deleted the bugfix/MON-10720_retain-next-schedule branch October 1, 2018 08:17

This was referenced Oct 1, 2018

Host- and Servicechecks maybe never executed #156

Closed

Naemon does not honor the flag use_retained_scheduling_info #224

Closed

jacobbaungard mentioned this pull request Oct 10, 2018

Always schedule next_check within check_interval #265

Merged

nook24 mentioned this pull request Dec 13, 2018

Introduce retained_scheduling_randomize_window #277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retain next_check schedule on restart (#224, #156) #259

Retain next_check schedule on restart (#224, #156) #259

jacobbaungard commented Sep 10, 2018 •

edited

sni commented Sep 10, 2018

jacobbaungard commented Sep 11, 2018

roengstrom left a comment

jacobbaungard commented Sep 20, 2018

jvigna commented Sep 26, 2018

jacobbaungard commented Sep 27, 2018 •

edited

sni commented Sep 28, 2018

Retain next_check schedule on restart (#224, #156) #259

Retain next_check schedule on restart (#224, #156) #259

Conversation

jacobbaungard commented Sep 10, 2018 • edited

sni commented Sep 10, 2018

jacobbaungard commented Sep 11, 2018

roengstrom left a comment

Choose a reason for hiding this comment

jacobbaungard commented Sep 20, 2018

jvigna commented Sep 26, 2018

jacobbaungard commented Sep 27, 2018 • edited

sni commented Sep 28, 2018

jacobbaungard commented Sep 10, 2018 •

edited

jacobbaungard commented Sep 27, 2018 •

edited