New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain next_check schedule on restart (#224, #156) #259
Retain next_check schedule on restart (#224, #156) #259
Conversation
thats great news, thanks |
This commit ensures that the next_check schedule for hosts and services are retained on Naemon restart, given that use_retained_scheduling_info is enabled. The logic is as follows: - If use_retained_scheduling_info is disabled, set a random time (as before) - If use_retained_schedule_info is enabled: - If we didn't miss the check during the restart, retain the old next_check time - If we missed one check, schedule the service/host within the next interval_length (usually 60 seconds) - If we missed more than one check, schedule the next check randomly. We schedule missed checks within 60 seconds, rather than immediately in order to do some load balacing. This is also the rationale for scheduling the check randomly, in case we missed more than one check (this indicates Naemon has been down for a longer period of time). This fixes: - naemon#224 - naemon#156 - MON-10720 (https://jira.op5.com/browse/MON-10720) Signed-off-by: Jacob Hansen <jhansen@op5.com>
This commit adds tests to ensure that the next_check is set correctly after Naemon restarts. This ensures the logic is from the previous commit is correctly followed. This fixes: - naemon#224 - naemon#156 - MON-10720 (https://jira.op5.com/browse/MON-10720) Signed-off-by: Jacob Hansen <jhansen@op5.com>
98f96b0
to
402706b
Compare
I think this is ready now.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix, looks good to me.
Look very good to me! |
I will go ahead and merge this, this afternoon unless there are any objections. (Edit: Okay didn't manage that, will do so next week. Any comments are still appreciated). |
i was on vacation... :-) no objections. |
After naemon#259 we now keep the next_check schedule over restarts if use_retained_schedule_info is enabled. However after this patch, if one would lower the check_interval it was possible that after the restart, the next check of an object would be more than one check_interval away. This commit ensures that if the next_check is more than one check_interval away, then we randomly schedule the next check, instead of using the retention data. This fixed MON-11295 (https://jira.op5.com/browse/MON-11295) Signed-off-by: Jacob Hansen <jhansen@op5.com>
After naemon#259 we now keep the next_check schedule over restarts if use_retained_schedule_info is enabled. However after this patch, if one would lower the check_interval it was possible that after the restart, the next check of an object would be more than one check_interval away. This commit ensures that if the next_check is more than one check_interval away, then we randomly schedule the next check, instead of using the retention data. This fixed MON-11295 (https://jira.op5.com/browse/MON-11295) Signed-off-by: Jacob Hansen <jhansen@op5.com>
After naemon#259 we now keep the next_check schedule over restarts if use_retained_schedule_info is enabled. However after this patch, if one would lower the check_interval it was possible that after the restart, the next check of an object would be more than one check_interval away. This commit ensures that if the next_check is more than one check_interval away, then we randomly schedule the next check, instead of using the retention data. This fixed MON-11295 (https://jira.op5.com/browse/MON-11295) Signed-off-by: Jacob Hansen <jhansen@op5.com>
This commit ensures that the next_check schedule for hosts and services
are retained on Naemon restart, given that use_retained_scheduling_info
is enabled.
The logic is as follows:
before)
next_check time
interval_length (usually 60 seconds)
We schedule missed checks within 60 seconds, rather than immediately in
order to do some load balacing. This is also the rationale for
scheduling the check randomly, in case we missed more than one check
(this indicates Naemon has been down for a longer period of time).
This fixes:
Signed-off-by: Jacob Hansen jhansen@op5.com