Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Systemd OnCalendar timers #919

Closed
ravench opened this issue Nov 29, 2023 · 8 comments
Closed

Support Systemd OnCalendar timers #919

ravench opened this issue Nov 29, 2023 · 8 comments

Comments

@ravench
Copy link

ravench commented Nov 29, 2023

The ability to time checks using cron syntax is great. However, we use systemd OnCalendar timers for all our timed jobs. It would be great, if healthchecks could parse those as well.

See this post for an explanation of the OnCalendar syntax.

The parser for this in systemd seems to be here. (in C)

There is also the systemd-analize tool, which has the ability to parse calendar settings.

@cuu508
Copy link
Member

cuu508 commented Dec 1, 2023

Thanks for the suggestion. I've been thinking about this for a long time as well! The main barrier is the absence of a python library for parsing and evaluating OnCalendar schedules. Forking out to systemd-analyze calendar ... would work, but would not be ideal in terms of security, performance, and portability.

No promises yet, but I've started work on a python library for this: https://github.com/cuu508/oncalendar/

cuu508 added a commit that referenced this issue Dec 6, 2023
@cuu508
Copy link
Member

cuu508 commented Dec 11, 2023

The initial implementation is ready, and deployed to https://healthchecks.io. All testing welcome :-)

Here's how the "Update Schedule" dialog looks:

image

@ravench
Copy link
Author

ravench commented Dec 12, 2023

Awesome, thanks. I've deployed 3.1-dev to our server and adjusted our scripts, seems to work well so far with a few dozen checks and various timer formats.

@ravench
Copy link
Author

ravench commented Dec 12, 2023

A little issue I ran in to involves grace time and RandomizeDelaySec:

We use fairly large random delays (up to 3600s) in our Jobs, since we have many that trigger at the same time and use the same resources. This means that using schedules instead of timeouts in Healthckecks, we constantly have Jobs that are in their grace period, waiting for the random delay of the systemd.timer to pass. We trigger the /start API call with ExecStartPre= in the service, so not really any way of triggering that call earlier.
It's sub-optimal, since this causes our project to become a bit of a christmas tree, but it doesn't cause any issues with false alerts, we just set hc_gracetime = randomizedelaysec + timeoutsec.

I wonder weather it would make some sense to add a 'green grace time', but that raises even more questions for me:
What is grace time actually intended for in a complete setup, using /start , /<exit-code and possibly log. I see three relevant durations in this context:

Schedule offset: The expected time between the scheduled and the effective start time of the job. (RandomizeDelaySec and AccuracySec in Systemd)
Duration: The maximum time between start and end of the job. (TimeoutSec in Systemd)
Grace Time: Time to delay warnings to catch unexpected delays.

I imagine handling these delays separately would be fairly complicated and it isn't a priority, since it's only really relevant for Systemd and gracetime can just be set accordingly.

@cuu508
Copy link
Member

cuu508 commented Dec 13, 2023

I wonder weather it would make some sense to add a 'green grace time'

I've thought about making icons gradually shift from green to orange as they progress through the grace window. But I'm not sure if this would be an improvement, you would still see non-pure-green statuses, and it may in the end look even more busy with many different shades of green/orange.

Grace time was originally (A) the time to delay alerts when a success ping does not arrive on time.

When I added support for the /start signal, I made the grace time to serve a double duty, and also (B) constrain the maximum time gap between the start and success signal. This means users cannot tune A and B separately, but this does not seem to be a big issue in the practice. And it avoids having another slider in the UI.

Grace time can also be used to account for random startup delay, and for the client system's clock being slightly off.

@ravench
Copy link
Author

ravench commented Dec 14, 2023

How about adding a configurable percentage threshold for the icon changing to yellow?
I agree that gradual color shift would probably be more confusing than helpful. But just being able to configure "only turn yellow if 40% of the grace time has elapsed" would probably cover most use- and edgecases.

Question regarding grace time: If I have a Job scheduled for 00:00, a grace time of 1h and pings at 00:30 and 01:29, what would the behavior be?

@cuu508
Copy link
Member

cuu508 commented Dec 14, 2023

But just being able to configure "only turn yellow if 40% of the grace time has elapsed" would probably cover most use- and edgecases.

But it would add a configuration setting, that would need to be tucked in the UI somewhere, and explained in the docs. My suggestion would be to think of the orange status icons not as an error condition, but as a sign a particular check will run soon. Same as with traffic lights where orange means "the light will change soon".

Question regarding grace time: If I have a Job scheduled for 00:00, a grace time of 1h and pings at 00:30 and 01:29, what would the behavior be?

Assuming the check is initially up,

  • at 00:00 the check's grace period will start, and the icon will change to orange
  • at 00:30, after receiving a ping, the icon will change back to green
  • after that, the next expected ping is at the next midnight. Any early pings (e.g. at 01:29) does not affect the status, it will stay green.

@ravench
Copy link
Author

ravench commented Dec 14, 2023

Ok thanks

@ravench ravench closed this as completed Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants