Get notified if a job is run too often #17

heyman · 2015-12-16T15:51:20Z

Hey, thanks for a really neat project & service!

It would be nice to have the possibility to get notified if a job is run too often.

Unfortunately, I don't have the time to dig in and implement it myself, and I'm sorry in case you don't want feature requests as Github issues.

cuu508 · 2015-12-16T16:10:36Z

Thanks for suggestion!
I'm thinking of two scenarios where this might be useful:

Your cron job runs as it should (say, once a hour), but you've set up "Period" and "Grace" parameters incorrectly (say, once a day). It's not a huge issue except for when the check actually goes down, it will take longer (~23 hours) until you find out.
In this situation a warning in the dashboard and possibly in the monthly report would be sufficient.
Your "Period" and "Grace" parameters are set up correctly, but something changes on the cron side (the job is e.g. looping infinitely and pinging on each iteration). In this case we would want to be proactive and send a notification.

Which one did you have in mind? Any ideas on how to distinguish between the two?

heyman · 2015-12-16T16:32:38Z

I was thinking about the second scenario. It probably has to be something you can turn on/off for a check, since you still want to support checks for jobs that run at irregular times (but you still want to be notified if it hasn't run within X time).

Perhaps a good solution would be to have the ability to specify "minimum time between runs"?

heyman · 2015-12-16T16:50:02Z

The minimum time between runs could either be defaulted to 0s, in which case it wouldn't solve scenario 1. Or, it could be defaulted to something like 75% of Period, and - in scenario 1 - it would then notify the user that the check had been pinged too frequently, and the user could then change the setting.

I think I'm leaning towards defaulting it to 0s, since it doesn't feel like core-functionality, and it's probably quite likely that users who are trying out the service for the first time would get the warning-notification when they access the check URLs using a browser or Curl.

carlosfunk · 2023-08-28T21:17:45Z

Just jumping in here to say I'd love to see this as a feature, I've had a couple of "run-away" crons recently that for some reason get an error while running and starting looping forever.

moraj-turing · 2024-04-15T16:47:56Z

So I am looking into developing this feature, I think if the notification uses proper start/active endpoints plus rid it's failry simple to calculate if something took too little time. For now im thinking just setting up an additional parameter with the minimum time, but then this could become a statistical param where it alerts eg. if the check took under 50% of the usual time.

cuu508 · 2024-04-15T17:14:57Z

This feature request is from 2015. At the time we did not have "fail" signals (added in May 2018) or "start" signals (added in Dec 2018). I think these two help handle some variants of the infinitely looping job scenario:

when the job starts, it can send a /start signal
if the job hits an error and sends a /fail signal, and Healthchecks declares the job as down right away
if the job crashes and gets restarted, Healthchecks waits for the "success" signal, which never comes, and eventually declares the job as down

There's a third case: the completes successfully, but gets restarted right away. On the Healthchecks side this would look like a flood of either "start"/"success" signal pairs, or just a stream of "success" signals. We could add a configurable parameter for this: "if there is more than X success signals in time period Y, let the user know somehow". As usual there is a tradeoff between having more flexibility, and having a harder to use product. Right now, I do not want to go in this direction, seeing as there's been interest in this functionality from 3 persons in 8 years.

On the hosted service, healthchecks.io, I'm sometimes dealing with a slightly related problem when someone sticks a hc-ping.com call in a function that runs frequently on many servers. The service starts seeing 10, 50, 100 and more requests per second for a single UUID. To protect the database against this, on the hosted service there is rate-limiting at nginx level.

heyman · 2024-04-15T23:17:48Z

IIRC, what prompted me to create the feature request back in 2015 was that I found out that I had a duplicate instance of a backup job that was dumping a database to the same file, causing the data to be corrupted. Both jobs were pinging Healthchecks, so it could have been noticed earlier if this feature had been available.

Right now, I do not want to go in this direction, seeing as there's been interest in this functionality from 3 persons in 8 years.

Completely understandable :).

cuu508 added the enhancement label Dec 30, 2015

cuu508 added feature and removed feature request labels Aug 27, 2021

heyman changed the title ~~Get notified if a job is run to often~~ Get notified if a job is run too often Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get notified if a job is run too often #17

Get notified if a job is run too often #17

heyman commented Dec 16, 2015

cuu508 commented Dec 16, 2015

heyman commented Dec 16, 2015

heyman commented Dec 16, 2015

carlosfunk commented Aug 28, 2023

moraj-turing commented Apr 15, 2024

cuu508 commented Apr 15, 2024 •

edited

heyman commented Apr 15, 2024

Get notified if a job is run too often #17

Get notified if a job is run too often #17

Comments

heyman commented Dec 16, 2015

cuu508 commented Dec 16, 2015

heyman commented Dec 16, 2015

heyman commented Dec 16, 2015

carlosfunk commented Aug 28, 2023

moraj-turing commented Apr 15, 2024

cuu508 commented Apr 15, 2024 • edited

heyman commented Apr 15, 2024

cuu508 commented Apr 15, 2024 •

edited