Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get notified if a job is run too often #17

Open
heyman opened this issue Dec 16, 2015 · 7 comments
Open

Get notified if a job is run too often #17

heyman opened this issue Dec 16, 2015 · 7 comments
Labels

Comments

@heyman
Copy link

heyman commented Dec 16, 2015

Hey, thanks for a really neat project & service!

It would be nice to have the possibility to get notified if a job is run too often.

Unfortunately, I don't have the time to dig in and implement it myself, and I'm sorry in case you don't want feature requests as Github issues.

@cuu508
Copy link
Member

cuu508 commented Dec 16, 2015

Thanks for suggestion!
I'm thinking of two scenarios where this might be useful:

  1. Your cron job runs as it should (say, once a hour), but you've set up "Period" and "Grace" parameters incorrectly (say, once a day). It's not a huge issue except for when the check actually goes down, it will take longer (~23 hours) until you find out.
    In this situation a warning in the dashboard and possibly in the monthly report would be sufficient.
  2. Your "Period" and "Grace" parameters are set up correctly, but something changes on the cron side (the job is e.g. looping infinitely and pinging on each iteration). In this case we would want to be proactive and send a notification.

Which one did you have in mind? Any ideas on how to distinguish between the two?

@heyman
Copy link
Author

heyman commented Dec 16, 2015

I was thinking about the second scenario. It probably has to be something you can turn on/off for a check, since you still want to support checks for jobs that run at irregular times (but you still want to be notified if it hasn't run within X time).

Perhaps a good solution would be to have the ability to specify "minimum time between runs"?

@heyman
Copy link
Author

heyman commented Dec 16, 2015

The minimum time between runs could either be defaulted to 0s, in which case it wouldn't solve scenario 1. Or, it could be defaulted to something like 75% of Period, and - in scenario 1 - it would then notify the user that the check had been pinged too frequently, and the user could then change the setting.

I think I'm leaning towards defaulting it to 0s, since it doesn't feel like core-functionality, and it's probably quite likely that users who are trying out the service for the first time would get the warning-notification when they access the check URLs using a browser or Curl.

@carlosfunk
Copy link

Just jumping in here to say I'd love to see this as a feature, I've had a couple of "run-away" crons recently that for some reason get an error while running and starting looping forever.

@heyman heyman changed the title Get notified if a job is run to often Get notified if a job is run too often Aug 29, 2023
@moraj-turing
Copy link
Contributor

So I am looking into developing this feature, I think if the notification uses proper start/active endpoints plus rid it's failry simple to calculate if something took too little time. For now im thinking just setting up an additional parameter with the minimum time, but then this could become a statistical param where it alerts eg. if the check took under 50% of the usual time.

@cuu508
Copy link
Member

cuu508 commented Apr 15, 2024

This feature request is from 2015. At the time we did not have "fail" signals (added in May 2018) or "start" signals (added in Dec 2018). I think these two help handle some variants of the infinitely looping job scenario:

  • when the job starts, it can send a /start signal
  • if the job hits an error and sends a /fail signal, and Healthchecks declares the job as down right away
  • if the job crashes and gets restarted, Healthchecks waits for the "success" signal, which never comes, and eventually declares the job as down

There's a third case: the completes successfully, but gets restarted right away. On the Healthchecks side this would look like a flood of either "start"/"success" signal pairs, or just a stream of "success" signals. We could add a configurable parameter for this: "if there is more than X success signals in time period Y, let the user know somehow". As usual there is a tradeoff between having more flexibility, and having a harder to use product. Right now, I do not want to go in this direction, seeing as there's been interest in this functionality from 3 persons in 8 years.

On the hosted service, healthchecks.io, I'm sometimes dealing with a slightly related problem when someone sticks a hc-ping.com call in a function that runs frequently on many servers. The service starts seeing 10, 50, 100 and more requests per second for a single UUID. To protect the database against this, on the hosted service there is rate-limiting at nginx level.

@heyman
Copy link
Author

heyman commented Apr 15, 2024

IIRC, what prompted me to create the feature request back in 2015 was that I found out that I had a duplicate instance of a backup job that was dumping a database to the same file, causing the data to be corrupted. Both jobs were pinging Healthchecks, so it could have been noticed earlier if this feature had been available.

Right now, I do not want to go in this direction, seeing as there's been interest in this functionality from 3 persons in 8 years.

Completely understandable :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants