Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted Clock Drift #384

Closed
InteXX opened this issue Apr 5, 2024 · 2 comments
Closed

Unwanted Clock Drift #384

InteXX opened this issue Apr 5, 2024 · 2 comments

Comments

@InteXX
Copy link

InteXX commented Apr 5, 2024

Describe the bug
I'm seeing significant clock drift in an EveryMinute job—as much as a full second of delay in as little as seventy-two hours. I frequently encounter this entry in my website's application logs:

Coravel's scheduler is behind 1 ticks and is catching-up to the current tick

This often appears in groups of five to up to twelve occurrences, with millisecond resolution.

Affected Coravel Feature
Scheduling

Expected behaviour
I'd hoped to see the job fire at the zero second mark reliably.

Is there something that can be done to mitigate this problem?

@jamesmh
Copy link
Owner

jamesmh commented Apr 10, 2024

Coravel uses a Timer under the covers, which doesn't necessarily fire at exactly the right moment. This generally is affected by resources given to the respective process (e.g. CPU, memory pressure which affects CPU), how much load the process/system is under, etc.

This issue has existed ever since the ability to schedule seconds was introduced (something I originally didn't want to do due to complexities it introduces - like this). A few weeks ago a final fix for this issue was introduced.

So yes, there will be times - notably when scheduling to the second, when there will be drift. That's not something Coravel can control. There are actions you can take such as keeping schedule processing on a dedicated process, container, etc. to keep dedicated resources to scheduling. Or, make sure that given container/machine isn't limited.

For example, this commonly occurs in kubernetes pods that have a tiny amount of resources allocated to it. The process just doesn't have enough resources to do all the work it needs to do on a timely basis 🤷.

The difference now (with the fix) is that Coravel will "catch-up" if the Timer is triggered, but if there were missed intervals (usually one or a few seconds) then Coravel will play back all the missed times and run schedules that were due.

So at this point, my advice is to take a look at the resources on the process.

You can also look into trying out schedule workers on some of your heavier tasks to see if that helps?

@InteXX
Copy link
Author

InteXX commented Apr 10, 2024

That makes sense, thanks for the detailed explanation.

In my case, at least, I was able to mitigate the problem by backing off of per-second resolution in my app's logic. So it's no longer an issue here, and I'll keep an eye out for it in the future.

And yes—this is an Azure App Service running on the Basic plan, so resource availability likely comes into play.

@InteXX InteXX closed this as completed Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants