Unwanted Clock Drift #384

InteXX · 2024-04-05T19:07:25Z

Describe the bug
I'm seeing significant clock drift in an EveryMinute job—as much as a full second of delay in as little as seventy-two hours. I frequently encounter this entry in my website's application logs:

Coravel's scheduler is behind 1 ticks and is catching-up to the current tick

This often appears in groups of five to up to twelve occurrences, with millisecond resolution.

Affected Coravel Feature
Scheduling

Expected behaviour
I'd hoped to see the job fire at the zero second mark reliably.

Is there something that can be done to mitigate this problem?

The text was updated successfully, but these errors were encountered:

jamesmh · 2024-04-10T16:30:12Z

Coravel uses a Timer under the covers, which doesn't necessarily fire at exactly the right moment. This generally is affected by resources given to the respective process (e.g. CPU, memory pressure which affects CPU), how much load the process/system is under, etc.

This issue has existed ever since the ability to schedule seconds was introduced (something I originally didn't want to do due to complexities it introduces - like this). A few weeks ago a final fix for this issue was introduced.

So yes, there will be times - notably when scheduling to the second, when there will be drift. That's not something Coravel can control. There are actions you can take such as keeping schedule processing on a dedicated process, container, etc. to keep dedicated resources to scheduling. Or, make sure that given container/machine isn't limited.

For example, this commonly occurs in kubernetes pods that have a tiny amount of resources allocated to it. The process just doesn't have enough resources to do all the work it needs to do on a timely basis 🤷.

The difference now (with the fix) is that Coravel will "catch-up" if the Timer is triggered, but if there were missed intervals (usually one or a few seconds) then Coravel will play back all the missed times and run schedules that were due.

So at this point, my advice is to take a look at the resources on the process.

You can also look into trying out schedule workers on some of your heavier tasks to see if that helps?

InteXX · 2024-04-10T20:50:56Z

That makes sense, thanks for the detailed explanation.

In my case, at least, I was able to mitigate the problem by backing off of per-second resolution in my app's logic. So it's no longer an issue here, and I'll keep an eye out for it in the future.

And yes—this is an Azure App Service running on the Basic plan, so resource availability likely comes into play.

InteXX closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unwanted Clock Drift #384

Unwanted Clock Drift #384

InteXX commented Apr 5, 2024

jamesmh commented Apr 10, 2024 •

edited

Loading

InteXX commented Apr 10, 2024

Unwanted Clock Drift #384

Unwanted Clock Drift #384

Comments

InteXX commented Apr 5, 2024

jamesmh commented Apr 10, 2024 • edited Loading

InteXX commented Apr 10, 2024

jamesmh commented Apr 10, 2024 •

edited

Loading