Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the jobs actual scheduled time #276

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

afrase
Copy link

@afrase afrase commented Apr 13, 2019

Rufus calls the block with the current time instead of the time the job is scheduled to run. If you have multiple hosts, it's possible that Rufus will run the block at slightly different times. This results in a job getting ran multiple times.

This figures out when the job was scheduled to run instead of the time rufus sends. Doing this will fix #181

Rufus calls back into sidekiq-scheduler with time being now instead of when the
 job is supposed to run.
The Rufus and Fugit libraries use EtOrbi::EoTime which isn't compatible with
ActiveSupports patches to Time.
@coveralls
Copy link

coveralls commented Apr 13, 2019

Coverage Status

Coverage decreased (-0.1%) to 97.869% when pulling 30e658f on rvshare:feature/use-scheduled-cron-time into 41990aa on moove-it:master.

@FabienChaynes
Copy link

Hi,

We faced the same issue with a job using the "cron" type. The job was executed on several hosts because the SidekiqScheduler::RedisManager.register_job_instance method called from SidekiqScheduler::Scheduler#idempotent_job_enqueue uses a time coming from rufus-scheduler which is actually "the time when the job got cleared for triggering". If there's a one second delay before clearing the job for triggering on one host but not the other, Redis#zadd will be called for the same key with a different value, which will update the set and return true, thus executing the job.

From what I understand, the fix from this PR would solve this issue because instead of using the time when the job got cleared for triggering (which can be different), we'd use the time where the job was scheduled to be run at (which would be the same).
We can even see in Rufus::Scheduler::Job that previous_time is defined for all kind of jobs and is set to the previous "next trigger time" when trigger is called, which would allow this PR to use this value for all kind of jobs and not just for crons. There might be an issue in Rufus::Scheduler::RepeatJob#trigger because in some cases previous_time wouldn't be set there, but it can be improved.

Do you see any flaw in this logic? Is something preventing the merge, and if so, would you be willing to merge it if we fix the points you deem necessary?

Thanks for considering this request 🙂

@marcelolx
Copy link
Member

@afrase Are you still interested in working on this?

@marcelolx
Copy link
Member

hey @afrase @FabienChaynes I plan to add this fix to the V4 release, I think we won't be able to merge these changes due to the conflicts.

I will open a new PR with @afrase as Co-Author or @afrase if you are willing to fix the conflicts of this PR, go ahead.

@FabienChaynes Let me know if you're still waiting for this issue to be fixed, it would be good to have someone to test it in production once merged before we make the official release of v4

@FabienChaynes
Copy link

Hi @marcelolx and thanks for the follow up.

We made some changes to reduce the number of jobs scheduled at the same time (thus decreasing the load) and never encountered the problem again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use rufus job's schedule time to determine if a specific job instance has ran before
4 participants