duplicate scheduling when running on multiple host #332

rvyas · 2021-03-08T18:17:24Z

This gem works fine under general use case. However, we have some jobs that take longer time (20-25 min) and increases the queue latency.

During such scenario, jobs are scheduled twice which causes other issues in the app. There is a section in the readme that warns about this but is there a way i can minimize duplication? like tweaking certain parameters or limiting certain things?

OR can this issue be fixed in the gem itself?

Current config:

4 hosts
4 processes (1 per host)
12 threads each process
5 queue
6 recurring jobs using cron at different times.
There can be >100 jobs scheduled in a queue at the same time.

bpo · 2021-03-08T19:04:05Z

@rvyas you might want to take a look at sidekiq-uniq

rvyas · 2021-03-08T19:23:48Z

Thanks @bpo. I will try this gem.

I was trying to do the same in the worker/checking other jobs in queue, etc, but there is some delay when those jobs show up in sidekiq APIs.

bpo · 2021-03-08T19:26:43Z

@rvyas I don't use that one myself, but I know others have had similar issues. Sidekiq Enterprise has built-in support for unique jobs, which would probably be your best choice if that's an option for you.

marcelolx · 2021-05-11T22:03:41Z

@rvyas Did you solved your problem?

rvyas · 2021-06-14T07:17:53Z

No, it did not solve the problem. Even with sidekiq-uniq I see the same duplicate scheduling although the rate has reduced.
May be I need to leverage built-in support. I will also give a shot to sidekiq-cron which seems to have similar feature as this gem.

mgoggin · 2021-07-13T20:15:00Z

@rvyas Were you able to solve your issue with sidekiq-cron?

marcelolx · 2021-07-13T20:21:24Z

@rvyas @mgoggin I didn't take a look at this gem https://github.com/mhenrixon/sidekiq-unique-jobs, but maybe it can help.

stale · 2021-10-02T01:46:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2022-01-09T02:54:06Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2022-04-16T09:32:30Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

marcelolx · 2022-04-28T20:54:58Z

ok I think one solution for those problems will be to use a Redis based Lock instead of rufus-scheduler default filelock

https://github.com/jmettraux/rufus-scheduler#advanced-lock-schemes

This would allow us to use something like https://github.com/kenn/redis-mutex to create a single lock to all schedulers running on different hosts which would ensure that only one scheduler would be able to schedule a job at a given time.

I'll investigate more and make some tests, this may land as an experimental feature in v4.1

bpo · 2022-05-01T18:07:58Z

@marcelolx this seems like a tricky thing for sidekiq-scheduler to handle well, as it's still always possible for the app to enqueue jobs externally from the scheduler.

marcelolx · 2022-05-02T13:28:11Z

@bpo Could you exemplify?

PS: Only thing we want is to avoid sidekiq-scheduler enqueueing duplicated jobs on multiple hosts

bpo · 2022-05-02T13:51:22Z

I think there may be 3 issues here:

A) Sidekiq-scheduler can enqueue a job when it's already running
B) Sidekiq can enqueue a job when it's already running
C) Sidekiq can start running a job when it's already started

The initial description of the issue here is all 3 of those: The overall system has performance problems because two copies of a job are running at once (C), caused by Sidekiq enqueuing two copies of a job (B), and sidekiq-scheduler handled the actual enqueueing (A).

A is clearly a subset of B, which makes me think that a more general plugin to Sidekiq might be the right way to solve the whole problem rather than adding the complexity within sidekiq-scheduler.

Only thing we want is to avoid sidekiq-scheduler enqueueing duplicated jobs on multiple hosts

I think this means you are saying "we only want to solve A". I think you're right that A is solvable within sidekiq-scheduler. I wanted to raise a caution about added complexity vs value (since B/C will remain unaddressed) but it sounds like you've already thought that through.

marcelolx · 2022-05-02T14:07:50Z

got it @bpo, that makes sense! I agree that adding too much complexity is not a path we want to follow and yeah, what I am looking for is to solve A, just to avoid sidekiq-scheduler enqueue the same job when running on multiple hosts.

eldemcan · 2022-08-12T12:29:19Z

I would like to share one of our solution to this problem, (we ended up with different solution because we have a lot of dynamically scheduled tasks)

class RedisLock
  include Singleton
  attr_reader :redis

  LOCK_NAME = "rufus:trigger"
  INSTANCE_IDENTIFIER =  Socket.gethostname || SecureRandom.hex(10)

  def lock
      acquired =
        Sidekiq.redis do |r|
          res = r.set(LOCK_NAME, INSTANCE_IDENTIFIER, nx: true, ex: 5.minutes.to_i)

          res == false ? r.get(LOCK_NAME) == INSTANCE_IDENTIFIER : res
        end

      acquired
    end
  end

  def unlock
    true
  end

  def remove_instance_from_queue
    removed = Sidekiq.redis { |r| r.del(LOCK_NAME) }
    Honeycomb.add_field("scheduler_removed", removed == 1)
  end
end


# Monkey patched Sidekiq::Scheduler since we don't have way to
# inject scheduler_lock
module SidekiqScheduler
  class Scheduler
    def initialize(options = {})
      self.enabled = options[:enabled]
      self.dynamic = options[:dynamic]
      self.dynamic_every = options[:dynamic_every]
      self.listened_queues_only = options[:listened_queues_only]
      self.rufus_scheduler_options = options[:rufus_scheduler_options] || {}

      self.rufus_scheduler_options =
        rufus_scheduler_options.merge({ scheduler_lock: RedisLock.instance })
    end
end

This will make sure only one host can schedule. Each host needs to create a thread to check if rufus:trigger is still set in redis and if not they can declare themselves as scheduler host if others haven't done that yet. Hope this helps.

marcelolx · 2022-08-12T13:28:12Z

@eldemcan You don't have to patch sidekiq-scheduler, it is just a matter of passing the scheduler_lock to the rufus_scheduler_options, no?

One question, what your team wants is a single scheduler running, right?

BTW, what happens if your server crashes and remove_instance_from_queue is not called to remove the lock? In the next deployment no server will be able to set up the scheduler, right?

Also, I suggest looking at how Square https://github.com/square/sqeduler have implemented its locking strategy using Redis which may be helpful (My plans are introduce support for that at some point, that you can opt-in, but first we want to solve #396)

eldemcan · 2022-08-13T15:24:40Z

@eldemcan You don't have to patch sidekiq-scheduler, it is just a matter of passing the scheduler_lock to the rufus_scheduler_options, no?

How can I pass options to rufus_scheduler instance inside SidekiqScheduler:Scheduler? SidekiqScheduler:Scheduler seems to be getting its options from yaml. Let me know if I am overlooking something

One question, what your team wants is a single scheduler running, right?

We want one instance to be able to schedule at a time but if that instance goes away another instance should be able to take that role.

BTW, what happens if your server crashes and remove_instance_from_queue is not called to remove the lock? In the next deployment no server will be able to set up the scheduler, right?

Key will expire in 5 min. (this could be shorter) So if remove_instance_from_queue is not called redis will remove the key. We could also create a thread that just keeps checking if key is set if not instance will set the key meaning they will declare themselves as scheduler amongs other hosts.

Also, I suggest looking at how Square https://github.com/square/sqeduler have implemented its locking strategy using Redis which may be helpful (My plans are introduce support for that at some point, that you can opt-in, but first we want to solve #396)

Thank you I will take a look into this

marcelolx · 2022-08-15T13:04:51Z

How can I pass options to rufus_scheduler instance inside SidekiqScheduler:Scheduler? SidekiqScheduler:Scheduler seems to be getting its options from yaml. Let me know if I am overlooking something

Doesn't this work? ⬇️

rufus_scheduler_options:
  scheduler_lock: <%= RedisLock.instance %>

We want one instance to be able to schedule at a time but if that instance goes away another instance should be able to take that role.

Then I think that you want to pass along a trigger_lock strategy, no? https://github.com/jmettraux/rufus-scheduler#trigger_lock
From the description of the scheduler_lock

The scheduler lock is an object that responds to #lock and #unlock. The scheduler calls #lock when starting up. If the answer is false, the scheduler stops its initialization work and won't schedule anything.

while the trigger_lock

The trigger lock in an object that responds to #lock. The scheduler calls that method on the job lock right before triggering any job. If the answer is false, the trigger doesn't happen, the job is not done (at least not in this scheduler).

So I think that what you're looking for is a trigger lock and not a scheduler lock....

Key will expire in 5 min. (this could be shorter) So if remove_instance_from_queue is not called redis will remove the key. We could also create a thread that just keeps checking if key is set if not instance will set the key meaning they will declare themselves as scheduler amongs other hosts.

🤦‍♂️ Didn't see the expire!

eldemcan · 2022-08-16T08:37:26Z

@marcelolx thanks for engagement,

Doesn't this work? ⬇️

It doesn't work because yml is loaded before the class itself so it throws undefined class error, maybe there is a trick to overcome this problem.

Then I think that you want to pass along a trigger_lock strategy, no?

We tried trigger_lock as well but because of jmettraux/rufus-scheduler#130 there were jobs duplicated. (we believe)

marcelolx · 2022-08-16T12:51:10Z

Thanks for the feedback @eldemcan, I'll keep an eye on that issue.

And once I work on supporting a redis lock strategy I'll see if we can allow users to set a trigger_lock/scheduler_lock without having to patch sidekiq-scheduler!

mrjonesbot · 2022-10-05T18:31:16Z

I recently experienced this same issue, although things had been working correctly despite having multiple hosts.

This occurred after I adjusted my sidekiq.yml file, where I removed a scheduled job, then changed the scheduled times of 3 others.

Overall, I deployed a wide array of changes to my scheduled jobs, so I'm wondering if this has anything to due with the scheduler queuing duplicates?

Perhaps each change + deploy to the scheduler populates additional scheduled jobs without clearing out the old ones?

bibendi · 2022-10-09T15:07:06Z

And once I work on supporting a redis lock strategy

I've made one for myself. You may be interested :)
bibendi/schked#31

stale bot added the stale The issue or PR has been inactive label Oct 2, 2021

marcelolx removed the stale The issue or PR has been inactive label Oct 2, 2021

stale bot added the stale The issue or PR has been inactive label Jan 9, 2022

marcelolx removed the stale The issue or PR has been inactive label Jan 9, 2022

chloerei mentioned this issue Apr 4, 2022

Looking for a new maintainer sidekiq-cron/sidekiq-cron#315

Closed

stale bot added the stale The issue or PR has been inactive label Apr 16, 2022

marcelolx removed the stale The issue or PR has been inactive label Apr 16, 2022

marcelolx added this to the v4.1 milestone Apr 28, 2022

marcelolx mentioned this issue Jul 1, 2022

Advice: Scheduling with intervals #387

Closed

marcelolx mentioned this issue Jul 23, 2022

Before enqueueing the job register it with the exact time it was supposed to run #396

Draft

3 tasks

mrjonesbot mentioned this issue Nov 5, 2022

Schedule did not run as expected bibendi/schked#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duplicate scheduling when running on multiple host #332

duplicate scheduling when running on multiple host #332

rvyas commented Mar 8, 2021

bpo commented Mar 8, 2021

rvyas commented Mar 8, 2021

bpo commented Mar 8, 2021

marcelolx commented May 11, 2021

rvyas commented Jun 14, 2021 •

edited

mgoggin commented Jul 13, 2021 •

edited

marcelolx commented Jul 13, 2021

stale bot commented Oct 2, 2021

stale bot commented Jan 9, 2022

stale bot commented Apr 16, 2022

marcelolx commented Apr 28, 2022

bpo commented May 1, 2022

marcelolx commented May 2, 2022 •

edited

bpo commented May 2, 2022

marcelolx commented May 2, 2022

eldemcan commented Aug 12, 2022

marcelolx commented Aug 12, 2022

eldemcan commented Aug 13, 2022

marcelolx commented Aug 15, 2022

eldemcan commented Aug 16, 2022

marcelolx commented Aug 16, 2022

mrjonesbot commented Oct 5, 2022

bibendi commented Oct 9, 2022 •

edited

duplicate scheduling when running on multiple host #332

duplicate scheduling when running on multiple host #332

Comments

rvyas commented Mar 8, 2021

bpo commented Mar 8, 2021

rvyas commented Mar 8, 2021

bpo commented Mar 8, 2021

marcelolx commented May 11, 2021

rvyas commented Jun 14, 2021 • edited

mgoggin commented Jul 13, 2021 • edited

marcelolx commented Jul 13, 2021

stale bot commented Oct 2, 2021

stale bot commented Jan 9, 2022

stale bot commented Apr 16, 2022

marcelolx commented Apr 28, 2022

bpo commented May 1, 2022

marcelolx commented May 2, 2022 • edited

bpo commented May 2, 2022

marcelolx commented May 2, 2022

eldemcan commented Aug 12, 2022

marcelolx commented Aug 12, 2022

eldemcan commented Aug 13, 2022

marcelolx commented Aug 15, 2022

eldemcan commented Aug 16, 2022

marcelolx commented Aug 16, 2022

mrjonesbot commented Oct 5, 2022

bibendi commented Oct 9, 2022 • edited

rvyas commented Jun 14, 2021 •

edited

mgoggin commented Jul 13, 2021 •

edited

marcelolx commented May 2, 2022 •

edited

bibendi commented Oct 9, 2022 •

edited