until_and_while_executing and lock_ttl: jobs silently dropped #788

fernandomm · 2023-06-06T09:56:32Z

Describe the bug
When using lock_ttl and until_and_while_executing locks, jobs are silently dropped if lock expires before the job can be processed.

Expected behavior
It should allow jobs to be processed, even if their uniqueness can no longer be ensured.

Current behavior
Jobs are dropped here.

Worker class

SidekiqUniqueJobs.reflect do |on|
  on.unlock_failed do |job|
    puts "unlock failed #{job.inspect}"
  end
end

class SampleJob
  include Sidekiq::Worker
  
  # Low ttl to make it faster to reproduce the problem
  sidekiq_options lock_ttl: 5.seconds, lock: :until_and_while_executing, on_conflict: { client: :log, server: :reschedule }

  def perform
    puts "Performing"
    sleep 10.seconds
    puts "Done"
  end
end

Run:

SampleJob.perform_async
=> "8f8f9d6c06907277113bbfc0"

Wait 5 seconds before starting a sidekiq worker. When it starts processing, you will get this:

2023-06-06T09:46:38.605Z pid=14404 tid=x9ave8dnk INFO: Starting processing, hit Ctrl-C to stop
2023-06-06T09:46:38.607Z pid=14404 tid=x9awjx00s class=SampleJob jid=8f8f9d6c06907277113bbfc0 INFO: start
unlock failed {"retry"=>3, "queue"=>"default", "lock_ttl"=>5, "lock"=>"until_and_while_executing", "on_conflict"=>{"client"=>"log", "server"=>"reschedule"}, "args"=>[], "class"=>"SampleJob", "jid"=>"8f8f9d6c06907277113bbfc0", "created_at"=>1686044788.863775, "lock_timeout"=>0, "lock_prefix"=>"uniquejobs", "lock_args"=>[], "lock_digest"=>"uniquejobs:36ac16731f732f365a70a02cbb3e7b70", "enqueued_at"=>1686044788.882391}
2023-06-06T09:46:38.695Z pid=14404 tid=x9awjx00s class=SampleJob jid=8f8f9d6c06907277113bbfc0 elapsed=0.088 INFO: done

If the job is processed before the lock expires, it works as expected:

SampleJob.perform_async
=> "f7f7b5477b5cb9a8457e8d2c"

2023-06-06T09:46:43.703Z pid=14404 tid=x9awjx00s class=SampleJob jid=f7f7b5477b5cb9a8457e8d2c INFO: start
Performing
Done
2023-06-06T09:46:53.893Z pid=14404 tid=x9awjx00s class=SampleJob jid=f7f7b5477b5cb9a8457e8d2c elapsed=10.191 INFO: done

At this moment the only work around that I found was:

SidekiqUniqueJobs.reflect do |on|
  on.unlock_failed do |job|
     job['class'].constantize.set(queue: job['queue']).perform_async(*job['args'])
  end
end

The text was updated successfully, but these errors were encountered:

mhenrixon · 2023-06-06T10:19:00Z

I get your problem but if I allow that then there is no meaning of having the gem. I mean, this gem is about preventing duplicate jobs.

Given a TTL there is simply no way to sort this out when the TTL expires the lock does too.

If you want to make sure no job is lost I would recommend removing the TTL, delegating to the reaper for cleanup and using a conflict resolution to ensure the job isn't ignored.

fernandomm · 2023-06-06T11:46:29Z

But for the until_and_while_executing lock, isn't it better to allow the job to run and only drop it ( or apply the defined conflict strategy ) if there is another job running?

I mean, the lock for this job expired during the "until_executing" phase. I guess that trying to run it won't cause problems because the "while_executing" part will ensure that it's unique.

If you want to make sure no job is lost I would recommend removing the TTL, delegating to the reaper for cleanup and using a conflict resolution to ensure the job isn't ignored.

We previously tried that, but as least for our use case with thousands of jobs enqueued/scheduled, the reaper is too slow to be effective. When we force it to run by adjusting timeouts/count, it takes 12+ hours to check all keys.

mhenrixon · 2024-02-07T10:43:55Z

This will be improved with reaper performance by only checking digests/locks known to have a chance of expiring.

It was brought to my attention that Sidekiq only checks in every 10 seconds, and during those seconds, jobs might appear to be missing, and I clean up active locks.

This should be fixed in the main branch, but I need help testing my assumptions. #830, in particular, was affecting you @fernandomm.

mhenrixon closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

until_and_while_executing and lock_ttl: jobs silently dropped #788

until_and_while_executing and lock_ttl: jobs silently dropped #788

fernandomm commented Jun 6, 2023

mhenrixon commented Jun 6, 2023

fernandomm commented Jun 6, 2023

mhenrixon commented Feb 7, 2024

until_and_while_executing and lock_ttl: jobs silently dropped #788

until_and_while_executing and lock_ttl: jobs silently dropped #788

Comments

fernandomm commented Jun 6, 2023

mhenrixon commented Jun 6, 2023

fernandomm commented Jun 6, 2023

mhenrixon commented Feb 7, 2024