Unique based on queue contents only and unique arg specification #3471

buddhistpirate · 2017-05-10T13:30:54Z

We are users of Sidekiq Enterprise, however for dealing with job uniqueness we use sidekiq-unique-jobs. That gem has a bit of extra functionality than we need. Also its documentation does not seem to match up with the behavior that I have tested. Overall it is an extra dependency I'd rather not have.

I did a spike moving our codebase over to use sidekiq-ent/unique. Out of the 100+ jobs we have defined, 7 of them have uniqueness constraints. Of those 7, 3 of them have additional requirements that sidekiq-ent/unique does not offer.

Unlock the job when the job begins executing, not when it is finished
Specify a block to determine the uniqueness value

One of the attractions of using sidekiq-ent/unique is the simplicity/size of the code. It looks straightforward for how to add this functionality. However since there are no extension points, we would have to copy the file and maintain it ourselves.

Would anyone else find this useful?

mperham · 2017-05-10T23:01:02Z

In my experience, the unique feature is quite often abused to wallpaper over poorly implemented application features, e.g. users double click a form button, leading to duplicate job enqueues, because the app developer has broken Rails's UJS and its form duplication protection. This is one reason why I took so long to implement this feature in Sidekiq and why it isn't as feature-rich as some of the OSS alternatives. I want to make sure these features aren't just wallpaper and have valid use cases.

Can you give me an example of Worker(s) that need those two features and why? I want to understand the business use case and the exact semantic needed.

buddhistpirate · 2017-05-11T17:49:46Z

In regards to unlocking at the beginning of execution, instead of at the end:

2 of our workers are for syncing data from our main app/database to A) Solr and B) Read only frontend system

In both cases, Rails hooks queue up a job that contains the ID of the record to be synchronized. It is plausible for a user to make 2 different valid requests that would therefore queue up two jobs or for 2 users to update the same record in a small amount of time. If the queue is backed up enough (let's say a few seconds), then we will have 2 entries for the same document in the queue. Since the queue message is just the id of the record, not the change itself, it is unnecessary to have both entries in the queue and to have the record synchronized twice.

With the current sidekiq-ent/uniq implementation, there is a race condition for this use-case. It is possible that in the time between the record being read from the database, being sent to the downstream system and the job being unlocked, that another change to the source record has happened. In this case, we would not queue up another job, and the downstream system would be out of date.

For this use-case, we'd much rather have the same version of the record being sent downstream multiple times, then miss an update. If the job was unlocked when it started, instead of when it finished, we would avoid this.

mperham · 2017-05-11T17:53:42Z

Yep, there is a race there. Great point. 👍

buddhistpirate · 2017-05-11T19:30:15Z

In regards to calculating the uniqueness value:

1 of our workers is for sending a follow up email, however the argument is the id of a job application which belong to a person.

(Definition for job application: The form a person fills out to apply for a position or role at a company. The overlap in domain terms between software background processing and hiring human beings to fill a role is unfortunate)

The same person can submit multiple job applications for different jobs. However we only want to send this follow up email to the person once. Daily we enqueue the list of job applications that need follow up reminders. What we would like to do is not enqueue a job if one exists for the same person already.

We already have code to check to see if the email has already been sent, when the job is executing However there can be a race condition if 2 workers get applications belonging to the same user. We could implement an external lock, or we could prevent multiple jobs for the same person from being enqueued.

We could also enqueue a job per person, instead of per job application, however we need to know which job application we've decided to send the email for. We could recalculate that on the worker as well, but again it seemed easier to just not enqueue multiple jobs for the same person.

mperham · 2017-05-12T21:40:31Z

I think the later use case is better solved by only creating one job per user. With SQL, something like a GROUP BY clause, can be used in a daily cron job:

select users.id, max(job_apps.id) from users join job_apps where ... group by users.id

mperham · 2017-05-17T20:09:43Z

I'm mulling over the implementation of your first request, optionally unlock before execution to avoid data loss due to race condition, and there are a few things to know.

Unlock before job execution means two of the exact same job can be executing concurrently.
If a job raises an exception, what should Sidekiq do? Retry without taking the lock again? Relock and then retry? What happens if the lock attempt fails due to another copy being enqueued?

Right now I'm assuming this impl:

begin
  unlock
  execute_job
rescue
  lock # blindly lock, ignore failure
  raise # will create a retry
end

WDYT?

buddhistpirate · 2017-05-18T19:02:16Z

Two of the same job executing at the same time isn't a problem really. If we reindex the record multiple times in some edge cases that is fine, the alternative of not indexing at all is the problem.
If I enable queue uniqueness only then I would assume if the job failed it will not relock. I think the simplicity of just thinking about the job in the original queue makes the feature more useful Personally I think even in the current implementation of uniqueness I would have the unlock happen regardless of success.

mperham · 2017-05-18T20:42:57Z

The new support for unique_until: :start will be in 1.6.0, no release date as of yet though.

https://github.com/mperham/sidekiq/wiki/Ent-Unique-Jobs#unlock-policy

buddhistpirate · 2017-05-18T20:57:05Z

Thanks @mperham !

mperham closed this as completed May 18, 2017

colszowka mentioned this issue Jun 8, 2017

Fixes a typo in release notes #3508

Merged

tjohnell-handy mentioned this issue Feb 13, 2018

Sidekiq Unique Redis Error #3753

Closed

97jaz mentioned this issue May 8, 2019

Feature request for enterprise unique jobs #4164

Closed

asross mentioned this issue Dec 9, 2022

Allow one running job but at most one job in the queue mhenrixon/sidekiq-unique-jobs#748

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique based on queue contents only and unique arg specification #3471

Unique based on queue contents only and unique arg specification #3471

buddhistpirate commented May 10, 2017

mperham commented May 10, 2017

buddhistpirate commented May 11, 2017

mperham commented May 11, 2017

buddhistpirate commented May 11, 2017

mperham commented May 12, 2017 •

edited

Loading

mperham commented May 17, 2017

buddhistpirate commented May 18, 2017

mperham commented May 18, 2017

buddhistpirate commented May 18, 2017

Unique based on queue contents only and unique arg specification #3471

Unique based on queue contents only and unique arg specification #3471

Comments

buddhistpirate commented May 10, 2017

mperham commented May 10, 2017

buddhistpirate commented May 11, 2017

mperham commented May 11, 2017

buddhistpirate commented May 11, 2017

mperham commented May 12, 2017 • edited Loading

mperham commented May 17, 2017

buddhistpirate commented May 18, 2017

mperham commented May 18, 2017

buddhistpirate commented May 18, 2017

mperham commented May 12, 2017 •

edited

Loading