-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unique based on queue contents only and unique arg specification #3471
Comments
In my experience, the unique feature is quite often abused to wallpaper over poorly implemented application features, e.g. users double click a form button, leading to duplicate job enqueues, because the app developer has broken Rails's UJS and its form duplication protection. This is one reason why I took so long to implement this feature in Sidekiq and why it isn't as feature-rich as some of the OSS alternatives. I want to make sure these features aren't just wallpaper and have valid use cases. Can you give me an example of Worker(s) that need those two features and why? I want to understand the business use case and the exact semantic needed. |
In regards to unlocking at the beginning of execution, instead of at the end: 2 of our workers are for syncing data from our main app/database to A) Solr and B) Read only frontend system In both cases, Rails hooks queue up a job that contains the ID of the record to be synchronized. It is plausible for a user to make 2 different valid requests that would therefore queue up two jobs or for 2 users to update the same record in a small amount of time. If the queue is backed up enough (let's say a few seconds), then we will have 2 entries for the same document in the queue. Since the queue message is just the id of the record, not the change itself, it is unnecessary to have both entries in the queue and to have the record synchronized twice. With the current sidekiq-ent/uniq implementation, there is a race condition for this use-case. It is possible that in the time between the record being read from the database, being sent to the downstream system and the job being unlocked, that another change to the source record has happened. In this case, we would not queue up another job, and the downstream system would be out of date. For this use-case, we'd much rather have the same version of the record being sent downstream multiple times, then miss an update. If the job was unlocked when it started, instead of when it finished, we would avoid this. |
Yep, there is a race there. Great point. 👍 |
In regards to calculating the uniqueness value: 1 of our workers is for sending a follow up email, however the argument is the id of a job application which belong to a person. (Definition for job application: The form a person fills out to apply for a position or role at a company. The overlap in domain terms between software background processing and hiring human beings to fill a role is unfortunate) The same person can submit multiple job applications for different jobs. However we only want to send this follow up email to the person once. Daily we enqueue the list of job applications that need follow up reminders. What we would like to do is not enqueue a job if one exists for the same person already. We already have code to check to see if the email has already been sent, when the job is executing However there can be a race condition if 2 workers get applications belonging to the same user. We could implement an external lock, or we could prevent multiple jobs for the same person from being enqueued. We could also enqueue a job per person, instead of per job application, however we need to know which job application we've decided to send the email for. We could recalculate that on the worker as well, but again it seemed easier to just not enqueue multiple jobs for the same person. |
I think the later use case is better solved by only creating one job per user. With SQL, something like a GROUP BY clause, can be used in a daily cron job:
|
I'm mulling over the implementation of your first request, optionally unlock before execution to avoid data loss due to race condition, and there are a few things to know.
Right now I'm assuming this impl: begin
unlock
execute_job
rescue
lock # blindly lock, ignore failure
raise # will create a retry
end WDYT? |
|
The new support for https://github.com/mperham/sidekiq/wiki/Ent-Unique-Jobs#unlock-policy |
Thanks @mperham ! |
We are users of Sidekiq Enterprise, however for dealing with job uniqueness we use sidekiq-unique-jobs. That gem has a bit of extra functionality than we need. Also its documentation does not seem to match up with the behavior that I have tested. Overall it is an extra dependency I'd rather not have.
I did a spike moving our codebase over to use sidekiq-ent/unique. Out of the 100+ jobs we have defined, 7 of them have uniqueness constraints. Of those 7, 3 of them have additional requirements that sidekiq-ent/unique does not offer.
One of the attractions of using sidekiq-ent/unique is the simplicity/size of the code. It looks straightforward for how to add this functionality. However since there are no extension points, we would have to copy the file and maintain it ourselves.
Would anyone else find this useful?
The text was updated successfully, but these errors were encountered: