Failed jobs waiting to be retried are not considered when fetching uniqueness #394

axos88 · 2019-05-15T00:15:57Z

Describe the bug
If a job fails, and is waiting to be retried, adding a new job with similar arguments should trigger the conflict resolution mechanism.

I have an job which - based on its arguments - calculates and saves an expensive operation to the database, that I would not like to enqueue multiple times (since it's output is only based on the input, and I want to make the calculation as few times as possible...), but if it is already being executed, allow it t be enqueued.
Curently I am using this config:

    sidekiq_options lock: :until_executing, on_conflict: :replace

   # only use the id for uniqueness checking, and ignore the actual data to be aggregated
    def self.unique_args(args)
      [ args[0] ]
    end

until_executed is not okay due to the requirement for it to be reexecuted if it is enqueued while it is already running.

Now the issue here is that the lock is removed when the job is scheduled to be run, but it is not readded if it fails.

Expected behavior
The system should re-add the lock if the job fails, and trigger the deduplication mechanism if a similar job has been enqueued in the meantime. If you think about it this is not very different than adding a job to the end of the enqueued jobs list, other than it contains some meta-information about the failure, and how many times it was tried.
Not only does it get duplicated, but the execution order is undeterministic, sometimes the job that is scheduled later will be executed first, and when the job from the retry queue is rescheduled, it is executed as well, overwriting the fresh data with old one.

Current behavior
Jobs get duplicated

The text was updated successfully, but these errors were encountered:

mhenrixon · 2019-05-15T07:13:54Z

That is a really good point! I missed this, of course we need to catch any error and re-lock the job when any type of error is raised. There is a race condition though, it is possible then that another job has been scheduled in the meantime so it would be up to you to handle that with a conflict resolution.

axos88 · 2019-05-15T12:41:07Z

Yeah. I see two possible ways to solve that
a) handle the job being added to the retry queue the same way as a new job being added (with the drawback outlined above)
b) reschedule the failed job (and by that I mean give the lock to the failed job), and simulate what would happen if the new job would be added afterwards (basically the same as removing and re-adding the new job). This could cause issues if somebody hangs onto the job id of the new job, which depending on the implementation could change, or simply go away. Same thing with hanging on to the old job, if the new job replaces it, the job with that ID goes away, and somebody might be hanging onto it.

mhenrixon · 2019-05-15T13:05:05Z

There is already functionality to both replace and reschedule jobs but retry queue or not doesn't matter except in this particular case. All other job types keeps being locked until done.

axos88 · 2020-01-09T09:51:36Z

Hi. It's there any progress on this?

mhenrixon · 2020-01-09T09:54:37Z

Not really @axos88, I've been out most of December due to back pain. Then before Christmas my wife had surgery and I've been taking care of the kids since then. This year I started a couple of new gigs so I haven't had a chance to dig into sidekiq unique jobs for a while.

I'll get there shortly though.

axos88 · 2020-01-09T10:20:00Z

Well I guess it's only upwards from there... Hang tight :)

axos88 · 2020-08-29T09:47:40Z

Hi there. Has there been any progress on this?

mhenrixon · 2020-09-05T18:21:38Z

Hi @axos88, I did so many bugfixes in 2020 for v7. Which version are you on at the moment?

axos88-da · 2020-11-17T13:11:29Z

Not sure, I'll have try to upgrade and check if the issue is still present.

mhenrixon · 2021-01-22T21:55:25Z

Closing due to inactivity! 😁

See #571 for an upgrade to v7.0.1 and report back if you are still experiencing the same problem. I'll get to them one at a time but I need some participation 🤪.

axos88 · 2022-04-30T13:27:42Z

Rewiving this old one. This is still an issue on 7.0.4.

Any jobs that are in the failed queue will not take part in validation.

Steps to reproduce:

class FailingJob
  include Sidekiq::Worker

  sidekiq_options lock: :until_executing, on_conflict: :replace

  def perform(id)
    raise NotImplementedError
  end
end

Execute the following in for example a rails console, while the sidekiq worker is running:

  FailingJob.perform_async
  sleep 1
  FailingJob.perform_async

Expected: The job should not duplicate
Actualk: It does.

mhenrixon · 2022-05-01T15:26:49Z

I guess I should at the very least try to put the lock back. I'll look into it.

This type of lock will never be hundred percent perfect though since the lock is already released when the server starts processing.

The only thing I can think of is to maintain my own sidekiq fetcher that handles the queues differently so that a job like yours would be preferred over a job that came in later but both have the same lock.

Maybe some ordered queues or something. Interesting problem to solve.

Thank you for the additional information, was very helpful.

axos88-da · 2022-05-04T10:46:14Z

Yes, putting the lock back in would already help a lot in my case.

Closes #394

mhenrixon · 2022-05-04T11:54:11Z

Released as v7.1.22 https://github.com/mhenrixon/sidekiq-unique-jobs/releases/tag/v7.1.22 @axos88 @axos88-da

axos88 · 2022-05-20T21:36:40Z

@mhenrixon , I think the exceptions are now swallowed. lib/sidekiq_unique_jobs/lock/until_executing.rb:35 should rethrow the exception, should it not?

mhenrixon · 2022-05-23T09:51:56Z

@axos88 indeed it should, PR coming up!

mhenrixon self-assigned this May 15, 2019

mhenrixon added the bug label May 15, 2019

mhenrixon closed this as completed Jan 22, 2021

This was referenced Mar 12, 2021

Bump sidekiq-unique-jobs from 6.0.25 to 7.0.4 mastodon/mastodon#15878

Merged

Bump sidekiq-unique-jobs from 6.0.25 to 7.0.4 postxiami/mastodon#6

Closed

Bump sidekiq-unique-jobs from 6.0.25 to 7.0.4 YoheiZuho/mastodon#333

Closed

axos88 mentioned this issue Apr 30, 2022

Reviwing: Failed jobs waiting to be retried are not considered when fetching uniqueness #708

Closed

mhenrixon reopened this Apr 30, 2022

mhenrixon added a commit that referenced this issue May 4, 2022

fix: re:lock until_executing on worker failure

6ea3c9e

Closes #394

mhenrixon mentioned this issue May 4, 2022

fix: re:lock until_executing on worker failure #709

Merged

mhenrixon closed this as completed in #709 May 4, 2022

mhenrixon added a commit that referenced this issue May 4, 2022

fix: re:lock until_executing on worker failure (#709)

5592318

Closes #394

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed jobs waiting to be retried are not considered when fetching uniqueness #394

Failed jobs waiting to be retried are not considered when fetching uniqueness #394

axos88 commented May 15, 2019 •

edited

mhenrixon commented May 15, 2019

axos88 commented May 15, 2019 •

edited

mhenrixon commented May 15, 2019

axos88 commented Jan 9, 2020

mhenrixon commented Jan 9, 2020

axos88 commented Jan 9, 2020

axos88 commented Aug 29, 2020

mhenrixon commented Sep 5, 2020

axos88-da commented Nov 17, 2020

mhenrixon commented Jan 22, 2021

axos88 commented Apr 30, 2022 •

edited

mhenrixon commented May 1, 2022

axos88-da commented May 4, 2022

mhenrixon commented May 4, 2022

axos88 commented May 20, 2022

mhenrixon commented May 23, 2022

Failed jobs waiting to be retried are not considered when fetching uniqueness #394

Failed jobs waiting to be retried are not considered when fetching uniqueness #394

Comments

axos88 commented May 15, 2019 • edited

mhenrixon commented May 15, 2019

axos88 commented May 15, 2019 • edited

mhenrixon commented May 15, 2019

axos88 commented Jan 9, 2020

mhenrixon commented Jan 9, 2020

axos88 commented Jan 9, 2020

axos88 commented Aug 29, 2020

mhenrixon commented Sep 5, 2020

axos88-da commented Nov 17, 2020

mhenrixon commented Jan 22, 2021

axos88 commented Apr 30, 2022 • edited

mhenrixon commented May 1, 2022

axos88-da commented May 4, 2022

mhenrixon commented May 4, 2022

axos88 commented May 20, 2022

mhenrixon commented May 23, 2022

axos88 commented May 15, 2019 •

edited

axos88 commented May 15, 2019 •

edited

axos88 commented Apr 30, 2022 •

edited