-
-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect :while_executing
behavior
#432
Comments
First of all, the while executing job was completely broken in 5.x. Also, some things have changed between v5 and v7. sidekiq_options(
queue: 'test_queue',
lock: :while_executing,
lock_timeout: 5,
unique_args: -> (args) do
[
args[0]
]
end
) The default It looks like everything is working fine to me but there is a lot of information missing for me to truly be helpful:
|
we've been using it for a long time in production, and never had any issues, I've checked one more time, works as expected
yep, that is what I missed, thanks! with this option,
for simplicity, I've removed all that could break things, so now it looks as :concurrency: 5
:max_retries: 10
:queues:
- test_queue
redis_options = {
url: 'redis://localhost:6379'
}
Sidekiq.configure_client do |config|
config.redis = redis_options
end
Sidekiq.configure_server do |config|
config.redis = redis_options
end
|
Thanks a bunch @roman-melnyk. It is quite possible I'm really happy while executing works for you guys on the older version. I think the locking was done thread locally back then so multiple threads would create problems but I might be wrong on that one. It has been a long time. What I do know for sure is that you should avoid v6 and aim for v7 instead. Might I ask what result you get instead on |
so, what I've found so far: class AJob
include Sidekiq::Worker
sidekiq_retry_in { |count| 3 }
@@counter = 0
sidekiq_options(
queue: 'test_queue',
lock: :while_executing,
lock_timeout: 60 * 5,
retry: 2,
unique_args: -> (args) do
[
args[0]
]
end
)
def perform(*arguments)
puts "!!! start #{arguments.inspect}"
sleep 3
if @@counter < 2
puts "!!! raise #{arguments.inspect}"
raise
end
puts "!!! finish #{arguments.inspect}"
ensure
@@counter += 1
end
end for 5.times { |i| puts AJob.perform_async('a', i); sleep 1 } only 2 or 3 of them finishing, class AJob
include Sidekiq::Worker
sidekiq_options(
queue: 'test_queue',
lock: :until_and_while_executing,
lock_timeout: 60 * 5,
unique_args: -> (args) do
[
args[0]
]
end
)
def perform(*arguments)
puts "!!! start #{arguments.inspect}"
sleep 10
puts "!!! finish #{arguments.inspect}"
end
end when I'm trying to push jobs as |
Really nice of you to help debug this. What happens if you set the concurrency either higher or lower? Do you get different results? What I discovered on v6 which led me to refactor away towards v7 was that given too high concurrency my jobs wouldn't complete. I would, in fact, run into complete lockdowns where everything stopped working. All the workers were busy but not a single one processing jobs. With a too low concurrency, everything worked perfectly and this is how I developed and sort of end to end tested v6. I would be keen on learning if changing the concurrency changes the result. |
for |
@roman-melnyk the delay on push will be because you are having a lock timeout of 5 minutes. I strongly recommend against that but it shows the need for separating the lock timeout for client and server middlewares. |
@roman-melnyk can you try to upgrade |
@mhenrixon hi! I've tested a latest version of class AJob
include Sidekiq::Worker
sidekiq_retry_in { 3 }
sidekiq_options(
queue: 'test_queue',
lock: :while_executing,
lock_timeout: 60,
retry: 2,
on_conflict: :log,
unique_args: -> (args) do
[
args[0]
]
end
)
def perform(*arguments)
log(arguments, :start)
sleep 3
if $redis.get('counter').to_i < 2
log(arguments, :raise)
raise 'Need retry!'
end
log(arguments, :finish)
ensure
$redis.incr 'counter'
end
private
def log(arguments, action)
puts " !!! #{action} #{arguments.inspect} at #{Time.now.to_i - $redis.get("start").to_i} sec, counter is #{$redis.get("counter")}"
end
end so, for the concurrency 1 it seems to work correctly:
jobs a0 and a1 have raised an exceptions, and have been retried after jobs a2, a3 and a4
jobs a0 and a1 have raised an exceptions, and have been retried, but jobs a2, a3 and a4 have not been started at all |
@roman-melnyk is it a rails app? If so, for some reason Sidekiq doesn't do what it should... It is a Sidekiq bug related to rails reloading feature. The block that is supposed to execute your worker code isn't executed. The problem is here: https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/processor.rb#L131 the |
@mhenrixon I've tried with |
@mhenrixon so what are our options here? Any change this will be fixed? |
@mhenrixon bumping this again. Anything we can do to help? |
@alex-hall @nedkoh @roman-melnyk i will take another stab at this tonight when my daughters are in bed. |
I can't replicate this problem in V7, let me know if I missed something that would make it fail. I basically just copied the worker |
I'm not sure this is possible to solve for v6. @alex-hall, @nedkoh, @roman-melnyk which version (if any) are you on at the moment? The whole reason for the v7 rewrite/refactoring was to be able to fix these types of problems and make the gem more reliable. Some parts of the v6 code are nearly impossible to get right so if possible, I suggest you give the latest v7 beta a try. |
@mhenrixon , these are our current sidekiq versions:
Last time we tried to upgrade these gems and ran into these problem. @roman-melnyk , can you try re-creating? |
@mhenrixon 👍🏼 confirming correct work in 7.0.4 version |
@mhenrixon tested again on sidekiq 6.2 and latest 7.0.7 version, with concurrency 20
seems 2, 3 and 4 jobs are lost
again, job 3, 4 and 5 are lost |
@roman-melnyk it depends on your configuration. If you have a lock_timeout set it only waits that long from when the job is picked of the queue. I think there should be some log entry for that. Perhaps add If you want to make sure no job is lost you can use |
Describe the bug
enqueue the job with
while_executing
lock policy and with uniqueness by the first argument 3 times with interval 1 secondExpected behavior
all jobs should be placed in the queue, second job should wait until first one is finished, and then third job should wait until second one is finished
Current behavior
all jobs are placed in the queue, but only first and second job are performed. moreover, if we enqeue 3 jobs again, now only first job will be performed
Worker class
Additional context
here is a log from
sidekiq (5.2.5)
andsidekiq-unique-jobs (5.0.10)
, where it works correctly:and this is from
sidekiq (6.0.1)
,sidekiq-unique-jobs (7.0.0.beta2)
The text was updated successfully, but these errors were encountered: