Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect :while_executing behavior #432

Closed
roman-melnyk opened this issue Oct 10, 2019 · 20 comments · Fixed by #552
Closed

incorrect :while_executing behavior #432

roman-melnyk opened this issue Oct 10, 2019 · 20 comments · Fixed by #552
Assignees
Milestone

Comments

@roman-melnyk
Copy link

Describe the bug
enqueue the job with while_executing lock policy and with uniqueness by the first argument 3 times with interval 1 second

Expected behavior
all jobs should be placed in the queue, second job should wait until first one is finished, and then third job should wait until second one is finished

Current behavior
all jobs are placed in the queue, but only first and second job are performed. moreover, if we enqeue 3 jobs again, now only first job will be performed

Worker class

class TestJob
  include Sidekiq::Worker

  sidekiq_options(
    queue: 'test_queue',
    unique: :while_executing,
    unique_args: -> (args) do
      [
        args[0]
      ]
    end
  )

  def perform(*arguments)
    Rails.logger.info "!!! start with #{arguments.inspect}"
    (1..120000).inject(:*)
    Rails.logger.info '!!! finish'
  end
end

Additional context
here is a log from sidekiq (5.2.5) and sidekiq-unique-jobs (5.0.10), where it works correctly:

>> TestJob.perform_async 'foo', 1
=> "aa45c34edcfaa56b114e78db"
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/size, 0
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/latency, 0
>> TestJob.perform_async 'foo', 2
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/size, 0
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/latency, 0
=> "72397550278883430400c2b7"
>> TestJob.perform_async 'foo', 3
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/size, 1
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/latency, 0.0016357898712158203
=> "aaecc71644ab1aa2b0ae2094"



2019-10-10T13:28:50.100Z 675 TID-grow3h2rb TestJob JID-aa45c34edcfaa56b114e78db INFO: start
2019-10-10T13:28:51.547Z 675 TID-grow3h2i7 TestJob JID-72397550278883430400c2b7 INFO: start
2019-10-10T13:28:51.853Z 675 TID-grow3h2rb TestJob JID-aa45c34edcfaa56b114e78db INFO: !!! start with ["foo", 1]
2019-10-10T13:29:01.265Z 675 TID-grow3h2rb TestJob JID-aa45c34edcfaa56b114e78db INFO: !!! finish
2019-10-10T13:29:01.266Z 675 TID-grow3h2rb TestJob JID-aa45c34edcfaa56b114e78db INFO: done: 11.166 sec
2019-10-10T13:29:01.267Z 675 TID-grow3h2rb TestJob JID-aaecc71644ab1aa2b0ae2094 INFO: start
2019-10-10T13:29:02.062Z 675 TID-grow3h2i7 TestJob JID-72397550278883430400c2b7 INFO: !!! start with ["foo", 2]
2019-10-10T13:29:09.887Z 675 TID-grow3h2i7 TestJob JID-72397550278883430400c2b7 INFO: !!! finish
2019-10-10T13:29:09.887Z 675 TID-grow3h2i7 TestJob JID-72397550278883430400c2b7 INFO: done: 18.34 sec
2019-10-10T13:29:09.925Z 675 TID-grow3h2rb TestJob JID-aaecc71644ab1aa2b0ae2094 INFO: !!! start with ["foo", 3]
2019-10-10T13:29:18.445Z 675 TID-grow3h2rb TestJob JID-aaecc71644ab1aa2b0ae2094 INFO: !!! finish
2019-10-10T13:29:18.445Z 675 TID-grow3h2rb TestJob JID-aaecc71644ab1aa2b0ae2094 INFO: done: 17.179 sec

and this is from sidekiq (6.0.1), sidekiq-unique-jobs (7.0.0.beta2)

>> TestJob.perform_async 'foo', 1
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/size, 0
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/latency, 0
=> "35c69d21c26b4385ac0f1e14"
>> TestJob.perform_async 'foo', 2
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/size, 0
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/latency, 0
=> "9b8a444211a6287952cb03d6"
>> TestJob.perform_async 'foo', 3
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/size, 1
[Newrelic::CustomMetricService] record Custom/Sidekiq::Queue::TestQueue/latency, 0.0014061927795410156
=> "2b145d5d10fb3dd9523c6f4c"


2019-10-10T13:39:42.642Z pid=11435 tid=gq64bagkv class=TestJob jid=35c69d21c26b4385ac0f1e14 INFO: start
2019-10-10T13:39:44.146Z pid=11435 tid=gq64bah33 class=TestJob jid=9b8a444211a6287952cb03d6 INFO: start
2019-10-10T13:39:44.666Z pid=11435 tid=gq64bagkv class=TestJob jid=35c69d21c26b4385ac0f1e14 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN DEBUG: Executed queue.lua in 1ms
2019-10-10T13:39:44.668Z pid=11435 tid=gq64bagkv class=TestJob jid=35c69d21c26b4385ac0f1e14 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN DEBUG: Executed lock.lua in 2ms
2019-10-10T13:39:44.668Z pid=11435 tid=gq64bagkv class=TestJob jid=35c69d21c26b4385ac0f1e14 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN INFO: !!! start with ["foo", 1]
2019-10-10T13:39:54.411Z pid=11435 tid=gq64bagkv class=TestJob jid=35c69d21c26b4385ac0f1e14 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN INFO: !!! finish
2019-10-10T13:39:54.412Z pid=11435 tid=gq64bagkv class=TestJob jid=35c69d21c26b4385ac0f1e14 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN DEBUG: Executed unlock.lua in 2ms
2019-10-10T13:39:54.413Z pid=11435 tid=gq64bagkv class=TestJob jid=35c69d21c26b4385ac0f1e14 elapsed=11.771 INFO: done
2019-10-10T13:39:54.414Z pid=11435 tid=gq64bagkv class=TestJob jid=2b145d5d10fb3dd9523c6f4c INFO: start
2019-10-10T13:39:55.272Z pid=11435 tid=gq64bah33 class=TestJob jid=9b8a444211a6287952cb03d6 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN DEBUG: Executed queue.lua in 0ms
2019-10-10T13:39:55.274Z pid=11435 tid=gq64bagkv class=TestJob jid=2b145d5d10fb3dd9523c6f4c uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN DEBUG: Executed queue.lua in 1ms
2019-10-10T13:39:55.275Z pid=11435 tid=gq64bah33 class=TestJob jid=9b8a444211a6287952cb03d6 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN DEBUG: Executed lock.lua in 0ms
2019-10-10T13:39:55.275Z pid=11435 tid=gq64bah33 class=TestJob jid=9b8a444211a6287952cb03d6 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN INFO: !!! start with ["foo", 2]
2019-10-10T13:39:55.275Z pid=11435 tid=gq64bagkv class=TestJob jid=2b145d5d10fb3dd9523c6f4c uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN, job_id: 2b145d5d10fb3dd9523c6f4c)
2019-10-10T13:39:55.470Z pid=11435 tid=gq64bagkv class=TestJob jid=2b145d5d10fb3dd9523c6f4c uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN DEBUG: Executed unlock.lua in 98ms
2019-10-10T13:39:55.830Z pid=11435 tid=gq64bagkv class=TestJob jid=2b145d5d10fb3dd9523c6f4c elapsed=1.416 INFO: done
2019-10-10T13:40:04.356Z pid=11435 tid=gq64bah33 class=TestJob jid=9b8a444211a6287952cb03d6 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN INFO: !!! finish
2019-10-10T13:40:04.357Z pid=11435 tid=gq64bah33 class=TestJob jid=9b8a444211a6287952cb03d6 uniquejobs=server =uniquejobs:c3bd986d71c5f4ff21ed14cbafae3b28:RUN DEBUG: Executed unlock.lua in 1ms
2019-10-10T13:40:04.358Z pid=11435 tid=gq64bah33 class=TestJob jid=9b8a444211a6287952cb03d6 elapsed=20.212 INFO: done
@mhenrixon
Copy link
Owner

First of all, the while executing job was completely broken in 5.x.

Also, some things have changed between v5 and v7.

sidekiq_options(
    queue: 'test_queue',
    lock: :while_executing,
    lock_timeout: 5,
    unique_args: -> (args) do
      [
        args[0]
      ]
    end
  )

The default lock_timeout (seconds to wait for a lock) is zero and you are lucky any of your jobs are processed except the first one. :)

It looks like everything is working fine to me but there is a lot of information missing for me to truly be helpful:

  1. What does your sidekiq.yml look like?
  2. What does your sidekiq.rb look like?
  3. How do you start sidekiq?

@roman-melnyk
Copy link
Author

First of all, the while executing job was completely broken in 5.x.

we've been using it for a long time in production, and never had any issues, I've checked one more time, works as expected

lock_timeout: 5

yep, that is what I missed, thanks! with this option, while_executing does work as expected (with and without unique_args) but, I've checked also until_and_while_executing - it still has a bug, with and without unique_args: when I put 5 jobs in a queue in 1 sec interval, and job executes 10 secs, only 2 jobs should be executed - first and second, which is allowed to be put in queue as soon as first job started. could you please confirm that this case works for you in 7.0.0.beta2 ?

What does your sidekiq.yml look like?

for simplicity, I've removed all that could break things, so now it looks as

:concurrency: 5
:max_retries: 10
:queues:
  - test_queue

What does your sidekiq.rb look like?

redis_options = {
  url: 'redis://localhost:6379'
}

Sidekiq.configure_client do |config|
  config.redis = redis_options
end

Sidekiq.configure_server do |config|
  config.redis = redis_options
end

How do you start sidekiq?

bundle exec sidekiq

@mhenrixon
Copy link
Owner

Thanks a bunch @roman-melnyk. It is quite possible until_and_while_executing isn't completely working yet.

I'm really happy while executing works for you guys on the older version. I think the locking was done thread locally back then so multiple threads would create problems but I might be wrong on that one. It has been a long time.

What I do know for sure is that you should avoid v6 and aim for v7 instead.

Might I ask what result you get instead on until_and_while_executing on v7.beta2? I have it tested but there might be some condition I didn't get right yet.

@roman-melnyk
Copy link
Author

so, what I've found so far:
I'm using this file for testing

class AJob
  include Sidekiq::Worker
  sidekiq_retry_in { |count| 3 }
  @@counter = 0

  sidekiq_options(
    queue: 'test_queue',
    lock: :while_executing,
    lock_timeout: 60 * 5,
    retry: 2,
    unique_args: -> (args) do
      [
        args[0]
      ]
    end
  )

  def perform(*arguments)
    puts "!!! start #{arguments.inspect}"
    sleep 3
    if @@counter < 2
      puts "!!! raise #{arguments.inspect}"
      raise
    end
    puts "!!! finish #{arguments.inspect}"
  ensure
    @@counter += 1
  end
end

for while_executing, when I run 5 jobs as

5.times { |i| puts AJob.perform_async('a', i); sleep 1 }

only 2 or 3 of them finishing,
and for until_and_while_executing:

class AJob
  include Sidekiq::Worker

  sidekiq_options(
    queue: 'test_queue',
    lock: :until_and_while_executing,
    lock_timeout: 60 * 5,
    unique_args: -> (args) do
      [
        args[0]
      ]
    end
  )

  def perform(*arguments)
    puts "!!! start #{arguments.inspect}"
    sleep 10
    puts "!!! finish #{arguments.inspect}"
  end
end

when I'm trying to push jobs as
5.times { |i| puts AJob.perform_async('a', i); sleep 1 }
i'm getting 3 jobs finished instead of 2, and, what is more interesting, there is a delay after pushing 3th and 4th job

@mhenrixon
Copy link
Owner

Really nice of you to help debug this.

What happens if you set the concurrency either higher or lower? Do you get different results? What I discovered on v6 which led me to refactor away towards v7 was that given too high concurrency my jobs wouldn't complete. I would, in fact, run into complete lockdowns where everything stopped working.

All the workers were busy but not a single one processing jobs.

With a too low concurrency, everything worked perfectly and this is how I developed and sort of end to end tested v6.

I would be keen on learning if changing the concurrency changes the result.

@roman-melnyk
Copy link
Author

for until_and_while_executing - tried concurrency 1 and 10 - still the same delay on job pushing,
for while_executing - yep, seems on concurrency 1 all jobs start, retry and finish properly, for 2 or more - some jobs disappear

@mhenrixon
Copy link
Owner

@roman-melnyk the delay on push will be because you are having a lock timeout of 5 minutes. I strongly recommend against that but it shows the need for separating the lock timeout for client and server middlewares.

@mhenrixon
Copy link
Owner

@roman-melnyk can you try to upgrade sidekiq-unique-jobs and let me know if you are still having trouble? I've released quite a few bug fixes since you posted this.

@roman-melnyk
Copy link
Author

@mhenrixon hi! I've tested a latest version of sidekiq-unique-jobs 7.0.0.beta9 with sidekiq 6.0.3 with this slightly modified script

class AJob
  include Sidekiq::Worker
  sidekiq_retry_in { 3 }

  sidekiq_options(
    queue: 'test_queue',
    lock: :while_executing,
    lock_timeout: 60,
    retry: 2,
    on_conflict: :log,
    unique_args: -> (args) do
      [
        args[0]
      ]
    end
  )

  def perform(*arguments)
    log(arguments, :start)
    sleep 3
    if $redis.get('counter').to_i < 2
      log(arguments, :raise)
      raise 'Need retry!'
    end
    log(arguments, :finish)
  ensure
    $redis.incr 'counter'
  end

  private

  def log(arguments, action)
    puts "      !!! #{action} #{arguments.inspect} at #{Time.now.to_i - $redis.get("start").to_i} sec, counter is #{$redis.get("counter")}"
  end
end

so, for the concurrency 1 it seems to work correctly:

$redis.set("start", Time.now.to_i); $redis.set("counter", 0); 5.times { |i| puts AJob.perform_async('a', i); sleep 1 }
526bba6b0a21febf20b4557a
41933d0529b39f2dba651396
10131f08909d3942d238f36a
2bdd9905f4938f6bbc132566
7928e1a60b3e7762fe5cce95
2019-12-09T11:28:01.831Z pid=22249 tid=gnt3gmbgt INFO: Booting Sidekiq 6.0.3 with redis options {:url=>"redis://localhost:6379", :namespace=>"sidekiq", :network_timeout=>5, :id=>"Sidekiq-server-PID-22249"}
2019-12-09T11:28:06.344Z pid=22249 tid=gnt3gmbgt INFO: Running in ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux]
2019-12-09T11:28:06.344Z pid=22249 tid=gnt3gmbgt INFO: See LICENSE and the LGPL-3.0 for licensing details.
2019-12-09T11:28:06.344Z pid=22249 tid=gnt3gmbgt INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org
2019-12-09T11:28:06.348Z pid=22249 tid=gnt3gmbgt DEBUG: Executed update_version.lua in 1ms
2019-12-09T11:28:06.348Z pid=22249 tid=gnt3gmbgt uniquejobs=upgrade_locks INFO: Already upgraded to 7.0.0.beta9
2019-12-09T11:28:06.349Z pid=22249 tid=gnt3gmbgt uniquejobs=reaper INFO: Starting Reaper
2019-12-09T11:28:06.349Z pid=22249 tid=gnt3gmbgt DEBUG: Client Middleware: SidekiqUniqueJobs::Middleware::Client
2019-12-09T11:28:06.350Z pid=22249 tid=gnt3gmbgt DEBUG: Server Middleware: SidekiqUniqueJobs::Middleware::Server, SidekiqMarginaliaIntegration
2019-12-09T11:28:06.350Z pid=22249 tid=gnt3gmbgt INFO: Starting processing, hit Ctrl-C to stop
2019-12-09T11:28:06.350Z pid=22249 tid=gnt3gmbgt DEBUG: {:queues=>["test_queue"], :labels=>[], :concurrency=>1, :require=>".", :environment=>nil, :timeout=>25, :poll_interval_average=>nil, :average_scheduled_poll_interval=>5, :error_handlers=>[#<Sidekiq::ExceptionHandler::Logger:0x0000557fb32d77c8>, #<Method: Airbrake::Sidekiq::ErrorHandler#notify_airbrake>], :death_handlers=>[], :lifecycle_events=>{:startup=>[], :quiet=>[], :shutdown=>[#<Proc:0x0000557fb42735d8@/home/freeman/.rvm/gems/ruby-2.6.2/gems/sidekiq-unique-jobs-7.0.0.beta9/lib/sidekiq_unique_jobs/middleware.rb:42>], :heartbeat=>[]}, :dead_max_jobs=>10000, :dead_timeout_in_seconds=>15552000, :reloader=>#<Sidekiq::Rails::Reloader @app=Calendly::Application>, :max_retries=>10, :config_file=>"./config/sidekiq.yml", :strict=>true, :tag=>"calendly", :identity=>"laptop:22249:c6ff28850cb2"}
2019-12-09T11:28:06.350Z pid=22249 tid=gnt59hzl1 uniquejobs=reaper INFO: Nothing to delete; exiting.
2019-12-09T11:28:13.440Z pid=22249 tid=gnt59kg2l class=AJob jid=526bba6b0a21febf20b4557a INFO: start
2019-12-09T11:28:15.267Z pid=22249 tid=gnt59kg2l class=AJob jid=526bba6b0a21febf20b4557a uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:28:15.270Z pid=22249 tid=gnt59kg2l class=AJob jid=526bba6b0a21febf20b4557a uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 0] at 2 sec, counter is 0
      !!! raise ["a", 0] at 5 sec, counter is 0
2019-12-09T11:28:18.277Z pid=22249 tid=gnt59kg2l class=AJob jid=526bba6b0a21febf20b4557a uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 3ms
2019-12-09T11:28:18.279Z pid=22249 tid=gnt59kg2l class=AJob jid=526bba6b0a21febf20b4557a elapsed=4.84 INFO: fail
2019-12-09T11:28:18.280Z pid=22249 tid=gnt59kg2l WARN: {"context":"Job raised exception","job":{"class":"AJob","args":["a",0],"retry":2,"queue":"test_queue","lock":"while_executing","lock_timeout":60,"on_conflict":"log","unique_args":["a"],"jid":"526bba6b0a21febf20b4557a","created_at":1575890893.4380698,"lock_ttl":null,"unique_prefix":"uniquejobs","unique_digest":"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN","enqueued_at":1575890893.4385471},"jobstr":"{\"class\":\"AJob\",\"args\":[\"a\",0],\"retry\":2,\"queue\":\"test_queue\",\"lock\":\"while_executing\",\"lock_timeout\":60,\"on_conflict\":\"log\",\"unique_args\":[\"a\"],\"jid\":\"526bba6b0a21febf20b4557a\",\"created_at\":1575890893.4380698,\"lock_ttl\":null,\"unique_prefix\":\"uniquejobs\",\"unique_digest\":\"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN\",\"enqueued_at\":1575890893.4385471}"}
2019-12-09T11:28:18.280Z pid=22249 tid=gnt59kg2l WARN: RuntimeError: Need retry!
2019-12-09T11:28:18.282Z pid=22249 tid=gnt53udp9 class=AJob jid=41933d0529b39f2dba651396 INFO: start
2019-12-09T11:28:18.284Z pid=22249 tid=gnt53udp9 class=AJob jid=41933d0529b39f2dba651396 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:28:18.286Z pid=22249 tid=gnt53udp9 class=AJob jid=41933d0529b39f2dba651396 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 1] at 5 sec, counter is 1
Passing 'script' command to redis as is; administrative commands cannot be effectively namespaced and should be called on the redis connection directly; passthrough has been deprecated and will be removed in redis-namespace 2.0 (at /home/freeman/.rvm/gems/ruby-2.6.2/gems/brpoplpush-redis_script-0.1.0/lib/brpoplpush/redis_script/script.rb:69:in `load')
      !!! raise ["a", 1] at 8 sec, counter is 1
2019-12-09T11:28:21.301Z pid=22249 tid=gnt53udp9 class=AJob jid=41933d0529b39f2dba651396 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:28:21.302Z pid=22249 tid=gnt53udp9 class=AJob jid=41933d0529b39f2dba651396 elapsed=3.021 INFO: fail
2019-12-09T11:28:21.302Z pid=22249 tid=gnt53udp9 WARN: {"context":"Job raised exception","job":{"class":"AJob","args":["a",1],"retry":2,"queue":"test_queue","lock":"while_executing","lock_timeout":60,"on_conflict":"log","unique_args":["a"],"jid":"41933d0529b39f2dba651396","created_at":1575890894.441212,"lock_ttl":null,"unique_prefix":"uniquejobs","unique_digest":"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN","enqueued_at":1575890894.4425294},"jobstr":"{\"class\":\"AJob\",\"args\":[\"a\",1],\"retry\":2,\"queue\":\"test_queue\",\"lock\":\"while_executing\",\"lock_timeout\":60,\"on_conflict\":\"log\",\"unique_args\":[\"a\"],\"jid\":\"41933d0529b39f2dba651396\",\"created_at\":1575890894.441212,\"lock_ttl\":null,\"unique_prefix\":\"uniquejobs\",\"unique_digest\":\"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN\",\"enqueued_at\":1575890894.4425294}"}
2019-12-09T11:28:21.302Z pid=22249 tid=gnt53udp9 WARN: RuntimeError: Need retry!
2019-12-09T11:28:21.304Z pid=22249 tid=gnt55ox7x class=AJob jid=10131f08909d3942d238f36a INFO: start
2019-12-09T11:28:21.306Z pid=22249 tid=gnt55ox7x class=AJob jid=10131f08909d3942d238f36a uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:28:21.308Z pid=22249 tid=gnt55ox7x class=AJob jid=10131f08909d3942d238f36a uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 2] at 8 sec, counter is 2
2019-12-09T11:28:21.463Z pid=22249 tid=gnt59kfm5 DEBUG: enqueued retry: {"class":"AJob","args":["a",0],"retry":2,"queue":"test_queue","lock":"while_executing","lock_timeout":60,"on_conflict":"log","unique_args":["a"],"jid":"526bba6b0a21febf20b4557a","created_at":1575890893.4380698,"lock_ttl":null,"unique_prefix":"uniquejobs","unique_digest":"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN","enqueued_at":1575890893.4385471,"error_message":"Need retry!","error_class":"RuntimeError","failed_at":1575890898.278468,"retry_count":0}
      !!! finish ["a", 2] at 11 sec, counter is 2
2019-12-09T11:28:24.314Z pid=22249 tid=gnt55ox7x class=AJob jid=10131f08909d3942d238f36a uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 2ms
2019-12-09T11:28:24.315Z pid=22249 tid=gnt55ox7x class=AJob jid=10131f08909d3942d238f36a elapsed=3.011 INFO: done
2019-12-09T11:28:24.316Z pid=22249 tid=gnt55ox7x class=AJob jid=2bdd9905f4938f6bbc132566 INFO: start
2019-12-09T11:28:24.318Z pid=22249 tid=gnt55ox7x class=AJob jid=2bdd9905f4938f6bbc132566 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:28:24.320Z pid=22249 tid=gnt55ox7x class=AJob jid=2bdd9905f4938f6bbc132566 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 3] at 11 sec, counter is 3
2019-12-09T11:28:26.682Z pid=22249 tid=gnt59kfm5 DEBUG: enqueued retry: {"class":"AJob","args":["a",1],"retry":2,"queue":"test_queue","lock":"while_executing","lock_timeout":60,"on_conflict":"log","unique_args":["a"],"jid":"41933d0529b39f2dba651396","created_at":1575890894.441212,"lock_ttl":null,"unique_prefix":"uniquejobs","unique_digest":"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN","enqueued_at":1575890894.4425294,"error_message":"Need retry!","error_class":"RuntimeError","failed_at":1575890901.302108,"retry_count":0}
      !!! finish ["a", 3] at 14 sec, counter is 3
2019-12-09T11:28:27.325Z pid=22249 tid=gnt55ox7x class=AJob jid=2bdd9905f4938f6bbc132566 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:28:27.326Z pid=22249 tid=gnt55ox7x class=AJob jid=2bdd9905f4938f6bbc132566 elapsed=3.01 INFO: done
2019-12-09T11:28:27.327Z pid=22249 tid=gnt55ox7x class=AJob jid=7928e1a60b3e7762fe5cce95 INFO: start
2019-12-09T11:28:27.329Z pid=22249 tid=gnt55ox7x class=AJob jid=7928e1a60b3e7762fe5cce95 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:28:27.331Z pid=22249 tid=gnt55ox7x class=AJob jid=7928e1a60b3e7762fe5cce95 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 4] at 14 sec, counter is 4
      !!! finish ["a", 4] at 17 sec, counter is 4
2019-12-09T11:28:30.337Z pid=22249 tid=gnt55ox7x class=AJob jid=7928e1a60b3e7762fe5cce95 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:28:30.339Z pid=22249 tid=gnt55ox7x class=AJob jid=7928e1a60b3e7762fe5cce95 elapsed=3.011 INFO: done
2019-12-09T11:28:30.339Z pid=22249 tid=gnt55ox7x class=AJob jid=526bba6b0a21febf20b4557a INFO: start
2019-12-09T11:28:30.342Z pid=22249 tid=gnt55ox7x class=AJob jid=526bba6b0a21febf20b4557a uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:28:30.344Z pid=22249 tid=gnt55ox7x class=AJob jid=526bba6b0a21febf20b4557a uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 0] at 17 sec, counter is 5
      !!! finish ["a", 0] at 20 sec, counter is 5
2019-12-09T11:28:33.349Z pid=22249 tid=gnt55ox7x class=AJob jid=526bba6b0a21febf20b4557a uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:28:33.350Z pid=22249 tid=gnt55ox7x class=AJob jid=526bba6b0a21febf20b4557a elapsed=3.011 INFO: done
2019-12-09T11:28:33.351Z pid=22249 tid=gnt55ox7x class=AJob jid=41933d0529b39f2dba651396 INFO: start
2019-12-09T11:28:33.353Z pid=22249 tid=gnt55ox7x class=AJob jid=41933d0529b39f2dba651396 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:28:33.355Z pid=22249 tid=gnt55ox7x class=AJob jid=41933d0529b39f2dba651396 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed lock.lua in 0ms
      !!! start ["a", 1] at 20 sec, counter is 6
      !!! finish ["a", 1] at 23 sec, counter is 6
2019-12-09T11:28:36.359Z pid=22249 tid=gnt55ox7x class=AJob jid=41933d0529b39f2dba651396 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:28:36.360Z pid=22249 tid=gnt55ox7x class=AJob jid=41933d0529b39f2dba651396 elapsed=3.009 INFO: done

jobs a0 and a1 have raised an exceptions, and have been retried after jobs a2, a3 and a4
but for cuncurrency 10 I get this

$redis.set("start", Time.now.to_i); $redis.set("counter", 0); 5.times { |i| puts AJob.perform_async('a', i); sleep 1 }
a231762cf661b2e2bccab528
97c4ea12ce7b6962fd820741
5977518163161cd4b03bf2bc
b7d6657308b82f8c81697121
31ba2ab728fd7685ed549cf4
2019-12-09T11:32:25.655Z pid=23127 tid=gpsidtkp7 INFO: Booting Sidekiq 6.0.3 with redis options {:url=>"redis://localhost:6379", :namespace=>"sidekiq", :network_timeout=>5, :id=>"Sidekiq-server-PID-23127"}
2019-12-09T11:32:30.289Z pid=23127 tid=gpsidtkp7 INFO: Running in ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux]
2019-12-09T11:32:30.289Z pid=23127 tid=gpsidtkp7 INFO: See LICENSE and the LGPL-3.0 for licensing details.
2019-12-09T11:32:30.289Z pid=23127 tid=gpsidtkp7 INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org
2019-12-09T11:32:30.292Z pid=23127 tid=gpsidtkp7 DEBUG: Executed update_version.lua in 1ms
2019-12-09T11:32:30.293Z pid=23127 tid=gpsidtkp7 uniquejobs=upgrade_locks INFO: Already upgraded to 7.0.0.beta9
2019-12-09T11:32:30.293Z pid=23127 tid=gpsidtkp7 uniquejobs=reaper INFO: Starting Reaper
2019-12-09T11:32:30.294Z pid=23127 tid=gpsidtkp7 DEBUG: Client Middleware: SidekiqUniqueJobs::Middleware::Client
2019-12-09T11:32:30.294Z pid=23127 tid=gpsidtkp7 DEBUG: Server Middleware: SidekiqUniqueJobs::Middleware::Server, SidekiqMarginaliaIntegration
2019-12-09T11:32:30.295Z pid=23127 tid=gpsidtkp7 INFO: Starting processing, hit Ctrl-C to stop
2019-12-09T11:32:30.295Z pid=23127 tid=gpsidtkp7 DEBUG: {:queues=>["test_queue"], :labels=>[], :concurrency=>10, :require=>".", :environment=>nil, :timeout=>25, :poll_interval_average=>nil, :average_scheduled_poll_interval=>5, :error_handlers=>[#<Sidekiq::ExceptionHandler::Logger:0x000055c816b875e8>, #<Method: Airbrake::Sidekiq::ErrorHandler#notify_airbrake>], :death_handlers=>[], :lifecycle_events=>{:startup=>[], :quiet=>[], :shutdown=>[#<Proc:0x000055c817b35738@/home/freeman/.rvm/gems/ruby-2.6.2/gems/sidekiq-unique-jobs-7.0.0.beta9/lib/sidekiq_unique_jobs/middleware.rb:42>], :heartbeat=>[]}, :dead_max_jobs=>10000, :dead_timeout_in_seconds=>15552000, :reloader=>#<Sidekiq::Rails::Reloader @app=Calendly::Application>, :max_retries=>10, :config_file=>"./config/sidekiq.yml", :strict=>true, :tag=>"calendly", :identity=>"laptop:23127:89f1bb441bc1"}
2019-12-09T11:32:30.295Z pid=23127 tid=gpsk6s2kb uniquejobs=reaper INFO: Nothing to delete; exiting.
2019-12-09T11:32:51.666Z pid=23127 tid=gpsk6vr2j class=AJob jid=a231762cf661b2e2bccab528 INFO: start
2019-12-09T11:32:52.768Z pid=23127 tid=gpsk6vqhj class=AJob jid=97c4ea12ce7b6962fd820741 INFO: start
2019-12-09T11:32:53.484Z pid=23127 tid=gpsk6vr2j class=AJob jid=a231762cf661b2e2bccab528 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:32:53.486Z pid=23127 tid=gpsk6vr2j class=AJob jid=a231762cf661b2e2bccab528 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 0] at 2 sec, counter is 0
2019-12-09T11:32:53.671Z pid=23127 tid=gpsk6vplv class=AJob jid=5977518163161cd4b03bf2bc INFO: start
2019-12-09T11:32:54.674Z pid=23127 tid=gpsk6voi3 class=AJob jid=b7d6657308b82f8c81697121 INFO: start
2019-12-09T11:32:55.677Z pid=23127 tid=gpsk6vsh7 class=AJob jid=31ba2ab728fd7685ed549cf4 INFO: start
      !!! raise ["a", 0] at 5 sec, counter is 0
2019-12-09T11:32:56.491Z pid=23127 tid=gpsk6vr2j class=AJob jid=a231762cf661b2e2bccab528 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 3ms
2019-12-09T11:32:56.492Z pid=23127 tid=gpsk6vr2j class=AJob jid=a231762cf661b2e2bccab528 elapsed=4.826 INFO: fail
2019-12-09T11:32:56.492Z pid=23127 tid=gpsk6vr2j WARN: {"context":"Job raised exception","job":{"class":"AJob","args":["a",0],"retry":2,"queue":"test_queue","lock":"while_executing","lock_timeout":60,"on_conflict":"log","unique_args":["a"],"jid":"a231762cf661b2e2bccab528","created_at":1575891171.6642435,"lock_ttl":null,"unique_prefix":"uniquejobs","unique_digest":"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN","enqueued_at":1575891171.6648395},"jobstr":"{\"class\":\"AJob\",\"args\":[\"a\",0],\"retry\":2,\"queue\":\"test_queue\",\"lock\":\"while_executing\",\"lock_timeout\":60,\"on_conflict\":\"log\",\"unique_args\":[\"a\"],\"jid\":\"a231762cf661b2e2bccab528\",\"created_at\":1575891171.6642435,\"lock_ttl\":null,\"unique_prefix\":\"uniquejobs\",\"unique_digest\":\"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN\",\"enqueued_at\":1575891171.6648395}"}
2019-12-09T11:32:56.493Z pid=23127 tid=gpsk6vr2j WARN: RuntimeError: Need retry!
2019-12-09T11:32:57.981Z pid=23127 tid=gpsk6vqhj class=AJob jid=97c4ea12ce7b6962fd820741 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:32:57.982Z pid=23127 tid=gpsk6voi3 class=AJob jid=b7d6657308b82f8c81697121 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 0ms
2019-12-09T11:32:57.985Z pid=23127 tid=gpsk6vplv class=AJob jid=5977518163161cd4b03bf2bc uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:32:57.988Z pid=23127 tid=gpsk6vqhj class=AJob jid=97c4ea12ce7b6962fd820741 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed lock.lua in 0ms
2019-12-09T11:32:57.991Z pid=23127 tid=gpsk6voi3 class=AJob jid=b7d6657308b82f8c81697121 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed lock.lua in 0ms
2019-12-09T11:32:57.992Z pid=23127 tid=gpsk6voi3 class=AJob jid=b7d6657308b82f8c81697121 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:32:57.992Z pid=23127 tid=gpsk6voi3 class=AJob jid=b7d6657308b82f8c81697121 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN INFO: Skipping job with id (b7d6657308b82f8c81697121) because unique_digest: (uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN) already exists
2019-12-09T11:32:57.993Z pid=23127 tid=gpsk6voi3 class=AJob jid=b7d6657308b82f8c81697121 elapsed=3.319 INFO: done
2019-12-09T11:32:57.990Z pid=23127 tid=gpsk6vplv class=AJob jid=5977518163161cd4b03bf2bc uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 1] at 6 sec, counter is 1
2019-12-09T11:32:57.991Z pid=23127 tid=gpsk6vsh7 class=AJob jid=31ba2ab728fd7685ed549cf4 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed queue.lua in 3ms
2019-12-09T11:32:57.995Z pid=23127 tid=gpsk6vplv class=AJob jid=5977518163161cd4b03bf2bc uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:32:57.996Z pid=23127 tid=gpsk6vplv class=AJob jid=5977518163161cd4b03bf2bc uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN INFO: Skipping job with id (5977518163161cd4b03bf2bc) because unique_digest: (uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN) already exists
2019-12-09T11:32:57.997Z pid=23127 tid=gpsk6vplv class=AJob jid=5977518163161cd4b03bf2bc elapsed=4.326 INFO: done
2019-12-09T11:32:59.995Z pid=23127 tid=gpsk6vsh7 class=AJob jid=31ba2ab728fd7685ed549cf4 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN WARN: Timed out after 60s while waiting for primed token (digest: uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN, job_id: 31ba2ab728fd7685ed549cf4)
      !!! raise ["a", 1] at 9 sec, counter is 1
2019-12-09T11:33:00.999Z pid=23127 tid=gpsk6vqhj class=AJob jid=97c4ea12ce7b6962fd820741 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:33:01.001Z pid=23127 tid=gpsk6vsh7 class=AJob jid=31ba2ab728fd7685ed549cf4 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN DEBUG: Executed unlock.lua in 1006ms
2019-12-09T11:33:01.002Z pid=23127 tid=gpsk6vsh7 class=AJob jid=31ba2ab728fd7685ed549cf4 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN INFO: Skipping job with id (31ba2ab728fd7685ed549cf4) because unique_digest: (uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN) already exists
2019-12-09T11:33:01.003Z pid=23127 tid=gpsk6vqhj class=AJob jid=97c4ea12ce7b6962fd820741 elapsed=8.235 INFO: fail
2019-12-09T11:33:01.004Z pid=23127 tid=gpsk6vqhj WARN: {"context":"Job raised exception","job":{"class":"AJob","args":["a",1],"retry":2,"queue":"test_queue","lock":"while_executing","lock_timeout":60,"on_conflict":"log","unique_args":["a"],"jid":"97c4ea12ce7b6962fd820741","created_at":1575891172.667428,"lock_ttl":null,"unique_prefix":"uniquejobs","unique_digest":"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN","enqueued_at":1575891172.6677046},"jobstr":"{\"class\":\"AJob\",\"args\":[\"a\",1],\"retry\":2,\"queue\":\"test_queue\",\"lock\":\"while_executing\",\"lock_timeout\":60,\"on_conflict\":\"log\",\"unique_args\":[\"a\"],\"jid\":\"97c4ea12ce7b6962fd820741\",\"created_at\":1575891172.667428,\"lock_ttl\":null,\"unique_prefix\":\"uniquejobs\",\"unique_digest\":\"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN\",\"enqueued_at\":1575891172.6677046}"}
2019-12-09T11:33:01.004Z pid=23127 tid=gpsk6vqhj WARN: RuntimeError: Need retry!
2019-12-09T11:33:01.004Z pid=23127 tid=gpsk6vsh7 class=AJob jid=31ba2ab728fd7685ed549cf4 elapsed=5.327 INFO: done
2019-12-09T11:33:01.005Z pid=23127 tid=gpsk6vqhj WARN: /mnt/sdd/encrypted/calendly/app/jobs/a_job.rb:23:in `perform'
2019-12-09T11:33:02.147Z pid=23127 tid=gpsk6voi3 class=AJob jid=a231762cf661b2e2bccab528 INFO: start
2019-12-09T11:33:02.147Z pid=23127 tid=gpsk6vp4v DEBUG: enqueued retry: {"class":"AJob","args":["a",0],"retry":2,"queue":"test_queue","lock":"while_executing","lock_timeout":60,"on_conflict":"log","unique_args":["a"],"jid":"a231762cf661b2e2bccab528","created_at":1575891171.6642435,"lock_ttl":null,"unique_prefix":"uniquejobs","unique_digest":"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN","enqueued_at":1575891171.6648395,"error_message":"Need retry!","error_class":"RuntimeError","failed_at":1575891176.4917407,"retry_count":0}
2019-12-09T11:33:02.153Z pid=23127 tid=gpsk6voi3 class=AJob jid=a231762cf661b2e2bccab528 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:33:02.155Z pid=23127 tid=gpsk6voi3 class=AJob jid=a231762cf661b2e2bccab528 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 0] at 11 sec, counter is 2
      !!! finish ["a", 0] at 14 sec, counter is 2
2019-12-09T11:33:05.161Z pid=23127 tid=gpsk6voi3 class=AJob jid=a231762cf661b2e2bccab528 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:33:05.162Z pid=23127 tid=gpsk6voi3 class=AJob jid=a231762cf661b2e2bccab528 elapsed=3.015 INFO: done
2019-12-09T11:33:06.230Z pid=23127 tid=gpsk6vs5r class=AJob jid=97c4ea12ce7b6962fd820741 INFO: start
2019-12-09T11:33:06.231Z pid=23127 tid=gpsk6vp4v DEBUG: enqueued retry: {"class":"AJob","args":["a",1],"retry":2,"queue":"test_queue","lock":"while_executing","lock_timeout":60,"on_conflict":"log","unique_args":["a"],"jid":"97c4ea12ce7b6962fd820741","created_at":1575891172.667428,"lock_ttl":null,"unique_prefix":"uniquejobs","unique_digest":"uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN","enqueued_at":1575891172.6677046,"error_message":"Need retry!","error_class":"RuntimeError","failed_at":1575891181.0016289,"retry_count":0}
2019-12-09T11:33:06.236Z pid=23127 tid=gpsk6vs5r class=AJob jid=97c4ea12ce7b6962fd820741 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed queue.lua in 1ms
2019-12-09T11:33:06.239Z pid=23127 tid=gpsk6vs5r class=AJob jid=97c4ea12ce7b6962fd820741 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed lock.lua in 1ms
      !!! start ["a", 1] at 15 sec, counter is 3
      !!! finish ["a", 1] at 18 sec, counter is 3
2019-12-09T11:33:09.244Z pid=23127 tid=gpsk6vs5r class=AJob jid=97c4ea12ce7b6962fd820741 uniquejobs=server while_executing=uniquejobs:cc8d44d29605ffd70a159069ab3d8fcc:RUN:RUN:RUN DEBUG: Executed unlock.lua in 1ms
2019-12-09T11:33:09.245Z pid=23127 tid=gpsk6vs5r class=AJob jid=97c4ea12ce7b6962fd820741 elapsed=3.014 INFO: done

jobs a0 and a1 have raised an exceptions, and have been retried, but jobs a2, a3 and a4 have not been started at all

@mhenrixon
Copy link
Owner

mhenrixon commented Dec 9, 2019

@roman-melnyk is it a rails app? If so, for some reason Sidekiq doesn't do what it should... It is a Sidekiq bug related to rails reloading feature. The block that is supposed to execute your worker code isn't executed.

The problem is here: https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/processor.rb#L131 the @reloader.call is never called. That whole block is skipped UNLESS... you add config.cache_classes = true and config.eager_load = true.

@roman-melnyk
Copy link
Author

@mhenrixon I've tried with config.cache_classes = true and config.eager_load = true - and got the same result

@nedkoh
Copy link

nedkoh commented Sep 17, 2020

@mhenrixon so what are our options here? Any change this will be fixed?

@alex-hall
Copy link

@mhenrixon bumping this again. Anything we can do to help?

@mhenrixon
Copy link
Owner

@alex-hall @nedkoh @roman-melnyk i will take another stab at this tonight when my daughters are in bed.

@mhenrixon
Copy link
Owner

I can't replicate this problem in V7, let me know if I missed something that would make it fail. I basically just copied the worker AJob and renamed it to something more specific and tested with the same thing and I can't make the jobs not finish.

@mhenrixon
Copy link
Owner

mhenrixon commented Nov 18, 2020

I'm not sure this is possible to solve for v6. @alex-hall, @nedkoh, @roman-melnyk which version (if any) are you on at the moment?

The whole reason for the v7 rewrite/refactoring was to be able to fix these types of problems and make the gem more reliable. Some parts of the v6 code are nearly impossible to get right so if possible, I suggest you give the latest v7 beta a try.

@alex-hall
Copy link

@mhenrixon , these are our current sidekiq versions:

gem 'sidekiq', '5.2.5'
gem 'sidekiq-unique-jobs', '5.0.10'

Last time we tried to upgrade these gems and ran into these problem.

@roman-melnyk , can you try re-creating?

@roman-melnyk
Copy link
Author

@mhenrixon 👍🏼 confirming correct work in 7.0.4 version

@roman-melnyk
Copy link
Author

@mhenrixon tested again on sidekiq 6.2 and latest 7.0.7 version, with concurrency 20
please correct me I I'm wrong: with while_executing policy, all jobs can be placed in the queue, but only one will be executed at a time, right? but all placed jobs should be executed one by one, no one should be lost?
I see this in my test logs:

       put 1 at 0.00 sec c0a0fbd95166ca857e708dea   
       put 2 at 2.01 sec d73b362d2b8105e9d19b6e75   
       put 3 at 4.01 sec 2cc606df8830787593fcdf76   
     start 1 at 5.30 sec  (I started sidekiq server intentionally later so that it starts processing on ~6 second after 1 job was put in queue)                           
       put 4 at 8.02 sec 8a743745aea7ec6e40bbd184   
     raise 1 at 11.30 sec (for test purpose job 1 raises an error after 6 second after it was started)                           
       put 5 at 13.03 sec f798577af08b481e240537b3  
     start 5 at 13.04 sec                           
    finish 5 at 19.04 sec                           
     start 1 at 19.76 sec                           
    finish 1 at 25.76 sec                           

seems 2, 3 and 4 jobs are lost
If I put jobs when the server is running, I see this:

       put 1 at 0.00 sec deaa33469dc42a22d180e922
     start 1 at 1.31 sec
       put 2 at 2.01 sec fb53af3ab6067a34fd0c8439
       put 3 at 4.01 sec d0d78ca930eda4734433e30c
     raise 1 at 7.31 sec
       put 4 at 8.02 sec b27c183eae2035669e9d528e
     start 2 at 8.12 sec
       put 5 at 13.03 sec 27d263d3f5c8e408fdbeb3c1
    finish 2 at 14.12 sec
     start 1 at 16.60 sec
    finish 1 at 22.60 sec

again, job 3, 4 and 5 are lost

@mhenrixon
Copy link
Owner

mhenrixon commented Mar 19, 2021

@roman-melnyk it depends on your configuration. If you have a lock_timeout set it only waits that long from when the job is picked of the queue. I think there should be some log entry for that. Perhaps add on_conflict: :log to debug.

If you want to make sure no job is lost you can use on_conflict: :raise which will allow the job to be retried. You could also reschedule it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants