Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs executing and immediately returning #418

Closed
cpiazza opened this issue Sep 10, 2019 · 23 comments
Closed

Jobs executing and immediately returning #418

cpiazza opened this issue Sep 10, 2019 · 23 comments
Assignees

Comments

@cpiazza
Copy link

cpiazza commented Sep 10, 2019

Describe the bug
We've had two instances in which we were using sidekiq_options unique: :until_and_while_executing on a worker and after some time, calling .perform_async on the worker would result in the job getting enqueued, a job ID assigned, picked up by a worker, and then exit immediately.

For example, we would see in the sidekiq.log

2019-09-09T14:52:16.545Z 14236 TID-ovo800gss MyWorker JID-3287686b0cad75fee25a57c8 INFO: start
2019-09-09T14:52:16.548Z 14236 TID-ovo800gss MyWorker JID-3287686b0cad75fee25a57c8 INFO: done: 0.003 sec

In all cases, there were no other instances of MyWorker enqueued or running.

Upon changing to sidekiq_options unique: :until_executing the problem went away in both cases.

Expected behavior
The worker to execute as normal.

Current behavior
See above. Note in the class below we are logging start, and success or failure - in these cases we see nothing logged. These are also typically workers that run for several minutes.

Worker class

class MyWorker
  include Sidekiq::Worker

  sidekiq_options queue: :data_sync, retry: true, backtrace: true
  sidekiq_options unique: :until_and_while_executing

  def perform(some_id)
    some_model = SomeModel.find(some_id)

    log('Started', some_model)
    result = some_model.sync.call

    if result.success?
      log('Finished', some_model)
    else
      log("Failed: #{result.errors.join(', ')}", some_model)
    end
  end

  private

  def log(message, some_model = nil)
    EventLogger.log(
      :sync,
      note: message,
      loggable: some_model
    )
  end
end

Additional context
Anything we can look for here? Ideally we'd want to use :until_and_while_executing if possible

@mhenrixon
Copy link
Owner

Pretty sure the problem is the following:

  1. When the server process/worker (bundle exec sidekiq) picks the job of the queue, it runs through the unique middleware, which unlocks the queue and locks the worker from executing any more jobs with the same arguments.
  2. The worker waits for some time to achieve the lock, decides it cannot get a mutex for that worker with the specific arguments and drop it.

It should not be silent though. If anything, we need to make sure there is something helpful logged from the middleware at that point.

I will have a quick look before cutting a new release.

@glennfu
Copy link

glennfu commented Sep 20, 2019

I'm having this problem as well using :until_executed, which is what I want to use. Switching to :until_executing fixed it for me, so thank you @cpiazza. For extra context, I have 2 apps, running different Rails versions, but both the same identical versions of sidekiq and sidekiq-unique-jobs, and both sharing the same Redis. Rails 5.0 app runs backend and sidekiq, Rails 5.2 app is my web frontend. If I enqueue a job from the Rails 5.0 app, everything runs fine. If I enqueue the exact same job with same args from my Rails 5.2 frontend, I get this error.

Also, I only recently upgraded this gem from ~5.0 to ~6.0. Before the upgrade and the same setup, I didn't have this error.

I hope any of that is helpful!

@glennfu
Copy link

glennfu commented Sep 20, 2019

Also as an extra note, I'm not 100% confident in the order of my upgrading and when this started. It might have been before I upgraded the front end to 5.2.

@mhenrixon mhenrixon self-assigned this Oct 8, 2019
@mhenrixon
Copy link
Owner

mhenrixon commented Oct 10, 2019

@cpiazza @glennfu version 6.0.15 fixed a minor but persistent bug with locks. You might want to try it out.

I also need some people to help me test version 7. Currently v7.0.0.beta2 is pre-released and I'm committed to fixing any withstanding bugs immediately for v7. The next version attempts to fix almost all of the known issues. Unfortunately, V6 was not good enough and I'm not sure I can every make it good enough.

When someone can point me to something concrete, then I can fix it. Otherwise it took a really huge refactoring of both tests and implementation to get to a reliable locking mechanism.

@glennfu
Copy link

glennfu commented Oct 24, 2019

I'm running v7.0.0.beta2 right now, and it seems to have actually solved this issue for me, and also another issue where I'd run perform_later and get no provider_job_id and the job wouldn't get enqueued. So far I'm really happy about all that, jobs are running and not being skipped. I'm back to using :until_executed which I want, but in the last 15 minutes I've gotten this error reported twice for different jobs:

sidekiq-69c4bbdffb-ltwzp — 2019-10-24T20:37:45.091Z 1 TID-goccpnoyk ERROR: heartbeat: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
sidekiq-69c4bbdffb-xcghm — 2019-10-24T20:37:45.239Z 1 TID-gqeqzlw9g TriggersJob JID-c2a6fa8860c6711ba7e0cf6b uniquejobs-server DIG-uniquejobs:4b5ef13705e61c8b81f05d43ebaab1c8 WARN: NOTBUSY No scripts in execution right now.
sidekiq-69c4bbdffb-xcghm — 2019-10-24T20:37:45.251Z 1 TID-gqeqzlw9g TriggersJob JID-c2a6fa8860c6711ba7e0cf6b INFO: fail: 39.898 sec
sidekiq-69c4bbdffb-xcghm — 2019-10-24T20:37:45.251Z 1 TID-gqeqzlw9g WARN: {"context":"Job raised exception","job":{"class":"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper","wrapped":"TriggersJob","queue":"default","args":[{"job_class":"TriggersJob","job_id":"d70bb988-8f06-4dcb-b6e0-97cd4feedd5d","provider_job_id":null,"queue_name":"default","priority":null,"arguments":[12],"locale":"en"}],"retry":true,"unique":"until_executed","unique_args":{"job_class":"TriggersJob","provider_job_id":null,"queue_name":"default","priority":null,"arguments":[12],"locale":"en"},"jid":"c2a6fa8860c6711ba7e0cf6b","created_at":1571949128.0184765,"lock_timeout":0,"lock_ttl":null,"unique_prefix":"uniquejobs","unique_digest":"uniquejobs:4b5ef13705e61c8b81f05d43ebaab1c8","enqueued_at":1571949128.021183,"error_message":"BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.","error_class":"Redis::CommandError","failed_at":1571949465.2405186,"retry_count":0},"jobstr":"{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"wrapped\":\"TriggersJob\",\"queue\":\"default\",\"args\":[{\"job_class\":\"TriggersJob\",\"job_id\":\"d70bb988-8f06-4dcb-b6e0-97cd4feedd5d\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[12],\"locale\":\"en\"}],\"retry\":true,\"unique\":\"until_executed\",\"unique_args\":{\"job_class\":\"TriggersJob\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[12],\"locale\":\"en\"},\"jid\":\"c2a6fa8860c6711ba7e0cf6b\",\"created_at\":1571949128.0184765,\"lock_timeout\":0,\"lock_ttl\":null,\"unique_prefix\":\"uniquejobs\",\"unique_digest\":\"uniquejobs:4b5ef13705e61c8b81f05d43ebaab1c8\",\"enqueued_at\":1571949128.021183}"}
sidekiq-69c4bbdffb-xcghm — 2019-10-24T20:37:45.251Z 1 TID-gqeqzlw9g WARN: Redis::CommandError: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
sidekiq-69c4bbdffb-xcghm — 2019-10-24T20:37:45.251Z 1 TID-gqeqzlw9g WARN: /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis/client.rb:121:in `call'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis/client.rb:104:in `block in connect'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis/client.rb:293:in `with_reconnect'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis/client.rb:100:in `connect'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis/client.rb:364:in `ensure_connected'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis/client.rb:221:in `block in process'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis/client.rb:306:in `logging'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis/client.rb:220:in `process'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis/client.rb:120:in `call'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis.rb:2069:in `block in hexists'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis.rb:58:in `block in synchronize'
sidekiq-69c4bbdffb-xcghm — /usr/local/lib/ruby/2.5.0/monitor.rb:226:in `mon_synchronize'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis.rb:58:in `synchronize'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/redis-3.3.5/lib/redis.rb:2068:in `hexists'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/locksmith.rb:325:in `taken?'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/locksmith.rb:125:in `block in locked?'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq.rb:95:in `block in redis'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:65:in `block (2 levels) in with'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `handle_interrupt'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `block in with'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `handle_interrupt'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `with'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq.rb:92:in `redis'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/connection.rb:19:in `redis'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/locksmith.rb:125:in `locked?'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/locksmith.rb:103:in `unlock'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/lock/base_lock.rb:60:in `unlock'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/lock/base_lock.rb:124:in `unlock_with_callback'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/lock/until_executed.rb:18:in `block in execute'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/locksmith.rb:175:in `lock_async'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/locksmith.rb:88:in `block in lock'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq.rb:95:in `block in redis'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:65:in `block (2 levels) in with'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `handle_interrupt'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `block in with'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `handle_interrupt'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `with'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq.rb:92:in `redis'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/connection.rb:19:in `redis'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/locksmith.rb:87:in `lock'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/lock/base_lock.rb:45:in `lock'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/lock/until_executed.rb:16:in `execute'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/middleware/server.rb:23:in `call'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/middleware.rb:86:in `block in call'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/logging.rb:103:in `block in with_logging_context'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/logging.rb:118:in `block in with_configured_loggers_context'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/logging.rb:44:in `with_context'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/logging.rb:118:in `call'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/logging.rb:118:in `with_configured_loggers_context'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/logging.rb:102:in `with_logging_context'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-unique-jobs-7.0.0.beta2/lib/sidekiq_unique_jobs/middleware.rb:85:in `call'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/middleware/chain.rb:130:in `block in invoke'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/middleware/chain.rb:133:in `invoke'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:169:in `block in process'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:141:in `block (6 levels) in dispatch'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/job_retry.rb:97:in `local'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:140:in `block (5 levels) in dispatch'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/rails.rb:42:in `block in call'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/activesupport-5.0.7.2/lib/active_support/execution_wrapper.rb:85:in `wrap'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/activesupport-5.0.7.2/lib/active_support/reloader.rb:68:in `block in wrap'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/activesupport-5.0.7.2/lib/active_support/execution_wrapper.rb:85:in `wrap'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/activesupport-5.0.7.2/lib/active_support/reloader.rb:67:in `wrap'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/rails.rb:41:in `call'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:136:in `block (4 levels) in dispatch'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:215:in `stats'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:131:in `block (3 levels) in dispatch'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/job_logger.rb:7:in `call'
sidekiq-69c4bbdffb-ltwzp — 2019-10-24T20:37:45.350Z 1 TID-ornzrbdjg INFO: Redis is online, 5.311044551 sec downtime
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:130:in `block (2 levels) in dispatch'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/job_retry.rb:72:in `global'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:129:in `block in dispatch'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/logging.rb:44:in `with_context'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/logging.rb:38:in `with_job_hash_context'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:128:in `dispatch'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:168:in `process'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:85:in `process_one'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/processor.rb:73:in `run'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/util.rb:16:in `watchdog'
sidekiq-69c4bbdffb-xcghm — /api/vendor/bundle/ruby/2.5.0/gems/sidekiq-5.0.4/lib/sidekiq/util.rb:25:in `block in safe_thread'

EDIT: If it's helpful to know, my queue pretty much never empties out. I also have 3 sidekiq instances running. Could Sidekiq 1 be running it successfully, and 2 and 3 also try to run it at the same time and it fail? I'm getting 2 errors every time it happens, so that's my naive guess at what could be happening.

Also, here is my configuration info:

> SidekiqUniqueJobs.config
=> #<concurrent/mutable_struct/thread_safe_config SidekiqUniqueJobs::Config {:default_lock_timeout=>0, :default_lock_ttl=>nil, :enabled=>true, :unique_prefix=>"uniquejobs", :logger=>#<Logger:0x000055b3ca5923a0 @level=1, @progname=nil, @default_formatter=#<Logger::Formatter:0x000055b3ca592350 @datetime_format=nil>, @formatter=#<Sidekiq::Logging::Pretty:0x000055b3ca592238 @datetime_format=nil>, @logdev=#<Logger::LogDevice:0x000055b3ca5922b0 @shift_period_suffix=nil, @shift_size=nil, @shift_age=nil, @filename=nil, @dev=#<IO:<STDOUT>>, @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x000055b3ca592260>>>, :locks=>{:around_perform=>SidekiqUniqueJobs::Lock::WhileExecuting, :while_busy=>SidekiqUniqueJobs::Lock::WhileExecuting, :while_executing=>SidekiqUniqueJobs::Lock::WhileExecuting, :while_working=>SidekiqUniqueJobs::Lock::WhileExecuting, :while_executing_reject=>SidekiqUniqueJobs::Lock::WhileExecutingReject, :until_executing=>SidekiqUniqueJobs::Lock::UntilExecuting, :while_enqueued=>SidekiqUniqueJobs::Lock::UntilExecuting, :until_expired=>SidekiqUniqueJobs::Lock::UntilExpired, :until_completed=>SidekiqUniqueJobs::Lock::UntilExecuted, :until_executed=>SidekiqUniqueJobs::Lock::UntilExecuted, :until_performed=>SidekiqUniqueJobs::Lock::UntilExecuted, :until_processed=>SidekiqUniqueJobs::Lock::UntilExecuted, :until_and_while_executing=>SidekiqUniqueJobs::Lock::UntilAndWhileExecuting, :until_successfully_completed=>SidekiqUniqueJobs::Lock::UntilExecuted}, :strategies=>{:log=>SidekiqUniqueJobs::OnConflict::Log, :raise=>SidekiqUniqueJobs::OnConflict::Raise, :reject=>SidekiqUniqueJobs::OnConflict::Reject, :replace=>SidekiqUniqueJobs::OnConflict::Replace, :reschedule=>SidekiqUniqueJobs::OnConflict::Reschedule}, :debug_lua=>false, :max_history=>1000, :reaper=>:ruby, :reaper_count=>1000, :reaper_interval=>600, :reaper_timeout=>10, :lock_info=>false, :raise_on_config_error=>false, :current_redis_version=>"5.0.4"}>

EDIT 2: This error also gets thrown a lot when visiting /sidekiq/queues/default in the sidekiq web ui

@glennfu
Copy link

glennfu commented Oct 25, 2019

More importantly, it seems that this problem gets worse over time until it kills all sidekiq processes permanently. Even my app instances are getting and throwing this error when attempting to enqueue jobs.

@mhenrixon
Copy link
Owner

@glennfu I know what the problem is. I honestly didn't consider multiple sidekiq instances so I will work on fixing that going live.

@mhenrixon
Copy link
Owner

@glennfu basically what happens is that the orphan reaper is running on all processes and it runs lua from all three possibly then at the same time. I can imagine how that would cause problems.

@glennfu
Copy link

glennfu commented Oct 25, 2019

Good to know. Also maybe some consideration into shutdown procedure. I scaled down to a single process, and I thought I was ok, but as I re-deployed, there's technically 2 running at the same time for a small amount of time. It would be nice if this also didn't cause the kaboom.

@glennfu
Copy link

glennfu commented Oct 25, 2019

Also just FYI I've rolled back to 5.0.11 and everything seems good there.

@mhenrixon
Copy link
Owner

@glennfu I will fix the issue with multiple reapers now.

@glennfu
Copy link

glennfu commented Jan 8, 2020

I just tested 7.0.0beta8 on my staging server, and quickly this problem resurfaced. I even went into the UI to clear all the locks and test here. Here's some log output:

> HealthCheckJob.perform_later
2020-01-08T20:26:46.503Z pid=7 tid=grjs31lez uniquejobs=client =uniquejobs:d5d2f1e1b811692bee3b6de7841d1bda WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:d5d2f1e1b811692bee3b6de7841d1bda, job_id: 4ae8589125a7d995709c9d20)
Enqueued HealthCheckJob (Job ID: 34a87e34-f635-44ed-9ac2-471dcd480715) to Sidekiq(critical)
=> #<HealthCheckJob:0x000056083bf9d4b0 @arguments=[], @job_id="7aa2b187-5711-429a-b0b1-2c1c4cccf8a4", @queue_name="critical", @priority=nil, @provider_job_id="90ff63095a321d7d855443ef">
> Sidekiq::Queue.new.map{|job| job.args[0]}
=> []

As you can see, uniquejobs spits out a warning, and then nothing enters the queue. If I run HealthCheckJob.perform_now it works fine. Sidekiq and app logs confirm no new job was entered or processed.

@mhenrixon any new ideas here?

@mhenrixon
Copy link
Owner

No new ideas yet. I'm struggling a bit with how to debug this. I found some issues while running things using pry but it is cumbersome to say the least. Locally I found that I have to use rails config.cache_classes = true and eager loading everything or sidekiq won't execute any middleware at all.

Then I was out with back pain most of December, my wife had surgery and I've not had a chance to do anything except daddying until this week.

I'll get back to it soon but I'm open for suggestions. Could caching of classes and eager loading be an issue at all on your staging environment?

@glennfu
Copy link

glennfu commented Jan 9, 2020

My Staging is a mirror of Production, and I can reproduce the issue on both. cache_classes=true on both. I can reproduce it more easily on Production, where there's lots more jobs and activity.

Here's an interesting bit happening right now on Production (sidekiq_unique_jobs v5, although I can get similar behavior on staging with v7beta8): I have a job, let's call it ThingSyncJob. It takes 1 argument, the account_id. So for 30+ accounts I do ThingSyncJob.perform_later(1) or ..._later(29), etc. Recently, one of them got stuck. So ThingSyncJob.perform_later(10) will NOT enqueue a job. Absolutely doesn't get in and doesn't run. There's not enough debug output for me to get any clues as to why. However, ThingSyncJob.perform_later(9) and ThingSyncJob.perform_later(11) work just fine. Every other id works. ThingSyncJob.perform_now(10) DOES work, but ThingSyncJob.perform_later(10) never will. Even days later, I'll have to go and run ThingSyncJob.perform_now(10) periodically by hand to get it to happen while I try and investigate this bug. ThingSyncJob.perform_later(10) still does not work right now.

Do you have any ideas of things I can do to dive more deeply into this? Debug flags to flip maybe? Also, best of luck to you with the home life! I hope it gets easier for you soon.

I'm going to deploy v7beta8 to Production shortly (even though I see similar problems on Staging) and see if the new stuff gives me any more insight.

@mhenrixon
Copy link
Owner

mhenrixon commented Jan 9, 2020

@glennfu hot tip is to debug_lua: true, max_history: 10_000 and lock_info: true on v7 config. Those should give a little input. If possible for you to follow the redis log you can track all commands that are being sent. The logs contains the script name that is called eg. lock.lua, delete.lua etc.

Lock info contains information about each lock and the history is like an audit trail for each lock digest

@glennfu
Copy link

glennfu commented Jan 9, 2020

I just tried with these settings:

  SidekiqUniqueJobs.configure do |config|
    config.enabled         = true
    config.debug_lua       = true
    config.lock_info       = true
    config.default_lock_ttl        = nil
    config.default_lock_timeout    = 10.minutes
    config.logger          = Sidekiq.logger
    config.max_history     = 10_000
    config.reaper          = :lua
    config.reaper_count    = 100
    config.reaper_interval = 10
    config.reaper_timeout  = 5
  end

Unfortunately, I instantly had a massive surge of duplicated jobs. I usually float around 20-40 jobs in queue at any given time, and after a few minutes I had 470 jobs. My clock kept crashing (no error in logs) and I didn't see any out put from sidekiq-unique-jobs in any of my log files, even though I'm watching everything, including the Sidekiq logs, where I expected to see this. I had to revert rather quickly after that. Sorry! I don't think I can try that again, it was too messed up.

@glennfu
Copy link

glennfu commented Jan 9, 2020

On the upside, the process of upgrading/downgrading cleared out my locks, which means ThingSyncJob.perform_later(10) works again for the time being. In v6+, the web ui has a way to list and delete locks. Is there a way in v5 I can say "delete all locks"? That would be super handy when this bug pops up on me again.

EDIT: I found this in there SidekiqUniqueJobs::Util.del('*', 1000, false) that seems to do it as far as I can tell.

@glennfu
Copy link

glennfu commented Jan 9, 2020

OK I felt bold and tried a bit slower. I deployed it again without my clock, and manually enqueued some jobs for testing. With config.default_lock_timeout = 10.minutes it would literally wait until the considered job was finished and cleared the queue, and then enqueue the job again. With 3.seconds it would wait 3 seconds, and then enqueue it again, despite the fact that the job was already in the queue. So with 3.seconds and enqueuing a batch of jobs multiple times, I ended up with duplicates in my queue.

I'm using this configuration:

  Sidekiq.default_worker_options = {
    unique: :until_executed, # :until_executing,
    unique_args: ->(args) { args.first.except('job_id') }
  }

I haven't changed that config setup in pretty much ever, and it works correctly on v5.

EDIT: I also tried with this, still get duplicated jobs:

  Sidekiq.default_worker_options = {
    unique: :until_executed,
    unique_args: ->(args) { args }
  }

@mhenrixon
Copy link
Owner

@cpiazza I accidentally introduced a bug in 6.0.15 that prevented unlock from working when using unique: instead of lock:. This has since be fixed in 6.0.19 and 6.0.20 fixed another similar issue that people had with accidentally using lock_ttl: instead of lock_expiration:.

I've added a warning when installing said 6.0.15-18 by re-releasing the gem with a post_install message.

Out of curiousity, @glennfu, what is the reason you limit yourself to ActiveJob like that? Legacy system? I don't mean to be a dick, I just feel like it is like owning a sports car (Sidekiq) and driving it like a Ford Fiat (ActiveJob).

There is no support for ActiveJob anymore due to all the weird issues people posted so I decided to put that focus towards pure sidekiq usage instead. Mostly because I am not using (and most likely never will be using) ActiveJob it becomes too hard to make it work properly.

If soemone wants to help me get the gem working ok for you ActiveJob users I'm more than happy to reconsider but I need someone to commit some time to that then.

As for the unique args: and unique: they have in v7 been renamed to lock_args: and lock:. which could explain why that doesn't work?

If using ActiveJob, it handles sidekiq arguments a little different. I think the first argument is the active job job_id or something like that so you might want to skip that one as well.

@radilr1
Copy link

radilr1 commented Oct 7, 2020

I am running on 7.0.0beta22 and got this error:

until_and_while_executing=uniquejobs:6579078eee7b12eb46e1a08399196e83 WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:6579078eee7b12eb46e1a08399196e83, job_id: 20b5aa8952067bb50e007265)

then nothing happened, no job entered the queue

UPDATE:
I could reproduce it locally, the deadlocks are left on server restarts and are never released. Adding the lock_ttl in the config ensures that the dead locks will expire after a period of time.
Also in the readme the section https://github.com/mhenrixon/sidekiq-unique-jobs#cleanup-dead-locks doesn't exactly work: delete_by_digest is an instance method so it needs to be:
SidekiqUniqueJobs::Digests.new.delete_by_digest(digest) if digest
and I could never get digest to be present. I don't know exactly why.

@mhenrixon
Copy link
Owner

@radilr1 if you configure according to the readme for v7.0.1 it will add a death handler automatically for you now.

I strongly recommend having the reaper running as well to reduce the impact of orphaned locks.

@mhenrixon
Copy link
Owner

@cpiazza @glennfu I'd love to help you guys with an upgrade to v7. See #571 for information, I could make myself available for some pair programming as well if you are interested.

v7.0.1 is the place to be!

@glennfu
Copy link

glennfu commented May 23, 2022

@mhenrixon I was finally able to upgrade this and simultaneously replace ActiveJob with Sidekiq::Worker (in most places) about a month ago. So far it's been smooth sailing. Thanks for your hard work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants