Solid Queue is not retrying job #110

zainonrails · 2024-01-02T00:05:33Z

I am trying to retry a failed job by using retry_on in the job class but it is not working consistently.

class RequestTestimonialJob < ApplicationJob
  queue_as :default
  retry_on StandardError, attempts: 3, priority: 0
  
  def perform(user_id)
    raise StandardError
  end
end

I have also overridden

config.solid_queue.on_thread_error = ->(exception) { Bugsnag.notify(exception) }

but nothing happens. I have tried with and without it. The behaviour seems unpredictable or it is just me.

Also throws this error sometimes,

No live threads left. Deadlock? (fatal)
04:53:43 solid.1 | 6 threads, 6 sleeps current:0x0000000113428570 main thread:0x0000000124004080
04:53:43 solid.1 | * #<Thread:0x0000000122834140 sleep_forever>
04:53:43 solid.1 |    rb_thread_t:0x0000000124004080 native:0x00000001f4bd1e00 int:0
04:53:43 solid.1 |    
04:53:43 solid.1 | * #<Thread:0x00000001234f93d0@DEBUGGER__::SESSION@server /Users/zain/.rvm/gems/ruby-3.0.0@devtree/gems/debug-1.7.2/lib/debug/session.rb:179 sleep_forever>
04:53:43 solid.1 |    rb_thread_t:0x00000001118b04c0 native:0x000000016e587000 int:0
04:53:43 solid.1 |    
04:53:43 solid.1 | * #<Thread:0x00000001234e8968@worker-1 /Users/zain/.rvm/gems/ruby-3.0.0@devtree/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:332 sleep_forever>
04:53:43 solid.1 |    rb_thread_t:0x00000001118b0db0 native:0x000000016e793000 int:0
04:53:43 solid.1 |    
04:53:43 solid.1 | * #<Thread:0x0000000123543d68 /Users/zain/.rvm/gems/ruby-3.0.0@devtree/gems/activerecord-7.0.4.3/lib/active_record/connection_adapters/abstract/connection_pool/reaper.rb:40 sleep_forever>
04:53:43 solid.1 |    rb_thread_t:0x0000000113428570 native:0x000000016e99f000 int:0
04:53:43 solid.1 |    
04:53:43 solid.1 | * #<Thread:0x0000000123b86dd8@worker-1 /Users/zain/.rvm/gems/ruby-3.0.0@devtree/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:332 sleep_forever>
04:53:43 solid.1 |    rb_thread_t:0x0000000113580970 native:0x000000016ebab000 int:0 mutex:0x0000000124004500 cond:1
04:53:43 solid.1 |    
04:53:43 solid.1 | * #<Thread:0x0000000123b61b28@io-worker-1 /Users/zain/.rvm/gems/ruby-3.0.0@devtree/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:332 sleep_forever>
04:53:43 solid.1 |    rb_thread_t:0x0000000114c25ae0 native:0x000000016edb7000 int:0 mutex:0x0000000113580bf0 cond:1

How can I make sure the job is retried after any exception? any help would be appreciated.

Thanks

The text was updated successfully, but these errors were encountered:

rosa · 2024-01-02T09:21:50Z

Hey @zainonrails, what's your workers and dispatchers configuration? How are you enqueuing the job?

zainonrails · 2024-01-02T12:25:23Z

@rosa I am enqueuing the job simply by MyJob.perform_later(id)

after adding database record in solid_queue_jobs and solid_queue_ready_executions tables it shows logs below.

[SolidQueue] Enqueued job {:queue_name=>"development_default", :active_job_id=>"b720f935-3277-403f-85a3-782794d79937", :priority=>nil, :scheduled_at=>Tue, 02 Jan 2024 00:59:10.827070000 UTC +00:00, :class_name=>"SubmitTestimonialJob", :arguments=>{"job_class"=>"SubmitTestimonialJob", "job_id"=>"b720f935-3277-403f-85a3-782794d79937", "provider_job_id"=>nil, "queue_name"=>"development_default", "priority"=>nil, "arguments"=>[49], "executions"=>0, "exception_executions"=>{}, "locale"=>"en", "timezone"=>"UTC", "enqueued_at"=>"2024-01-02T00:59:10Z"}, :concurrency_key=>nil}
05:59:10 web.1   | [ActiveJob] Enqueued SubmitTestimonialJob (Job ID: b720f935-3277-403f-85a3-782794d79937) to SolidQueue(development_default) with arguments: 49

production:
  dispatchers:
    - polling_interval: 5
      batch_size: 100
  workers:
    - queues: '*'
      threads: 2
      processes: 1
      polling_interval: 5

development:
  dispatchers:
    - polling_interval: 1
      batch_size: 10
  workers:
    - queues: "*"
      threads: 3
      processes: 1
      polling_interval: 1

rosa · 2024-01-02T14:22:41Z

Cool, and when you say "it is not working consistently" and "The behaviour seems unpredictable", what do you mean? Could you be more specific?

zainonrails · 2024-01-02T14:29:51Z

So, I am throwing exception in my job to see if it would pick up the retry settings or not.

After the job throws error, I quickly update the job code to remove the exception part to see if it retries it but doesn't happen unfortunately. and upon restarting the server it picks up all the jobs.

Maybe I am doing something wrong here to test this behaviour. Can you brief me on if the settings would be picked up if a worker receives an error in the job.

Should I override on_thread_error to something that would make sure the job is retried.

rosa · 2024-01-02T14:56:14Z

After the job throws error, I quickly update the job code to remove the exception part to see if it retries it but doesn't happen unfortunately. and upon restarting the server it picks up all the jobs.

Hmm... my guess is that the 3 retries happen before you get a chance to update the job:

retry_on StandardError, attempts: 3, priority: 0

This would be using the default wait time between retries, which is 3 seconds, so after roughly 9 seconds, the 3 attempts would have been done and your job would fail permanently. You should see it in the solid_queue_failed_executions table, and check its arguments, by doing something like:

$ bin/rails c
>> SolidQueue::FailedExecution.last.job

zainonrails · 2024-01-02T14:59:51Z

I did try it with longer wait time like 5 and 10 seconds but let me go observe with longer wait times and confirm and report back.

Thanks @rosa

virolea · 2024-01-06T10:31:31Z

I too wanted to check if jobs were properly retried in my setup. Turns out the first retry occurences happen quite quickly as suggested by Rosa.

You can also check the failed job arguments to see if any retry was performed. ActiveJob increment the executions value.

job_id = YOUR_JOB_ID
@job = SolidQueue::Job.find(job_id)
@job.arguments["executions"] # This returns the number of times this job was executed
@job.arguments["exception_executions"] # This returns the breakdown of executions per exceptions

Here's an example for one of my jobs:

{
  "executions"=>12, 
  "exception_executions"=>{"[Exception]"=>8, "[ZeroDivisionError]"=>4}
}

zainonrails · 2024-01-09T16:21:55Z

@rosa @virolea

I experimented it this timw without changing the code in between, with just 1 minute to wait for retry.

The job was retried as expected and the number of executions were also updated. Just one question though, does attempt: 3 means the job would run for a total of 3 times? or retried 3 times?

What I observed is that it would run a total of that amount we set and not retry counts, that's why executions is attempts - 1

rosa · 2024-01-09T21:50:00Z

Thanks @virolea, @zainonrails!

Just one question though, does attempt: 3 means the job would run for a total of 3 times? or retried 3 times?

A total of 3 times according to Active Job's attempts parameter.

zainonrails · 2024-01-10T13:48:48Z

Thanks for the help everyone, closing this issue.

zainonrails closed this as completed Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solid Queue is not retrying job #110

Solid Queue is not retrying job #110

zainonrails commented Jan 2, 2024

rosa commented Jan 2, 2024

zainonrails commented Jan 2, 2024 •

edited

Loading

rosa commented Jan 2, 2024 •

edited

Loading

zainonrails commented Jan 2, 2024

rosa commented Jan 2, 2024

zainonrails commented Jan 2, 2024

virolea commented Jan 6, 2024 •

edited

Loading

zainonrails commented Jan 9, 2024

rosa commented Jan 9, 2024 •

edited

Loading

zainonrails commented Jan 10, 2024

Solid Queue is not retrying job #110

Solid Queue is not retrying job #110

Comments

zainonrails commented Jan 2, 2024

rosa commented Jan 2, 2024

zainonrails commented Jan 2, 2024 • edited Loading

rosa commented Jan 2, 2024 • edited Loading

zainonrails commented Jan 2, 2024

rosa commented Jan 2, 2024

zainonrails commented Jan 2, 2024

virolea commented Jan 6, 2024 • edited Loading

zainonrails commented Jan 9, 2024

rosa commented Jan 9, 2024 • edited Loading

zainonrails commented Jan 10, 2024

zainonrails commented Jan 2, 2024 •

edited

Loading

rosa commented Jan 2, 2024 •

edited

Loading

virolea commented Jan 6, 2024 •

edited

Loading

rosa commented Jan 9, 2024 •

edited

Loading