New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry_on/discard_on for better exception handling #25991
Changes from 13 commits
8b5c04e
5ce59f4
4139b14
504a7d0
b00214d
8457e5e
f931290
779148d
a4fc7dc
08a92d4
0be5d5d
9d8d4ee
3118911
111227c
2762ebd
7efd77f
b53da9e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
require 'active_support/core_ext/numeric/time' | ||
|
||
module ActiveJob | ||
# Provides behavior for retrying and discarding jobs on exceptions. | ||
module Exceptions | ||
extend ActiveSupport::Concern | ||
|
||
module ClassMethods | ||
# Catch the exception and reschedule job for re-execution after so many seconds, for a specific number of attempts. | ||
# If the exception keeps getting raised beyond the specified number of attempts, the exception is allowed to | ||
# bubble up to the underlying queuing system, which may have its own retry mechanism or place it in a | ||
# holding queue for inspection. | ||
# | ||
# You can also pass a block that'll be invoked if the retry attempts fail for custom logic rather than letting | ||
# the exception bubble up. | ||
# | ||
# ==== Options | ||
# * <tt>:wait</tt> - Re-enqueues the job with a delay specified either in seconds (default: 3 seconds), | ||
# as a computing proc that the number of executions so far as an argument, or as a symbol reference of | ||
# <tt>:exponentially_longer<>, which applies the wait algorithm of <tt>(executions ** 4) + 2</tt> | ||
# (first wait 3s, then 18s, then 83s, etc) | ||
# * <tt>:attempts</tt> - Re-enqueues the job the specified number of times (default: 5 attempts) | ||
# * <tt>:queue</tt> - Re-enqueues the job on a different queue | ||
# * <tt>:priority</tt> - Re-enqueues the job with a different priority | ||
# | ||
# ==== Examples | ||
# | ||
# class RemoteServiceJob < ActiveJob::Base | ||
# retry_on CustomAppException # defaults to 3s wait, 5 attempts | ||
# retry_on AnotherCustomAppException, wait: ->(executions) { executions * 2 } | ||
# retry_on(YetAnotherCustomAppException) do |exception| | ||
# ExceptionNotifier.caught(exception) | ||
# end | ||
# retry_on ActiveRecord::StatementInvalid, wait: 5.seconds, attempts: 3 | ||
# retry_on Net::OpenTimeout, wait: :exponentially_longer, attempts: 10 | ||
# | ||
# def perform(*args) | ||
# # Might raise CustomAppException, AnotherCustomAppException, or YetAnotherCustomAppException for something domain specific | ||
# # Might raise ActiveRecord::StatementInvalid when a local db deadlock is detected | ||
# # Might raise Net::OpenTimeout when the remote service is down | ||
# end | ||
# end | ||
def retry_on(exception, wait: 3.seconds, attempts: 5, queue: nil, priority: nil) | ||
rescue_from exception do |error| | ||
if executions < attempts | ||
logger.error "Retrying #{self.class} in #{wait} seconds, due to a #{exception}. The original exception was #{error.cause.inspect}." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider showing This brushes with my "error reporting in general" project anyway, so possibly ignore for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I follow? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because of inheritance, |
||
retry_job wait: determine_delay(wait), queue: queue, priority: priority | ||
else | ||
if block_given? | ||
yield exception | ||
else | ||
logger.error "Stopped retrying #{self.class} due to a #{exception}, which reoccurred on #{executions} attempts. The original exception was #{error.cause.inspect}." | ||
raise error | ||
end | ||
end | ||
end | ||
end | ||
|
||
# Discard the job with no attempts to retry, if the exception is raised. This is useful when the subject of the job, | ||
# like an Active Record, is no longer available, and the job is thus no longer relevant. | ||
# | ||
# ==== Example | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nitpick: This says "Example" while the other docs say "Examples" (plural). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there's only 1 example, it doesn't make sense to me to pluralize. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The other cases only have one example as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd rather fix those, then. Use plural when there are multiple examples and singular when just one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's what I meant to say, just seems I forgot to actually put the words down 👍 |
||
# | ||
# class SearchIndexingJob < ActiveJob::Base | ||
# discard_on ActiveJob::DeserializationError | ||
# | ||
# def perform(record) | ||
# # Will raise ActiveJob::DeserializationError if the record can't be deserialized | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think our practice is to add a period to comments. Same goes for the other examples. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't been using that form consistently. Usually only use periods if there are multiple sentences. |
||
# end | ||
# end | ||
def discard_on(exception) | ||
rescue_from exception do |error| | ||
logger.error "Discarded #{self.class} due to a #{exception}. The original exception was #{error.cause.inspect}." | ||
end | ||
end | ||
end | ||
|
||
# Reschedules the job to be re-executed. This is useful in combination | ||
# with the +rescue_from+ option. When you rescue an exception from your job | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you meant "the +rescue_from+ method" here, not option, no ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not meant as a programmatic option, but rather as "an option for dealing with exceptions". Other options include discard_on, retry_on. |
||
# you can ask Active Job to retry performing your job. | ||
# | ||
# ==== Options | ||
# * <tt>:wait</tt> - Enqueues the job with the specified delay in seconds | ||
# * <tt>:wait_until</tt> - Enqueues the job at the time specified | ||
# * <tt>:queue</tt> - Enqueues the job on the specified queue | ||
# * <tt>:priority</tt> - Enqueues the job with the specified priority | ||
# | ||
# ==== Examples | ||
# | ||
# class SiteScraperJob < ActiveJob::Base | ||
# rescue_from(ErrorLoadingSite) do | ||
# retry_job queue: :low_priority | ||
# end | ||
# | ||
# def perform(*args) | ||
# # raise ErrorLoadingSite if cannot scrape | ||
# end | ||
# end | ||
def retry_job(options = {}) | ||
enqueue options | ||
end | ||
|
||
private | ||
def determine_delay(seconds_or_algorithm) | ||
case seconds_or_algorithm | ||
when :exponentially_longer | ||
(executions ** 4) + 2 | ||
when Integer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does this fare with fixnums, bignums and rationals to name a few on Ruby 2.3? Does it catch durations as well, if I passed in 1.hour? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Integer is the parent class of those. So should be good. But no, doesn't catch duration. Could add that as a separate clause and .to_i it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍, since our documentation uses 3.seconds, I'd expect full duration support as a doc reader. |
||
seconds = seconds_or_algorithm | ||
seconds | ||
when Proc | ||
algorithm = seconds_or_algorithm | ||
algorithm.call(executions) | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
require 'helper' | ||
require 'jobs/retry_job' | ||
|
||
class ExceptionsTest < ActiveSupport::TestCase | ||
setup do | ||
JobBuffer.clear | ||
end | ||
|
||
test "successfully retry job throwing exception against defaults" do | ||
RetryJob.perform_later 'DefaultsError', 5 | ||
|
||
assert_equal [ | ||
"Raised DefaultsError for the 1st time", | ||
"Raised DefaultsError for the 2nd time", | ||
"Raised DefaultsError for the 3rd time", | ||
"Raised DefaultsError for the 4th time", | ||
"Successfully completed job" ], JobBuffer.values | ||
end | ||
|
||
test "successfully retry job throwing exception against higher limit" do | ||
RetryJob.perform_later 'ShortWaitTenAttemptsError', 9 | ||
assert_equal 9, JobBuffer.values.count | ||
end | ||
|
||
test "failed retry job when exception kept occurring against defaults" do | ||
begin | ||
RetryJob.perform_later 'DefaultsError', 6 | ||
assert_equal "Raised DefaultsError for the 5th time", JobBuffer.last_value | ||
rescue DefaultsError | ||
pass | ||
end | ||
end | ||
|
||
test "failed retry job when exception kept occurring against higher limit" do | ||
begin | ||
RetryJob.perform_later 'ShortWaitTenAttemptsError', 11 | ||
assert_equal "Raised ShortWaitTenAttemptsError for the 10th time", JobBuffer.last_value | ||
rescue ShortWaitTenAttemptsError | ||
pass | ||
end | ||
end | ||
|
||
test "discard job" do | ||
RetryJob.perform_later 'DiscardableError', 2 | ||
assert_equal "Raised DiscardableError for the 1st time", JobBuffer.last_value | ||
end | ||
|
||
test "custom handling of job that exceeds retry attempts" do | ||
RetryJob.perform_later 'CustomCatchError', 6 | ||
assert_equal "Dealt with a job that failed to retry in a custom way", JobBuffer.last_value | ||
end | ||
end | ||
|
||
class ExponentiallyBackoffExceptionsTest < ActiveJob::TestCase | ||
setup do | ||
JobBuffer.clear | ||
end | ||
|
||
test "exponentially retrying job" do | ||
travel_to Time.now | ||
|
||
perform_enqueued_jobs do | ||
assert_performed_with at: (Time.now + 3.seconds).to_i do | ||
assert_performed_with at: (Time.now + 18.seconds).to_i do | ||
assert_performed_with at: (Time.now + 83.seconds).to_i do | ||
assert_performed_with at: (Time.now + 258.seconds).to_i do | ||
RetryJob.perform_later 'ExponentialWaitTenAttemptsError', 5 | ||
end | ||
end | ||
end | ||
end | ||
end | ||
end | ||
|
||
test "custom wait retrying job" do | ||
travel_to Time.now | ||
|
||
perform_enqueued_jobs do | ||
assert_performed_with at: (Time.now + 2.seconds).to_i do | ||
assert_performed_with at: (Time.now + 4.seconds).to_i do | ||
assert_performed_with at: (Time.now + 6.seconds).to_i do | ||
assert_performed_with at: (Time.now + 8.seconds).to_i do | ||
RetryJob.perform_later 'CustomWaitTenAttemptsError', 5 | ||
end | ||
end | ||
end | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
require_relative '../support/job_buffer' | ||
require 'active_support/core_ext/integer/inflections' | ||
|
||
class DefaultsError < StandardError; end | ||
class ShortWaitTenAttemptsError < StandardError; end | ||
class ExponentialWaitTenAttemptsError < StandardError; end | ||
class CustomWaitTenAttemptsError < StandardError; end | ||
class CustomCatchError < StandardError; end | ||
class DiscardableError < StandardError; end | ||
|
||
class RetryJob < ActiveJob::Base | ||
retry_on DefaultsError | ||
retry_on ShortWaitTenAttemptsError, wait: 1.second, attempts: 10 | ||
retry_on ExponentialWaitTenAttemptsError, wait: :exponentially_longer, attempts: 10 | ||
retry_on CustomWaitTenAttemptsError, wait: ->(executions) { executions * 2 }, attempts: 10 | ||
retry_on(CustomCatchError) { |exception| JobBuffer.add("Dealt with a job that failed to retry in a custom way") } | ||
discard_on DiscardableError | ||
|
||
def perform(raising, attempts) | ||
if executions < attempts | ||
JobBuffer.add("Raised #{raising} for the #{executions.ordinalize} time") | ||
raise raising.constantize | ||
else | ||
JobBuffer.add("Successfully completed job") | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd rearrange these arguments such that
wait
,queue
andpriority
are together withattempts
being last. That gives me the nice pleasure of seeing the same grouping as they have in the method body and brings them closer to theirretry_job
origin.I'd do the same in the docs just above. 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that general point, but I think it's trumped by grouping default-value parameters together and option-value parameters together. wait/attempts are the overwhelmingly most likely to be used. Queue/priority much less so.