Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry_on/discard_on for better exception handling #25991

Merged
merged 17 commits into from Aug 2, 2016
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions activejob/lib/active_job/base.rb
Expand Up @@ -5,6 +5,7 @@
require 'active_job/enqueuing'
require 'active_job/execution'
require 'active_job/callbacks'
require 'active_job/exceptions'
require 'active_job/logging'
require 'active_job/translation'

Expand Down Expand Up @@ -62,6 +63,7 @@ class Base
include Enqueuing
include Execution
include Callbacks
include Exceptions
include Logging
include Translation

Expand Down
6 changes: 6 additions & 0 deletions activejob/lib/active_job/core.rb
Expand Up @@ -24,6 +24,9 @@ module Core
# ID optionally provided by adapter
attr_accessor :provider_job_id

# Number of times this job has been executed (which increments on every retry, like after an exception).
attr_accessor :executions

# I18n.locale to be used during the job.
attr_accessor :locale
end
Expand Down Expand Up @@ -68,6 +71,7 @@ def initialize(*arguments)
@job_id = SecureRandom.uuid
@queue_name = self.class.queue_name
@priority = self.class.priority
@executions = 0
end

# Returns a hash with the job data that can safely be passed to the
Expand All @@ -79,6 +83,7 @@ def serialize
'queue_name' => queue_name,
'priority' => priority,
'arguments' => serialize_arguments(arguments),
'executions' => executions,
'locale' => I18n.locale.to_s
}
end
Expand Down Expand Up @@ -109,6 +114,7 @@ def deserialize(job_data)
self.queue_name = job_data['queue_name']
self.priority = job_data['priority']
self.serialized_arguments = job_data['arguments']
self.executions = job_data['executions']
self.locale = job_data['locale'] || I18n.locale.to_s
end

Expand Down
27 changes: 1 addition & 26 deletions activejob/lib/active_job/enqueuing.rb
@@ -1,7 +1,7 @@
require 'active_job/arguments'

module ActiveJob
# Provides behavior for enqueuing and retrying jobs.
# Provides behavior for enqueuing jobs.
module Enqueuing
extend ActiveSupport::Concern

Expand All @@ -24,31 +24,6 @@ def job_or_instantiate(*args)
end
end

# Reschedules the job to be re-executed. This is useful in combination
# with the +rescue_from+ option. When you rescue an exception from your job
# you can ask Active Job to retry performing your job.
#
# ==== Options
# * <tt>:wait</tt> - Enqueues the job with the specified delay
# * <tt>:wait_until</tt> - Enqueues the job at the time specified
# * <tt>:queue</tt> - Enqueues the job on the specified queue
# * <tt>:priority</tt> - Enqueues the job with the specified priority
#
# ==== Examples
#
# class SiteScraperJob < ActiveJob::Base
# rescue_from(ErrorLoadingSite) do
# retry_job queue: :low_priority
# end
#
# def perform(*args)
# # raise ErrorLoadingSite if cannot scrape
# end
# end
def retry_job(options={})
enqueue options
end

# Enqueues the job to be performed by the queue adapter.
#
# ==== Options
Expand Down
117 changes: 117 additions & 0 deletions activejob/lib/active_job/exceptions.rb
@@ -0,0 +1,117 @@
require 'active_support/core_ext/numeric/time'

module ActiveJob
# Provides behavior for retrying and discarding jobs on exceptions.
module Exceptions
extend ActiveSupport::Concern

module ClassMethods
# Catch the exception and reschedule job for re-execution after so many seconds, for a specific number of attempts.
# If the exception keeps getting raised beyond the specified number of attempts, the exception is allowed to
# bubble up to the underlying queuing system, which may have its own retry mechanism or place it in a
# holding queue for inspection.
#
# You can also pass a block that'll be invoked if the retry attempts fail for custom logic rather than letting
# the exception bubble up.
#
# ==== Options
# * <tt>:wait</tt> - Re-enqueues the job with a delay specified either in seconds (default: 3 seconds),
# as a computing proc that the number of executions so far as an argument, or as a symbol reference of
# <tt>:exponentially_longer<>, which applies the wait algorithm of <tt>(executions ** 4) + 2</tt>
# (first wait 3s, then 18s, then 83s, etc)
# * <tt>:attempts</tt> - Re-enqueues the job the specified number of times (default: 5 attempts)
# * <tt>:queue</tt> - Re-enqueues the job on a different queue
# * <tt>:priority</tt> - Re-enqueues the job with a different priority
#
# ==== Examples
#
# class RemoteServiceJob < ActiveJob::Base
# retry_on CustomAppException # defaults to 3s wait, 5 attempts
# retry_on AnotherCustomAppException, wait: ->(executions) { executions * 2 }
# retry_on(YetAnotherCustomAppException) do |exception|
# ExceptionNotifier.caught(exception)
# end
# retry_on ActiveRecord::StatementInvalid, wait: 5.seconds, attempts: 3
# retry_on Net::OpenTimeout, wait: :exponentially_longer, attempts: 10
#
# def perform(*args)
# # Might raise CustomAppException, AnotherCustomAppException, or YetAnotherCustomAppException for something domain specific
# # Might raise ActiveRecord::StatementInvalid when a local db deadlock is detected
# # Might raise Net::OpenTimeout when the remote service is down
# end
# end
def retry_on(exception, wait: 3.seconds, attempts: 5, queue: nil, priority: nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd rearrange these arguments such that wait, queue and priority are together with attempts being last. That gives me the nice pleasure of seeing the same grouping as they have in the method body and brings them closer to their retry_job origin.

I'd do the same in the docs just above. 😁

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that general point, but I think it's trumped by grouping default-value parameters together and option-value parameters together. wait/attempts are the overwhelmingly most likely to be used. Queue/priority much less so.

rescue_from exception do |error|
if executions < attempts
logger.error "Retrying #{self.class} in #{wait} seconds, due to a #{exception}. The original exception was #{error.cause.inspect}."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider showing error instead of exception; the latter is more likely to be a vague parent class. "The original exception was nil." seems likely to be confusing, too.

This brushes with my "error reporting in general" project anyway, so possibly ignore for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow? exception is the class, error is the object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of inheritance, error could be an instance of a very specific exception class, while exception could be something so vague as RuntimeError.

retry_job wait: determine_delay(wait), queue: queue, priority: priority
else
if block_given?
yield exception
else
logger.error "Stopped retrying #{self.class} due to a #{exception}, which reoccurred on #{executions} attempts. The original exception was #{error.cause.inspect}."
raise error
end
end
end
end

# Discard the job with no attempts to retry, if the exception is raised. This is useful when the subject of the job,
# like an Active Record, is no longer available, and the job is thus no longer relevant.
#
# ==== Example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: This says "Example" while the other docs say "Examples" (plural).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's only 1 example, it doesn't make sense to me to pluralize.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other cases only have one example as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather fix those, then. Use plural when there are multiple examples and singular when just one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I meant to say, just seems I forgot to actually put the words down 👍

#
# class SearchIndexingJob < ActiveJob::Base
# discard_on ActiveJob::DeserializationError
#
# def perform(record)
# # Will raise ActiveJob::DeserializationError if the record can't be deserialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our practice is to add a period to comments. Same goes for the other examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been using that form consistently. Usually only use periods if there are multiple sentences.

# end
# end
def discard_on(exception)
rescue_from exception do |error|
logger.error "Discarded #{self.class} due to a #{exception}. The original exception was #{error.cause.inspect}."
end
end
end

# Reschedules the job to be re-executed. This is useful in combination
# with the +rescue_from+ option. When you rescue an exception from your job
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant "the +rescue_from+ method" here, not option, no ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not meant as a programmatic option, but rather as "an option for dealing with exceptions". Other options include discard_on, retry_on.

# you can ask Active Job to retry performing your job.
#
# ==== Options
# * <tt>:wait</tt> - Enqueues the job with the specified delay in seconds
# * <tt>:wait_until</tt> - Enqueues the job at the time specified
# * <tt>:queue</tt> - Enqueues the job on the specified queue
# * <tt>:priority</tt> - Enqueues the job with the specified priority
#
# ==== Examples
#
# class SiteScraperJob < ActiveJob::Base
# rescue_from(ErrorLoadingSite) do
# retry_job queue: :low_priority
# end
#
# def perform(*args)
# # raise ErrorLoadingSite if cannot scrape
# end
# end
def retry_job(options = {})
enqueue options
end

private
def determine_delay(seconds_or_algorithm)
case seconds_or_algorithm
when :exponentially_longer
(executions ** 4) + 2
when Integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this fare with fixnums, bignums and rationals to name a few on Ruby 2.3?

Does it catch durations as well, if I passed in 1.hour?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer is the parent class of those. So should be good. But no, doesn't catch duration. Could add that as a separate clause and .to_i it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, since our documentation uses 3.seconds, I'd expect full duration support as a doc reader.

seconds = seconds_or_algorithm
seconds
when Proc
algorithm = seconds_or_algorithm
algorithm.call(executions)
end
end
end
end
1 change: 1 addition & 0 deletions activejob/lib/active_job/execution.rb
Expand Up @@ -31,6 +31,7 @@ def execute(job_data) #:nodoc:
def perform_now
deserialize_arguments_if_needed
run_callbacks :perform do
self.executions = executions + 1
perform(*arguments)
end
rescue => exception
Expand Down
90 changes: 90 additions & 0 deletions activejob/test/cases/exceptions_test.rb
@@ -0,0 +1,90 @@
require 'helper'
require 'jobs/retry_job'

class ExceptionsTest < ActiveSupport::TestCase
setup do
JobBuffer.clear
end

test "successfully retry job throwing exception against defaults" do
RetryJob.perform_later 'DefaultsError', 5

assert_equal [
"Raised DefaultsError for the 1st time",
"Raised DefaultsError for the 2nd time",
"Raised DefaultsError for the 3rd time",
"Raised DefaultsError for the 4th time",
"Successfully completed job" ], JobBuffer.values
end

test "successfully retry job throwing exception against higher limit" do
RetryJob.perform_later 'ShortWaitTenAttemptsError', 9
assert_equal 9, JobBuffer.values.count
end

test "failed retry job when exception kept occurring against defaults" do
begin
RetryJob.perform_later 'DefaultsError', 6
assert_equal "Raised DefaultsError for the 5th time", JobBuffer.last_value
rescue DefaultsError
pass
end
end

test "failed retry job when exception kept occurring against higher limit" do
begin
RetryJob.perform_later 'ShortWaitTenAttemptsError', 11
assert_equal "Raised ShortWaitTenAttemptsError for the 10th time", JobBuffer.last_value
rescue ShortWaitTenAttemptsError
pass
end
end

test "discard job" do
RetryJob.perform_later 'DiscardableError', 2
assert_equal "Raised DiscardableError for the 1st time", JobBuffer.last_value
end

test "custom handling of job that exceeds retry attempts" do
RetryJob.perform_later 'CustomCatchError', 6
assert_equal "Dealt with a job that failed to retry in a custom way", JobBuffer.last_value
end
end

class ExponentiallyBackoffExceptionsTest < ActiveJob::TestCase
setup do
JobBuffer.clear
end

test "exponentially retrying job" do
travel_to Time.now

perform_enqueued_jobs do
assert_performed_with at: (Time.now + 3.seconds).to_i do
assert_performed_with at: (Time.now + 18.seconds).to_i do
assert_performed_with at: (Time.now + 83.seconds).to_i do
assert_performed_with at: (Time.now + 258.seconds).to_i do
RetryJob.perform_later 'ExponentialWaitTenAttemptsError', 5
end
end
end
end
end
end

test "custom wait retrying job" do
travel_to Time.now

perform_enqueued_jobs do
assert_performed_with at: (Time.now + 2.seconds).to_i do
assert_performed_with at: (Time.now + 4.seconds).to_i do
assert_performed_with at: (Time.now + 6.seconds).to_i do
assert_performed_with at: (Time.now + 8.seconds).to_i do
RetryJob.perform_later 'CustomWaitTenAttemptsError', 5
end
end
end
end
end
end
end
27 changes: 27 additions & 0 deletions activejob/test/jobs/retry_job.rb
@@ -0,0 +1,27 @@
require_relative '../support/job_buffer'
require 'active_support/core_ext/integer/inflections'

class DefaultsError < StandardError; end
class ShortWaitTenAttemptsError < StandardError; end
class ExponentialWaitTenAttemptsError < StandardError; end
class CustomWaitTenAttemptsError < StandardError; end
class CustomCatchError < StandardError; end
class DiscardableError < StandardError; end

class RetryJob < ActiveJob::Base
retry_on DefaultsError
retry_on ShortWaitTenAttemptsError, wait: 1.second, attempts: 10
retry_on ExponentialWaitTenAttemptsError, wait: :exponentially_longer, attempts: 10
retry_on CustomWaitTenAttemptsError, wait: ->(executions) { executions * 2 }, attempts: 10
retry_on(CustomCatchError) { |exception| JobBuffer.add("Dealt with a job that failed to retry in a custom way") }
discard_on DiscardableError

def perform(raising, attempts)
if executions < attempts
JobBuffer.add("Raised #{raising} for the #{executions.ordinalize} time")
raise raising.constantize
else
JobBuffer.add("Successfully completed job")
end
end
end