Add retry_on/discard_on for better exception handling #25991

dhh · 2016-07-29T20:55:19Z

Declarative exception handling of the most common kind: retrying and discarding.

class RemoteServiceJob < ActiveJob::Base
  retry_on Net::OpenTimeout, wait: 30.seconds, attempts: 10

  def perform(*args)
    # Might raise Net::OpenTimeout when the remote service is down
  end
end

class SearchIndexingJob < ActiveJob::Base
  discard_on ActiveJob::DeserializationError

  def perform(record)
    # Will raise ActiveJob::DeserializationError if the record can't be deserialized
  end
end

kaspth · 2016-07-29T21:13:43Z

activejob/lib/active_job/exceptions.rb

+      #
+      # ==== Options
+      # * <tt>:wait</tt> - Re-enqueues the job with the specified delay in seconds
+      # * <tt>:attempts</tt> - Re-enqueues the job the specified number of times


Might want to state the defaults here.

kaspth · 2016-07-29T21:21:14Z

activejob/lib/active_job/exceptions.rb

+      # Discard the job with no attempts to retry, if the exception is raised. This is useful when the subject of the job,
+      # like an Active Record, is no longer available, and the job is thus no longer relevant.
+      #
+      # ==== Example


Nitpick: This says "Example" while the other docs say "Examples" (plural).

If there's only 1 example, it doesn't make sense to me to pluralize.

The other cases only have one example as well.

I'd rather fix those, then. Use plural when there are multiple examples and singular when just one.

That's what I meant to say, just seems I forgot to actually put the words down 👍

rafaelfranca · 2016-07-29T21:28:44Z

kaspth · 2016-07-29T21:31:27Z

activejob/lib/active_job/exceptions.rb

+      #      # Might raise Net::OpenTimeout when the remote service is down
+      #    end
+      #  end
+      def retry_on(exception, wait: 3.seconds, attempts: 5, queue: nil, priority: nil)


Personally, I'd rearrange these arguments such that wait, queue and priority are together with attempts being last. That gives me the nice pleasure of seeing the same grouping as they have in the method body and brings them closer to their retry_job origin.

I'd do the same in the docs just above. 😁

I can see that general point, but I think it's trumped by grouping default-value parameters together and option-value parameters together. wait/attempts are the overwhelmingly most likely to be used. Queue/priority much less so.

kaspth · 2016-07-29T21:39:23Z

Luuuv the API,

matthewd · 2016-07-29T21:59:52Z

activejob/lib/active_job/exceptions.rb

+      #  end
+      def retry_on(exception, wait: 3.seconds, attempts: 5, queue: nil, priority: nil)
+        rescue_from exception do |error|
+          logger.error "Retrying #{self.class} in #{wait} seconds, due to a #{exception}. The original exception was #{error.cause.inspect}."


This claims to be retrying when it should actually be saying it's giving up

matthewd · 2016-07-29T22:07:34Z

Alternative API:

discard_on Net::OpenTimeout, retries: 10, wait: 30.seconds

rescue_from Net::OpenTimeout, retries: 10, wait: 30.seconds do |error|
  # We failed to connect ten times; give up and send someone an email or something
end

Mostly, I like the fact this clearly states what happens when we run out of retries... retry_on feels a bit subtle for ".. and then almost-silently drop it on the floor".

dhh · 2016-07-29T22:11:40Z

Ah, I see what you mean @matthewd. Actually what should happen after we try to retry a bunch of time and fail is that we should reraise and let the queue deal with it. Will fix that!

(Though I guess there's still an argument for ALSO allowing custom logic at that point, but).

matthewd · 2016-07-29T22:15:24Z

Yeah, I guess I'm positing that (unlike "discard", say,) "retry" isn't a distinct error handling strategy, but an intrinsic.. step? attribute? of any overall error-handling plan. It just happens that the default behaviour is to make zero retries.

…empts

dhh · 2016-07-29T22:16:30Z

Definitely. Failing to retry should absolutely not result in dropping the job. Just fixed that in the latest commit 👍

matthewd · 2016-07-29T22:21:35Z

activejob/lib/active_job/exceptions.rb

+            logger.error "Retrying #{self.class} in #{wait} seconds, due to a #{exception}. The original exception was #{error.cause.inspect}."
+            retry_job wait: wait, queue: queue, priority: priority
+          else
+            logger.error "Stopped retrying #{self.class} due to a #{exception}, which reoccurred on #{executions} attempts. The original exception was #{error.cause.inspect}."


Super nitpick technicality: this particular exception may not have occurred during the previous executions. Maybe worth rephrasing to something like "Stopping #{self.class} after #{executions} retries due to .."... or maybe not worth bothering with.

You mean because of inheritance? Or how would that particular exception not have been the cause?

If you have more than one retry_on (i.e., handling more than once possible exception) -- executions keeps counting up. The exception we're naming definitely caused this failure, but if we've done five attempts, they could've hit: Resolv::ResolvTimeout Resolv::ResolvTimeout Resolv::ResolvTimeout Resolv::ResolvTimeout Net::OpenTimeout. It's true we're giving up after 5 tries, but it's not strictly true to blame Net::OpenTimeout for reoccurring 5 times.

matthewd · 2016-07-29T22:43:31Z

(Though I guess there's still an argument for ALSO allowing custom logic at that point, but).

FWIW, I think that's the thing I was arguing: if I write a custom rescue_from, I don't want to then have to implement my own retry mechanism -- that feels orthogonal to my decision on how I want to handle the "give up" step.

.. including on the new discard_on -- it seems right that the default after-retrying behaviour is to re-raise, but "try a few times then just forget it" seems just as likely as "try once then forget it".

robin850 · 2016-07-30T13:05:20Z

activejob/lib/active_job/exceptions.rb

+    end
+
+    # Reschedules the job to be re-executed. This is useful in combination
+    # with the +rescue_from+ option. When you rescue an exception from your job


I think you meant "the +rescue_from+ method" here, not option, no ?

It's not meant as a programmatic option, but rather as "an option for dealing with exceptions". Other options include discard_on, retry_on.

Let’s do it when we actually execute instead. Then the tests dealing with comparable serializations won’t fail either!

… attempts

dhh · 2016-08-01T23:53:30Z

@matthewd Added the power to provide a custom handler if the retry attempts are unsuccessful.

dhh · 2016-08-01T23:53:46Z

This is now ready to merge from my perspective, lest anyone has any last objections.

dhh · 2016-08-02T03:29:42Z

Did not. Removed. Thanks.

kaspth · 2016-08-02T05:27:49Z

activejob/lib/active_job/exceptions.rb

+        case seconds_or_algorithm
+        when :exponentially_longer
+          (executions ** 4) + 2
+        when Integer


How does this fare with fixnums, bignums and rationals to name a few on Ruby 2.3?

Does it catch durations as well, if I passed in 1.hour?

Integer is the parent class of those. So should be good. But no, doesn't catch duration. Could add that as a separate clause and .to_i it.

👍, since our documentation uses 3.seconds, I'd expect full duration support as a doc reader.

kaspth · 2016-08-02T05:29:57Z

Left some questions, but otherwise good to me 😁

CodeClimate, on the other hand, seems to wish for some climate change 😋

kaspth · 2016-08-02T19:35:02Z

activejob/lib/active_job/exceptions.rb

@@ -45,7 +45,7 @@ def retry_on(exception, wait: 3.seconds, attempts: 5, queue: nil, priority: nil)
          if executions < attempts
            logger.error "Retrying #{self.class} in #{wait} seconds, due to a #{exception}. The original exception was #{error.cause.inspect}."
            retry_job wait: determine_delay(wait), queue: queue, priority: priority
-          else          


I've never seen this much whitespace sacrificed for the glory of Rubocup! We must be saved now! Praise be our automaton savior 🙏🤖

dhh added 3 commits July 29, 2016 13:54

Add retry_on/discard_on for better exception handling

8b5c04e

Remove needless require

5ce59f4

Satisfy pedantic rubocop whitespace detection

4139b14

kaspth added this to the 5.1.0 milestone Jul 29, 2016

kaspth added the activejob label Jul 29, 2016

kaspth assigned dhh Jul 29, 2016

kaspth reviewed Jul 29, 2016
View reviewed changes

dhh added 2 commits July 29, 2016 14:20

Mention defaults

504a7d0

Require time extension for 3.seconds default

b00214d

kaspth reviewed Jul 29, 2016
View reviewed changes

Allow retries to happen with different priority and queue

8457e5e

kaspth reviewed Jul 29, 2016
View reviewed changes

matthewd reviewed Jul 29, 2016
View reviewed changes

Proper logging when we bail on retrying after X attempts

f931290

Reraise instead of swallow exceptions that occur beyond the retry att…

779148d

…empts

matthewd reviewed Jul 29, 2016
View reviewed changes

robin850 reviewed Jul 30, 2016
View reviewed changes

Executions counting is not a serialization concern

a4fc7dc

Let’s do it when we actually execute instead. Then the tests dealing with comparable serializations won’t fail either!

dhh added 3 commits August 1, 2016 16:09

Use descriptive exception names

08a92d4

Add exponentially_longer and custom wait algorithms

0be5d5d

Allow for custom handling of exceptions that persist beyond the retry…

9d8d4ee

… attempts

Not needed

3118911

kaspth reviewed Aug 2, 2016
View reviewed changes

dhh added 2 commits August 2, 2016 12:28

Please Rubocup

111227c

Merge branch 'master' into retry-and-discard-jobs

2762ebd

kaspth reviewed Aug 2, 2016
View reviewed changes

dhh added 2 commits August 2, 2016 14:27

Fix tests against ActiveSupport::Durations

7efd77f

Amend the CHANGELOG

b53da9e

dhh merged commit d46d61e into master Aug 2, 2016

dhh deleted the retry-and-discard-jobs branch August 2, 2016 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retry_on/discard_on for better exception handling #25991

Add retry_on/discard_on for better exception handling #25991

dhh commented Jul 29, 2016 •

edited

kaspth Jul 29, 2016

kaspth Jul 29, 2016

dhh Jul 29, 2016

kaspth Jul 29, 2016

dhh Jul 29, 2016

kaspth Jul 30, 2016

rafaelfranca commented Jul 29, 2016

kaspth Jul 29, 2016

dhh Jul 29, 2016

kaspth commented Jul 29, 2016

matthewd Jul 29, 2016

matthewd commented Jul 29, 2016

dhh commented Jul 29, 2016

matthewd commented Jul 29, 2016

dhh commented Jul 29, 2016

matthewd Jul 29, 2016

dhh Jul 29, 2016

matthewd Jul 29, 2016

matthewd commented Jul 29, 2016

robin850 Jul 30, 2016

dhh Aug 1, 2016

dhh commented Aug 1, 2016

dhh commented Aug 1, 2016

dhh commented Aug 2, 2016

kaspth Aug 2, 2016

dhh Aug 2, 2016

kaspth Aug 2, 2016

kaspth commented Aug 2, 2016

kaspth Aug 2, 2016

Add retry_on/discard_on for better exception handling #25991

Add retry_on/discard_on for better exception handling #25991

Conversation

dhh commented Jul 29, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaelfranca commented Jul 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaspth commented Jul 29, 2016

Choose a reason for hiding this comment

matthewd commented Jul 29, 2016

dhh commented Jul 29, 2016

matthewd commented Jul 29, 2016

dhh commented Jul 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewd commented Jul 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhh commented Aug 1, 2016

dhh commented Aug 1, 2016

dhh commented Aug 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaspth commented Aug 2, 2016

Choose a reason for hiding this comment

dhh commented Jul 29, 2016 •

edited