Add retry_on/discard_on for better exception handling#25991
Conversation
| # | ||
| # ==== Options | ||
| # * <tt>:wait</tt> - Re-enqueues the job with the specified delay in seconds | ||
| # * <tt>:attempts</tt> - Re-enqueues the job the specified number of times |
There was a problem hiding this comment.
Might want to state the defaults here.
| # Discard the job with no attempts to retry, if the exception is raised. This is useful when the subject of the job, | ||
| # like an Active Record, is no longer available, and the job is thus no longer relevant. | ||
| # | ||
| # ==== Example |
There was a problem hiding this comment.
Nitpick: This says "Example" while the other docs say "Examples" (plural).
There was a problem hiding this comment.
If there's only 1 example, it doesn't make sense to me to pluralize.
There was a problem hiding this comment.
The other cases only have one example as well.
There was a problem hiding this comment.
I'd rather fix those, then. Use plural when there are multiple examples and singular when just one.
There was a problem hiding this comment.
That's what I meant to say, just seems I forgot to actually put the words down 👍
|
|
| # # Might raise Net::OpenTimeout when the remote service is down | ||
| # end | ||
| # end | ||
| def retry_on(exception, wait: 3.seconds, attempts: 5, queue: nil, priority: nil) |
There was a problem hiding this comment.
Personally, I'd rearrange these arguments such that wait, queue and priority are together with attempts being last. That gives me the nice pleasure of seeing the same grouping as they have in the method body and brings them closer to their retry_job origin.
I'd do the same in the docs just above. 😁
There was a problem hiding this comment.
I can see that general point, but I think it's trumped by grouping default-value parameters together and option-value parameters together. wait/attempts are the overwhelmingly most likely to be used. Queue/priority much less so.
|
Luuuv the API, |
| # end | ||
| def retry_on(exception, wait: 3.seconds, attempts: 5, queue: nil, priority: nil) | ||
| rescue_from exception do |error| | ||
| logger.error "Retrying #{self.class} in #{wait} seconds, due to a #{exception}. The original exception was #{error.cause.inspect}." |
There was a problem hiding this comment.
This claims to be retrying when it should actually be saying it's giving up
|
Alternative API: discard_on Net::OpenTimeout, retries: 10, wait: 30.secondsrescue_from Net::OpenTimeout, retries: 10, wait: 30.seconds do |error|
# We failed to connect ten times; give up and send someone an email or something
endMostly, I like the fact this clearly states what happens when we run out of retries... |
|
Ah, I see what you mean @matthewd. Actually what should happen after we try to retry a bunch of time and fail is that we should reraise and let the queue deal with it. Will fix that! (Though I guess there's still an argument for ALSO allowing custom logic at that point, but). |
|
Yeah, I guess I'm positing that (unlike "discard", say,) "retry" isn't a distinct error handling strategy, but an intrinsic.. step? attribute? of any overall error-handling plan. It just happens that the default behaviour is to make zero retries. |
|
Definitely. Failing to retry should absolutely not result in dropping the job. Just fixed that in the latest commit 👍 |
| logger.error "Retrying #{self.class} in #{wait} seconds, due to a #{exception}. The original exception was #{error.cause.inspect}." | ||
| retry_job wait: wait, queue: queue, priority: priority | ||
| else | ||
| logger.error "Stopped retrying #{self.class} due to a #{exception}, which reoccurred on #{executions} attempts. The original exception was #{error.cause.inspect}." |
There was a problem hiding this comment.
Super nitpick technicality: this particular exception may not have occurred during the previous executions. Maybe worth rephrasing to something like "Stopping #{self.class} after #{executions} retries due to .."... or maybe not worth bothering with.
There was a problem hiding this comment.
You mean because of inheritance? Or how would that particular exception not have been the cause?
There was a problem hiding this comment.
If you have more than one retry_on (i.e., handling more than once possible exception) -- executions keeps counting up. The exception we're naming definitely caused this failure, but if we've done five attempts, they could've hit: Resolv::ResolvTimeout Resolv::ResolvTimeout Resolv::ResolvTimeout Resolv::ResolvTimeout Net::OpenTimeout. It's true we're giving up after 5 tries, but it's not strictly true to blame Net::OpenTimeout for reoccurring 5 times.
FWIW, I think that's the thing I was arguing: if I write a custom .. including on the new |
| end | ||
|
|
||
| # Reschedules the job to be re-executed. This is useful in combination | ||
| # with the +rescue_from+ option. When you rescue an exception from your job |
There was a problem hiding this comment.
I think you meant "the +rescue_from+ method" here, not option, no ?
There was a problem hiding this comment.
It's not meant as a programmatic option, but rather as "an option for dealing with exceptions". Other options include discard_on, retry_on.
Let’s do it when we actually execute instead. Then the tests dealing with comparable serializations won’t fail either!
|
@matthewd Added the power to provide a custom handler if the retry attempts are unsuccessful. |
|
This is now ready to merge from my perspective, lest anyone has any last objections. |
|
Did not. Removed. Thanks. |
| case seconds_or_algorithm | ||
| when :exponentially_longer | ||
| (executions ** 4) + 2 | ||
| when Integer |
There was a problem hiding this comment.
How does this fare with fixnums, bignums and rationals to name a few on Ruby 2.3?
Does it catch durations as well, if I passed in 1.hour?
There was a problem hiding this comment.
Integer is the parent class of those. So should be good. But no, doesn't catch duration. Could add that as a separate clause and .to_i it.
There was a problem hiding this comment.
👍, since our documentation uses 3.seconds, I'd expect full duration support as a doc reader.
|
Left some questions, but otherwise good to me 😁 CodeClimate, on the other hand, seems to wish for some climate change 😋 |
| if executions < attempts | ||
| logger.error "Retrying #{self.class} in #{wait} seconds, due to a #{exception}. The original exception was #{error.cause.inspect}." | ||
| retry_job wait: determine_delay(wait), queue: queue, priority: priority | ||
| else |
There was a problem hiding this comment.
I've never seen this much whitespace sacrificed for the glory of Rubocup! We must be saved now! Praise be our automaton savior 🙏🤖
Declarative exception handling of the most common kind: retrying and discarding.