-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry strategy #32
Comments
Congratulations nicholas, you have found a feature that doesn't actually exist yet :). I meant to write this but never quite got around to it. It should be a fairly easy pull request if you fancy contributing? |
Yes, I would definitely be interested in doing this (or someone on my team). Is there a specific direction you would like to take with this feature? |
I have a lot of ideas. It might help to know what your specific requirement is, and base it around that. Can you elaborate at all on that? |
Our requirements are relatively simple. We just need to retry failed jobs in the queue. Right now we don't even need exponential back off or max attempts, but I guess that would be considered the basics. Just after something that you would get of the box with a library like |
How about Rihanna.retry/1? EDIT: Ah I see, you already looked at that. This is not a question with an obvious answer - you want to retry failed jobs... but under what rules? Automatic retry of everything forever? Sounds dangerous. I'm gonna need more info about your requirements :) |
Sure, that's fair. How about this: Example:
Rules:
Based on the example above, I think this is a standard approach you would get out of the box with other packages. How does this sound? |
Actually, it sounds like you would be better off using something like ElixirRetry. Wrap your function in that macro, and put that in your perform function. I think this third party library might do better than building it into Rihanna. |
Yes, ElixirRetry could be part of a solution but as it's ephemeral therefore a mechanism to persistent retry information between deployments would be required. I guess I forgot to add that into the requirements listed above. So I do see that there is still an integration component required with Rihanna. Another example of a retry mechanism provided "out of the box" is in the https://github.com/chanks/que/blob/master/docs/error_handling.md Thoughts? |
I'd be reluctant to enable retries by default, since jobs can have side-effects. You are right in that it should be optionally enabled though, and handled in the database. A simple exponential backoff approach with a configurable number of retries would work. We could save the retry number on the job row, and use the existing |
That sounds great, @samphilipd. We need this for our current project so my team will definitely be up for implementing this. |
How did you get on @nicholasjhenry ? |
@samphilipd We added Rihanna to our project last week, so we will need to address this within the next two weeks. |
@nicholasjhenry That's great! Keep me posted, I'd love to get this into Rihanna soon. Happy to help with anything you need. |
@nicholasjhenry did you get anywhere with this? |
@samphilipd This is on the back burner at the moment. Our project is a pre-release phase so it hasn't been a priority right now. No current ETA. |
@samphilipd @nicholasjhenry Any word on this yet? First, expand defmodule Rihanna.Job do
require Logger
@type result :: any
@type reason :: any
@type arg :: any
@type t :: %__MODULE__{}
@callback perform(arg :: any) :: :ok | {:ok, result} | :error | {:error, reason}
@callback after_error({:error, reason} | :error | Exception.t(), arg) :: any()
@callback retry({:error, reason} | :error | Exception.t(), arg, pos_integer()) :: {:ok | DateTime.t()} | any() # <-- Add retry(error, job_args, attempt_count)
@optional_callbacks after_error: 2, retry: 3
...
On def retry(job_module, reason, arg, attempt_count) do
if :erlang.function_exported(job_module, :retry, 3) do
try do
case job_module.retry(reason, arg, attempt_count) do
{:ok, %DateTime{} = due_at} -> update_job_due(due_at)
_ -> :noop
end
rescue
exception ->
Logger.warn(
"""
[Rihanna] retry callback failed
...
)
:noop
end
end
end Would like to get your thoughts on:
This structure allows clients implement their own custom strategy such as max retries, linear back-off, exponential backoff etc. Job status in rihanna UI for failed jobs can indicate a retry is dispatched by |
Looks sensible to me. What is the |
With an explicit failure value of |
I think that the retry function should never fail, and if it does that is exceptional circumstances which should result in a crash and not be part of the typespec. If the error case is part of the success type then what should Rihanna do in the case the implementation returns |
Ah yes, should remove |
@lpil Sounds like you're ok with keeping track of the retry counts in the internal meta jsonb column? Also what are your thoughts on holding the lock while assessing retries? |
I think the lock should be held until the retry has been either committed to the database or determined to be a |
Curious to know what the status of this is? I'd like to use it to build an exponential backoff behaviour into my jobs using "retry". Ideally the metadata required to understand backoff would be stored as part of the job metadata, to handle any worker node failures. |
I think we're happy with the design, but implementation of this feature has not started. I will not be implementing the feature but will be reviewing any pull requests. |
Fine with retry metadata being put in "rihanna_internal_meta". Can we re-use the |
Using I didn't realise it was so precise! |
I've started work on this @lpil @KushalP @nicholasjhenry based on the work and suggestions of @onomated. Please take a look at #66 and let me know what you think. Thanks! |
@samphilipd I think this is resolved now. |
Yup I think so. |
Could you please describe how to handle retries? Using the
Rihanna.Job
behaviour. Do you schedule another job with the same params within the current job? If I wanted to have a global retry, is there any reason why I couldn't use a process to monitor the jobs table for failed jobs and retry them usingRihanna.Job.retry/1
? Thank you for your help and this library.The text was updated successfully, but these errors were encountered: