Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto retry failed jobs #60

Open
xpd opened this issue Jun 26, 2018 · 4 comments
Open

Auto retry failed jobs #60

xpd opened this issue Jun 26, 2018 · 4 comments

Comments

@xpd
Copy link

xpd commented Jun 26, 2018

Since out of the box monitoring is missing it would be nice to have the possibility of automatically retrying failed jobs. Along with a timeout this would have prevented almost all issues we've had.

@remkade
Copy link

remkade commented Jan 10, 2019

I also would love this. Cavalcade is great, but failed jobs are causing a massive headache.

I'd be happy to work on a PR if there is any guidance on how people would prefer to handle edge cases and retry.

@svandragt
Copy link

Hey folks. We run this and have the same problem. Cavalcade assumes all failed jobs need to be investigated and manually resumed otherwise they don’t run again. On the other hand the reason for a job to fail is not always a problem with the code itself. For example external factors such as the connectivity problems. So I found that failed jobs with an interval are best marked as completed as this results in them being rescheduled. Jobs without interval are best marked as waiting so they are run again.

I built a self health job that corrects failed jobs older than x minutes which allows some time for an investigation. Ideally any job that repeatedly fails a number of times should not be corrected as a repeatable fail is likely caused by an issue on the application level. This should result in a notification of some kind?

If it is of interest I could spend a bit of time polishing this solution up and see if we can share it?

@archon810
Copy link

Yes, very much so. I've been watching Cavalcade in hopes of resolving our constant issues with failed scheduled post.

It's possible that just by switching, this problem would be resolved. But in case it isn't, a retry mechanism would make sense. WP cron doesn't have one at all.

@remkade
Copy link

remkade commented Jan 14, 2019

I have a MySQL Event (cronjob inside mysql basically) I made for a specific woocommerce subscriptions job that can't stop processing if it fails.

Here's what it looks like:

CREATE EVENT IF NOT EXISTS `fix_cavalcade`
ON SCHEDULE EVERY 1 HOUR
DISABLE ON SLAVE
COMMENT 'Sets cavalcade job status back to waiting + 2 minutes if it fails'
DO
  UPDATE wordpress.wp_cavalcade_jobs
  SET status = 'waiting'
  WHERE id = 18
    AND
    site = 86
    AND
    (
      status = 'failed'
      OR
      nextrun < NOW() - INTERVAL 2 HOUR
    );

If you're running this inside of RDS make sure you update you parameter group to enable the event scheduler! It took me weeks to figure out why it wasn't running. event_scheduler = ON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants