Auto retry failed jobs #60

xpd · 2018-06-26T16:01:57Z

Since out of the box monitoring is missing it would be nice to have the possibility of automatically retrying failed jobs. Along with a timeout this would have prevented almost all issues we've had.

remkade · 2019-01-10T20:50:47Z

I also would love this. Cavalcade is great, but failed jobs are causing a massive headache.

I'd be happy to work on a PR if there is any guidance on how people would prefer to handle edge cases and retry.

svandragt · 2019-01-10T21:43:27Z

Hey folks. We run this and have the same problem. Cavalcade assumes all failed jobs need to be investigated and manually resumed otherwise they don’t run again. On the other hand the reason for a job to fail is not always a problem with the code itself. For example external factors such as the connectivity problems. So I found that failed jobs with an interval are best marked as completed as this results in them being rescheduled. Jobs without interval are best marked as waiting so they are run again.

I built a self health job that corrects failed jobs older than x minutes which allows some time for an investigation. Ideally any job that repeatedly fails a number of times should not be corrected as a repeatable fail is likely caused by an issue on the application level. This should result in a notification of some kind?

If it is of interest I could spend a bit of time polishing this solution up and see if we can share it?

archon810 · 2019-01-10T21:45:36Z

Yes, very much so. I've been watching Cavalcade in hopes of resolving our constant issues with failed scheduled post.

It's possible that just by switching, this problem would be resolved. But in case it isn't, a retry mechanism would make sense. WP cron doesn't have one at all.

remkade · 2019-01-14T16:56:05Z

I have a MySQL Event (cronjob inside mysql basically) I made for a specific woocommerce subscriptions job that can't stop processing if it fails.

Here's what it looks like:

CREATE EVENT IF NOT EXISTS `fix_cavalcade`
ON SCHEDULE EVERY 1 HOUR
DISABLE ON SLAVE
COMMENT 'Sets cavalcade job status back to waiting + 2 minutes if it fails'
DO
  UPDATE wordpress.wp_cavalcade_jobs
  SET status = 'waiting'
  WHERE id = 18
    AND
    site = 86
    AND
    (
      status = 'failed'
      OR
      nextrun < NOW() - INTERVAL 2 HOUR
    );

If you're running this inside of RDS make sure you update you parameter group to enable the event scheduler! It took me weeks to figure out why it wasn't running. event_scheduler = ON.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto retry failed jobs #60

Auto retry failed jobs #60

xpd commented Jun 26, 2018

remkade commented Jan 10, 2019

svandragt commented Jan 10, 2019

archon810 commented Jan 10, 2019

remkade commented Jan 14, 2019

Auto retry failed jobs #60

Auto retry failed jobs #60

Comments

xpd commented Jun 26, 2018

remkade commented Jan 10, 2019

svandragt commented Jan 10, 2019

archon810 commented Jan 10, 2019

remkade commented Jan 14, 2019