Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Improve the visibility and resiliency of AWS Batch processing of CSVs #457

Closed
jwalgran opened this issue Apr 3, 2019 · 1 comment
Closed
Assignees

Comments

@jwalgran
Copy link
Contributor

jwalgran commented Apr 3, 2019

Overview

AWS Batch jobs can fail. We designed the processing to be idempotent, so it is safe to rerun failed jobs, but we don't have an automated system for doing so.

Is your feature request related to a problem? Please describe.

You cannot see that a job has failed within the application, only in the AWS Batch console.

Describe the solution you'd like

@jwalgran jwalgran changed the title Improve the visibility resiliency of AWS Batch processing of CSVs Improve the visibility and resiliency of AWS Batch processing of CSVs Apr 3, 2019
@hectcastro
Copy link
Contributor

I would be interested in discussing this a bit with whoever pulls it. In RF we've made use of:

  • Job retries (which require some additional application logic to track the retry count)
  • Job timeouts (to prevent jobs that hang from hogging up resources)
  • Send Rollbar notifications when jobs fail (A Ruby Goldberg device that uses part of the strategy in SNS article above, but to wake up a Lambda that invokes Rollbar)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants