Skip to content

Conversation

@metaskills
Copy link
Member

The idea here is to support the newly released (https://aws.amazon.com/about-aws/whats-new/2021/11/aws-lambda-partial-batch-response-sqs-event-source/) SQS features for per item batch failures with SQS. The docs can be found here (https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting) with some typos because the case of the interface is wrong. See awsdocs/aws-lambda-developer-guide#320

Benefits

  • Never return errors from the handler. Avoid concurrence blocks!!!
  • Allow folks to use whatever Batch size makes sense for them.

@metaskills
Copy link
Member Author

OK, I ran this thru its paces and everything seems good. Did ~100K jobs where each had a small % of failing and needed to be reported as an item failure. Seems visibility timeout is working well too for our mimic of the Sikekiq backoff. Here are some CloudWatch metrics of each of the 3 batches of ~100K jobs. Only difference between them is the BatchSize. First was 1, second 2, third 5. Here is what the data show me:

  1. By removing handler job failures with ReportBatchItemFailures we can SCALE as expected!
  2. SQS & Lambda always finds the concurrent sweet spot after scaling up quickly. Few minutes.
  3. Batching works as expected. But did not (for this test) reduce concurrency needs. YMMV.

Jobs Lambda Metrics

Screen Shot 2021-11-26 at 12 59 42 PM

Jobs SQS Queue Metrics

Screen Shot 2021-11-26 at 12 59 48 PM

CloudWatch Insights for ActiveJob Retry

Screen Shot 2021-11-26 at 12 55 40 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants