A comprehensive guide to implementing scalable batch processing workloads using ECS and EKS with event-driven architectures.
- What is batch processing and when to use it
- Traditional vs Event-driven batch processing
- AWS container services comparison (ECS vs EKS for batch workloads)
Three implementation patterns covered:
- Scheduled Batch Jobs - Time-based execution using EventBridge
- Queue-Based Processing - Event-driven with SQS + Auto-scaling
- Kubernetes Jobs - EKS CronJobs and one-time Jobs
- Image processing pipeline triggered daily
- CloudWatch Events Rule → ECS Task
- Use case: Daily report generation
- Message queue triggers container tasks
- Auto-scaling based on queue depth
- Use case: Video transcoding, data processing
- Kubernetes Jobs for one-time processing
- CronJobs for scheduled workloads
- Use case: ETL pipelines, ML training jobs
- CloudWatch Container Insights
- Custom metrics for batch job tracking
- Dead letter queues for failed jobs
- Cost tracking and optimization
- Error handling and retries
- Idempotency patterns
- Resource optimization
- Security considerations
EventBridge Rule (cron) → ECS Task Definition → Fargate Task
↓
CloudWatch Logs
↓
SNS (Success/Failure)
Event Source → SQS Queue → ECS Service (Auto-scaling)
↓ ↓
CloudWatch Fargate Tasks
(Queue Depth) ↓
S3/Database
↓
DLQ (Failed)
EventBridge/Manual → Kubernetes Job/CronJob
↓
EKS Worker Nodes
↓
CloudWatch Logs/Metrics
.
├── README.md
├── ecs-scheduled/
│ ├── terraform/
│ ├── docker/
│ └── README.md
├── ecs-sqs-autoscaling/
│ ├── terraform/
│ ├── docker/
│ ├── lambda/ (SQS producer)
│ └── README.md
├── eks-jobs/
│ ├── terraform/
│ ├── k8s-manifests/
│ ├── docker/
│ └── README.md
└── monitoring/
├── cloudwatch-dashboards/
└── alarms/
- AWS Services: ECS, EKS, EventBridge, SQS, ECR, CloudWatch, SNS
- IaC: Terraform
- Container Runtime: Docker
- Languages: Python (sample applications), HCL (Terraform)
- AWS CLI configured
- Docker installed
- Terraform >= 1.0
- kubectl (for EKS examples)
- eksctl (for EKS cluster setup)
Each subdirectory contains a complete working example with:
- Docker application code
- Infrastructure as Code (Terraform)
- Deployment instructions
- Testing and validation steps
Approximate monthly costs for running these examples:
- ECS Scheduled (1 task/day, 5 min): ~$1-2
- ECS SQS Auto-scaling (100 tasks/day): ~$10-20
- EKS Jobs (t3.medium nodes): ~$30-50
-
When to use ECS vs EKS for batch:
- ECS: Simpler, serverless with Fargate, great for straightforward batch jobs
- EKS: More complex workloads, need advanced scheduling, existing K8s investment
-
Event-driven benefits:
- Cost efficiency (pay only when processing)
- Automatic scaling based on demand
- Loose coupling between services
-
Production considerations:
- Implement idempotency for retries
- Use DLQ for failed messages
- Monitor queue age and task duration
- Set appropriate timeouts and resource limits
- Hook: Start with a real-world problem (e.g., "Processing millions of images uploaded by users")
- Context: Explain why containers are perfect for batch processing
- Deep Dive: Walk through each implementation with code
- Comparison: Side-by-side comparison of the three approaches
- Production Tips: Share lessons learned and best practices
- Call to Action: Encourage readers to try the examples and share feedback
MIT License - Feel free to use this code for your projects and learning