Event-driven video thumbnail generator running on EKS Fargate.
Browser ──HTTPS──────→ Route53 (thumbnail.yourdomain.com) → ALB ──/ws──→ FastAPI api service
──WebSocket──→ ──/──→ Next.js frontend
↓ (internal)
FastAPI api service → SQS (queue depth + in-memory history)
→ S3 (thumbnails + presigned URLs)
→ K8s API (pod status)
S3 (uploads/) ──event notification──→ SQS ──→ Worker (Fargate) → S3 (thumbnails/)
Supported video formats: .mp4, .mov, .mkv, .avi, .webm, .wmv, .flv, .m4v, .ts, .3gp
| Path | Description |
|---|---|
frontend/ |
Next.js dashboard — real-time via WebSocket, 3D scene, metrics, pod grid, queue chart, thumbnail gallery |
api/ |
FastAPI service — HTTP endpoints + WebSocket /ws that broadcasts all pipeline data every 5s |
worker/ |
Python worker — polls SQS, generates 3 thumbnails per video via ffmpeg, uploads to S3 |
k8s/ |
Kubernetes manifests (namespace, deployments, services, ingress) |
bootstrap-terraform/ |
One-time infra: GH Actions IAM role, ECR repos, tfstate bucket |
terraform/ |
Main infrastructure (EKS, VPC, IAM, SQS, S3, S3→SQS notification, Container Insights) |
terraform-monitoring/ |
CloudWatch alarms + SNS email alerts for DLQ, ALB 5xx, response time, connections |
route53/ |
DNS + TLS — Route53 hosted zone, ACM certificate, A record (applied last, after K8s) |
scripts/ |
Deploy scripts (KEDA, ALB controller, K8s manifests) |
.github/workflows/ |
CI/CD — build & push to ECR, deploy/destroy infrastructure |
Polls SQS for S3 event notification messages. Parses the JSON body to extract the S3 key.
- Downloads video from S3
- Extracts 3 thumbnails at 10%, 50%, 95% of video duration via ffmpeg
- Uploads thumbnails to S3 under
thumbnails/<stem>_1.jpg,_2.jpg,_3.jpg - Deletes the SQS message only after all uploads succeed (retries on failure via visibility timeout)
- Skips and deletes S3 test events (
s3:TestEvent) sent when the notification is first created
FastAPI service — data source for the frontend.
HTTP endpoints (also used as fallback):
GET /health— liveness checkGET /api/pods— pod list from K8s APIGET /api/queue— SQS queue depth + in-memory history (5 min at 5s resolution)POST /api/test— copies videos from test source bucket intouploads/to trigger the pipelineDELETE /api/purge— deletes all objects underuploads/andthumbnails/in the main bucketGET /api/thumbnails— presigned S3 URLs for generated thumbnailsGET /api/metrics— aggregated pipeline metrics
WebSocket:
WS /ws— sends full pipeline state on connect, then broadcasts updates every 5s to all connected clients
Next.js dashboard. Connects to the API via WebSocket for real-time updates (reconnects automatically with exponential backoff, 1s → 30s max).
- Live badge — green
LIVE/ redDOWNbased on WebSocket connection state - Overview — processed thumbnails, running pods, queue depth, in-flight, total pods, storage used
- Fargate Pods — live pod grid with status, ready state, restart count, uptime
- Queue Activity — SQS depth chart (5-min in-memory history at 5s resolution)
- Thumbnail Gallery — filterable by frame position (10%, 50%, 95%)
- Test Pipeline button — copies videos from the test source bucket into
uploads/to trigger the pipeline - Purge button — deletes all uploads and thumbnails from S3 to reset pipeline state
| Path | Backend | Notes |
|---|---|---|
/ws |
thumbnail-api:8000 |
WebSocket — routed directly to API |
/ |
thumbnail-frontend:3000 |
Everything else through Next.js |
ALB idle timeout set to 3600s to keep WebSocket connections alive.
Videos uploaded to uploads/ in the S3 bucket automatically trigger an SQS message via S3 event notification. Supported extensions: .mp4 .mov .mkv .avi .webm .wmv .flv .m4v .ts .3gp
The worker receives the S3 key as the message body and processes it with ffmpeg (supports any ffmpeg-compatible format).
| Variable | Description |
|---|---|
API_BASE_URL |
FastAPI internal URL (default: http://thumbnail-api:8000) |
| Variable | Description |
|---|---|
AWS_REGION |
e.g. ap-southeast-2 |
SQS_QUEUE_URL |
Full SQS queue URL |
S3_BUCKET |
Bucket for videos and thumbnails |
S3_THUMBNAIL_PREFIX |
Prefix for thumbnails (default: thumbnails/) |
| Variable | Description |
|---|---|
AWS_REGION |
e.g. ap-southeast-2 |
SQS_QUEUE_URL |
Full SQS queue URL |
S3_BUCKET |
Bucket name |
S3_THUMBNAIL_PREFIX |
Prefix for thumbnails (default: thumbnails/) |
WORKER_NAMESPACE |
K8s namespace to list pods from (default: default) |
TEST_SOURCE_BUCKET |
S3 bucket containing test videos for the Test Pipeline button |
sqs:ReceiveMessage,sqs:DeleteMessage,sqs:GetQueueAttributess3:GetObjectonuploads/prefixs3:PutObjectonthumbnails/prefix
sqs:GetQueueAttributess3:ListBucket,s3:GetObjecton the thumbnails prefixs3:PutObjectonuploads/prefix (for Test Pipeline button)s3:DeleteObjecton the main bucket (for Purge button)s3:ListBucket,s3:GetObjecton the test source bucket- K8s RBAC:
get,listonpodsin the worker namespace
1. cd bootstrap-terraform && terraform init && terraform apply
# Creates: GH Actions IAM role, ECR repos, tfstate bucket settings
# State: s3://tfstate-pala3105/thumbnail/bootstrap.tfstate
Or trigger the bootstrap.yml workflow from GitHub Actions.
2. terraform-apply.yml # VPC, EKS, IAM, SQS, S3, S3→SQS notification
3. build-push-a.yml # build & push images to ECR (set image_tag e.g. 1.0.5)
4. deploy-k8s.yml # install KEDA + ALB controller + apply K8s manifests
5. terraform-global.yml → action: apply, step: 1
# Creates Route53 hosted zone, ACM cert, DNS validation records
# Outputs the 4 NS records — paste them into Namecheap Custom DNS
6. Wait for NS propagation (usually a few minutes)
7. terraform-global.yml → action: apply, step: 2
# Waits for ACM cert validation, creates A record alias to ALB
# Annotates ingress with cert ARN → ALB enables HTTPS + HTTP→HTTPS redirect
8. terraform-destroy.yml # deletes K8s resources, waits for ALB, then terraform destroy
# ECR repos are NOT destroyed (managed by bootstrap)
# Note: destroy route53/ separately if needed (terraform destroy in route53/)
| Workflow | Trigger | Description |
|---|---|---|
bootstrap.yml |
manual | One-time: GH Actions role, ECR repos |
terraform-apply.yml |
manual | Provision VPC, EKS, IAM, SQS, S3 |
build-push-a.yml |
manual | Build & push Docker images to ECR |
deploy-k8s.yml |
manual | Deploy KEDA, ALB controller, K8s manifests |
terraform-global.yml |
manual | Route53 hosted zone + ACM cert + HTTPS ingress (step 1 → 2) |
terraform-monitoring.yml |
manual | Deploy CloudWatch alarms + SNS alerts |
terraform-destroy.yml |
manual | Tear down all infrastructure |
| Secret | Description |
|---|---|
AWS_ROLE_ARN |
OIDC role ARN for GitHub Actions |
ECR_BASE_FRONTEND |
ECR repo URI for frontend |
ECR_BASE_WORKER |
ECR repo URI for worker |
ECR_BASE_API |
ECR repo URI for api |
S3_BUCKET_NAME |
S3 bucket name for videos/thumbnails |
DOMAIN_NAME |
Root domain (e.g. yourdomain.com) — used by terraform-global.yml |
ALERT_EMAIL |
Email address for CloudWatch alarm notifications |
- Ingress: AWS Load Balancer Controller (ALB),
target-type: iprequired for Fargate - Worker autoscaling: KEDA
ScaledObject— scales 0→10 based on SQS queue depth (1 pod per 3 messages, polling every 15s) - Frontend/API autoscaling: HPA — scales on CPU (70% threshold, min 1, max 3)
- Auth: IRSA (IAM Roles for Service Accounts) — no static credentials in env vars
- CoreDNS: Fargate-compatible —
eks.amazonaws.com/compute-type: ec2annotation patched out on deploy - NAT Gateway: single NAT gateway for outbound internet access (ECR, SQS, S3 via gateway endpoint)
Enabled via the amazon-cloudwatch-observability EKS add-on. On Fargate, the agent is injected as a sidecar into pods automatically. Metrics available in CloudWatch under the ContainerInsights namespace:
- Pod CPU and memory usage
- Pod network I/O
- Container restarts
- Fargate vCPU and memory
All alarms notify via SNS email (configured via ALERT_EMAIL secret).
| Alarm | Metric | Threshold | Description |
|---|---|---|---|
thumbnail-dlq-not-empty |
ApproximateNumberOfMessagesVisible |
> 0 | Any message in DLQ — worker is failing |
thumbnail-dlq-message-age |
ApproximateAgeOfOldestMessage |
> 300s | Message stuck in DLQ for over 5 minutes |
thumbnail-elb-5xx-errors |
HTTPCode_ELB_5XX_Count |
> 0 | ALB-level 5xx errors |
thumbnail-target-5xx-errors |
HTTPCode_Target_5XX_Count |
> 0 | Pod-level 5xx errors |
thumbnail-target-response-time |
TargetResponseTime |
> 2s avg | Slow response time (2 consecutive minutes) |
thumbnail-active-connection-count |
ActiveConnectionCount |
> 500 | Unexpected traffic spike |
Apply monitoring after the ALB exists (after deploy-k8s.yml):
terraform-monitoring.yml → action: apply