- Summary
- Getting Started
- Project Structure
- Tech Stack
- AWS Architecture
- Status
- Roadmap
- Problems Encountered
- Google Takeout Strategy
- Key Decisions
Personal Silo. A personal cloud storage built with AWS, NextJS, Typescript. Designed as a self-hosted alternative to commercial storage solutions. Optimized for cost using S3 Glacier Flexible Retrieval for cold storage.
Built as a learning project to explore AWS architecture, CDK infrastructure-as-code, and full-stack TypeScript. Integrated with Claude Code for AI-assisted development.
- Node.js v22+
- AWS CLI configured with appropriate credentials
- AWS CDK v2
- An AWS account
- Provisioned automatically via AWS CDK. See
infrastructure/for the full stack definition.- Core services include:
- Cognito - authentication
- API Gateway + Lambda - request handling and business logic
- S3 - object storage with lifecycle rules (originals transition to Glacier)
- CloudFront - CDN for thumbnail/preview delivery with signed URLs (24h TTL)
- SQS + DLQ - for async metadata processing and thumbnail generation
- EventBridge - listens for S3 storage class transitions and Glacier restore completions
- Aurora Serverless v2 - stores users, photo metadata, storage class state, retrieval batches
- AWS Batch (Fargate Spot) + ECR - video thumbnail and preview generation via FFmpeg
- ECS Fargate + ECR - batch Glacier zip download pipeline
- SES - email notifications when Glacier restores complete (single-file flow)
- Core services include:
├── frontend/ # Next.js app
├── infrastructure/ # AWS CDK stacks
│ └── lib/constructs/ # CDK constructs (storage, database, auth, upload-pipeline,
│ # video-pipeline, cdn, zip-pipeline, api)
└── services/ # Lambda functions + shared code
├── generate-presigned-url/ # pHash duplicate check + presigned PUT URL
├── manage-photos/ # List, delete, trash, profile/plan endpoints (CloudFront signed URLs)
├── manage-albums/ # CRUD albums + album-photo associations (CloudFront signed URLs)
├── manage-retrieval/ # List retrieval batches and per-file restore status
├── request-restore/ # POST /files/restore — presigned URL or Glacier restore
├── handle-restore-completed/ # EventBridge — SES email when Glacier restore finishes (email flow)
├── handle-glacier-job-complete/ # SNS — coordinates zip pipeline when all files are restored
├── user-provisioning/ # Post-Cognito confirmation setup
├── process-photo-metadata/ # EXIF + thumbnail + preview + pHash; submits Batch jobs (videos)
├── lifecycle-transition/ # Tracks S3 Glacier transitions (EventBridge)
├── handle-upload-dlq/ # Dead-letter queue handler
├── purge-deleted-photos/ # Daily cron — hard-deletes soft-deleted photos past retention
├── batch/
│ ├── video-thumbnail-processor/ # Fargate job: FFmpeg thumbnail + 5s preview generation
│ └── zip-processor/ # Fargate job: stream restored files → zip → S3
├── shared/ # Schema + DB client + CloudFront signer + pHash (bundled by esbuild)
└── migrations/ # Drizzle SQL migrations (0000–0019)
The user-facing application built with Next.js and Typescript. Handles all UI routing and client-side logic. Communicates with backend services via API Gateway through the BFF pattern.
AWS CDK project that provisions and manages all cloud resources. Running the CDK deploy will automatically set up all required AWS services. See infrastructure/ for stack definitions.
Lambda functions written in TypeScript, each handling a specific domain. Deployed automatically as part of the infrastructure stack. Shared code lives in services/shared/ and is bundled by esbuild at deploy time.
| Layer | Technology |
|---|---|
| Frontend | Next.js, TypeScript |
| Backend | AWS Lambda, Node.js v22+ |
| Database | Aurora Serverless v2 (Drizzle ORM) |
| Infrastructure | AWS CDK (construct-per-domain) |
| Storage | S3 Glacier Flexible Retrieval |
| CDN | CloudFront (signed URLs, edge cache) |
| Auth | Cognito |
| Queue | SQS + DLQ |
| Video | AWS Batch (Fargate Spot) + FFmpeg |
| Zip Download | ECS Fargate + archiver |
| SES | |
| Registry | ECR |
graph TD
User["User (Browser)"]
FE["Frontend<br>Next.js"]
APIGW["API Gateway"]
Cognito["Cognito<br>Auth"]
CF["CloudFront<br>CDN (signed URLs)"]
APILambda["API Lambdas<br>(manage-photos, manage-albums,<br>manage-retrieval)"]
PresignLambda["generate-presigned-url<br>(quota + duplicate check)"]
RestoreLambda["request-restore"]
HandleRestoreLambda["handle-restore-completed<br>(email flow)"]
GlacierJobLambda["handle-glacier-job-complete<br>(zip flow coordinator)"]
ProcessLambda["process-photo-metadata<br>(EXIF + thumbnail + preview + pHash)"]
LifecycleLambda["lifecycle-transition"]
DLQLambda["handle-upload-dlq"]
SQS["SQS Upload Queue"]
DLQ["Dead-Letter Queue"]
S3["S3<br>(Standard + Glacier)"]
ZipBucket["S3 Zip Bucket"]
Aurora["Aurora Serverless<br>Metadata + Retrieval Batches"]
EventBridge["EventBridge<br>S3 Events"]
SNS["SNS<br>Restore Completed"]
Batch["AWS Batch<br>(Fargate Spot + FFmpeg)"]
ZipTask["ECS Fargate<br>zip-processor"]
ECR["ECR<br>(video-processor + zip-processor)"]
SES["SES<br>Email"]
User --> FE
FE --> Cognito
FE --> APIGW
APIGW --> APILambda
APIGW --> PresignLambda
APIGW --> RestoreLambda
PresignLambda --> Aurora
PresignLambda --> S3
APILambda --> CF
APILambda --> Aurora
CF --> S3
RestoreLambda --> S3
RestoreLambda --> Aurora
S3 -->|ObjectCreated| SQS
SQS --> ProcessLambda
ProcessLambda --> S3
ProcessLambda --> Aurora
ProcessLambda -->|videos| Batch
ECR --> Batch
ECR --> ZipTask
Batch --> S3
Batch --> Aurora
SQS -->|after 3 retries| DLQ
DLQ --> DLQLambda
S3 -->|StorageClassChanged| EventBridge
S3 -->|RestoreCompleted email flow| EventBridge
S3 -->|RestoreCompleted zip flow| SNS
EventBridge --> LifecycleLambda
EventBridge --> HandleRestoreLambda
SNS --> GlacierJobLambda
LifecycleLambda --> Aurora
HandleRestoreLambda --> Aurora
HandleRestoreLambda --> SES
GlacierJobLambda --> Aurora
GlacierJobLambda -->|all files ready| ZipTask
ZipTask --> S3
ZipTask --> ZipBucket
ZipTask --> Aurora
SES --> User
Currently in active development
- Infrastructure Setup
- Authentication (Cognito)
- File Upload
- File Retrieval
- Album Management (CRUD, rename)
- Thumbnail generation (JPEG/GIF/WebP format-preserving, 800×800, served from Standard)
- S3 Glacier lifecycle for originals (cost optimization)
- Storage usage dashboard with per-class cost breakdown + retrieval cost estimates
- Infinite scroll on dashboard and album detail
- Bulk photo delete
- Trash bin + photo restore
- Video support (upload + thumbnail cover + hover preview via AWS Batch + FFmpeg)
- Full-resolution photo viewer (STANDARD: full-res; GLACIER: preview or thumbnail fallback)
- Full-resolution photo download (Standard: immediate presigned URL; Glacier: restore + SES email)
- Batch Glacier album download (zip pipeline via ECS Fargate)
- Glacier restore tier selection (Expedited / Standard / Bulk)
- Retrieval batch tracking + restore requests page with Download Zip button
- CloudFront CDN for thumbnail/preview delivery (24h edge caching)
- pHash perceptual duplicate detection at upload time
- Tier-aware storage limits, nudges, and settings page
- CDK stack refactored into per-domain constructs
- Add Redis or another caching layer for hot reads and duplicate-check-adjacent lookups
- Add a notifications feature for upload completion / processing completion
- Simplify restore requests to an Expedited-only path for now and remove Standard/Bulk from the user flow
- Audit storage and billing calculations against actual write paths, transitions, and retrieval flows
- Add photo sorting and filtering
- Document operational edge cases and recovery steps as they are discovered
- Google Photos / Google Takeout exports are often split across multiple zip files, and albums or years can be mixed between archives. Treating the whole export as one giant import is error-prone.
- Aurora Data API payload limits make large DB-backed hash scans fragile. For duplicate checking, broad result sets are unsafe; treat roughly
1 MBresponses as a practical ceiling and keep queries narrow. - Metadata processing can fail on individual files, so the retry path matters. The app already exposes
POST /api/photos/retry-failedfor re-queueing failed items. - Batch duplicate handling is intentionally narrower than single-file upload checks. Batch preflight is currently for path-based existing duplicates plus same-batch local duplicate heuristics, not a full DB-wide pHash pass for every file in the batch.
- Import Google Takeout in smaller slices, ideally per year or per explicit request, not as one full-account migration.
- Always import from an extracted folder so media files and JSON sidecars stay together.
- Keep each import bounded enough that duplicate review, retry handling, and sidecar matching remain manageable.
- Use the existing
google-takeout/{importId}/...pathing so each import run is isolated, whilenormalized_import_pathstill lets the backend detect re-imports across different export runs. - Expect some media files to arrive without matching sidecars and some JSON files to remain unmatched; review those counts after every import batch before continuing to the next year/request.
- NextJS - frontend tech stack. ADR-001
- Monorepo - repository architecture. ADR-002
- AWS - cloud service provider. ADR-003
- AWS S3 Glacier Flexible - cost optimization for cold storage. ADR-004
- AWS Aurora Serverless v2 - database. ADR-005
- Drizzle - database ORM. ADR-006
- Backend for Frontends (BFF) Pattern - design pattern for the App. ADR-007
- SQS for async photo metadata processing - decoupled background processing with DLQ. ADR-008
- Aurora Data API (no VPC) - Lambda-to-database connectivity without NAT gateways. ADR-009
- Thumbnail generation pipeline - fast grid loading while keeping originals in Glacier. ADR-010
- EventBridge for storage class tracking - sync Glacier transition state to DB without polling. ADR-011
- AWS Batch (Fargate Spot) for video thumbnails - FFmpeg video processing outside Lambda constraints. ADR-012
- CloudFront signed URLs - edge-cached thumbnail/preview delivery with access control. ADR-013
- pHash duplicate detection - perceptual hashing to catch near-duplicate uploads before storage. ADR-014
- ECS Fargate zip pipeline for batch Glacier downloads - single zip download for album restores. ADR-015