PSILO

Summary
Getting Started
Project Structure
Tech Stack
AWS Architecture
Status
Roadmap
Problems Encountered
Google Takeout Strategy
Key Decisions

Summary

Personal Silo. A personal cloud storage built with AWS, NextJS, Typescript. Designed as a self-hosted alternative to commercial storage solutions. Optimized for cost using S3 Glacier Flexible Retrieval for cold storage.

Built as a learning project to explore AWS architecture, CDK infrastructure-as-code, and full-stack TypeScript. Integrated with Claude Code for AI-assisted development.

Getting Started

Prerequisites

Node.js v22+
AWS CLI configured with appropriate credentials
AWS CDK v2
An AWS account

AWS Service (Auto-provisioned via CDK)

Provisioned automatically via AWS CDK. See infrastructure/ for the full stack definition.
- Core services include:
  - Cognito - authentication
  - API Gateway + Lambda - request handling and business logic
  - S3 - object storage with lifecycle rules (originals transition to Glacier)
  - CloudFront - CDN for thumbnail/preview delivery with signed URLs (24h TTL)
  - SQS + DLQ - for async metadata processing and thumbnail generation
  - EventBridge - listens for S3 storage class transitions and Glacier restore completions
  - Aurora Serverless v2 - stores users, photo metadata, storage class state, retrieval batches
  - AWS Batch (Fargate Spot) + ECR - video thumbnail and preview generation via FFmpeg
  - ECS Fargate + ECR - batch Glacier zip download pipeline
  - SES - email notifications when Glacier restores complete (single-file flow)

Project Structure

├── frontend/                        # Next.js app
├── infrastructure/                  # AWS CDK stacks
│     └── lib/constructs/            # CDK constructs (storage, database, auth, upload-pipeline,
│                                    #   video-pipeline, cdn, zip-pipeline, api)
└── services/                        # Lambda functions + shared code
      ├── generate-presigned-url/    # pHash duplicate check + presigned PUT URL
      ├── manage-photos/             # List, delete, trash, profile/plan endpoints (CloudFront signed URLs)
      ├── manage-albums/             # CRUD albums + album-photo associations (CloudFront signed URLs)
      ├── manage-retrieval/          # List retrieval batches and per-file restore status
      ├── request-restore/           # POST /files/restore — presigned URL or Glacier restore
      ├── handle-restore-completed/  # EventBridge — SES email when Glacier restore finishes (email flow)
      ├── handle-glacier-job-complete/ # SNS — coordinates zip pipeline when all files are restored
      ├── user-provisioning/         # Post-Cognito confirmation setup
      ├── process-photo-metadata/    # EXIF + thumbnail + preview + pHash; submits Batch jobs (videos)
      ├── lifecycle-transition/      # Tracks S3 Glacier transitions (EventBridge)
      ├── handle-upload-dlq/         # Dead-letter queue handler
      ├── purge-deleted-photos/      # Daily cron — hard-deletes soft-deleted photos past retention
      ├── batch/
      │     ├── video-thumbnail-processor/  # Fargate job: FFmpeg thumbnail + 5s preview generation
      │     └── zip-processor/             # Fargate job: stream restored files → zip → S3
      ├── shared/                    # Schema + DB client + CloudFront signer + pHash (bundled by esbuild)
      └── migrations/                # Drizzle SQL migrations (0000–0019)

Frontend

The user-facing application built with Next.js and Typescript. Handles all UI routing and client-side logic. Communicates with backend services via API Gateway through the BFF pattern.

Infrastructure

AWS CDK project that provisions and manages all cloud resources. Running the CDK deploy will automatically set up all required AWS services. See infrastructure/ for stack definitions.

Services

Lambda functions written in TypeScript, each handling a specific domain. Deployed automatically as part of the infrastructure stack. Shared code lives in services/shared/ and is bundled by esbuild at deploy time.

Tech Stack

Layer	Technology
Frontend	Next.js, TypeScript
Backend	AWS Lambda, Node.js v22+
Database	Aurora Serverless v2 (Drizzle ORM)
Infrastructure	AWS CDK (construct-per-domain)
Storage	S3 Glacier Flexible Retrieval
CDN	CloudFront (signed URLs, edge cache)
Auth	Cognito
Queue	SQS + DLQ
Video	AWS Batch (Fargate Spot) + FFmpeg
Zip Download	ECS Fargate + archiver
Email	SES
Registry	ECR

AWS Architecture

graph TD
User["User (Browser)"]
FE["Frontend<br>Next.js"]
APIGW["API Gateway"]
Cognito["Cognito<br>Auth"]
CF["CloudFront<br>CDN (signed URLs)"]
APILambda["API Lambdas<br>(manage-photos, manage-albums,<br>manage-retrieval)"]
PresignLambda["generate-presigned-url<br>(quota + duplicate check)"]
RestoreLambda["request-restore"]
HandleRestoreLambda["handle-restore-completed<br>(email flow)"]
GlacierJobLambda["handle-glacier-job-complete<br>(zip flow coordinator)"]
ProcessLambda["process-photo-metadata<br>(EXIF + thumbnail + preview + pHash)"]
LifecycleLambda["lifecycle-transition"]
DLQLambda["handle-upload-dlq"]
SQS["SQS Upload Queue"]
DLQ["Dead-Letter Queue"]
S3["S3<br>(Standard + Glacier)"]
ZipBucket["S3 Zip Bucket"]
Aurora["Aurora Serverless<br>Metadata + Retrieval Batches"]
EventBridge["EventBridge<br>S3 Events"]
SNS["SNS<br>Restore Completed"]
Batch["AWS Batch<br>(Fargate Spot + FFmpeg)"]
ZipTask["ECS Fargate<br>zip-processor"]
ECR["ECR<br>(video-processor + zip-processor)"]
SES["SES<br>Email"]

User --> FE
FE --> Cognito
FE --> APIGW
APIGW --> APILambda
APIGW --> PresignLambda
APIGW --> RestoreLambda
PresignLambda --> Aurora
PresignLambda --> S3
APILambda --> CF
APILambda --> Aurora
CF --> S3
RestoreLambda --> S3
RestoreLambda --> Aurora
S3 -->|ObjectCreated| SQS
SQS --> ProcessLambda
ProcessLambda --> S3
ProcessLambda --> Aurora
ProcessLambda -->|videos| Batch
ECR --> Batch
ECR --> ZipTask
Batch --> S3
Batch --> Aurora
SQS -->|after 3 retries| DLQ
DLQ --> DLQLambda
S3 -->|StorageClassChanged| EventBridge
S3 -->|RestoreCompleted email flow| EventBridge
S3 -->|RestoreCompleted zip flow| SNS
EventBridge --> LifecycleLambda
EventBridge --> HandleRestoreLambda
SNS --> GlacierJobLambda
LifecycleLambda --> Aurora
HandleRestoreLambda --> Aurora
HandleRestoreLambda --> SES
GlacierJobLambda --> Aurora
GlacierJobLambda -->|all files ready| ZipTask
ZipTask --> S3
ZipTask --> ZipBucket
ZipTask --> Aurora
SES --> User

Status

Currently in active development

Roadmap

Add Redis or another caching layer for hot reads and duplicate-check-adjacent lookups
Add a notifications feature for upload completion / processing completion
Simplify restore requests to an Expedited-only path for now and remove Standard/Bulk from the user flow
Audit storage and billing calculations against actual write paths, transitions, and retrieval flows
Add photo sorting and filtering
Document operational edge cases and recovery steps as they are discovered

Problems Encountered

Google Photos / Google Takeout exports are often split across multiple zip files, and albums or years can be mixed between archives. Treating the whole export as one giant import is error-prone.
Aurora Data API payload limits make large DB-backed hash scans fragile. For duplicate checking, broad result sets are unsafe; treat roughly 1 MB responses as a practical ceiling and keep queries narrow.
Metadata processing can fail on individual files, so the retry path matters. The app already exposes POST /api/photos/retry-failed for re-queueing failed items.
Batch duplicate handling is intentionally narrower than single-file upload checks. Batch preflight is currently for path-based existing duplicates plus same-batch local duplicate heuristics, not a full DB-wide pHash pass for every file in the batch.

Google Takeout Strategy

Import Google Takeout in smaller slices, ideally per year or per explicit request, not as one full-account migration.
Always import from an extracted folder so media files and JSON sidecars stay together.
Keep each import bounded enough that duplicate review, retry handling, and sidecar matching remain manageable.
Use the existing google-takeout/{importId}/... pathing so each import run is isolated, while normalized_import_path still lets the backend detect re-imports across different export runs.
Expect some media files to arrive without matching sidecars and some JSON files to remain unmatched; review those counts after every import batch before continuing to the next year/request.

Key Decisions

NextJS - frontend tech stack. ADR-001
Monorepo - repository architecture. ADR-002
AWS - cloud service provider. ADR-003
AWS S3 Glacier Flexible - cost optimization for cold storage. ADR-004
AWS Aurora Serverless v2 - database. ADR-005
Drizzle - database ORM. ADR-006
Backend for Frontends (BFF) Pattern - design pattern for the App. ADR-007
SQS for async photo metadata processing - decoupled background processing with DLQ. ADR-008
Aurora Data API (no VPC) - Lambda-to-database connectivity without NAT gateways. ADR-009
Thumbnail generation pipeline - fast grid loading while keeping originals in Glacier. ADR-010
EventBridge for storage class tracking - sync Glacier transition state to DB without polling. ADR-011
AWS Batch (Fargate Spot) for video thumbnails - FFmpeg video processing outside Lambda constraints. ADR-012
CloudFront signed URLs - edge-cached thumbnail/preview delivery with access control. ADR-013
pHash duplicate detection - perceptual hashing to catch near-duplicate uploads before storage. ADR-014
ECS Fargate zip pipeline for batch Glacier downloads - single zip download for album restores. ADR-015

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.github/workflows		.github/workflows
.husky		.husky
docs		docs
documentation/ADRs		documentation/ADRs
frontend		frontend
infrastructure		infrastructure
scripts		scripts
services		services
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
DOWNLOAD-FEATURE.md		DOWNLOAD-FEATURE.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PSILO

Summary

Getting Started

Prerequisites

AWS Service (Auto-provisioned via CDK)

Project Structure

Frontend

Infrastructure

Services

Tech Stack

AWS Architecture

Status

Roadmap

Problems Encountered

Google Takeout Strategy

Key Decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PSILO

Summary

Getting Started

Prerequisites

AWS Service (Auto-provisioned via CDK)

Project Structure

Frontend

Infrastructure

Services

Tech Stack

AWS Architecture

Status

Roadmap

Problems Encountered

Google Takeout Strategy

Key Decisions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages