A personal "Spotify Wrapped" for YouTube watch history. The project turns a Google Takeout export into a polished year-in-review dashboard with Databricks, Neon Postgres, FastAPI, and Next.js.
Note
π¬ Join the Community! Have questions, want to showcase your own YouTube Wrapped build, or need help configuring Databricks? Join our GitHub Discussions!
YouTube watch history is full of patterns, but the raw Takeout export is not built for exploration. This project cleans and enriches that data, then presents it as a shareable analytics experience: top artists, music share, genre split, binge sessions, listening rhythm, and loyalty insights.
| Personal insights | Listening rhythm |
|---|---|
![]() |
![]() |
| Genres and loyalty | Databricks pipeline |
|---|---|
![]() |
![]() |
- End-to-end data product from Google Takeout JSON to deployed web dashboard.
- Medallion lakehouse pipeline with Bronze, Silver, Enrichment, and Gold notebooks in Databricks.
- Typed FastAPI service backed by Neon Postgres fact tables.
- Next.js dashboard with animated cards, responsive layouts, and cached API calls.
- Music enrichment through YouTube metadata and MusicBrainz-style artist/genre normalization.
- Deployment split across Vercel for the frontend, Render for the API, and Neon for the serving database.
flowchart TD
subgraph Data_Source ["π€ Data Ingestion"]
Takeout["Google Takeout (watch-history JSON)"]
end
subgraph Databricks_Medallion ["π Databricks Medallion Lakehouse"]
Bronze[("π₯ Bronze Layer\nRaw Data Landing")]
Silver[("π₯ Silver Layer\nCleaned, Typed, Deduplicated")]
Enrichment["β‘ Enrichment Engine\nYouTube Metadata & Music Mapping"]
Gold[("π₯ Gold Layer\nAnalytics Fact Tables")]
end
subgraph Serving_Layer ["π Serving & API Layer"]
Neon[("π Neon PostgreSQL\nServing Database")]
FastAPI["β‘ FastAPI Backend\n(Hosted on Render)"]
end
subgraph Presentation_Layer ["π₯οΈ Presentation Layer"]
NextJS["π Next.js Dashboard\n(Hosted on Vercel)"]
end
Takeout -->|"Ingest JSON"| Bronze
Bronze -->|"Parse & Clean"| Silver
Silver -->|"Add Metadata"| Enrichment
Enrichment -->|"Aggregate Facts"| Gold
Gold -->|"Load via Script"| Neon
Neon -->|"SQL Query"| FastAPI
FastAPI -->|"REST /api"| NextJS
style Takeout fill:#ff0000,stroke:#333,stroke-width:2px,color:#fff
style Bronze fill:#cd7f32,stroke:#333,stroke-width:2px,color:#fff
style Silver fill:#c0c0c0,stroke:#333,stroke-width:2px,color:#000
style Gold fill:#ffd700,stroke:#333,stroke-width:2px,color:#000
style Neon fill:#336791,stroke:#333,stroke-width:2px,color:#fff
style FastAPI fill:#009688,stroke:#333,stroke-width:2px,color:#fff
style NextJS fill:#000000,stroke:#333,stroke-width:2px,color:#fff
| Layer | Tools |
|---|---|
| Data source | Google Takeout YouTube watch history |
| Lakehouse | Databricks Free Edition, Unity Catalog, Delta Lake |
| Transformation | PySpark, Databricks notebooks, medallion architecture |
| Enrichment | YouTube Data API, music metadata classification |
| Serving database | Neon Postgres |
| API | FastAPI, SQLAlchemy, Uvicorn |
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS 4, Recharts, Framer Motion |
| Deployment | Vercel, Render, Neon |
youtube-wrapped/
|-- api/ # FastAPI service
| |-- app/
| | |-- main.py # App setup, CORS, router registration
| | |-- database.py # Neon/Postgres connection
| | |-- models.py # Pydantic response models
| | `-- routers/ # Analytics endpoints
| `-- requirements.txt
|-- frontend/ # Next.js app
| |-- src/app/ # App Router pages and global styles
| |-- src/components/ # Dashboard cards and interactions
| `-- src/lib/api.ts # Typed API client
|-- notebooks/ # Databricks pipeline notebooks
| |-- bronze/ # Raw ingestion
| |-- silver/ # Clean, typed, deduplicated data
| |-- enrichment/ # Metadata and music enrichment
| `-- gold/ # Analytics fact tables
|-- scripts/
| `-- load_to_neon.py # Loads gold CSV exports into Neon
|-- docs/screenshots/ # README showcase images
`-- render.yaml # Render API deployment config
- Overview totals: total watches, music share, unique artists, days tracked.
- Main character artist: the artist that defined the listening year.
- Top artists, channels, and genres.
- Genre split across Desi, Western, and untagged music.
- Listening rhythm by hour and day.
- Binge sessions with video count and duration.
- Loyal artists ranked by listening span.
- Last pipeline run timestamp surfaced in the dashboard footer.
cd api
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reloadCreate an API environment file before running against Neon:
NEON_CONNECTION_STRING=postgresql://user:password@host:port/databaseThe API runs at http://localhost:8000, with Swagger docs at http://localhost:8000/docs.
cd frontend
npm install
npm run devCreate frontend/.env.local if your API is not running at the default local URL:
NEXT_PUBLIC_API_URL=http://localhost:8000The dashboard runs at http://localhost:3000.
- Export YouTube watch history from Google Takeout.
- Run
notebooks/bronze/02_bronze_ingest.ipynbto land the raw history in Databricks. - Run
notebooks/silver/03_silver_clean.ipynbto parse timestamps, clean titles, type columns, and deduplicate rows. - Run
notebooks/enrichment/04_enrich_youtube.ipynbto add video, channel, artist, and genre context. - Run
notebooks/gold/05_gold_facts.ipynbto produce dashboard-ready fact tables. - Export
fact_*.csvfiles intodata/gold_exports/. - Load the serving tables into Neon:
python scripts/load_to_neon.pyRaw exports, CSVs, and secrets are intentionally ignored by Git so personal watch history is not committed.
The FastAPI app exposes read-only analytics endpoints under /api, including:
/api/overview
/api/top-artists
/api/top-channels
/api/top-genres
/api/genre-split
/api/listening-by-hour
/api/listening-by-dayofweek
/api/timeline
/api/main-character
/api/binge-sessions
/api/night-owl-score
/api/loyal-artists
/api/last-pipeline-run
- Frontend: deployed on Vercel.
- Backend: deployed on Render using
render.yaml. - Database: hosted on Neon Postgres.
- Pipeline: run in Databricks, then loaded into Neon with
scripts/load_to_neon.py.
MIT License. See LICENSE.




