MovieMap

See your film taste. Navigate it. Discover what to watch next.

MovieMap turns the MovieLens 25M dataset into an interactive galaxy of 2,500 films. Each node is a movie, positioned by thematic similarity using a blend of genome tags and collaborative filtering. Enter a Letterboxd username and your films light up across the map. Slide between content similarity and taste similarity to watch the entire graph reorganise in real time. Get personalised recommendations for solo viewing, or overlay two profiles and find the perfect film for a movie night.

The Problem

Recommendation algorithms are black boxes. They hand you a ranked list with no context for why something was chosen or how it relates to what you already love. You can't see the shape of your own taste or explore the space between genres.

The Solution

MovieMap makes film similarity spatial. A force-directed graph clusters movies by how they actually relate — thrillers near thrillers, arthouse near arthouse, with crossover films bridging the gaps. Your watched, rated, and liked films become a visible footprint on this map, and recommendations are points you haven't reached yet. A content/taste slider lets you morph the entire layout between what films are about and what audiences enjoy together.

Features

Interactive Film Graph --- 2,500 nodes rendered on a Canvas-based D3 force simulation at 60fps. Zoom, pan, hover for tooltips, click for full detail. Cluster labels crossfade between content and taste layouts as you slide. Edges glow between related films. Selected nodes pulse white with highlighted connections.

Dual Layout Interpolation --- Two complete UMAP projections: one from genome tag vectors (what films are about) and one from collaborative filtering vectors (what audiences enjoy together). The content/taste slider smoothly interpolates every node position, edge weight, and cluster label between these two layouts — the whole graph reorganises in real time with zero API calls.

Letterboxd Integration --- Enter any public username. MovieMap scrapes rated, liked, watched, and watchlist data, resolves TMDB IDs in parallel (6 concurrent workers with Cloudflare bypass), and maps everything onto the graph. Your films glow amber; a second user glows teal.

Solo Mode --- Highlights your films and computes a weighted taste centroid from your ratings. Every unseen film gets a content score (genome similarity) and a taste score (CF similarity), blended in real time by the slider. Recommendations update instantly as you slide.

Duo Mode --- Overlay two users. See where your tastes overlap. A compatibility score combines centroid similarity, shared highly-rated films, and genre overlap. Duo recommendations are drawn from the intersection zone — films that appeal to both profiles but neither has seen.

Taste Profiler (No Letterboxd) --- No account? Pick up to 3 favourite films to seed your profile, then answer dynamically selected questions on a 0–5 scale. The system draws from a pool of 31 questions across 5 categories (mood, genre, style, tone, setting), with follow-up questions triggered by strong answers. The content/taste slider works identically to the Letterboxd flow — same real-time blending, same score format.

Content / Taste Slider --- All recommendation blending happens client-side. The backend returns separate content and taste scores per candidate; the frontend computes score = (1-blend) * contentScore + blend * tasteScore in a useMemo. Moving the slider simultaneously repositions all graph nodes via D3 force simulation interpolation, crossfades cluster labels, and re-ranks recommendations.

Film Detail Panel --- Click any node to open a frosted glass panel (centre-right) showing TMDB poster, genres, genome tags with relevance scores, average rating, a generated description, similar films with similarity percentages, and links to IMDb, TMDB, and JustWatch.

Film Search --- Client-side search across all 2,500 films. Select a result and the graph pans to its node, opening the detail panel.

How It Works

1. Data Pipeline

The preprocessing script downloads MovieLens ml-25m and builds a blended similarity model from two complementary signals:

Genome tags (1,128 dimensions per film) — captures what a film is about: tone, theme, style, era. Pairwise cosine similarity tells you which films share thematic DNA.
Collaborative filtering (200 dimensions from SVD of the ratings matrix) — captures who watches these films together. Two films are similar if the same users tend to rate them similarly.

These are combined at a 60/40 genome/CF blend (with fallback to genome-only for films with sparse ratings). The blended matrix drives graph edges.

Content UMAP projects the genome feature matrix to 2D coordinates, preserving global structure. Taste UMAP projects the CF vectors to a separate 2D layout (n_neighbors=15, min_dist=0.1, cosine metric). Both are normalised to the same coordinate range.

Cluster labelling uses TF-IDF: each cluster's mean genome vector minus the global mean reveals its most distinctive tags. Labels are deduplicated across clusters and generic tags (e.g. "good", "classic", "oscar winner") are excluded.

2. Letterboxd Scraping

Public Letterboxd profiles are scraped without authentication. The scraper:

Paginates through rated, liked, watched, and watchlist pages
Resolves Letterboxd film slugs to TMDB IDs via parallel HTTP requests (6 concurrent workers using cloudscraper for Cloudflare bypass)
Bridges TMDB IDs to MovieLens IDs via the dataset's links.csv
Caches resolved slugs to disk, so repeated lookups are instant
Supplements with RSS feeds for additional TMDB ID coverage

3. Recommendation Engine

Solo: Rating-weighted centroids are computed in both genome space (content) and CF space (taste). Every unseen film gets two scores: cosine similarity to the genome centroid and cosine similarity to the CF centroid. Multi-similarity bonuses reward films that appeal across multiple highly-rated references. Scores are min-max normalised independently, then blended client-side.

Duo: Each user gets centroids in both spaces. Films are scored by the minimum similarity to both centroids — a film must appeal to both, not just one. Compatibility is a weighted blend of centroid cosine similarity, shared highly-rated film count, and genre Jaccard overlap.

Taste profiler: Answers on a 0–5 scale are converted to weights in [-1, +1] and applied to genome tag indices. Optional seed movies contribute their averaged genome vector with a 2x anchor weight. For the taste score, a CF proxy profile is built from the seed movies' CF vectors (or from the top-20 genome-similar films if no seeds are provided). This gives meaningful content vs taste differentiation even without a Letterboxd profile.

4. Dynamic Question Selection

The question pool contains 31 questions across 5 categories: mood (5), genre (10 including 3 follow-ups), style (6 including 1 follow-up), tone (5 including 1 follow-up), and setting (5). The frontend dynamically picks the next question by:

Checking for triggered follow-ups (e.g. "Do you prefer psychological horror over gory slashers?" only appears if the user scored horror >= 3)
Selecting from the least-covered category to ensure breadth
Stopping after 15 questions (submit unlocked after 5)

Architecture

preprocess/          Offline pipeline: MovieLens -> similarity matrix -> UMAP -> clusters
backend/             FastAPI: scraping, recommendations, graph and film detail APIs
frontend/            React + D3.js Canvas graph, Tailwind CSS, Framer Motion

API

Endpoint	Method	Description
`/api/graph`	GET	Pre-computed graph (nodes, edges, content clusters, taste clusters, dual coords)
`/api/user/{username}`	GET	Scrape Letterboxd, match to graph, compute taste centroid
`/api/scores/solo`	POST	Content + taste scores for all unwatched candidates (real-time blending)
`/api/scores/duo`	POST	Duo candidate scores + compatibility
`/api/scores/taste`	POST	Candidate scores from taste profiler answers + seed movies
`/api/recommend/solo`	POST	Single-user recommendations with exploration control
`/api/recommend/duo`	POST	Two-user recommendations + compatibility score
`/api/recommend/taste`	POST	Recommendations from taste profiler answers
`/api/movie/{id}`	GET	Film detail: TMDB poster, tags, similar films, description, watch links
`/api/taste/questions`	GET	Full question pool with categories and follow-up triggers
`/api/films/popular`	GET	Genre-diverse popular films for live rating

Tech Stack

Frontend --- React 18, Vite, D3.js (Canvas renderer), Tailwind CSS, Framer Motion

Backend --- FastAPI, NumPy, scikit-learn, aiohttp, BeautifulSoup4, cloudscraper

Data --- MovieLens 25M, UMAP, K-means, SciPy sparse matrices, TMDB API (posters)

Getting Started

Prerequisites

Python 3.11+
Node.js 18+
TMDB API token (optional, for film posters)
Pre-computed data files in backend/data/ (generated by the preprocessing pipeline)

Backend

cd backend
pip install -r requirements.txt
cp ../.env.example ../.env  # add your TMDB token if desired
uvicorn main:app --port 8000

Frontend

cd frontend
npm install
npm run dev

The frontend proxies /api requests to localhost:8000. Open http://localhost:5173.

Data Preprocessing (optional)

Only required to regenerate the graph from scratch. Downloads ~250MB of MovieLens data.

cd preprocess
pip install -r requirements.txt
python preprocess.py
python compute_taste_umap.py  # generates taste UMAP layout

Project Structure

backend/
  main.py              FastAPI app, endpoints, taste questions, TMDB integration
  recommender.py       Recommendation engine (solo, duo, taste profile, all-scores modes)
  scraper.py           Letterboxd scraper with parallel slug resolution
  models.py            Pydantic request/response schemas
  data/                Pre-computed graph, genome vectors, CF vectors, taste coords

frontend/
  src/
    App.jsx            Root component, state management, client-side score blending
    components/
      Graph.jsx        Canvas wrapper for the D3 force graph
      Sidebar.jsx      Search bar, mode toggle, user cards, blend slider, recommendations
      FilmDetail.jsx   Film detail panel with TMDB poster, tags, similar films, watch links
      LiveRating.jsx   Taste profiler (movie picker + dynamic question selection)
      UserInput.jsx    Letterboxd username entry
      RecommendationList.jsx  Scored recommendations with reasons
      CompatibilityCard.jsx   Duo mode compatibility stats
    hooks/
      useGraph.js      D3 simulation, Canvas rendering, dual layout interpolation, edge highlighting
      useApi.js        API client with all endpoint bindings
    styles/
      globals.css      Tailwind directives, custom animations, glass panel effects

preprocess/
  preprocess.py        Full pipeline: download, blend, UMAP, cluster, export
  compute_taste_umap.py  Taste UMAP from CF vectors

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude		.claude
backend		backend
frontend		frontend
preprocess		preprocess
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
render.yaml		render.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MovieMap

The Problem

The Solution

Features

How It Works

1. Data Pipeline

2. Letterboxd Scraping

3. Recommendation Engine

4. Dynamic Question Selection

Architecture

API

Tech Stack

Getting Started

Prerequisites

Backend

Frontend

Data Preprocessing (optional)

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MovieMap

The Problem

The Solution

Features

How It Works

1. Data Pipeline

2. Letterboxd Scraping

3. Recommendation Engine

4. Dynamic Question Selection

Architecture

API

Tech Stack

Getting Started

Prerequisites

Backend

Frontend

Data Preprocessing (optional)

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages