See your film taste. Navigate it. Discover what to watch next.
MovieMap turns the MovieLens 25M dataset into an interactive galaxy of 2,500 films. Each node is a movie, positioned by thematic similarity using a blend of genome tags and collaborative filtering. Enter a Letterboxd username and your films light up across the map. Slide between content similarity and taste similarity to watch the entire graph reorganise in real time. Get personalised recommendations for solo viewing, or overlay two profiles and find the perfect film for a movie night.
Recommendation algorithms are black boxes. They hand you a ranked list with no context for why something was chosen or how it relates to what you already love. You can't see the shape of your own taste or explore the space between genres.
MovieMap makes film similarity spatial. A force-directed graph clusters movies by how they actually relate — thrillers near thrillers, arthouse near arthouse, with crossover films bridging the gaps. Your watched, rated, and liked films become a visible footprint on this map, and recommendations are points you haven't reached yet. A content/taste slider lets you morph the entire layout between what films are about and what audiences enjoy together.
Interactive Film Graph --- 2,500 nodes rendered on a Canvas-based D3 force simulation at 60fps. Zoom, pan, hover for tooltips, click for full detail. Cluster labels crossfade between content and taste layouts as you slide. Edges glow between related films. Selected nodes pulse white with highlighted connections.
Dual Layout Interpolation --- Two complete UMAP projections: one from genome tag vectors (what films are about) and one from collaborative filtering vectors (what audiences enjoy together). The content/taste slider smoothly interpolates every node position, edge weight, and cluster label between these two layouts — the whole graph reorganises in real time with zero API calls.
Letterboxd Integration --- Enter any public username. MovieMap scrapes rated, liked, watched, and watchlist data, resolves TMDB IDs in parallel (6 concurrent workers with Cloudflare bypass), and maps everything onto the graph. Your films glow amber; a second user glows teal.
Solo Mode --- Highlights your films and computes a weighted taste centroid from your ratings. Every unseen film gets a content score (genome similarity) and a taste score (CF similarity), blended in real time by the slider. Recommendations update instantly as you slide.
Duo Mode --- Overlay two users. See where your tastes overlap. A compatibility score combines centroid similarity, shared highly-rated films, and genre overlap. Duo recommendations are drawn from the intersection zone — films that appeal to both profiles but neither has seen.
Taste Profiler (No Letterboxd) --- No account? Pick up to 3 favourite films to seed your profile, then answer dynamically selected questions on a 0–5 scale. The system draws from a pool of 31 questions across 5 categories (mood, genre, style, tone, setting), with follow-up questions triggered by strong answers. The content/taste slider works identically to the Letterboxd flow — same real-time blending, same score format.
Content / Taste Slider --- All recommendation blending happens client-side. The backend returns separate content and taste scores per candidate; the frontend computes score = (1-blend) * contentScore + blend * tasteScore in a useMemo. Moving the slider simultaneously repositions all graph nodes via D3 force simulation interpolation, crossfades cluster labels, and re-ranks recommendations.
Film Detail Panel --- Click any node to open a frosted glass panel (centre-right) showing TMDB poster, genres, genome tags with relevance scores, average rating, a generated description, similar films with similarity percentages, and links to IMDb, TMDB, and JustWatch.
Film Search --- Client-side search across all 2,500 films. Select a result and the graph pans to its node, opening the detail panel.
The preprocessing script downloads MovieLens ml-25m and builds a blended similarity model from two complementary signals:
- Genome tags (1,128 dimensions per film) — captures what a film is about: tone, theme, style, era. Pairwise cosine similarity tells you which films share thematic DNA.
- Collaborative filtering (200 dimensions from SVD of the ratings matrix) — captures who watches these films together. Two films are similar if the same users tend to rate them similarly.
These are combined at a 60/40 genome/CF blend (with fallback to genome-only for films with sparse ratings). The blended matrix drives graph edges.
Content UMAP projects the genome feature matrix to 2D coordinates, preserving global structure. Taste UMAP projects the CF vectors to a separate 2D layout (n_neighbors=15, min_dist=0.1, cosine metric). Both are normalised to the same coordinate range.
Cluster labelling uses TF-IDF: each cluster's mean genome vector minus the global mean reveals its most distinctive tags. Labels are deduplicated across clusters and generic tags (e.g. "good", "classic", "oscar winner") are excluded.
Public Letterboxd profiles are scraped without authentication. The scraper:
- Paginates through rated, liked, watched, and watchlist pages
- Resolves Letterboxd film slugs to TMDB IDs via parallel HTTP requests (6 concurrent workers using cloudscraper for Cloudflare bypass)
- Bridges TMDB IDs to MovieLens IDs via the dataset's
links.csv - Caches resolved slugs to disk, so repeated lookups are instant
- Supplements with RSS feeds for additional TMDB ID coverage
Solo: Rating-weighted centroids are computed in both genome space (content) and CF space (taste). Every unseen film gets two scores: cosine similarity to the genome centroid and cosine similarity to the CF centroid. Multi-similarity bonuses reward films that appeal across multiple highly-rated references. Scores are min-max normalised independently, then blended client-side.
Duo: Each user gets centroids in both spaces. Films are scored by the minimum similarity to both centroids — a film must appeal to both, not just one. Compatibility is a weighted blend of centroid cosine similarity, shared highly-rated film count, and genre Jaccard overlap.
Taste profiler: Answers on a 0–5 scale are converted to weights in [-1, +1] and applied to genome tag indices. Optional seed movies contribute their averaged genome vector with a 2x anchor weight. For the taste score, a CF proxy profile is built from the seed movies' CF vectors (or from the top-20 genome-similar films if no seeds are provided). This gives meaningful content vs taste differentiation even without a Letterboxd profile.
The question pool contains 31 questions across 5 categories: mood (5), genre (10 including 3 follow-ups), style (6 including 1 follow-up), tone (5 including 1 follow-up), and setting (5). The frontend dynamically picks the next question by:
- Checking for triggered follow-ups (e.g. "Do you prefer psychological horror over gory slashers?" only appears if the user scored horror >= 3)
- Selecting from the least-covered category to ensure breadth
- Stopping after 15 questions (submit unlocked after 5)
preprocess/ Offline pipeline: MovieLens -> similarity matrix -> UMAP -> clusters
backend/ FastAPI: scraping, recommendations, graph and film detail APIs
frontend/ React + D3.js Canvas graph, Tailwind CSS, Framer Motion
| Endpoint | Method | Description |
|---|---|---|
/api/graph |
GET | Pre-computed graph (nodes, edges, content clusters, taste clusters, dual coords) |
/api/user/{username} |
GET | Scrape Letterboxd, match to graph, compute taste centroid |
/api/scores/solo |
POST | Content + taste scores for all unwatched candidates (real-time blending) |
/api/scores/duo |
POST | Duo candidate scores + compatibility |
/api/scores/taste |
POST | Candidate scores from taste profiler answers + seed movies |
/api/recommend/solo |
POST | Single-user recommendations with exploration control |
/api/recommend/duo |
POST | Two-user recommendations + compatibility score |
/api/recommend/taste |
POST | Recommendations from taste profiler answers |
/api/movie/{id} |
GET | Film detail: TMDB poster, tags, similar films, description, watch links |
/api/taste/questions |
GET | Full question pool with categories and follow-up triggers |
/api/films/popular |
GET | Genre-diverse popular films for live rating |
Frontend --- React 18, Vite, D3.js (Canvas renderer), Tailwind CSS, Framer Motion
Backend --- FastAPI, NumPy, scikit-learn, aiohttp, BeautifulSoup4, cloudscraper
Data --- MovieLens 25M, UMAP, K-means, SciPy sparse matrices, TMDB API (posters)
- Python 3.11+
- Node.js 18+
- TMDB API token (optional, for film posters)
- Pre-computed data files in
backend/data/(generated by the preprocessing pipeline)
cd backend
pip install -r requirements.txt
cp ../.env.example ../.env # add your TMDB token if desired
uvicorn main:app --port 8000cd frontend
npm install
npm run devThe frontend proxies /api requests to localhost:8000. Open http://localhost:5173.
Only required to regenerate the graph from scratch. Downloads ~250MB of MovieLens data.
cd preprocess
pip install -r requirements.txt
python preprocess.py
python compute_taste_umap.py # generates taste UMAP layoutbackend/
main.py FastAPI app, endpoints, taste questions, TMDB integration
recommender.py Recommendation engine (solo, duo, taste profile, all-scores modes)
scraper.py Letterboxd scraper with parallel slug resolution
models.py Pydantic request/response schemas
data/ Pre-computed graph, genome vectors, CF vectors, taste coords
frontend/
src/
App.jsx Root component, state management, client-side score blending
components/
Graph.jsx Canvas wrapper for the D3 force graph
Sidebar.jsx Search bar, mode toggle, user cards, blend slider, recommendations
FilmDetail.jsx Film detail panel with TMDB poster, tags, similar films, watch links
LiveRating.jsx Taste profiler (movie picker + dynamic question selection)
UserInput.jsx Letterboxd username entry
RecommendationList.jsx Scored recommendations with reasons
CompatibilityCard.jsx Duo mode compatibility stats
hooks/
useGraph.js D3 simulation, Canvas rendering, dual layout interpolation, edge highlighting
useApi.js API client with all endpoint bindings
styles/
globals.css Tailwind directives, custom animations, glass panel effects
preprocess/
preprocess.py Full pipeline: download, blend, UMAP, cluster, export
compute_taste_umap.py Taste UMAP from CF vectors