CineMatch is a high-performance, content-based movie recommender system designed as a research portfolio. It combines advanced natural language processing (NLP) with a premium user interface to demonstrate the power of vector similarity in information retrieval.
Live Demo: https://kram2006.github.io/cinematch/
- Methodology: Uses Cosine Similarity on high-dimensional movie metadata vectors.
- Tag Fusion: Combines plot overviews, genres, keywords, cast, and directors into unique textual signatures.
- 3D Manifold Visualization: Projects the 5000-D feature space into an interactive 3D manifold using Truncated SVD for latent cluster analysis.
- Transparency: Every recommendation includes a mathematical breakdown of why it was chosen.
- Feature Attribution: Visualizes shared metadata features between the query and recommended movies.
- Ablation Study: Benchmarks different vectorization strategies (Bag-of-Words, TF-IDF, and SBERT).
- Precision@10: Evaluates accuracy using Genre Overlap as a scientific proxy for relevance.
- API Load Balancing: Implements random round-robin rotation between multiple TMDB API keys to maximize image loading throughput.
- Resilient Fallbacks: Multi-stage poster fetching strategy (Direct ID -> Title Search -> High-end Placeholder).
├── index.html # Main Static Site (GitHub Pages)
├── app.js # Client-Side Application Logic (Core Engine)
├── style.css # Premium "Neon Tech" UI Design System
├── data/
│ ├── movies.json # Exported Movie List (4,806 titles)
│ ├── recommendations.json # Pre-computed Top-10 Recommendations
│ ├── manifold.json # 3D SVD Coordinates
│ └── evaluation.json # Pre-computed Benchmark Metrics
├── app_utils.py # API Hooks & Metadata Utilities
├── config.py # Configuration & API Key Rotation
├── preprocess.py # Research Pipeline (Vectorization -> SVD)
├── evaluation.py # Experimental Benchmarking Module
├── explainability.py # XAI Feature Attribution Module
├── export_data.py # Pickle -> JSON Data Export Tool
├── similarity.pkl # Pre-computed Similarity Matrix (Large)
├── movie_list.pkl # Processed Movie Metadata Pickle
├── cv.pkl # Fitted Vectorizer Model
└── requirements.txt # Python Dependencies (for pipeline execution)
The site is deployed automatically via GitHub Pages from the main branch. Simply push to main and the site updates.
If you need to regenerate the recommendation data from scratch:
- Download the TMDB 5000 Movie Dataset and place CSVs in root.
- Run the research pipeline:
pip install -r requirements.txt python preprocess.py python export_data.py
K RAMA KRISHNA NARASIMHA CHOWDARY
Research Status: ONLINE
© 2026 CineMatch Research Project. Released under the MIT License.