## 🚀 Quick Start Guide

**First time setup:**
1. Run all cells in order (Runtime → Run all)
2. Enter your API keys when prompted
3. Wait for all dependencies to install (~2 minutes)

**If you see errors:**
- `RecursionError`: Go to Runtime → Restart Runtime, then run all cells
- `API key error`: Double-check your TMDb API key is correct
- `0 results`: Your API key might have quota limits

# 🎬 Advanced Movie Recommendation System with RAG

**LangGraph-based intelligent movie recommendations using:**
- 🧠 **Gemini 2.0 Flash** for query analysis
- 🔍 **Hybrid RAG** (FAISS + BM25)
- 🎯 **Multi-strategy TMDb search**
- ⭐ **Intelligent re-ranking**
- 📚 **Wikipedia enrichment**
- 💾 **Dynamic database growth**

## 1. Install Dependencies

## ⚠️ Troubleshooting Common Errors

**RecursionError:**
- Go to `Runtime` → `Restart Runtime` and run all cells from the beginning
- Or run the emergency fix cell below

**401 Unauthorized (TMDb API):**
- You entered the API key incorrectly
- Re-run cell 2 with the correct key

**0 results from searches:**
- Your API key might have quota limits
- Try a different API key
- Check if you can access TMDb in your browser

In [None]:
# 🔧 Emergency Fix: Run this if you see RecursionError
# This resets the socket module to its original state

import socket
import importlib

# Reset socket module
if hasattr(socket, '_original_getaddrinfo'):
    socket.getaddrinfo = socket._original_getaddrinfo
    delattr(socket, '_original_getaddrinfo')
    print("✅ Socket module reset! Now run cell 4 again.")
else:
    print("✅ Socket module is already in original state.")

In [None]:
!pip install -q langchain langchain-google-genai langgraph langchain-core
!pip install -q faiss-cpu sentence-transformers
!pip install -q rank-bm25 requests wikipedia-api
!pip install -q python-dotenv numpy

print("✅ All dependencies installed!")

## 2. Configure API Keys

### 🔐 Recommended: Use Colab Secrets (Most Secure)

1. Click the **🔑 key icon** in the left sidebar
2. Click **"Add a new secret"**
3. Add two secrets:
4. Toggle **"Notebook access"** ON for both secrets
5. Run the cell below

### 📝 Alternative: Manual Input (Fallback)

If you don't set up secrets, the cell will prompt you to enter the keys manually.

In [None]:
import os

# Method 1: Use Colab Secrets (Recommended)
# Go to the 🔑 key icon in the left sidebar → Add secrets:
# - Name: TMDB_API_KEY, Value: your TMDb API key
# - Name: GOOGLE_API_KEY, Value: your Google API key

try:
    from google.colab import userdata
    TMDB_API_KEY = userdata.get('TMDB_API_KEY')
    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
    print("✅ Using Colab Secrets (secure method)")
except:
    # Method 2: Manual input (fallback if secrets not configured)
    from getpass import getpass
    print("⚠️  Colab Secrets not found. Using manual input.")
    print("   Tip: Store secrets securely using the 🔑 icon in the left sidebar!")
    
    TMDB_API_KEY = getpass("Enter your TMDb API Key (just the key, no 'TMDB_API_KEY='): ")
    GOOGLE_API_KEY = getpass("Enter your Google API Key (just the key, no 'GOOGLE_API_KEY='): ")
    
    # Clean up common mistakes
    TMDB_API_KEY = TMDB_API_KEY.strip()
    if '=' in TMDB_API_KEY:
        TMDB_API_KEY = TMDB_API_KEY.split('=', 1)[1].strip()
    
    GOOGLE_API_KEY = GOOGLE_API_KEY.strip()
    if '=' in GOOGLE_API_KEY:
        GOOGLE_API_KEY = GOOGLE_API_KEY.split('=', 1)[1].strip()

# Set environment variables
os.environ['TMDB_API_KEY'] = TMDB_API_KEY
os.environ['GOOGLE_API_KEY'] = GOOGLE_API_KEY

print("✅ API keys configured!")
print(f"   TMDb key: {TMDB_API_KEY[:10]}...")
print(f"   Google key: {GOOGLE_API_KEY[:10]}...")

## 3. Configuration Module

In [None]:
class Config:
    """Configuration settings."""
    TMDB_API_KEY = os.getenv('TMDB_API_KEY')
    GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
    
    # Embedding model
    EMBEDDING_MODEL = 'sentence-transformers/paraphrase-MiniLM-L3-v2'
    EMBEDDING_DIM = 384
    
    # Storage paths (in Colab)
    FAISS_INDEX_PATH = '/content/faiss_index.bin'
    FAISS_MOVIES_PATH = '/content/faiss_movies.pkl'
    BM25_INDEX_PATH = '/content/bm25_index.pkl'
    
    # Re-ranking weights
    SEMANTIC_WEIGHT = 0.25
    GENRE_WEIGHT = 0.20
    RATING_WEIGHT = 0.20
    RECENCY_WEIGHT = 0.15
    POPULARITY_WEIGHT = 0.10
    
    MIN_RATING_COUNT = 100

config = Config()
print("✅ Configuration loaded!")

## 4. TMDb Client (with IPv4 Fix)

In [None]:
import requests
import socket
from typing import Dict, List, Optional

# Force IPv4 for TMDb API (safe for re-runs)
# Store original function only once using a private attribute
if not hasattr(socket, '_original_getaddrinfo'):
    socket._original_getaddrinfo = socket.getaddrinfo

def ipv4_only_getaddrinfo(*args, **kwargs):
    """Filter IPv6 addresses to force IPv4 connections."""
    responses = socket._original_getaddrinfo(*args, **kwargs)
    return [response for response in responses if response[0] == socket.AF_INET]

# Only apply if not already applied
if socket.getaddrinfo != ipv4_only_getaddrinfo:
    socket.getaddrinfo = ipv4_only_getaddrinfo
    print("✅ IPv4-only mode enabled for TMDb API")

class TMDbClient:
    """TMDb API client with IPv4 fix."""
    BASE_URL = "https://api.themoviedb.org/3"
    
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.session = requests.Session()
    
    def _make_request(self, endpoint: str, params: Dict = None) -> Dict:
        params = params or {}
        params["api_key"] = self.api_key
        
        max_retries = 3
        for attempt in range(max_retries):
            try:
                url = f"{self.BASE_URL}{endpoint}"
                response = self.session.get(url, params=params, timeout=15)
                response.raise_for_status()
                data = response.json()
                
                # Debug: Show what we got
                if endpoint == "/search/movie" and "results" in data:
                    result_count = len(data.get("results", []))
                    if result_count == 0:
                        print(f"    ⚠ API returned 0 results for: {params.get('query')}")
                
                return data
            except requests.exceptions.ConnectionError as e:
                if attempt < max_retries - 1:
                    print(f"Connection error, retrying... ({attempt + 1}/{max_retries})")
                    continue
                else:
                    print(f"❌ TMDb API failed after {max_retries} attempts: {e}")
                    return {}
            except requests.RequestException as e:
                print(f"❌ TMDb API error: {e}")
                return {}
        return {}
    
    def search_movies(self, query: str, page: int = 1) -> Dict:
        return self._make_request("/search/movie", {"query": query, "page": page})
    
    def get_genres(self) -> Dict:
        return self._make_request("/genre/movie/list")
    
    def discover_movies(self, **filters) -> Dict:
        params = {
            "sort_by": filters.get("sort_by", "popularity.desc"),
            "include_adult": "false",
            "vote_count.gte": config.MIN_RATING_COUNT,
        }
        params.update({k: v for k, v in filters.items() if v is not None})
        return self._make_request("/discover/movie", params)

tmdb_client = TMDbClient(config.TMDB_API_KEY)
print("✅ TMDb client initialized!")

## 5. Test TMDb Connection

In [None]:
# Test TMDb API with simple queries
print("🧪 Testing TMDb API...\n")

# Test 1: Simple search
test1 = tmdb_client.search_movies("inception")
if test1.get('results'):
    print(f"✅ Test 1 PASSED: Found {len(test1['results'])} results for 'inception'")
    print(f"   Example: {test1['results'][0]['title']}")
else:
    print(f"❌ Test 1 FAILED: No results for 'inception'")
    print(f"   Response: {test1}")

# Test 2: Another simple search
test2 = tmdb_client.search_movies("war")
if test2.get('results'):
    print(f"✅ Test 2 PASSED: Found {len(test2['results'])} results for 'war'")
else:
    print(f"❌ Test 2 FAILED: No results for 'war'")

# Test 3: Get genres
test3 = tmdb_client.get_genres()
if test3.get('genres'):
    print(f"✅ Test 3 PASSED: Got {len(test3['genres'])} genres")
else:
    print(f"❌ Test 3 FAILED: Could not get genres")

if not test1.get('results'):
    print("\n⚠️  WARNING: TMDb API is not returning results!")
    print("   Possible issues:")
    print("   1. Invalid API key")
    print("   2. API key quota exceeded")
    print("   3. Network/firewall blocking TMDb")
    print(f"\n   Your API key: {config.TMDB_API_KEY[:10]}...")


## 6. Vector Store & Embeddings

In [None]:
import faiss
import pickle
import numpy as np
from sentence_transformers import SentenceTransformer
from typing import List, Tuple

class EmbeddingManager:
    def __init__(self, model_name: str):
        print(f"Loading embedding model: {model_name}...")
        self.model = SentenceTransformer(model_name)
        self.dimension = self.model.get_sentence_embedding_dimension()
        print(f"✅ Model loaded! Dimension: {self.dimension}")
    
    def encode(self, texts: List[str]) -> np.ndarray:
        return self.model.encode(texts, show_progress_bar=False)

class FAISSVectorStore:
    def __init__(self, embedding_manager: EmbeddingManager):
        self.embedding_manager = embedding_manager
        self.dimension = embedding_manager.dimension
        self.index = faiss.IndexFlatIP(self.dimension)  # Inner product for cosine similarity
        self.movies = []
    
    def add_movie(self, movie: Dict):
        text = self._create_movie_text(movie)
        embedding = self.embedding_manager.encode([text])[0]
        
        # Normalize for cosine similarity
        faiss.normalize_L2(embedding.reshape(1, -1))
        
        self.index.add(embedding.reshape(1, -1))
        self.movies.append(movie)
    
    def _create_movie_text(self, movie: Dict) -> str:
        components = []
        if movie.get("title"):
            components.append(movie["title"])
        
        genres = movie.get("genres", [])
        genre_names = [g.get("name", g) if isinstance(g, dict) else str(g) for g in genres]
        components.extend(genre_names)
        
        if movie.get("overview"):
            components.append(movie["overview"])
        
        return " ".join(components)
    
    def search(self, query: str, k: int = 20) -> List[Tuple[Dict, float]]:
        if self.index.ntotal == 0:
            return []
        
        query_embedding = self.embedding_manager.encode([query])[0]
        faiss.normalize_L2(query_embedding.reshape(1, -1))
        
        k = min(k, self.index.ntotal)
        distances, indices = self.index.search(query_embedding.reshape(1, -1), k)
        
        results = []
        for idx, score in zip(indices[0], distances[0]):
            if idx < len(self.movies):
                results.append((self.movies[idx], float(score)))
        
        return results
    
    def save(self):
        faiss.write_index(self.index, config.FAISS_INDEX_PATH)
        with open(config.FAISS_MOVIES_PATH, 'wb') as f:
            pickle.dump(self.movies, f)
    
    def load(self):
        self.index = faiss.read_index(config.FAISS_INDEX_PATH)
        with open(config.FAISS_MOVIES_PATH, 'rb') as f:
            self.movies = pickle.load(f)

# Initialize
embedding_manager = EmbeddingManager(config.EMBEDDING_MODEL)
vector_store = FAISSVectorStore(embedding_manager)
print("✅ Vector store initialized!")

## 7. BM25 Retriever

In [None]:
from rank_bm25 import BM25Okapi
import re

class BM25Retriever:
    def __init__(self):
        self.bm25 = None
        self.movies = []
        self.tokenized_corpus = []
    
    def _tokenize(self, text: str) -> List[str]:
        text = text.lower()
        tokens = re.findall(r'\w+', text)
        return tokens
    
    def _create_movie_text(self, movie: Dict) -> str:
        components = []
        if movie.get("title"):
            components.append(movie["title"])
        
        genres = movie.get("genres", [])
        genre_names = [g.get("name", g) if isinstance(g, dict) else str(g) for g in genres]
        components.extend(genre_names)
        
        if movie.get("overview"):
            components.append(movie["overview"])
        
        return " ".join(components)
    
    def add_movies(self, new_movies: List[Dict]):
        self.movies.extend(new_movies)
        
        for movie in new_movies:
            text = self._create_movie_text(movie)
            tokens = self._tokenize(text)
            self.tokenized_corpus.append(tokens)
        
        if self.tokenized_corpus:
            self.bm25 = BM25Okapi(self.tokenized_corpus)
    
    def search(self, query: str, k: int = 20) -> List[Tuple[Dict, float]]:
        if not self.bm25:
            return []
        
        tokenized_query = self._tokenize(query)
        scores = self.bm25.get_scores(tokenized_query)
        
        top_indices = scores.argsort()[-k:][::-1]
        
        results = []
        for idx in top_indices:
            if scores[idx] > 0:
                results.append((self.movies[idx], float(scores[idx])))
        
        return results
    
    def save(self):
        with open(config.BM25_INDEX_PATH, 'wb') as f:
            pickle.dump({'bm25': self.bm25, 'movies': self.movies, 'corpus': self.tokenized_corpus}, f)
    
    def load(self):
        with open(config.BM25_INDEX_PATH, 'rb') as f:
            data = pickle.load(f)
            self.bm25 = data['bm25']
            self.movies = data['movies']
            self.tokenized_corpus = data['corpus']

bm25_retriever = BM25Retriever()
print("✅ BM25 retriever initialized!")

## 8. LLM-Powered Query Analyzer

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
import json

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-exp",
    temperature=0,
    google_api_key=config.GOOGLE_API_KEY
)

def analyze_query(query: str) -> Dict:
    """Use Gemini to analyze query for intent, mood, themes, keywords."""
    
    prompt = f"""Analyze this movie recommendation query and extract ALL relevant information:

Query: "{query}"

Extract and return a JSON object with:
{{
  "intent": "exploration|similar_to|mood_based|comparison|direct_match",
  "confidence": 0.0-1.0,
  "genres": ["genre1", "genre2"],
  "mood": "overall emotional tone (e.g., dark, uplifting, tense, nostalgic)",
  "themes": ["theme1", "theme2"] (e.g., war, revenge, family, identity),
  "keywords": ["keyword1", "keyword2"] (important concepts),
  "mentioned_movies": ["movie1"] (if any specific movies mentioned),
  "mentioned_people": ["person1"] (actors/directors if mentioned),
  "era_preference": "decade or time period if mentioned",
  "rating_preference": "high|medium|any",
  "specific_requirements": "any other specific needs"
}}

Return ONLY the JSON, no other text."""

    try:
        print(f"  🧠 Analyzing query with Gemini...")
        response = llm.invoke(prompt)
        
        content = response.content
        if "```json" in content:
            content = content.split("```json")[1].split("```")[0].strip()
        elif "```" in content:
            content = content.split("```")[1].split("```")[0].strip()
        
        analysis = json.loads(content)
        print(f"  ✓ Intent: {analysis.get('intent')}, Mood: {analysis.get('mood')}")
        print(f"  ✓ Themes: {analysis.get('themes')}")
        
        return analysis
    except Exception as e:
        print(f"  ⚠ LLM analysis failed: {e}")
        return {
            "intent": "exploration",
            "confidence": 0.5,
            "genres": [],
            "mood": "unknown",
            "themes": [],
            "keywords": query.lower().split()
        }

print("✅ LLM query analyzer ready!")

## 9. Intelligent TMDb Search

In [None]:
def intelligent_tmdb_search(
    query: str,
    genres: List[str] = None,
    themes: List[str] = None,
    mood: str = None,
    keywords: List[str] = None,
    max_results: int = 30
) -> List[Dict]:
    """Multi-strategy TMDb search using LLM analysis."""
    
    all_movies = {}
    
    print(f"🔍 Intelligent TMDb Search:")
    print(f"  Query: '{query}'")
    print(f"  Themes: {themes}, Mood: {mood}")
    
    # Strategy 1: Search ORIGINAL query first (most important!)
    print(f"  → Strategy 1: /search original '{query}'")
    orig_results = tmdb_client.search_movies(query)
    for movie in orig_results.get("results", [])[:30]:
        all_movies[movie.get("id")] = movie
    print(f"    ✓ Found {len(orig_results.get('results', []))} movies")
    
    # Strategy 2: Search individual keywords
    if keywords:
        stop_words = {'a', 'an', 'the', 'of', 'in', 'on', 'at', 'to', 'for', 'and', 'or', 'but', 'me', 'about', 'like'}
        important_keywords = [k for k in keywords if k.lower() not in stop_words]
        
        for keyword in important_keywords[:5]:  # Top 5 keywords
            print(f"  → Strategy 2: /search keyword '{keyword}'")
            kw_results = tmdb_client.search_movies(keyword)
            for movie in kw_results.get("results", [])[:15]:
                if movie.get("id") not in all_movies:
                    all_movies[movie.get("id")] = movie
            print(f"    ✓ Found {len(kw_results.get('results', []))} movies")
    
    # Strategy 3: /discover with genre filters (if genres found)
    if genres and len(all_movies) < 20:
        print(f"  → Strategy 3: /discover with genres")
        genre_map = {g["name"].lower(): g["id"] for g in tmdb_client.get_genres().get("genres", [])}
        genre_ids = [str(genre_map.get(g.lower())) for g in genres if g.lower() in genre_map]
        
        if genre_ids:
            result = tmdb_client.discover_movies(
                with_genres=",".join(genre_ids),
                sort_by="popularity.desc"
            )
            for movie in result.get("results", [])[:20]:
                if movie.get("id") not in all_movies:
                    all_movies[movie.get("id")] = movie
            print(f"    ✓ Found {len(result.get('results', []))} movies")
    
    # Strategy 4: Fallback - search just key themes
    if len(all_movies) < 10 and themes:
        for theme in themes[:3]:
            print(f"  → Strategy 4: /search theme '{theme}'")
            theme_results = tmdb_client.search_movies(theme)
            for movie in theme_results.get("results", [])[:10]:
                if movie.get("id") not in all_movies:
                    all_movies[movie.get("id")] = movie
            print(f"    ✓ Found {len(theme_results.get('results', []))} movies")
    
    print(f"📊 Total unique movies: {len(all_movies)}")
    
    return sorted(all_movies.values(), key=lambda x: x.get("popularity", 0), reverse=True)[:max_results]

print("✅ Intelligent search ready!")

## 10. Re-Ranker with Keyword Matching

In [None]:
from datetime import datetime

def calculate_keyword_match(movie: Dict, query: str) -> float:
    """Calculate keyword matching bonus."""
    query_lower = query.lower()
    query_words = set(query_lower.split())
    
    stop_words = {'a', 'an', 'the', 'of', 'in', 'on', 'at', 'to', 'for', 'and', 'or', 'but', 'me', 'about'}
    query_keywords = query_words - stop_words
    
    if not query_keywords:
        return 0.0
    
    score = 0.0
    
    # Title matches
    title = (movie.get("title") or "").lower()
    title_words = set(title.split())
    title_matches = len(query_keywords & title_words)
    score += (title_matches / len(query_keywords)) * 0.6
    
    # Overview matches
    overview = (movie.get("overview") or "").lower()
    overview_words = set(overview.split())
    overview_matches = len(query_keywords & overview_words)
    score += (overview_matches / len(query_keywords)) * 0.4
    
    return min(score, 1.0)

def rerank_movies(
    results: List[Tuple[Dict, float]],
    query: str,
    max_results: int = 10
) -> List[Tuple[Dict, float]]:
    """Re-rank with composite scoring."""
    
    reranked = []
    current_year = datetime.now().year
    
    for movie, initial_score in results:
        # Semantic similarity
        semantic_score = initial_score
        
        # Rating (normalized)
        rating_score = min(movie.get("vote_average", 0) / 10.0, 1.0)
        
        # Recency boost
        release_date = movie.get("release_date", "")
        try:
            release_year = int(release_date.split("-")[0]) if release_date else 2000
            age = current_year - release_year
            recency_score = min(np.exp(-age / 10.0), 1.0)
        except:
            recency_score = 0.0
        
        # Popularity (log-normalized)
        popularity = movie.get("popularity", 0)
        popularity_score = min(np.log10(popularity + 1) / 3.0, 1.0)
        
        # Keyword matching
        keyword_score = calculate_keyword_match(movie, query)
        
        # Composite score
        composite = (
            0.25 * semantic_score +
            0.20 * rating_score +
            0.15 * recency_score +
            0.10 * popularity_score +
            0.30 * keyword_score  # High weight for keyword matching!
        )
        
        reranked.append((movie, composite))
    
    reranked.sort(key=lambda x: x[1], reverse=True)
    return reranked[:max_results]

print("✅ Re-ranker ready!")

## 11. Main Recommendation Function

In [None]:
def get_recommendations(query: str, top_k: int = 5):
    """Get movie recommendations."""
    
    print(f"\n🔍 Query: {query}\n")
    
    # 1. Analyze query with LLM
    analysis = analyze_query(query)
    
    # 2. Hybrid retrieval from local database
    print("\n  → Hybrid retrieval (Vector + BM25)...")
    vector_results = vector_store.search(query, k=50)
    bm25_results = bm25_retriever.search(query, k=50)
    print(f"    ✓ Vector: {len(vector_results)}, BM25: {len(bm25_results)}")
    
    # Combine results
    all_results = {}
    for movie, score in vector_results:
        all_results[movie.get("id")] = (movie, score)
    for movie, score in bm25_results:
        movie_id = movie.get("id")
        if movie_id not in all_results:
            all_results[movie_id] = (movie, score * 0.5)  # Lower weight for BM25 only
    
    # 3. Intelligent TMDb search
    print("\n  → Intelligent TMDb API search...")
    tmdb_results = intelligent_tmdb_search(
        query=query,
        genres=analysis.get("genres", []),
        themes=analysis.get("themes", []),
        mood=analysis.get("mood"),
        keywords=analysis.get("keywords", []),
        max_results=30
    )
    
    # Add TMDb results
    for movie in tmdb_results:
        movie_id = movie.get("id")
        if movie_id not in all_results:
            all_results[movie_id] = (movie, 0.7)
    
    print(f"\n📊 Total unique movies: {len(all_results)}")
    
    if not all_results:
        print("\n❌ No movies found!")
        return []
    
    # 4. Re-rank
    print("  → Re-ranking...")
    candidates = list(all_results.values())
    reranked = rerank_movies(candidates, query, max_results=top_k)
    
    # 5. Save new movies to database
    new_movies = [movie for movie, _ in reranked]
    for movie in new_movies:
        vector_store.add_movie(movie)
    bm25_retriever.add_movies(new_movies)
    
    vector_store.save()
    bm25_retriever.save()
    print(f"  💾 Saved {len(new_movies)} movies to database")
    
    return reranked

print("✅ Main recommendation function ready!")

## 12. Display Results

In [None]:
def display_recommendations(results: List[Tuple[Dict, float]]):
    """Display recommendations nicely."""
    
    print("\n" + "="*70)
    print("🎬 MOVIE RECOMMENDATIONS")
    print("="*70)
    
    if not results:
        print("\nNo recommendations found.")
        return
    
    for i, (movie, score) in enumerate(results, 1):
        title = movie.get("title", "Unknown")
        year = movie.get("release_date", "N/A")[:4] if movie.get("release_date") else "N/A"
        rating = movie.get("vote_average", 0)
        
        genres = movie.get("genres", [])
        if genres:
            if isinstance(genres[0], dict):
                genre_names = [g.get("name", "") for g in genres]
            else:
                genre_names = genres
            genre_str = ", ".join(genre_names)
        else:
            genre_str = "N/A"
        
        overview = movie.get("overview", "No overview available.")[:200] + "..."
        
        print(f"\n{i}. {title} ({year})")
        print(f"   ⭐ Rating: {rating:.1f}/10 | Match: {score:.1%}")
        print(f"   🎭 Genres: {genre_str}")
        print(f"   📝 {overview}")
    
    print("\n" + "="*70)

print("✅ Display function ready!")

## 13. 🎬 Try It Out!

In [None]:
# Example 1: War movies about Indian soldiers
query = "Recommend me a war movie about indian soldiers"
results = get_recommendations(query, top_k=5)
display_recommendations(results)

In [None]:
# Example 2: Sci-fi thrillers
query = "Dark sci-fi thriller like Blade Runner"
results = get_recommendations(query, top_k=5)
display_recommendations(results)

In [None]:
# Example 3: Your custom query
query = input("Enter your movie query: ")
results = get_recommendations(query, top_k=5)
display_recommendations(results)

## 14. Check Database Growth

In [None]:
print(f"📊 Database Statistics:")
print(f"  Vector Store: {vector_store.index.ntotal} movies")
print(f"  BM25 Index: {len(bm25_retriever.movies)} movies")
print(f"\n💡 The database grows with each query!")