# **Problem Statement**  
## **26. Build a content-based movie recommender using cosine similarity.**

Build a content-based movie recommendation system that recommends movies to users based on similarity between movie content using cosine similarity.

The system should:
- Learn movie features (e.g., genres, tags, descriptions)
- Recommend similar movies based on content
- Support brute-force and optimized implementations

### Constraints & Example Inputs/Outputs

### Constraints
- Content features are text-based
- Sparse high-dimensional vectors
- No user interaction data required
- Cold-start friendly for new users

### Example Input:
```python
Movie   Genres
M1      Action|Adventure
M2      Action|Sci-Fi
M3      Romance|Drama

```

Expected Output:
```python
Movies similar to "M1":
M2

```

### Solution Approach

**Step 1: Represent Movie Content**
- Convert movie metadata into text features

**Step 2: Vectorize Content**
- Use TF-IDF to convert text into vectors

**Step 3: Compute Similarity**
- Use cosine similarity between movie vectors

**Step 4: Generate Recommendations**
- Recommend top-N most similar movies

### Solution Code

In [1]:
# Import Libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


In [2]:
# Step 2: Create Sample Movie Dataset
movies = pd.DataFrame({
    "movie_id": [1, 2, 3, 4, 5],
    "title": ["Inception", "Interstellar", "The Notebook", "Titanic", "The Matrix"],
    "genres": [
        "Action Sci-Fi Thriller",
        "Sci-Fi Drama Space",
        "Romance Drama",
        "Romance Drama Disaster",
        "Action Sci-Fi"
    ]
})

movies


Unnamed: 0,movie_id,title,genres
0,1,Inception,Action Sci-Fi Thriller
1,2,Interstellar,Sci-Fi Drama Space
2,3,The Notebook,Romance Drama
3,4,Titanic,Romance Drama Disaster
4,5,The Matrix,Action Sci-Fi


In [3]:
# Approach 1: Brute Force Approach (Manual Cosine Similarity)
# Step 3: TF-IDF Vectorization
vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = vectorizer.fit_transform(movies["genres"])

tfidf_matrix.shape


(5, 8)

In [4]:
# Step 4: Compute Cosine Similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

cosine_sim_df = pd.DataFrame(
    cosine_sim,
    index=movies["title"],
    columns=movies["title"]
)

cosine_sim_df


title,Inception,Interstellar,The Notebook,Titanic,The Matrix
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Inception,1.0,0.366935,0.0,0.0,0.77944
Interstellar,0.366935,1.0,0.2793,0.202117,0.470768
The Notebook,0.0,0.2793,1.0,0.723658,0.0
Titanic,0.0,0.202117,0.723658,1.0,0.0
The Matrix,0.77944,0.470768,0.0,0.0,1.0


In [5]:
# Step 5: Recommendation Function
def recommend_movies_bruteforce(movie_title, top_n=2):
    scores = cosine_sim_df[movie_title].sort_values(ascending=False)
    scores = scores.drop(movie_title)
    return scores.head(top_n)


### Alternative Solution

In [6]:
# Approach 2: Optimized Approach (Vectorized Similarity Lookup)
# Step 6: Optimized Recommendation
def recommend_movies_optimized(movie_title, top_n=2):
    idx = movies[movies["title"] == movie_title].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    sim_scores = sim_scores[1:top_n+1]
    movie_indices = [i[0] for i in sim_scores]
    
    return movies.iloc[movie_indices][["title"]]


In [7]:
recommend_movies_optimized("The Matrix")

Unnamed: 0,title
0,Inception
1,Interstellar


### Alternative Approaches

**Brute Force**
- Binary bag-of-words similarity
- Jaccard similarity on genres

**Optimized**
- Word2Vec / FastText embeddings
- Sentence-BERT embeddings
- Approximate Nearest Neighbors (FAISS)
- Hybrid recommender (content + CF)

### Test Case

In [8]:
# Test Case 1: Output Type Check
output = recommend_movies_bruteforce("Inception")
assert isinstance(output, pd.Series)
print("Test Case 1 Passed")


Test Case 1 Passed


In [9]:
# Test Case 2: Self-Recommendation Exclusion
assert "Inception" not in recommend_movies_bruteforce("Inception").index
print("Test Case 2 Passed")


Test Case 2 Passed


In [10]:
# Test Case 3: Top-N Constraint
assert len(recommend_movies_optimized("Titanic", top_n=1)) == 1
print("Test Case 3 Passed")


Test Case 3 Passed


In [11]:
# Test Case 4: Similar Genre Preference
rec = recommend_movies_bruteforce("The Matrix").index.tolist()
assert "Inception" in rec or "Interstellar" in rec
print("Test Case 4 Passed")


Test Case 4 Passed


In [12]:
# Test Case 5: Invalid Movie Handling
try:
    recommend_movies_optimized("Unknown Movie")
except IndexError:
    print("Handled unknown movie correctly")


Handled unknown movie correctly


## Complexity Analysis

### Brute Force
```python
Vectorization: O(n × d)
Similarity: O(n² × d)
Space: O(n²)
```

### Optimized 
```python
Precompute Similarity: O(n² × d)
Query Time: O(1)
Space: O(n²)
```

Where:
- n = number of movies
- d = feature dimension

#### Thank You!!