# **Problem Statement**  
## **1. Build a mini movie recommender system using pandas and cosine similarity.**

### Problem Statement

Build a mini movie recommender system using content-based filtering, where movies are recommended based on cosine similarity of their features (such as genre and keywords), using pandas and NumPy.

### Constraints & Example Inputs/Outputs

### Constraints
- Dataset size: small to medium (demo-scale)
- Features are text-based
- No deep learning / heavy frameworks
- Similarity metric: Cosine Similarity
- Output: Top-N similar movies

### Example Input:
```python
Movie Name: "Inception"
Top-N Recommendations: 3
```

### Expected Output:
Interstellar

The Matrix

Shutter Island


### Solution Approach

### Real-World Analogy
“If you liked Inception, you may also like movies with similar themes, genres, or keywords.”

### Overall Approach
1. Create a movie dataset
2. Combine relevant text features
3. Convert text → numerical vectors
4. Compute cosine similarity
5. Recommend movies with highest similarity scores

### Why Cosine Similarity?
- Measures directional similarity, not magnitude
- Perfect for text-based features
- Widely used in recommender systems

### Solution Code

### 1. Brute Force Solution (Manual cosine similarity, no sklearn)

In [10]:
# Step1: Create Dataset

import pandas as pd

movies = pd.DataFrame({
    "title": [
        "Inception", "Interstellar", "The Matrix",
        "Shutter Island", "The Dark Knight"
    ],
    "genre": [
        "Sci-Fi Action Thriller",
        "Sci-Fi Drama",
        "Sci-Fi Action",
        "Thriller Mystery",
        "Action Crime Drama"
    ]
})

movies


Unnamed: 0,title,genre
0,Inception,Sci-Fi Action Thriller
1,Interstellar,Sci-Fi Drama
2,The Matrix,Sci-Fi Action
3,Shutter Island,Thriller Mystery
4,The Dark Knight,Action Crime Drama


In [11]:
# Step 2: Text to Vector (Simple Count Encoding)

from collections import Counter
import math

def text_to_vector(text):
    return Counter(text.lower().split())


In [12]:
# Manual Cosine Similarity

def cosine_similarity(vec1, vec2):
    intersection = set(vec1.keys()) & set(vec2.keys())
    dot = sum(vec1[x] * vec2[x] for x in intersection)
    
    norm1 = math.sqrt(sum(v ** 2 for v in vec1.values()))
    norm2 = math.sqrt(sum(v ** 2 for v in vec2.values()))
    
    if norm1 == 0 or norm2 == 0:
        return 0.0
    
    return dot / (norm1 * norm2)


In [13]:
# Step 4: Brute Force Recommender

def recommend_brute_force(movie_title, top_n=3):
    target = movies[movies["title"] == movie_title].iloc[0]
    target_vec = text_to_vector(target["genre"])
    
    scores = []
    
    for _, row in movies.iterrows():
        if row["title"] == movie_title:
            continue
        
        score = cosine_similarity(
            target_vec,
            text_to_vector(row["genre"])
        )
        scores.append((row["title"], score))
    
    scores.sort(key=lambda x: x[1], reverse=True)
    return scores[:top_n]


### 2. Optimized Solution (TF-IDF + sklearn cosine similarity)

In [14]:
# Step 1: TF-IDF Vectorization

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(movies["genre"])


In [15]:
# Step 2: Optimized Recommender

def recommend_optimized(movie_title, top_n=3):
    idx = movies[movies["title"] == movie_title].index[0]
    similarity_scores = cosine_similarity(
        tfidf_matrix[idx], tfidf_matrix
    ).flatten()
    
    similar_indices = similarity_scores.argsort()[::-1][1:top_n+1]
    
    return movies.iloc[similar_indices]["title"].tolist()


### Alternative Approaches

```python
| Approach                | Description         |
| ----------------------- | ------------------- |
| Content-based           | Uses item features  |
| Collaborative Filtering | Uses user behavior  |
| Hybrid                  | Combination of both |
| Embedding-based         | Word2Vec / BERT     |
```

### Testing the Code with Example Test Cases

In [24]:
# Test Case 1: Optimized Recommendation
recommend_optimized("Inception", top_n=3)


['The Matrix', 'Interstellar', 'Shutter Island']

In [25]:
# Test Case 2: Different Movie
recommend_optimized("The Dark Knight", top_n=2)


['Interstellar', 'The Matrix']

In [26]:
# Test Case 3: Edge Case
recommend_optimized("Interstellar", top_n=10)


['The Matrix', 'Inception', 'The Dark Knight', 'Shutter Island']

### Expected Outputs
- Brute force and optimized outputs should be semantically similar
- TF-IDF results are more accurate
- No crashes for small datasets

## Complexity Analysis

### Brute Force
- Time: O(n²)
- Space: O(n)

### Optimized (TF-IDF)
- Time: O(n log n)
- Space: O(n × features)

#### Thank You!!