# Recommender Systems: Hands-on Laboratory

We will build two types of recommenders:
1.  **Content-Based**: "Because you liked 'The Matrix'..."
2.  **Collaborative (SVD)**: "People like you also liked..."

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# Sample Data (Tiny Movie Database)
data = {
    'title': ['The Matrix', 'Inception', 'The Notebook', 'Titanic', 'John Wick', '50 First Dates'],
    'description': [
        'Sci-Fi Action Neo Keanu',
        'Sci-Fi Dream Leonardo',
        'Romance Ryan Gosling',
        'Romance Ship Leonardo',
        'Action Gun Keanu',
        'Romance Comedy Adam Sandler'
    ]
}
df = pd.DataFrame(data)
df

## Part 1: Content-Based Filtering
We will find similarity based on words in the description.

In [None]:
# 1. Create TF-IDF Vectors
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['description'])

# 2. Calculate Cosine Similarity
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# 3. Recommendation Function
def get_recommendations(title, cosine_sim=cosine_sim):
    idx = df.index[df['title'] == title].tolist()[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:4] # Top 3 (excluding self)
    movie_indices = [i[0] for i in sim_scores]
    return df['title'].iloc[movie_indices]

print("Recommendations for 'The Matrix':")
print(get_recommendations('The Matrix'))

print("\nRecommendations for 'The Notebook':")
print(get_recommendations('The Notebook'))

## Part 2: Matrix Factorization (SVD)

Uses `scikit-surprise` (Install if needed: `pip install scikit-surprise`).
If you don't have it, we will simulate the concept.

In [None]:
# Pseudo-SVD Demonstration (Manual Dot Product)

# Imagine we trained an SVD model and got these embeddings for Users and Items:
# Features: [Actionworthiness, Romanceworthiness]

users = pd.DataFrame(
    [[0.9, 0.1],  # User A: Loves Action, Hates Romance
     [0.1, 0.9],  # User B: Loves Romance, Hates Action
     [0.5, 0.5]], # User C: Likes Both
     index=['User A', 'User B', 'User C'], columns=['Action', 'Romance']
)

movies = pd.DataFrame(
    [[0.95, 0.05], # John Wick
     [0.05, 0.95], # Titanic
     [0.5, 0.5]],  # Avatar
    index=['John Wick', 'Titanic', 'Avatar'], columns=['Action', 'Romance']
)

# Dot Product to Predict Rating
predicted_ratings = users.dot(movies.T)
print("Predicted Support/Ratings (0-1):")
print(predicted_ratings)

# Notice how User A gets a high score for John Wick (0.9*0.95 + 0.1*0.05 = ~0.86)