<a href="https://colab.research.google.com/github/jayaprabhapalani/collaborative-filtering-movie-recommender/blob/main/SVD_based_Movie_Recommender_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

   What's Happening
1. | Pivoting ratings to a user-item matrix (Users × Movies)
2. | Applying TruncatedSVD to get lower-dimensional embeddings
3. | Getting movie embeddings from svd.components_ (like VkV_kVk​)
4. | Calculating cosine similarity between all movies
5. | Recommending top-n similar movies to the target

In [2]:
#Load Dataset
import pandas as pd
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity

# Load ratings
ratings = pd.read_csv("https://files.grouplens.org/datasets/movielens/ml-100k/u.data",
                      sep='\t', names=['user_id', 'movie_id', 'rating', 'timestamp'])

# Load movie titles
movies = pd.read_csv("https://files.grouplens.org/datasets/movielens/ml-100k/u.item",
                     sep='|', encoding='latin-1',
                     names=["movie_id", "title", "release_date", "video_release_date", "IMDb_URL",
                            "unknown", "Action", "Adventure", "Animation", "Children's", "Comedy",
                            "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror",
                            "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"])

# Merge ratings with movie titles
data = pd.merge(ratings, movies[['movie_id', 'title']], on='movie_id')
data.head()


Unnamed: 0,user_id,movie_id,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,186,302,3,891717742,L.A. Confidential (1997)
2,22,377,1,878887116,Heavyweights (1994)
3,244,51,2,880606923,Legends of the Fall (1994)
4,166,346,1,886397596,Jackie Brown (1997)


In [4]:
#Create User-Item Matrix
user_item_matrix = data.pivot_table(index='user_id', columns='title', values='rating').fillna(0)
user_item_matrix.head()


title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,2.0,5.0,0.0,0.0,3.0,4.0,0.0,0.0,...,0.0,0.0,0.0,5.0,3.0,0.0,0.0,0.0,4.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,2.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,...,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,4.0,0.0


**What actually happens in SVD?**

Start with the user-item rating matrix A.

Shape: [num_users x num_movies]

It’s often sparse (many zeros).

Decompose A into:

𝐴
≈
𝑈
𝑘
⋅
Σ
𝑘
⋅
𝑉
𝑘
𝑇


Here:

U_k: Users represented in terms of top k latent features.

Σ_k: Diagonal matrix of top k singular values (importance of those features).

V_kᵗ: Items represented in the same k-dimensional space.

**Matrix Reconstruction:**

Multiply the reduced matrices back to get an approximation:

𝐴
^
=
𝑈
𝑘
⋅
Σ
𝑘
⋅
𝑉
𝑘
𝑇


Now
𝐴
^
  is a matrix with predicted ratings — even for the places where we had missing values before!

In [7]:
# Step 2: Apply Truncated SVD (Dimensionality Reduction)
svd = TruncatedSVD(n_components=20)  # Choose number of latent features (can tune)
latent_matrix = svd.fit_transform(user_item_matrix)
# Step 3: Compute similarity between items (movies)
# First, get the movie feature matrix (transpose of V matrix from SVD)
movie_features = svd.components_.T  # Each row is a movie vector in k-dim space
movie_similarity = cosine_similarity(movie_features)


In [9]:
# Step 4: Map movieId to title
movie_id_to_title = dict(zip(movies['movie_id'], movies['title']))
title_to_index = {title: idx for idx, title in enumerate(user_item_matrix.columns)}

In [16]:
# Step 5: Define a function to get recommendations
def recommend_svd(movie_title, n=10):
    movie_idx = title_to_index.get(movie_title)
    if movie_idx is None:
        return "Movie not found!"

    sim_scores = list(enumerate(movie_similarity[movie_idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    top_indices = [i[0] for i in sim_scores[1:n+1]]  # skip the movie itself

    # Get recommended movie titles directly using user_item_matrix.columns
    recommended_titles = [user_item_matrix.columns[i] for i in top_indices]

    return recommended_titles # Return the recommended titles




In [17]:
recommend_svd("Star Wars (1977)")


['Return of the Jedi (1983)',
 'Empire Strikes Back, The (1980)',
 'Godfather, The (1972)',
 'Men in Black (1997)',
 'King of New York (1990)',
 "Boy's Life 2 (1997)",
 'Indiana Jones and the Last Crusade (1989)',
 'Toy Story (1995)',
 'Raiders of the Lost Ark (1981)',
 "My Best Friend's Wedding (1997)"]