# Movie Recommendation System using SVD
In this notebook, we'll build a simple movie recommendation system using **Singular Value Decomposition (SVD)**. We'll use the MovieLens dataset to find movies similar to a randomly selected movie based on user ratings.

## Import Libraries

In [29]:
import pandas as pd
import numpy as np
import random
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import TruncatedSVD

## Load the Data
First, we'll load the user ratings and movie titles, and then merge them into a single dataframe.

In [30]:
# Load ratings data
column_names = ['user_id', 'item_id', 'rating', 'timestamp']
ratings = pd.read_csv('../data/movielens/u.data', sep='\t', names=column_names)

# Load movie titles
movie_titles = pd.read_csv('../data/movielens/u.item', sep='|', encoding='latin-1',
                           usecols=[0, 1], names=['item_id', 'title'])

# Merge the datasets
data = pd.merge(ratings, movie_titles, on='item_id')

## Explore the Data
Let's take a quick look at the data to understand its structure.

In [5]:
data.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,63,242,3,875747190,Kolya (1996)
2,226,242,5,883888671,Kolya (1996)
3,154,242,3,879138235,Kolya (1996)
4,306,242,5,876503793,Kolya (1996)


## Create the User-Item Ratings Matrix
We pivot the data to create a matrix where each row represents a user and each column represents a movie. The values are the ratings given by users to movies.

In [10]:
ratings_matrix = data.pivot_table(index='user_id', columns='title', values='rating')

## Fill Missing Values
Since not all users have rated all movies, there will be missing values in the matrix. We'll replace these missing values with zeros to prepare the data for SVD.

In [31]:
ratings_matrix_filled = ratings_matrix.fillna(0)

## Perform Singular Value Decomposition (SVD)
We use SVD to reduce the dimensionality of the ratings matrix and uncover latent features that capture the underlying structure in the data.

In [32]:
# Perform SVD with reduced components
svd = TruncatedSVD(n_components=20)
svd_matrix = svd.fit_transform(ratings_matrix_filled.T)

### Understanding SVD

SVD factorizes the matrix $ A $ into three matrices:

$$
A \approx U \Sigma V^T
$$

- $ U $ : Left singular vectors (movies in latent feature space)
- $ \Sigma $ : Singular values (importance of each latent feature)
- $ V^T $ : Right singular vectors (users in latent feature space)

By reducing the number of components, we capture the most significant patterns in user ratings.

### Build the Recommendation Function

We define a function to find movies similar to a randomly selected movie based on cosine similarity in the reduced feature space.


In [33]:
def get_similar_movies_svd(num_movies=5):
    # Pick a random movie from the list of available titles
    random_movie = random.choice(ratings_matrix.columns)
    print(f"Randomly selected movie: {random_movie}")
    
    # Find the index of the movie in the original ratings matrix
    movie_idx = ratings_matrix.columns.get_loc(random_movie)
    
    # Get the vector for the specified movie in the reduced feature space
    movie_vec = svd_matrix[movie_idx].reshape(1, -1)
    
    # Compute cosine similarity with other movies in the reduced space
    cosine_sim = cosine_similarity(movie_vec, svd_matrix)[0]
    
    # Get indices of the top similar movies
    similar_idx = np.argsort(cosine_sim)[-num_movies-1:-1][::-1]
    
    # Return the titles of the most similar movies
    return ratings_matrix.columns[similar_idx].tolist()


## Get Movie Recommendations
Now, let's get recommendations based on a randomly selected movie.

In [34]:
# Run the function to see recommendations for a random movie
recommended_movies = get_similar_movies_svd(5)
print("Recommended movies:", recommended_movies)

Randomly selected movie: Dumbo (1941)
Recommended movies: ['Snow White and the Seven Dwarfs (1937)', 'Pinocchio (1940)', 'Cinderella (1950)', 'Alice in Wonderland (1951)', 'Mary Poppins (1964)']
