#Simple Movie Recommender Using SVD

Given a movie title, we’ll use Singular Value Decomposition (SVD) to recommend other movies based on user ratings.

Filtering and recommending based on information given by other users is known as collaborative filtering. The assumption is that people with similar movie tastes are most likely to give similar movie ratings. So, if I’m looking for a new movie and I’ve watched The Matrix, this method will recommend movies that have a similar rating pattern to The Matrix across a set of users.

In [1]:

import numpy as np
import pandas as pd

Read the files with pandas

In [2]:

data = pd.io.parsers.read_csv('/content/ratings.dat',
    names=['user_id', 'movie_id', 'rating', 'time'],
    engine='python', delimiter='::')
movie_data = pd.read_csv('/content/movies.dat', names=['movie_id', 'title', 'genre'], engine='python', delimiter='::', encoding='ISO-8859-1')


In [3]:
data.head()

Unnamed: 0,user_id,movie_id,rating,time
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [4]:
movie_data.head()

Unnamed: 0,movie_id,title,genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


 Create the ratings matrix of shape (m×u
) with rows as movies and columns as users

In [5]:
movie_data['movie_id'].unique()

array([   1,    2,    3, ..., 3950, 3951, 3952])

In [6]:
movie_data['movie_id'].nunique()


3883

In [7]:

ratings_mat = np.ndarray(
    shape=(np.max(data.movie_id.values), np.max(data.user_id.values)),
    dtype=np.uint8)
ratings_mat[data.movie_id.values-1, data.user_id.values-1] = data.rating.values

In [8]:
ratings_mat

array([[  5, 180,  94, ...,   0,   0,   3],
       [253,   2,   0, ...,   0,   0,   0],
       [ 13,   6,   0, ...,   0,   0,   0],
       ...,
       [  4,   0,   0, ...,   0,   0,   0],
       [  5,   0,   0, ...,   0,   0,   0],
       [  2,   0,   0, ...,   0,   0,   0]], dtype=uint8)

 Normalise matrix (subtract mean off)

In [9]:
normalised_mat = ratings_mat - np.asarray([(np.mean(ratings_mat, 1))]).T


In [10]:
normalised_mat

array([[ -6.73807947, 168.26192053,  82.26192053, ..., -11.73807947,
        -11.73807947,  -8.73807947],
       [238.06142384, -12.93857616, -14.93857616, ..., -14.93857616,
        -14.93857616, -14.93857616],
       [ -2.45463576,  -9.45463576, -15.45463576, ..., -15.45463576,
        -15.45463576, -15.45463576],
       ...,
       [  3.48476821,  -0.51523179,  -0.51523179, ...,  -0.51523179,
         -0.51523179,  -0.51523179],
       [  4.51572848,  -0.48427152,  -0.48427152, ...,  -0.48427152,
         -0.48427152,  -0.48427152],
       [  1.35960265,  -0.64039735,  -0.64039735, ...,  -0.64039735,
         -0.64039735,  -0.64039735]])

Compute SVD

In [11]:
A = normalised_mat.T / np.sqrt(ratings_mat.shape[0] - 1)
U, S, V = np.linalg.svd(A)

Calculate cosine similarity, sort by most similar and return the top N.

In [12]:
def top_cosine_similarity(data, movie_id, top_n=10):
    index = movie_id - 1 # Movie id starts from 1
    movie_row = data[index, :]
    magnitude = np.sqrt(np.einsum('ij, ij -> i', data, data))

    # Check for zero magnitude before performing division
    magnitude_product = magnitude[index] * magnitude
    magnitude_product[magnitude_product == 0] = 1.0  # Replace zeros with a non-zero value

    similarity = np.dot(movie_row, data.T) / magnitude_product
    sort_indexes = np.argsort(-similarity)
    return sort_indexes[:top_n]


# Helper function to print top N similar movies
def print_similar_movies(movie_data, movie_id, top_indexes):
    print('Recommendations for {0}: \n'.format(
    movie_data[movie_data.movie_id == movie_id].title.values[0]))
    for id in top_indexes + 1:
        print(movie_data[movie_data.movie_id == id].title.values[0])

Select k
 principal components to represent the movies, a movie_id to find recommendations and print the top_n results.

In [13]:
k = 50
movie_id = 1 # Grab an id from movies.dat
top_n = 10

sliced = V.T[:, :k] # representative data
indexes = top_cosine_similarity(sliced, movie_id, top_n)
print_similar_movies(movie_data, movie_id, indexes)

Recommendations for Toy Story (1995): 

Toy Story (1995)
Rain Man (1988)
Sixteen Candles (1984)
Grease (1978)
There's Something About Mary (1998)
Pleasantville (1998)
Christmas Vacation (1989)
Heathers (1989)
Little Mermaid, The (1989)
Stripes (1981)


We can change k and use different number of principal components to represent our dataset. This is essentially performing dimensionality reduction.

In [14]:
k = 50
movie_id = 3950 # Grab an id from movies.dat
top_n = 10

sliced = V.T[:, :k] # representative data
indexes = top_cosine_similarity(sliced, movie_id, top_n)
print_similar_movies(movie_data, movie_id, indexes)

Recommendations for Tigerland (2000): 

Tigerland (2000)
Adrenalin: Fear the Rush (1996)
Kids of Survival (1993)
Machine, The (1994)
Amityville Curse, The (1990)
Leather Jacket Love Story (1997)
Terror in a Texas Town (1958)
B*A*P*S (1997)
Falling in Love Again (1980)
Tickle in the Heart, A (1996)
