In [8]:
import pandas as pd
from sklearn.metrics import mean_squared_error
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors


In [9]:
# Load the datasets
ratings_df = pd.read_csv("movies.csv")
movies_df = pd.read_csv("ratings.csv")

# Display the first few rows of each dataframe
(ratings_df.head(), movies_df.head())


(   movieId                               title  \
 0        1                    Toy Story (1995)   
 1        2                      Jumanji (1995)   
 2        3             Grumpier Old Men (1995)   
 3        4            Waiting to Exhale (1995)   
 4        5  Father of the Bride Part II (1995)   
 
                                         genres  
 0  Adventure|Animation|Children|Comedy|Fantasy  
 1                   Adventure|Children|Fantasy  
 2                               Comedy|Romance  
 3                         Comedy|Drama|Romance  
 4                                       Comedy  ,
    userId  movieId  rating  timestamp
 0       1        1     4.0  964982703
 1       1        3     4.0  964981247
 2       1        6     4.0  964982224
 3       1       47     5.0  964983815
 4       1       50     5.0  964982931)

In [10]:

# Merge ratings with movies to get movie titles
merged_df = pd.merge(ratings_df, movies_df, on='movieId')

# Create a pivot table
movie_user_matrix = merged_df.pivot_table(index='title', columns='userId', values='rating').fillna(0)

# Convert the pivot table to a sparse matrix for calculations
sparse_matrix = csr_matrix(movie_user_matrix.values)

# Display the first few rows of the pivot table
movie_user_matrix.head()


userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'71 (2014),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
'Hellboy': The Seeds of Creation (2004),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Round Midnight (1986),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Salem's Lot (2004),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Til There Was You (1997),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
# Initialize the Nearest Neighbors model
model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20, n_jobs=-1)

# Fit the model on the sparse matrix
model_knn.fit(sparse_matrix)

# Function to recommend movies based on a movie title 
def recommend_movies(movie_title, movie_user_matrix, model_knn, n_recommendations=10):
  
    if movie_title not in movie_user_matrix.index:
        return "Movie not found in the dataset. Please try another title."
    
    # Query the model for similar movies
    distances, indices = model_knn.kneighbors(movie_user_matrix.loc[movie_title,:].values.reshape(1, -1), n_neighbors=n_recommendations+1)
    
    # Get the titles of the recommended movies
    recommendations = [movie_user_matrix.index[idx] for idx in indices.flatten()[1:]]  # Exclude the input movie itself
    
    return recommendations

# Test the function with a sample movie title
test_movie_title = 'Toy Story (1995)'
recommendations = recommend_movies(test_movie_title, movie_user_matrix, model_knn)
recommendations


['Toy Story 2 (1999)',
 'Jurassic Park (1993)',
 'Independence Day (a.k.a. ID4) (1996)',
 'Star Wars: Episode IV - A New Hope (1977)',
 'Forrest Gump (1994)',
 'Lion King, The (1994)',
 'Star Wars: Episode VI - Return of the Jedi (1983)',
 'Mission: Impossible (1996)',
 'Groundhog Day (1993)',
 'Back to the Future (1985)']

In [12]:
test_movie_title1 = 'Pocahontas (1995)'
recommendations1 = recommend_movies(test_movie_title1, movie_user_matrix, model_knn)
recommendations1

['Beauty and the Beast (1991)',
 'Casper (1995)',
 'Lion King, The (1994)',
 'Aladdin (1992)',
 'Snow White and the Seven Dwarfs (1937)',
 'Pinocchio (1940)',
 'Hunchback of Notre Dame, The (1996)',
 'Santa Clause, The (1994)',
 'Nightmare Before Christmas, The (1993)',
 'Indian in the Cupboard, The (1995)']

I chose to build the recommender system using the MovieLens dataset with a collaborative filtering approach, specifically using a model based on the K-Nearest Neighbors (KNN) algorithm to find movies similar to the one a user inputs. I started by extracting the data, ratings.csv and movies.csv, and loading them into pandas DataFrames. Then I merged the ratings and movies DataFrames to associate movie titles with their ratings. Then, I created a pivot table (user-movie matrix) with movie titles as rows, user IDs as columns, and ratings as values. Missing ratings were filled with 0, indicating that the user hasn't rated that movie. Since the user-movie matrix is mostly zeros, I converted it into a sparse matrix using SciPy's csr_matrix. Next I created a KNN model with cosine similarity as the distance metric. Cosine similarity measures the cosine of the angle between two vectors, in this case, the ratings vectors of two movies. A smaller angle (and thus, a higher cosine similarity) indicates that the movies are similar in terms of user ratings. Then I fitted the KNN model on the sparse matrix representing the user-movie matrix. 

I created a function (recommend_movies) that takes a movie title as input and uses the trained KNN model to find the nearest neighbors (most similar movies) based on cosine similarity of ratings. The function returns a list of recommended movie titles, excluding the input movie. As a test, I used the function to recommend ten movies similar to "Toy Story (1995)," resulting in a list of movies that users who liked "Toy Story" also enjoyed. In summary, the recommender system leverages user ratings to find and recommend movies similar to a user's input. This approach assumes that movies with similar rating patterns by users are likely to be similar in content or appeal, making them good recommendations. The key to this system is the collaborative filtering technique, which focuses on finding relationships between items (movies) based on user interactions rather than the content of the items themselves.