# Week 10 Recommendation Systems

## 10.2 Exercise: Recommender System

***Instructions)***

Using the small MovieLens data set, create a recommender system that allows users to input a movie they like (in the data set) and recommends ten other movies for them to watch. In your write-up, clearly explain the recommender system process and all steps performed. If you are using a method found online, be sure to reference the source.

You can use R or Python to complete this assignment. Submit your code and output to the submission link. Make sure to add comments to all of your code and to document your steps, process, and analysis.

***Answer)***

**#0. Import the data**

In [2]:
# Load necessary libraries
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors


import matplotlib.pyplot as plt

In [3]:
# reading the data
movies_df = pd.read_csv('C:/Users/ivan2/gitLocal/DSC630-SPRING2024/WK10-movies.csv')
ratings_df = pd.read_csv('C:/Users/ivan2/gitLocal/DSC630-SPRING2024/WK10-ratings.csv')

movies_df.head(), ratings_df.head()

(   movieId                               title  \
 0        1                    Toy Story (1995)   
 1        2                      Jumanji (1995)   
 2        3             Grumpier Old Men (1995)   
 3        4            Waiting to Exhale (1995)   
 4        5  Father of the Bride Part II (1995)   
 
                                         genres  
 0  Adventure|Animation|Children|Comedy|Fantasy  
 1                   Adventure|Children|Fantasy  
 2                               Comedy|Romance  
 3                         Comedy|Drama|Romance  
 4                                       Comedy  ,
    userId  movieId  rating  timestamp
 0       1        1     4.0  964982703
 1       1        3     4.0  964981247
 2       1        6     4.0  964982224
 3       1       47     5.0  964983815
 4       1       50     5.0  964982931)

In [4]:
movies_df.shape, ratings_df.shape

((9742, 3), (100836, 4))

**1. Building the Recommendation System**

One of the simplest yet effective methods: item-based collaborative filtering. This approach leverages the similarities between movies based on user ratings to make recommendations.

We will only focus on the movies.csv and ratings.csv files. Mainly because the movies.csv files has the metadata for each movie which is used as a mapping file with 'movieID' to the raitings.csv file.

The ratings are the core data for the collaborative filtering approach. By analyzing how users rate different movies, we can compute similarities between movies based on user preferences. The userId and movieId help map which user rated which movie, and the rating provides the value that influences the similarity calculations.

tags.csv
- For pure item-based collaborative filtering based solely on rating patterns, tags are not required.

links.csv
- Does not contribute to the internal calculation of movie similarities based on ratings within the MovieLens dataset.

***Next Steps***

By transforming the ratings data into this matrix, we can easily compute similarities between movies based on their ratings across multiple users.

In [5]:
# Creating the user-movie ratings matrix
ratings_pivot = ratings_df.pivot_table(index='movieId', columns='userId', values='rating').fillna(0)

Wser-movie ratings are typically sparse, meaning most users have not rated most movies. A sparse matrix format like CSR (Compressed Sparse Row) is designed to store only the non-zero elements, which significantly reduces memory usage.

In [None]:
# Converting the pivot table to a sparse matrix
ratings_matrix = csr_matrix(ratings_pivot.values)

Now we initiate and fit the model.

In [6]:
# Instantiate the model using cosine similarity
model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=11, n_jobs=-1)

# Fit the model
model_knn.fit(ratings_matrix)

With the model ready now I set up a variable that is used to search a movie title, and a data frame that displays any movies that contains the respective title.

In [7]:
# variable that houses the movie to search for recommendations
movie_to_search = "Pirates of the Caribbean"

# find any movie title containing the title entered in 'movie_to_search' variable
movies_df[movies_df['title'].str.contains(movie_to_search, case=False, na=False)]

Unnamed: 0,movieId,title,genres
4427,6539,Pirates of the Caribbean: The Curse of the Bla...,Action|Adventure|Comedy|Fantasy
6221,45722,Pirates of the Caribbean: Dead Man's Chest (2006),Action|Adventure|Fantasy
6488,53125,Pirates of the Caribbean: At World's End (2007),Action|Adventure|Comedy|Fantasy
7608,86880,Pirates of the Caribbean: On Stranger Tides (2...,Action|Adventure|Fantasy|IMAX
8687,122896,Pirates of the Caribbean: Dead Men Tell No Tal...,(no genres listed)


Using the ID of the title searched using the dataframe above, we can plug the movieID of our choosing into the 'movie_id' varible.
Next we execute the following steps
- Use the k-NN model to find the 10 nearest neighbor movies based on user ratings for the specified movieId.
- Extract the movieId values of these nearest neighbor movies, excluding the input movie itself.
- Retrieve the titles and genres of the recommended movies from the movies_df DataFrame.

In [8]:
# Get the recommendations using the correct movieId from the 'movie_to_search' variable
movie_id = 6539
distances, indices = model_knn.kneighbors(ratings_pivot.loc[movie_id, :].values.reshape(1, -1), n_neighbors=11)

# Extract movie titles from the indices
recommended_movie_ids = [ratings_pivot.index[i] for i in indices.flatten()][1:]  # exclude the first one as it is the input movie itself
recommended_movies = movies_df[movies_df['movieId'].isin(recommended_movie_ids)]
recommended_movies

Unnamed: 0,movieId,title,genres
3194,4306,Shrek (2001),Adventure|Animation|Children|Comedy|Fantasy|Ro...
3614,4963,Ocean's Eleven (2001),Crime|Thriller
3638,4993,"Lord of the Rings: The Fellowship of the Ring,...",Adventure|Fantasy
4137,5952,"Lord of the Rings: The Two Towers, The (2002)",Adventure|Fantasy
4360,6377,Finding Nemo (2003),Adventure|Animation|Children|Comedy
4615,6874,Kill Bill: Vol. 1 (2003),Action|Crime|Thriller
4800,7153,"Lord of the Rings: The Return of the King, The...",Action|Adventure|Drama|Fantasy
5374,8961,"Incredibles, The (2004)",Action|Adventure|Animation|Children|Comedy
5917,33794,Batman Begins (2005),Action|Crime|IMAX
6221,45722,Pirates of the Caribbean: Dead Man's Chest (2006),Action|Adventure|Fantasy


**Sources**

Qutbuddin, M. (2020, March 7). *Comprehensive Guide on Item Based Collaborative Filtering.* Towardsdatascience.com. Retrieved May 18, 2024, from https://towardsdatascience.com/comprehensive-guide-on-item-based-recommendation-systems-d67e40e2b75d#:~:text=Item%2Ditem%20collaborative%20filtering%20is,great%20role%20in%20Amazon's%20success.

S. (2023, August 11). *Item-based Collaborative Filtering : Build Your own Recommender System!* Analytics Vidhya. Retrieved May 18, 2024, from https://www.analyticsvidhya.com/blog/2021/05/item-based-collaborative-filtering-build-your-own-recommender-system/