#### Name: Swapnil Chavan
#### Week : 10
#### Assignment: Recommender System
#### Mentor: Prof. Neugebauer

##### Problem Statement

Using the small MovieLens data set, create a recommender system that allows users to input a movie they like (in the data set) and recommends ten other movies for them to watch. In your write-up, clearly explain the recommender system process and all steps performed. If you are using a method found online, be sure to reference the source.

[Link to the dataset is here](https://grouplens.org/datasets/movielens/)

##### Solution

First thing is to read the data. For this I need pandas package. I will import pandas.

In [2]:
import pandas as pd

Read ratings data and movies data

In [4]:
ratings = pd.read_csv('MovieLenseDataset/ratings.csv')

In [5]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [6]:
movies = pd.read_csv('MovieLenseDataset/movies.csv')

In [7]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


My intention next is to create a user-item matrix that has each row representing a user and each column representong a movie. The values in the matrix will be the ratings that users have given to movies.

In [20]:
user_movie_ratings = ratings.pivot(index='userId', columns='movieId', values='rating')

In [21]:
user_movie_ratings.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,,4.0,,,4.0,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,4.0,,,,,,,,,,...,,,,,,,,,,


There are a lot of NaN values. I will replace those with 0s

In [22]:
user_movie_ratings = user_movie_ratings.fillna(0)

In [23]:
user_movie_ratings.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,4.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Sparse Matrix is an effecient way of matrix that involves lot of 0s. I will do that now. I need to include necessary package for that.

In [24]:
from scipy.sparse import csr_matrix

In [25]:
user_movie_ratings_matrix = csr_matrix(user_movie_ratings.values)

The next step I will do here is to find cosine similarity.Cosine similarity is a mathematical metric used to measure the similarity between two vectors in a multi-dimensional space, particularly in high-dimensional spaces, by calculating the cosine of the angle between them

I will import the required package for cosisne similarity.

In [26]:
from sklearn.metrics.pairwise import cosine_similarity

In [27]:
cosine_sim = cosine_similarity(user_movie_ratings_matrix.T)

I will write a reusable code here i.e. a user defined function. This function will accepy a movie name as an imput and recommend us the top 10 similar movies.

In [28]:
def recommend_movies(movie_title):

    movie_idx = movies[movies['title'] == movie_title].index[0]
    sim_scores = list(enumerate(cosine_sim[movie_idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    movie_indices = [i[0] for i in sim_scores]
    return movies['title'].iloc[movie_indices]

I will try running this for a couple of movies

In [34]:
in_movie = input("Enter a Movie Name")
recommended_movies = recommend_movies(in_movie)
print(f"\n You entered the movie named, '{in_movie}'") 
print(f"\n And the recomended movies with the similarities are,\n\n{recommended_movies}")

Enter a Movie Name Balto (1995)



 You entered the movie named, 'Balto (1995)'

 And the recomended movies with the similarities are,

1137                              Devil's Own, The (1997)
1399                               Lethal Weapon 4 (1998)
1166                                   Schizopolis (1996)
2545               Captain Horatio Hornblower R.N. (1951)
3284    Way of the Dragon, The (a.k.a. Return of the D...
3494                                       Glitter (2001)
4344                                Chorus Line, A (1985)
578                               Oliver & Company (1988)
49                                  Big Green, The (1995)
222                  Kid in King Arthur's Court, A (1995)
Name: title, dtype: object
