# Movie Recommender System (Content Based Recommender)

This was done as a part of __Machine Learning with Python__ course of __IBM Data Science Professional Certificate__ on Coursera

You can find the dataset used for this Contend Based Movie Recommender System at https://grouplens.org/datasets/movielens/latest/ (small dataset)

#### Importing required library

In [1]:
import pandas as pd

#### Loading the data

In [2]:
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')

In [3]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [4]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


#### Removing the year from 'title' column of movies dataframe and storing it in a seperate 'year' column

In [5]:
movies['year'] = movies.title.str.extract('(\(\d\d\d\d\))',expand=False)
movies['year'] = movies.year.str.extract('(\d\d\d\d)',expand=False)
movies['title'] = movies.title.str.replace('(\(\d\d\d\d\))', '')
movies['title'] = movies['title'].apply(lambda x: x.strip())
movies.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,Adventure|Animation|Children|Comedy|Fantasy,1995
1,2,Jumanji,Adventure|Children|Fantasy,1995
2,3,Grumpier Old Men,Comedy|Romance,1995
3,4,Waiting to Exhale,Comedy|Drama|Romance,1995
4,5,Father of the Bride Part II,Comedy,1995


#### Splitting the values in 'genres' column into a list

In [6]:
movies['genres'] = movies.genres.str.split('|')
movies.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995
2,3,Grumpier Old Men,"[Comedy, Romance]",1995
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995
4,5,Father of the Bride Part II,[Comedy],1995


#### Using One Hot Encoding technique to convert the list of genres to a vector where each column corresponds to one possible value of the feature

In [7]:
movies_genres = movies.copy()

for index, row in movies.iterrows():
    for genre in row['genres']:
        movies_genres.at[index, genre] = 1

movies_genres = movies_genres.fillna(0)

movies_genres.head()

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,1.0,1.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,Grumpier Old Men,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5,Father of the Bride Part II,[Comedy],1995,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Creating a content based recommender for user of userId 1

User details can be obtained from ratings dataframe

In [8]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


#### Removing the 'timestamp' column of ratings dataframe as it isn't necessary for processing and it saves memory

In [9]:
ratings = ratings.drop('timestamp', 1)
ratings.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


#### Getting the movie ratings of userId 1 with movie IDs and ratings

In [10]:
user1 = ratings[ratings['userId'] == 1]
user1.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


#### From movie IDs of userId 1, obtaining the movies' details from movie_genres dataframe

In [11]:
user1Movies = movies_genres[movies_genres['movieId'].isin(user1['movieId'].tolist())]
user1Movies.head()

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,1.0,1.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,Grumpier Old Men,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,6,Heat,"[Action, Crime, Thriller]",1995,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
43,47,Seven (a.k.a. Se7en),"[Mystery, Thriller]",1995,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
46,50,"Usual Suspects, The","[Crime, Mystery, Thriller]",1995,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Removing the 'movieId', 'title', 'genres' and 'year' columns to create a 'user1GenreTable' dataframe

In [12]:
user1Movies = user1Movies.reset_index(drop=True)
user1GenreTable = user1Movies.drop('movieId', 1).drop('title', 1).drop('genres', 1).drop('year', 1)
user1GenreTable.head()

Unnamed: 0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Obtaining the user profile of userId 1 by multiplying the user1GenreTable and ratings of userId 1

In [13]:
user1Profile = user1GenreTable.transpose().dot(user1['rating'].reset_index(drop=True))
user1Profile

Adventure             373.0
Animation             136.0
Children              191.0
Comedy                355.0
Fantasy               202.0
Romance               112.0
Drama                 308.0
Action                389.0
Crime                 196.0
Thriller              228.0
Horror                 59.0
Mystery                75.0
Sci-Fi                169.0
War                    99.0
Musical               103.0
Documentary             0.0
IMAX                    0.0
Western                30.0
Film-Noir               5.0
(no genres listed)      0.0
dtype: float64

#### Removing the movies that userId 1 already watched from movies_genres dataframe

In [14]:
movies_filtered = movies_genres[~movies_genres['movieId'].isin(user1['movieId'].tolist())]
movies_filtered.head()

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5,Father of the Bride Part II,[Comedy],1995,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,7,Sabrina,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,8,Tom and Huck,"[Adventure, Children]",1995,1.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Creating a genreTable dataframe containing only the genres of each movie to be recommended by removing 'movieId', 'title', 'genres' and 'year' columns 

In [15]:
genreTable = movies_filtered.set_index(movies_filtered['movieId'])
genreTable = genreTable.drop('movieId', 1).drop('title', 1).drop('genres', 1).drop('year', 1)
genreTable.head()

Unnamed: 0_level_0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Obtaining the weighted average of each movie 

In [16]:
recommendationTable = ((genreTable*user1Profile).sum(axis=1))/(user1Profile.sum())
recommendationTable

movieId
2         0.252805
4         0.255776
5         0.117162
7         0.154125
8         0.186139
            ...   
193581    0.357096
193583    0.228713
193585    0.101650
193587    0.173267
193609    0.117162
Length: 9510, dtype: float64

#### Sorting the weighted averages in descending order

In [17]:
recommendationTable = recommendationTable.sort_values(ascending=False)
recommendationTable

movieId
81132     0.666007
117646    0.612211
71999     0.587789
4956      0.582508
4719      0.568977
            ...   
127172    0.000000
69953     0.000000
166024    0.000000
165969    0.000000
8622      0.000000
Length: 9510, dtype: float64

#### Displaying the details of top 5 movies as recommendation for userId 1 from weighted averages

In [18]:
movies[movies['movieId'].isin(recommendationTable.head(5).keys())]

Unnamed: 0,movieId,title,genres,year
3460,4719,Osmosis Jones,"[Action, Animation, Comedy, Crime, Drama, Roma...",2001
3608,4956,"Stunt Man, The","[Action, Adventure, Comedy, Drama, Romance, Th...",1980
7170,71999,Aelita: The Queen of Mars (Aelita),"[Action, Adventure, Drama, Fantasy, Romance, S...",1924
7441,81132,Rubber,"[Action, Adventure, Comedy, Crime, Drama, Film...",2010
8597,117646,Dragonheart 2: A New Beginning,"[Action, Adventure, Comedy, Drama, Fantasy, Th...",2000


### Creating a Recommender  Function for a User

The function recommend_movies() takes userId as input and returns 5 movies as a recommendation for that user

In [19]:
def recommend_movies(userId) :
    user = ratings[ratings['userId'] == userId]
    userMovies = movies_genres[movies_genres['movieId'].isin(user['movieId'].tolist())]
    userMovies = userMovies.reset_index(drop=True)
    userGenreTable = userMovies.drop('movieId', 1).drop('title', 1).drop('genres', 1).drop('year', 1)
    userProfile = userGenreTable.transpose().dot(user['rating'].reset_index(drop=True))    
    movies_filtered = movies_genres[~movies_genres['movieId'].isin(user['movieId'].tolist())]  
    genreTable = movies_filtered.set_index(movies_filtered['movieId'])
    genreTable = genreTable.drop('movieId', 1).drop('title', 1).drop('genres', 1).drop('year', 1)
    recommendationTable = ((genreTable*userProfile).sum(axis=1))/(userProfile.sum())
    recommendationTable = recommendationTable.sort_values(ascending=False)
    return movies[movies['movieId'].isin(recommendationTable.head(5).keys())]

#### Movies recommendation for userId 1

In [20]:
recommend_movies(1)

Unnamed: 0,movieId,title,genres,year
3460,4719,Osmosis Jones,"[Action, Animation, Comedy, Crime, Drama, Roma...",2001
3608,4956,"Stunt Man, The","[Action, Adventure, Comedy, Drama, Romance, Th...",1980
7170,71999,Aelita: The Queen of Mars (Aelita),"[Action, Adventure, Drama, Fantasy, Romance, S...",1924
7441,81132,Rubber,"[Action, Adventure, Comedy, Crime, Drama, Film...",2010
8597,117646,Dragonheart 2: A New Beginning,"[Action, Adventure, Comedy, Drama, Fantasy, Th...",2000


#### Movies recommendation for userId 5

In [21]:
recommend_movies(5)

Unnamed: 0,movieId,title,genres,year
1390,1907,Mulan,"[Adventure, Animation, Children, Comedy, Drama...",1998
3460,4719,Osmosis Jones,"[Action, Animation, Comedy, Crime, Drama, Roma...",2001
3608,4956,"Stunt Man, The","[Action, Adventure, Comedy, Drama, Romance, Th...",1980
7441,81132,Rubber,"[Action, Adventure, Comedy, Crime, Drama, Film...",2010
8349,108540,Ernest & Célestine (Ernest et Célestine),"[Adventure, Animation, Children, Comedy, Drama...",2012


#### Movies recommendation for userId 100

In [22]:
recommend_movies(100)

Unnamed: 0,movieId,title,genres,year
1390,1907,Mulan,"[Adventure, Animation, Children, Comedy, Drama...",1998
3460,4719,Osmosis Jones,"[Action, Animation, Comedy, Crime, Drama, Roma...",2001
3608,4956,"Stunt Man, The","[Action, Adventure, Comedy, Drama, Romance, Th...",1980
5476,26236,"White Sun of the Desert, The (Beloe solntse pu...","[Action, Adventure, Comedy, Drama, Romance, War]",1970
6094,42015,Casanova,"[Action, Adventure, Comedy, Drama, Romance]",2005
