<a id="ref1"></a>
# Content Based Recommender System


Recommendation systems are a collection of algorithms used to recommend items to users based on information taken from the user. These systems have become ubiquitous, and can be commonly seen in online stores, movies databases and job finders. In this notebook, we will implement a simple version of one using Python and the Pandas library.


<a id="ref1"></a>
# 1. Acquiring the Data


Dataset acquired from GroupLens.

In [None]:
!wget -O moviedataset.zip https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%205/data/moviedataset.zip
print('Unzipping dataset...')
!unzip -o -j moviedataset.zip

<a id="ref2"></a>
# 2. Preprocessing


In [None]:
#Dataframe manipulation library
import pandas as pd
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')
movies.head()

In [None]:
movies['year'] = movies.title.str.extract('(\(\d\d\d\d\))', expand=False)
movies['year'] = movies.year.str.extract('(\d\d\d\d)', expand = False)
movies['title'] = movies.title.str.extract(r'(.*)\s\(\d{4}\)')
movies.head()

In [None]:
movies['genres'] = movies.genres.str.split('|')
movies.head()

Since keeping genres in a list format isn't optimal for the content-based recommendation system technique, we will use the One Hot Encoding technique to convert the list of genres to a vector.


In [None]:
genmovies = movies.copy()

#For every row in the dataframe, iterate through the list of genres and place a 1 into the corresponding column
#Filling in the NaN values with 0 to show that a movie doesn't have that column's genre

for index, row in movies.iterrows():
    for genre in row['genres']:
        genmovies.at[index,genre] = int(1)

genmovies.fillna(0, inplace=True)
genmovies.head()

Every row in the ratings dataframe has a user id, a rating and a timestamp of the review. We won't be needing the timestamp column, so drop it to save memory.


In [None]:
ratings.drop('timestamp', axis=1, inplace=True)
ratings.head()

<a id="ref3"></a>
# 3. Preparing training data

Since this technique attempts to figure out a user's favourite genres from the movies and ratings given, we begin by creating an input user to recommend movies to:

In [None]:
userInput = [
            {'title':'Breakfast Club, The', 'rating':5},
            {'title':'Toy Story', 'rating':3.5},
            {'title':'Jumanji', 'rating':2},
            {'title':"Pulp Fiction", 'rating':5},
            {'title':'Akira', 'rating':4.5}
         ] 
inputMovies = pd.DataFrame(userInput)
inputMovies

In [None]:
inputmovieswithId = movies[movies['title'].isin(inputMovies['title'])]
inputMovies = pd.merge(inputmovieswithId, inputMovies)
inputMovies.drop(['genres', 'year'], axis = 1, inplace=True)
inputMovies.head()

In [None]:
usermovies = genmovies[genmovies['movieId'].isin(inputMovies['movieId'].tolist())]
usermovies.head()

In [None]:
usermovies = usermovies.reset_index(drop = True)
usergenremtx = usermovies.drop(['movieId', 'title', 'genres', 'year'], axis = 1)
usergenremtx

In [None]:
usergenremtx.shape

<a id="ref3"></a>
# 4. Training the Recommender model (Creating a user profile)

In [None]:
inputMovies.rating

In [None]:
#dot product to get weights for genres
userprofile = usergenremtx.transpose().dot(inputMovies['rating'])
userprofile

<a id="ref3"></a>
# 5. Final Recommendations

In [None]:
#Now let's get the genres of every movie in our original dataframe
genretable = genmovies.set_index(genmovies['movieId'])
genretable.drop(['movieId', 'title', 'year', 'genres'], axis = 1, inplace=True)
genretable.head()

In [None]:
genretable.shape

Taking the weighted average of every movie based on the input profile and recommending the top twenty movies.


In [None]:
recs = ((genretable*userprofile).sum(axis = 1)) / userprofile.sum()
recs = recs.sort_values(ascending=False)
recs.head(20)

In [None]:
recmovies = movies[movies['movieId'].isin(recs.head(20).keys())]
recmovies