# Content Based Recommender System


## Aim of The Project

A newly established online movie viewing platform wants to make movie recommendations to users. Since the login rate of users is very low, it cannot collect user habits. For this reason, it cannot develop product recommendations with collaborative filtering methods. But it knows which movies the users are watching from the browser tracks (*cookie id*). In this project, movie suggestions will be made based on this information.

In [1]:
# libraries
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

pd.set_option("display.max_columns",None)
pd.set_option("display.width",500)
sns.set(rc={"figure.figsize":(12,12)})

In [2]:
data = pd.read_csv("datas/movies_metadata.csv")
data.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,popularity,poster_path,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",21.946943,/rhIRbceoE9lR4veEXuwCC2wARtG.jpg,"[{'name': 'Pixar Animation Studios', 'id': 3}]","[{'iso_3166_1': 'US', 'name': 'United States o...",1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,17.015539,/vzmL6fP7aPKNKPRTFnZmiUfciyV.jpg,"[{'name': 'TriStar Pictures', 'id': 559}, {'na...","[{'iso_3166_1': 'US', 'name': 'United States o...",1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,11.7129,/6ksm1sjKMFLbO7UY2i6G1ju9SML.jpg,"[{'name': 'Warner Bros.', 'id': 6194}, {'name'...","[{'iso_3166_1': 'US', 'name': 'United States o...",1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",3.859495,/16XOMpEaLWkrcPqSQqhTmeJuqQl.jpg,[{'name': 'Twentieth Century Fox Film Corporat...,"[{'iso_3166_1': 'US', 'name': 'United States o...",1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,8.387519,/e64sOI48hQXyru7naBFyssKFxVd.jpg,"[{'name': 'Sandollar Productions', 'id': 5842}...","[{'iso_3166_1': 'US', 'name': 'United States o...",1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


In [3]:
data.shape

(45466, 24)

In [4]:
data["overview"].head(10)

0    Led by Woody, Andy's toys live happily in his ...
1    When siblings Judy and Peter discover an encha...
2    A family wedding reignites the ancient feud be...
3    Cheated on, mistreated and stepped on, the wom...
4    Just when George Banks has recovered from his ...
5    Obsessive master thief, Neil McCauley leads a ...
6    An ugly duckling having undergone a remarkable...
7    A mischievous young boy, Tom Sawyer, witnesses...
8    International action superstar Jean Claude Van...
9    James Bond must unmask the mysterious head of ...
Name: overview, dtype: object

In [5]:
data["overview"].isnull().sum()

954

# Creating the TF-IDF Matrix

## Cosine Similarity Calculator

In [7]:
def calculate_cos_sim(dataframe):
    
    '''
        This returns a calculated cosine similarity array

        :param dataframe: dataframe
        :type dataframe: pandas dataframe

        :returns: array of cosine similarity
    '''
    
    tfidf = TfidfVectorizer(stop_words="english") # to remove the stopwords from the dataset
    dataframe["overview"] = dataframe["overview"].fillna("")

    tfdif_matrix = tfidf.fit_transform(dataframe["overview"])
    cos_sim = cosine_similarity(tfdif_matrix, tfdif_matrix)
    
    return cos_sim

## Recommendation Based on Similarities

In [8]:
def content_based_recommender(title, cos_sim, dataframe):
    
    '''
        This funtion returns a recommendation list based on the calculated similarity

        :param title: the title of the movie
        :param cos_sim: array of a calculated cosine similarity
        :param dataframe: dataframe
        :type title: str
        :type cos_sim: numpy.ndarray
        :type dataframe: pandas dataframe

        :returns: list of recommended movies
    '''
    
    indices = pd.Series(dataframe.index, index = dataframe["title"])
    indices = indices[~indices.index.duplicated(keep="last")]  # removing the duplications
    movie_index = indices[title]
    
    
    similarity_scores = pd.DataFrame(cos_sim[movie_index], columns=["score"])
    movie_indices = similarity_scores.sort_values("score",ascending=False)[1:11].index # first index is movie itself
    
    return dataframe["title"].iloc[movie_indices]

In [13]:
cos_sim = calculate_cos_sim(data)

In [14]:
content_based_recommender("The Green Mile",cos_sim,data)

6067                       Mr. North
45335              American Violence
19793                        Capital
31473                    The Big Ask
41677         Murder in a Blue World
34177                Cruel & Unusual
43512    The Diary of Ellen Rimbauer
31625       Lake Placid vs. Anaconda
14782                  Mad Detective
16430                       Rasputin
Name: title, dtype: object

In [15]:
content_based_recommender("Pulp Fiction",cos_sim,data)

34007                            From Mexico With Love
45423    The Fortunes and Misfortunes of Moll Flanders
14803           The First Day of the Rest of Your Life
1190                                         The Sting
640                                      Moll Flanders
32614                                Kill Your Friends
30706                                 Baby Face Nelson
19015                           Ladies They Talk About
3563                                    Prizzi's Honor
40700                                  Watch Your Left
Name: title, dtype: object

In [16]:
content_based_recommender("The Shawshank Redemption",cos_sim,data)

16947                   They Made Me a Fugitive
6548                                Civil Brand
39141                         Seven Times Seven
11327                               Brute Force
36701                            Women's Prison
17446                           Girls in Prison
9391                                    In Hell
34185                   Women's Prison Massacre
9225     Female Prisoner Scorpion: Jailhouse 41
41606                           Alcatraz Island
Name: title, dtype: object