# TMDB Open Source API to apply NLP Content-Based Recommendation System for Streaming App Data | Part 2

# Why Part 2?
If we noticed from the data we have from the previous post (Recommendation System Using (Rank, User-User, Matrix Factorization) on Streaming App). We have limited advantages from the dataset. Some of the examples we lack of is:
* movie/tv year released to understand years' trends
* full-content details such as overviews of the content would help us in applying content-based recommendations
* more tags of genre because we know each content can have more genre
* metrics of each content such as ratings, votes, and reviews. These can help identify which content has more value. Since STC did not add their ratings, we cannot conclude local popularity by using ratings because these ratings were collected globally from TMDB. Therefore, we can add ratings globally. For example, "Most Voted Movies Globally"

# Business Understanding
In the previous article, I had planned to add a content-based recommendation system but unfortunately, that was not possible due to a lack of data points from STC that can help us to apply the content-based recommendation. In this article, our main goal is to make that happen by feeding our data with another source of data using TMDB OpenSource API. This will help us fetch required data points such as release date, overview, and genre.

# Objective
Our goal here is simple, we will try to get as much information as possible to feed our dataset. We have a list of Movies/TVs from STC JAWWY, and we want to use TMDB API to get more information about STC movies. 
Steps:
1. Identfy Movies and TV Shows Names from STC datasets
2. Search these the Movies and TV Shows in TMDB API 
3. Merge new variables into our main df
4. build content-based recommendation system based on cosine similarity

# Loading Libraries and API Package

You need to create a developer account (FREE)
to better work on the API and to register here are good sources to get started: 
- https://github.com/AnthonyBloomer/tmdbv3api
- you will find your personal API here: https://www.themoviedb.org/settings/api



In [1]:
from tmdbv3api import TMDb
from tmdbv3api import Movie
from tmdbv3api import TV
from tmdbv3api import Genre, Search

tmdb = TMDb()
tmdb.api_key = 
tmdb.language = 'en'
tmdb.debug = True

movie = Movie()
tv = TV()
genre = Genre()

In [2]:
import pandas as pd
pd.options.mode.chained_assignment = None
import numpy as np
import json as json
import pickle

# Loading STC JAWWY Data

In [4]:
stc = pd.read_csv('Final_Dataset.txt', delimiter = ",")
stc = stc.drop('Unnamed: 0', axis=1)

In [5]:
df = stc.copy() # the data is heavy to reread so this is our check poing

# creating unique id for videos
df['vid_id'] = df.groupby(['original_name']).ngroup()
df.head()

Unnamed: 0,date_,user_id_maped,program_name,duration_seconds,program_class,season,episode,program_desc,program_genre,series_title,hd,original_name,vid_id
0,2017-05-27,26138,100 treets,40,MOVIE,0,0,Drama Movie100 Streets,Drama,0,0,100 treets,2
1,2017-05-21,7946,Moana,17,MOVIE,0,0,Animation MovieMoana (HD),Animation,0,1,Moana,924
2,2017-08-10,7418,The Mermaid Princess,8,MOVIE,0,0,Animation MovieThe Mermaid Princess (HD),Animation,0,1,The Mermaid Princess,1524
3,2017-07-26,19307,The Mermaid Princess,76,MOVIE,0,0,Animation MovieThe Mermaid Princess (HD),Animation,0,1,The Mermaid Princess,1524
4,2017-07-07,15860,Churchill,87,MOVIE,0,0,Biography MovieChurchill (HD),Biography,0,1,Churchill,317


In [7]:
df['original_name'] = [" ".join(x.split()) for x in df['original_name']]

# get the names of TVs and movies
movie_orig = list(set(df[df['program_class'] == 'MOVIE']['original_name']))
tv_orig = list(set(df[df['program_class'] != 'MOVIE']['original_name']))

print("Number of TV Shows {}".format(len(tv_orig)))
print("Number of Movies {}".format(len(movie_orig)))

Number of TV Shows 276
Number of Movies 1525


# Problem: 
in STC JAWWY dataset, it seemed that we do not have enough information about the content (movies and tv shows information) such as year release, overview, genres,  ratings, and votes. these variables are important to build content-based recommendation system. 

# Approach
the goal is to give the engine a name of a movie the user watched and show a list of simialr movies based content of seen movies. ex, if I watched Avengers, I should get movies about superheros and stuff!

IMPOSSIBLE

## Find STC JAWWY Movies and TV Shows in TMDB API

since we do not have unique id for each movie and tv show in our database, we will use the movie and tv names to search them in tmdb API. then fetch movie ids from the below function.

In [8]:
def get_tmdb_id(names, types):
    """ function to get IDs of external movies and TV shows list from tmdb
    input
    names: list of item names (TV shows, Movies)
    
    output: 
    item_api_id: a list of all found IDs for target item names
    item_name_not_found: a list of all not found name for target item names
    item_dic: full details info about target item names from API
    
    """
    item_api_id = []
    item_name_not_found = []
    item_dic = []
    for name in names:
        search  = types.search(name) # search api for names
        
        if len(search) != 0: # if nothing, meant we do not have the name in tmdb
            search = search[0]
            item_dic.append(search) # add found items in list
            item_api_id.append(search.id)
        else:
            item_name_not_found.append(name)
    # return 3 
    return item_api_id, item_name_not_found, item_dic

In [9]:
movie_api_id, movie_name_not_found, movie_dic = get_tmdb_id(movie_orig, movie)
tv_api_id, tv_name_not_found, tv_dic = get_tmdb_id(tv_orig, tv)

In [11]:
print(movie_dic[0])

{'adult': False, 'backdrop_path': '/18vHzORk7OZNUe8WALrUlKPrhF0.jpg', 'genre_ids': [10749, 35], 'id': 12620, 'original_language': 'en', 'original_title': 'The House Bunny', 'overview': 'Shelley is living a carefree life until a rival gets her tossed out of the Playboy Mansion. With nowhere to go, fate delivers her to the sorority girls from Zeta Alpha Zeta. Unless they can sign a new pledge class, the seven socially clueless women will lose their house to the scheming girls of Phi Iota Mu. In order to accomplish their goal, they need Shelley to teach them the ways of makeup and men; at the same time, Shelley needs some of what the Zetas have - a sense of individuality. The combination leads all the girls to learn how to stop pretending and start being themselves.', 'popularity': 24.452, 'poster_path': '/4oGGJ824vqIqDtyMvMuK44pDEmx.jpg', 'release_date': '2008-08-22', 'title': 'The House Bunny', 'video': False, 'vote_average': 5.7, 'vote_count': 1604}


Did you spot something in the dictionary? our genre has ids instead of actual genre names, so we need to fetch that from the api

In [12]:
# get all the genres ids list from tmdb api
genres_tv = genre.tv_list()
genres_movie = genre.movie_list()
genres_tv.extend(genres_movie)
all_genres = genres_tv

print("TV Shows in STC JAWWY     {}".format(len(tv_orig)))
print("TV Shows ID Found in TMDB {}".format(len(tv_api_id)))
print("TV Shows ID Not Found     {}".format(len(tv_name_not_found)))
print()
print("Movies in STC JAWWY       {}".format(len(movie_orig)))
print("Movies ID Found in TMDB   {}".format(len(movie_api_id)))
print("Movies ID Not Found       {}".format(len(movie_name_not_found)))

TV Shows in STC JAWWY     276
TV Shows ID Found in TMDB 134
TV Shows ID Not Found     142

Movies in STC JAWWY       1525
Movies ID Found in TMDB   1332
Movies ID Not Found       193


Our result came out great! It seems that we have less than 50% found in TV shows from TMDB and almost 80% of Movies found in TMDB API.
Something to note is that JAWWY has Arabic movies and TV shows and we can assume some of them are not in TMDB. Also, typos can be another issue here where we can not search for a typo in TMDB and it has to be the correct original name of the content.Our result came out great! It seems that we have less than 50% find in TV shows from TMDB and almot 80% of Movies found in TMDB API. 

Something to note that JAWWY has Arabic movies and TV shows and we can assume some of them are not in TMDB. Also, typos can be another issue here where we can not search a typo in TMDB and it has to be correct original name of content.

### Check Point just like in games! So we do not need to repeat all above

## Save dictionary of movies and tv shows to CSV

In [13]:
def export_list_dic(dic, filename):
    import csv
    csv_columns = list(dic[0].keys())
    dict_data = dic
    csv_file = "{}.csv".format(filename)
    try:
        with open(csv_file, 'w') as csvfile:
            writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
            writer.writeheader()
            for data in dict_data:
                writer.writerow(data)
    except IOError:
        print("I/O error")

In [14]:
export_list_dic(tv_dic, 'tv_df')
export_list_dic(movie_dic, 'movie_df')
export_list_dic(all_genres, 'genre_df')

In [15]:
# load dfs
movie_df  = pd.read_csv('movie_df.csv')
tv_df     = pd.read_csv('tv_df.csv')
genre_df  = pd.read_csv('genre_df.csv')
genre_df  = genre_df.drop_duplicates()

# add a type in the df
movie_df['program_type'] = 'Movie'
tv_df['program_type'] = 'TV Show'

Notice in the genre_ids column we have a genre ids list that is a reference to genres API. We will perform a conversion from this list into the real genre names for movies and tv shows

In [16]:
def convert_string_list(df, column):
    """ convert dataframe column that is quoted "list" into [list] class
    input:
    df: dataframe
    column: takes str of column name that need to be converted
    
    output: 
    df: converted values of target column from string-list to list
    """
    print('Before: {}, type: {}'.format(df[column][1], type(df[column][1])))
    
    for i in range(len(df[column])):
        df[column][i] = json.loads(df[column][i])
    
    print('After: {}, type: {}'.format(df[column][1], type(df[column][1])))

In [17]:
convert_string_list(movie_df, 'genre_ids')
convert_string_list(tv_df, 'genre_ids')

Before: [35], type: <class 'str'>
After: [35], type: <class 'list'>
Before: [35, 18], type: <class 'str'>
After: [35, 18], type: <class 'list'>


In [18]:
def id_to_genre(df):
    """ convert ids of genre_ids primery key into actual values 
    
    input: dataframe with genre_ids column as [35,53,543]
    
    output: dataframe with genre_ids column as [action, comdey, horror]
    """
    print('Before: {}'.format(df['genre_ids'][1]))

    for i in range(len(df)):
        lists = df['genre_ids'][i]
        names = list(genre_df[genre_df['id'].isin(lists)]['name'])
        df.loc[:,'genre_ids'][i] = names
          
    print('After: {}'.format(df['genre_ids'][1]))

In [19]:
id_to_genre(tv_df)
id_to_genre(movie_df)

Before: [35, 18]
After: ['Comedy', 'Drama']
Before: [35]
After: ['Comedy']


Now We have two dataframes that have enough data points for content recommendation

We will concate movie_df and tv_df into one final df, and we can join the final df to our orginal dataframe (stc)

In [20]:
tv_df.head(2)

Unnamed: 0,backdrop_path,first_air_date,genre_ids,id,name,origin_country,original_language,original_name,overview,popularity,poster_path,vote_average,vote_count,program_type
0,/r1e0lDIsezrklektX82AetUHr5m.jpg,2017-02-08,"[Action & Adventure, Sci-Fi & Fantasy]",67195,Legion,['US'],en,Legion,"David Haller, AKA Legion, is a troubled young ...",47.874,/vT0Zsbm4GWd7llNjgWEtwY0CqOv.jpg,7.6,1065,TV Show
1,/sSVmLcHMrOtZapIY0ip6M7shMjA.jpg,2014-10-13,"[Comedy, Drama]",61418,Jane the Virgin,['US'],en,Jane the Virgin,A comedy-drama following a chaste young woman ...,112.22,/DRRHgvsNEfBloMgIP8bBw4zi4E.jpg,8.0,668,TV Show


In [21]:
# rename columns to match movies_df for accurate merge
tv_df.columns = ['backdrop_path', 'release_date', 'genre_ids', 'id', 'name',
       'origin_country', 'original_language', 'original_title', 'overview',
       'popularity', 'poster_path', 'vote_average', 'vote_count',
       'program_type']

column_needed = ['id', 'original_title','release_date','program_type', 
                 'genre_ids', 'overview', 'popularity', 'vote_average', 'vote_count']

movie_df = movie_df[column_needed]
tv_df = tv_df[column_needed]

dfs = [movie_df, tv_df]
df  = pd.concat(dfs).reset_index()
df  = df.drop('index', axis=1)

df.columns = ['movie_id', 'original_name','release_date',
              'program_type', 'genres', 'overview', 
              'popularity', 'vote_average', 'vote_count']

# remove [] from genres
df['genres'] = [", ".join(x) for x in df['genres']]
# get release year
df['release_date'] = [str(x)[:4] for x in df['release_date']]

df.head()

Unnamed: 0,movie_id,original_name,release_date,program_type,genres,overview,popularity,vote_average,vote_count
0,12620,The House Bunny,2008,Movie,"Comedy, Romance",Shelley is living a carefree life until a riva...,24.452,5.7,1604
1,64807,Grudge Match,2013,Movie,Comedy,A pair of aging boxing rivals are coaxed out o...,23.616,6.1,1060
2,98566,Teenage Mutant Ninja Turtles,2014,Movie,"Comedy, Action, Adventure, Science Fiction","When a kingpin threatens New York City, a grou...",179.376,5.9,5828
3,10895,Pinocchio,1940,Movie,"Animation, Family",Lonely toymaker Geppetto has his wishes answer...,114.983,7.1,4809
4,322903,Naz & Maalik,2015,Movie,"Drama, Romance",Two closeted Muslim teens hawk goods across Br...,2.092,6.6,11


Getting the movies/TV found in API and matching them with the real dataset

In [22]:
# filter only stc df to match found movies and tv shows full information
items = list(df['original_name'].values)
# new filtered dataframe
stc1 = stc[stc['original_name'].isin(items)]
stc_working = stc[~stc['original_name'].isin(items)]

In [23]:
print("Total records in STC JAWWY: {}".format(stc.shape[0]))
print("Total records in after getting full items info:  {}".format(stc1.shape[0]))
print("Total records that we did not get their record:  {}".format(stc_working.shape[0]))
print('Total Unique Contents lost from the process: {}'.format(len(set(stc_working['original_name']))))

Total records in STC JAWWY: 3598607
Total records in after getting full items info:  985509
Total records that we did not get their record:  2613098
Total Unique Contents lost from the process: 792


Sadly, we will lose a lot of data by getting only the full information about movies and TVs. We still have almost a 1 Million record and that has a high portion of content from different items. The 793 contents make up about 2.6 Million of STC JAWWY DATASET.

In [24]:
# merge columns for detailed info about contents
full_df = pd.merge(stc1, df, how='left', indicator=True)
full_df.head()

Unnamed: 0,date_,user_id_maped,program_name,duration_seconds,program_class,season,episode,program_desc,program_genre,series_title,...,original_name,movie_id,release_date,program_type,genres,overview,popularity,vote_average,vote_count,_merge
0,2017-05-21,7946,Moana,17,MOVIE,0,0,Animation MovieMoana (HD),Animation,0,...,Moana,277834,2016,Movie,"Animation, Comedy, Family, Adventure","In Ancient Polynesia, when a terrible curse in...",42.705,7.6,10347,both
1,2017-07-07,15860,Churchill,87,MOVIE,0,0,Biography MovieChurchill (HD),Biography,0,...,Churchill,399790,2017,Movie,"Drama, History",A ticking-clock thriller following Winston Chu...,17.312,6.1,254,both
2,2018-03-29,6358,Coco,14,MOVIE,0,0,Animation MovieCoco (HD),Animation,0,...,Coco,354912,2017,Movie,"Animation, Comedy, Family, Adventure, Fantasy,...",Despite his family’s baffling generations-old ...,249.121,8.2,15529,both
3,2018-01-27,11660,Kidnap,85,MOVIE,0,0,Action MovieKidnap (HD),Action,0,...,Kidnap,293768,2017,Movie,"Drama, Thriller",A mother (in her Minivan) stops at nothing to ...,50.668,6.2,1098,both
4,2017-03-30,5155,The Accountant,42,MOVIE,0,0,Action MovieThe Accountant (HD),Action,0,...,The Accountant,302946,2016,Movie,"Crime, Drama, Thriller",As a math savant uncooks the books for a new c...,46.956,7.0,5001,both


# Content-Based Recommendation System Pipeline

Now we can leverage the data points we got from TMDB with our original dataset. It's more powerful and we can do a lot with it.

In [25]:
def process_df(df):

    df.columns = ['stream_date', 'user_id', 'program_name', 'duration_seconds',
       'program_class', 'season', 'episode', 'program_desc', 'program_genre',
       'series_title', 'hd', 'original_name', 'program_id', 'release_date',
        'program_type', 'genres', 'overview', 'popularity', 'vote_average',
        'vote_count', 'merge']
    
    df['total_views'] = df.groupby('original_name', as_index=False)['user_id'].transform(lambda x: x.count())
    # feature: for content, ['overview', 'genres', 'release_date']
    df['genres'] = [x.replace(',', '') for x in df['genres']]
    #combine metadata for to be consider as content
    df['content'] = df['genres']+ ' ' + df['release_date']+ ' ' + df['overview']
    
    df = df.drop_duplicates(subset = 'original_name').reset_index()
    df = df.drop('index', axis=1)

    df = df[['original_name', 'release_date', 'genres', 'overview','total_views', 'popularity', 'vote_average', 'vote_count', 'content']]
    return df

In [26]:
rec_df = process_df(full_df)

This is important step for setting up content-based recommendation. We are going to combine all the metadata of the content in one columns called 'content'. I used year, genre and overview. This can be expanded to far more recommendations such as adding more info about casts, directors, etc.

In [27]:
rec_df.head()

Unnamed: 0,original_name,release_date,genres,overview,total_views,popularity,vote_average,vote_count,content
0,Moana,2016,Animation Comedy Family Adventure,"In Ancient Polynesia, when a terrible curse in...",25683,42.705,7.6,10347,Animation Comedy Family Adventure 2016 In Anci...
1,Churchill,2017,Drama History,A ticking-clock thriller following Winston Chu...,2898,17.312,6.1,254,Drama History 2017 A ticking-clock thriller fo...
2,Coco,2017,Animation Comedy Family Adventure Fantasy Music,Despite his family’s baffling generations-old ...,10867,249.121,8.2,15529,Animation Comedy Family Adventure Fantasy Musi...
3,Kidnap,2017,Drama Thriller,A mother (in her Minivan) stops at nothing to ...,2263,50.668,6.2,1098,Drama Thriller 2017 A mother (in her Minivan) ...
4,The Accountant,2016,Crime Drama Thriller,As a math savant uncooks the books for a new c...,4157,46.956,7.0,5001,Crime Drama Thriller 2016 As a math savant unc...


In [28]:
rec_df.to_csv("full.csv")

# Feature Extraction
### Using sklearn Pipeline to get similar movies/tvs based on inverse of frequenciy of terms (Tfidf)

In [29]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import linear_kernel
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.pipeline import Pipeline

In [30]:
def sim_pipe_df(text, df=rec_df):
    content = np.array(df[text])
    
    pipe = Pipeline([('count', TfidfVectorizer(stop_words='english')),
                     ('tfid', TfidfTransformer())]).fit(content)

    mat_pip = pipe['count'].fit_transform(content)

    # convert matrix back to see how it will look like
    df = pd.DataFrame(cosine_similarity(mat_pip, mat_pip))
    
    return df

In [31]:
cos_sim = sim_pipe_df('content')
cos_sim.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1026,1027,1028,1029,1030,1031,1032,1033,1034,1035
0,1.0,0.0,0.04905,0.0,0.018748,0.017812,0.006465,0.0,0.016153,0.00355,...,0.0,0.0,0.005574,0.015993,0.004399,0.023607,0.024664,0.042621,0.0,0.0
1,0.0,1.0,0.074245,0.081788,0.018337,0.007508,0.029908,0.024647,0.040548,0.0,...,0.00728,0.021398,0.0,0.0,0.005281,0.0,0.0,0.0,0.016747,0.013426
2,0.04905,0.074245,1.0,0.021789,0.0,0.0,0.016789,0.01052,0.022942,0.059269,...,0.005856,0.0,0.003469,0.048723,0.002738,0.057168,0.02372,0.024402,0.012058,0.0
3,0.0,0.081788,0.021789,1.0,0.023126,0.009468,0.037718,0.031084,0.051137,0.0,...,0.027514,0.026986,0.0,0.0,0.00666,0.0,0.0,0.0,0.021121,0.052175
4,0.018748,0.018337,0.0,0.023126,1.0,0.025185,0.028315,0.0,0.036377,0.008352,...,0.005477,0.016098,0.0,0.0,0.012887,0.0,0.0,0.0,0.012599,0.0101


Now that we have a matrics represent an item-to-item similarity. The columns are items with simialrity score against the index showing how each are similar. For example, column 2 is 0.073031 simialr to index 1

We will start the reverse mapping and matching when inputing any movie/tv name to get recommendations using 1 function:
1. when we input a name, we want to get back their id from the processed dataframe
2. once we got the id, we want to look for similar ids from cosine computation and specifiying how many do we want
3. lastly, map back the list of ids to get their actual names from the processed datafram altogether

In [47]:
def get_recommendations(title, top, mat_df=cos_sim):
    # Get the index of the items that matches the target input
    indices = pd.Series(rec_df.index, index=rec_df['original_name']).drop_duplicates()

    idx = indices[title]

    # Get the similarity of all item with other item
    sim_ids = list(enumerate(cos_sim[idx]))

    # Sort the items based on the similarity 
    sim_ids = sorted(sim_ids, key=lambda x: x[1], reverse=True)
    # Get the scores of the 10 most similar movies
    sim_ids = sim_ids[1:top+1]

    # Get the movie indices
    item_indices = [i[0] for i in sim_ids]
    
    item_rec = rec_df['original_name'].iloc[item_indices]

    # Return the top 10 most similar items
    return item_rec

In [43]:
get_recommendations('Annabelle',10)

475            Poltergeist
92       The Devil's Candy
138    Annabelle: Creation
245        Finders Keepers
645    Justice League Dark
496       The Devil's Hand
570                Elektra
792            Devil's Due
591                 Oculus
107              Aftermath
Name: original_name, dtype: object

# Method 2: Adding Ranking Recommendation Based on Highest Views
Mapping from actual dataset; good for retriveing all info about the movie/tv. 

This method will help us enhance our view of how we want our list is returned based on information we have.
ex, previous function does the job perfectly, but want to bring a recommendation based on top views of the return list.

In [48]:
#1
def get_target_id(name, df=rec_df):
    """ a function to get the id of target item
    Input:
    name: tv/movie name that will input by the user
    df:   the dataframe that search all the original_name of movies and tv
    
    Output:
    idx: a target id
    """
    
    idx = df[df['original_name']== name].index[0]

    return idx

In [49]:
#2 
def get_similar_recs_ids(name, n):
    """ a function to get the ids of target item based on its id. 
    It will search in the matrics for similar ids
    
    Input:
    name: tv/movie name that will input by the user
    n:   number of result return 
    
    Output:
    ids_similar: list of similar ids
    """
    get_movie_name_id = get_target_id(name)
    ids_similar = cos_sim[cos_sim.index==get_movie_name_id].sort_values(by=get_movie_name_id, axis=1, ascending=False)
    ids_similar = list(ids_similar.columns[1:n+1])
    
    return ids_similar

In [50]:
def recommender(name, n):
    """ a recommendor function that takes a name and return list of recommended movies/tvs Ranked by Highest Views
    Input
    name: tv/movie name that will input by the user
    n:   number of result return 
    
    Output
    df: a recommendation dataframe of similar Movie/TV based on user input
    """
    idx = get_similar_recs_ids(name, n)
    df = rec_df[rec_df.index.isin(idx)].sort_values(by='total_views', ascending=False)
    return df

In [51]:
recommender('Annabelle', 5)

Unnamed: 0,original_name,release_date,genres,overview,total_views,popularity,vote_average,vote_count,content
138,Annabelle: Creation,2017,Mystery Horror Thriller,Several years after the tragic death of their ...,2779,106.302,6.6,4696,Mystery Horror Thriller 2017 Several years aft...
92,The Devil's Candy,2017,Drama Horror Thriller,A struggling painter is possessed by satanic f...,973,10.647,6.3,513,Drama Horror Thriller 2017 A struggling painte...
245,Finders Keepers,2014,Mystery Horror Thriller,A haunted doll teaches one little girl why chi...,973,28.508,4.7,141,Mystery Horror Thriller 2014 A haunted doll te...
475,Poltergeist,2015,Horror,A family's suburban home is invaded by angry s...,575,53.137,5.2,1982,Horror 2015 A family's suburban home is invade...
645,Justice League Dark,2017,Animation Action Fantasy Science Fiction Thriller,Beings with supernatural powers join together ...,526,30.542,7.4,718,Animation Action Fantasy Science Fiction Thril...


# Analysis & Conclusion:
in some examples, we find a very strong recommendation from the system, because when we input 'Annabelle', then 'Annabelle: Creation' showed in the top list. We also, got more recommendation watching more about the devil or did the devil recommend that? 

# Simulating all recommendations if watched wach content. 
What would you get of you watch...

In [52]:
def recommendation_ls(df):
    res_rec_5 = []
    X = df['original_name']
    for i in X:
        item_rec = list(recommender(i, 5)['original_name'])
        item_rec.insert(0, i)
        res_rec_5.append(item_rec)
        
    recommendation_list = pd.DataFrame(res_rec_5, columns=['Watched','1 Recommendation','1 Recommendation',
                                                '3 Recommendation','4 Recommendation','5 Recommendation'])
    
    return recommendation_list

In [53]:
recommendation_ls(rec_df)

Unnamed: 0,Watched,1 Recommendation,1 Recommendation.1,3 Recommendation,4 Recommendation,5 Recommendation
0,Moana,Collateral Beauty,Swallows and Amazons,Roadkill,Rugrats Go Wild,The Maid's Room
1,Churchill,Phoenix Forgotten,Final Portrait,Ocean's Eleven,Pound of Flesh,Winston Churchill: Walking with Destiny
2,Coco,Surf's Up,The Princess and the Frog,The Polar Express,The Muppet Movie,Boychoir
3,Kidnap,Atomic Blonde,Inconceivable,We Need to Talk About Kevin,The Brave One,Just Wright
4,The Accountant,The Legend of Tarzan,Swiss Army Man,Miss You Already,Anna Karenina,Saving Mr. Banks
...,...,...,...,...,...,...
1031,The Jungle Book,Frozen,Norbit,Megamind,Dolphin Tale,Get Lucky
1032,Atlantis: The Lost Empire,Pressure,Avatar,Prometheus,Mission to Mars,Devil's Due
1033,The Love Guru,Swordfish,Analyze That,In Time,The Game of Their Lives,Kalamity
1034,Captain Phillips,Pressure,Ice Age: Continental Drift,Titanic,How I Live Now,The Fugitive
