# Netflix movies Recommender system

## Objective

- Build a content based recommender system for Netflix movies
- Content based methods are based on similarity of item attributes and collaborative methods

## Data Dictionary

#### This dataset has two files containing the titles (titles.csv) and the cast (credits.csv) for the title.

#### This dataset contains +5k unique titles on Netflix with 15 columns containing their information, including:

- id: The title ID on JustWatch.
- title: The name of the title.
- show_type: TV show or movie.
- description: A brief description.
- release_year: The release year.
- age_certification: The age certification.
- runtime: The length of the episode (SHOW) or movie.
- genres: A list of genres.
- production_countries: A list of countries that produced the title.
- seasons: Number of seasons if it's a SHOW.
- imdb_id: The title ID on IMDB.
- imdb_score: Score on IMDB.
- imdb_votes: Votes on IMDB.
- tmdb_popularity: Popularity on TMDB.
- tmdb_score: Score on TMDB.

#### And over +77k credits of actors and directors on Netflix titles with 5 columns containing their information, including:

- person_ID: The person ID on JustWatch.
- id: The title ID on JustWatch.
- name: The actor or director's name.
- character_name: The character name.
- role: ACTOR or DIRECTOR.

In [1]:
# import the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ast import literal_eval
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
from nltk.corpus import stopwords
from nltk.stem.snowball import SnowballStemmer
from nltk.stem import WordNetLemmatizer
  

# Other installed packages, use the following commanda
#nltk.download('wordnet')
#nltk.download('stopwords')

In [2]:
#read the data
df_titles = pd.read_csv("netflix/titles.csv")
df_credits = pd.read_csv("netflix/credits.csv")

## Exploratory Data Analysis

In [3]:
# View the first five rows of the titles dataset
df_titles.head()

Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
0,ts300399,Five Came Back: The Reference Films,SHOW,This collection includes 12 World War II-era p...,1945,TV-MA,48,['documentation'],['US'],1.0,,,,0.6,
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,113,"['crime', 'drama']",['US'],,tt0075314,8.3,795222.0,27.612,8.2
2,tm127384,Monty Python and the Holy Grail,MOVIE,"King Arthur, accompanied by his squire, recrui...",1975,PG,91,"['comedy', 'fantasy']",['GB'],,tt0071853,8.2,530877.0,18.216,7.8
3,tm70993,Life of Brian,MOVIE,"Brian Cohen is an average young Jewish man, bu...",1979,R,94,['comedy'],['GB'],,tt0079470,8.0,392419.0,17.505,7.8
4,tm190788,The Exorcist,MOVIE,12-year-old Regan MacNeil begins to adapt an e...,1973,R,133,['horror'],['US'],,tt0070047,8.1,391942.0,95.337,7.7


In [4]:
# view the first five rows of the credits data set
df_credits.head()

Unnamed: 0,person_id,id,name,character,role
0,3748,tm84618,Robert De Niro,Travis Bickle,ACTOR
1,14658,tm84618,Jodie Foster,Iris Steensma,ACTOR
2,7064,tm84618,Albert Brooks,Tom,ACTOR
3,3739,tm84618,Harvey Keitel,Matthew 'Sport' Higgins,ACTOR
4,48933,tm84618,Cybill Shepherd,Betsy,ACTOR


In [5]:
# view the dimensions of the titles data set
print(f"There are {df_titles.shape[0]} rows and {df_titles.shape[1]} columns in the titles dataset")

There are 5806 rows and 15 columns in the titles dataset


In [6]:
# view the dimensions of the titles data set
print(f"There are {df_credits.shape[0]} rows and {df_credits.shape[1]} columns in the credits dataset")

There are 77213 rows and 5 columns in the credits dataset


In [7]:
# check for missing values
df_titles.isnull().sum()

id                         0
title                      1
type                       0
description               18
release_year               0
age_certification       2610
runtime                    0
genres                     0
production_countries       0
seasons                 3759
imdb_id                  444
imdb_score               523
imdb_votes               539
tmdb_popularity           94
tmdb_score               318
dtype: int64

## Merge the datasets

In [8]:
unique_ids = df_credits['id'].unique()
ids = df_credits['id']
roles = df_credits['role']
names = df_credits['name']

# List save actors and directors
actors = []
directors = []

start = 0

for id_unique in unique_ids:
    act = []
    director = []
    for i in range(start, len(ids)):
        if (ids[i] == id_unique):
            if roles[i] == 'ACTOR':
                act.append(names[i])
            elif roles[i] == 'DIRECTOR':
                director.append(names[i])
        else:
            start = i
            break
    actors.append(act)
    directors.append(director)

In [9]:
data = {'id': unique_ids, 'actors': actors, 'directors': directors}
new_df = pd.DataFrame(data = data)

In [10]:
new_df.shape

(5434, 3)

In [11]:
# merge on id
df = df_titles.merge(new_df, how='inner', on='id')

# remove the string
df['genres'] = df['genres'].apply(literal_eval)

In [12]:
# shape of the resulting data set
df.shape

(5434, 17)

In [13]:
# view the five first rows of the merged data set
df.head()

Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score,actors,directors
0,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,113,"[crime, drama]",['US'],,tt0075314,8.3,795222.0,27.612,8.2,"[Robert De Niro, Jodie Foster, Albert Brooks, ...",[Martin Scorsese]
1,tm127384,Monty Python and the Holy Grail,MOVIE,"King Arthur, accompanied by his squire, recrui...",1975,PG,91,"[comedy, fantasy]",['GB'],,tt0071853,8.2,530877.0,18.216,7.8,"[Graham Chapman, John Cleese, Eric Idle, Terry...","[Terry Jones, Terry Gilliam]"
2,tm70993,Life of Brian,MOVIE,"Brian Cohen is an average young Jewish man, bu...",1979,R,94,[comedy],['GB'],,tt0079470,8.0,392419.0,17.505,7.8,"[Graham Chapman, John Cleese, Terry Gilliam, E...",[Terry Jones]
3,tm190788,The Exorcist,MOVIE,12-year-old Regan MacNeil begins to adapt an e...,1973,R,133,[horror],['US'],,tt0070047,8.1,391942.0,95.337,7.7,"[Ellen Burstyn, Linda Blair, Max von Sydow, Le...",[William Friedkin]
4,ts22164,Monty Python's Flying Circus,SHOW,A British sketch comedy series with the shows ...,1969,TV-14,30,"[comedy, european]",['GB'],4.0,tt0063929,8.8,72895.0,12.919,8.3,"[Graham Chapman, Michael Palin, Terry Jones, E...",[]


In [14]:
# create important columns for the recommender system
columns = ["actors", "directors", "genres", "title", "description"]

In [15]:
# check the data types
df.dtypes

id                       object
title                    object
type                     object
description              object
release_year              int64
age_certification        object
runtime                   int64
genres                   object
production_countries     object
seasons                 float64
imdb_id                  object
imdb_score              float64
imdb_votes              float64
tmdb_popularity         float64
tmdb_score              float64
actors                   object
directors                object
dtype: object

In [16]:
# statistical summary of the merged data set

df.describe(include = "all").T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
id,5434.0,5434.0,tm84618,1.0,,,,,,,
title,5433.0,5386.0,Connected,3.0,,,,,,,
type,5434.0,2.0,MOVIE,3658.0,,,,,,,
description,5424.0,5421.0,"Away from school, during the winter holidays, ...",2.0,,,,,,,
release_year,5434.0,,,,2015.929886,7.410901,1953.0,2015.0,2018.0,2020.0,2022.0
age_certification,2951.0,11.0,TV-MA,757.0,,,,,,,
runtime,5434.0,,,,79.869157,38.979547,0.0,46.0,87.0,106.0,251.0
genres,5434.0,1577.0,[comedy],491.0,,,,,,,
production_countries,5434.0,445.0,['US'],1827.0,,,,,,,
seasons,1776.0,,,,2.234234,2.760135,1.0,1.0,1.0,3.0,42.0


In [17]:
# check missing values
df[columns].isnull().sum()

actors          0
directors       0
genres          0
title           1
description    10
dtype: int64

#### Observation:
- The important columns for the recommender system have no missing values

In [18]:
# converting important columns to string data types since we are going to do text processing
df[columns] = df[columns].astype('str')

In [19]:
df[columns].dtypes

actors         object
directors      object
genres         object
title          object
description    object
dtype: object

In [20]:
df[columns].head()

Unnamed: 0,actors,directors,genres,title,description
0,"['Robert De Niro', 'Jodie Foster', 'Albert Bro...",['Martin Scorsese'],"['crime', 'drama']",Taxi Driver,A mentally unstable Vietnam War veteran works ...
1,"['Graham Chapman', 'John Cleese', 'Eric Idle',...","['Terry Jones', 'Terry Gilliam']","['comedy', 'fantasy']",Monty Python and the Holy Grail,"King Arthur, accompanied by his squire, recrui..."
2,"['Graham Chapman', 'John Cleese', 'Terry Gilli...",['Terry Jones'],['comedy'],Life of Brian,"Brian Cohen is an average young Jewish man, bu..."
3,"['Ellen Burstyn', 'Linda Blair', 'Max von Sydo...",['William Friedkin'],['horror'],The Exorcist,12-year-old Regan MacNeil begins to adapt an e...
4,"['Graham Chapman', 'Michael Palin', 'Terry Jon...",[],"['comedy', 'european']",Monty Python's Flying Circus,A British sketch comedy series with the shows ...


## Data Preprocessing

In [21]:
# removing brackets from the strings
for column in columns:
    df[column] = df[column].str.strip('[]').astype(str)

### Converting columns to lower case

In [22]:
for column in columns:
     df[column] = df[column].str.lower()

In [23]:
df[columns].head()

Unnamed: 0,actors,directors,genres,title,description
0,"'robert de niro', 'jodie foster', 'albert broo...",'martin scorsese',"'crime', 'drama'",taxi driver,a mentally unstable vietnam war veteran works ...
1,"'graham chapman', 'john cleese', 'eric idle', ...","'terry jones', 'terry gilliam'","'comedy', 'fantasy'",monty python and the holy grail,"king arthur, accompanied by his squire, recrui..."
2,"'graham chapman', 'john cleese', 'terry gillia...",'terry jones','comedy',life of brian,"brian cohen is an average young jewish man, bu..."
3,"'ellen burstyn', 'linda blair', 'max von sydow...",'william friedkin','horror',the exorcist,12-year-old regan macneil begins to adapt an e...
4,"'graham chapman', 'michael palin', 'terry jone...",,"'comedy', 'european'",monty python's flying circus,a british sketch comedy series with the shows ...


#### Observations:
- Brackets have been eliminated
- All strings in the important columns are now lower cased

### Remove stop words from the description

In [24]:
stop = stopwords.words('english')

In [25]:
# Exclude stopwords with Python's list comprehension and pandas.DataFrame.apply.
df['description'] = df['description'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))

### Check if stop words have been eliminated

In [26]:
# check the description in the original data set
df_titles["description"].head()

0    This collection includes 12 World War II-era p...
1    A mentally unstable Vietnam War veteran works ...
2    King Arthur, accompanied by his squire, recrui...
3    Brian Cohen is an average young Jewish man, bu...
4    12-year-old Regan MacNeil begins to adapt an e...
Name: description, dtype: object

In [27]:
# check the description in the new set
df["description"].head()

0    mentally unstable vietnam war veteran works ni...
1    king arthur, accompanied squire, recruits knig...
2    brian cohen average young jewish man, series r...
3    12-year-old regan macneil begins adapt explici...
4    british sketch comedy series shows composed su...
Name: description, dtype: object

#### Observation:
- stop words have been eliminated

### Applying Stemming on words in the description column
- Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma

In [28]:
# Use English stemmer.
stemmer = SnowballStemmer("english")
# Split the sentences to lists of words.
df['description'] = df['description'].str.split()
# Apply stemmer
df["description"] = df["description"].apply(lambda x: [stemmer.stem(y) for y in x])


In [29]:
# view how stemming has worked
df["description"].head()

0    [mental, unstabl, vietnam, war, veteran, work,...
1    [king, arthur,, accompani, squire,, recruit, k...
2    [brian, cohen, averag, young, jewish, man,, se...
3    [12-year-old, regan, macneil, begin, adapt, ex...
4    [british, sketch, comedi, seri, show, compos, ...
Name: description, dtype: object

### Lemmatization
- Lemmatization is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item.
-  Lemmatization is similar to stemming but it brings context to the words.

In [30]:
lemmatizer = WordNetLemmatizer()

In [31]:
df["description"] = df["description"].apply(lambda x: [lemmatizer.lemmatize(y) for y in x])
df["description"].head()

0    [mental, unstabl, vietnam, war, veteran, work,...
1    [king, arthur,, accompani, squire,, recruit, k...
2    [brian, cohen, averag, young, jewish, man,, se...
3    [12-year-old, regan, macneil, begin, adapt, ex...
4    [british, sketch, comedi, seri, show, compos, ...
Name: description, dtype: object

In [32]:
# convert descriptions back to strings
df["description"] = df["description"].astype('str')

In [33]:
# remove brackets from description
df["description"] = df["description"].str.strip('[]').astype(str)
# view how the descriptions now look
df["description"].head()

0    'mental', 'unstabl', 'vietnam', 'war', 'vetera...
1    'king', 'arthur,', 'accompani', 'squire,', 're...
2    'brian', 'cohen', 'averag', 'young', 'jewish',...
3    '12-year-old', 'regan', 'macneil', 'begin', 'a...
4    'british', 'sketch', 'comedi', 'seri', 'show',...
Name: description, dtype: object

In [34]:
# create a function to combine values of important columns into one string
def get_important_features(data):
    important_features = []
    for i in range(0, data.shape[0]):
        important_features.append(data["actors"][i]+" "+data["directors"][i]+" "+data["genres"][i]+
                                  " "+data["title"][i]+" "+data["description"][i])
    return important_features

In [35]:
# create column to hold combined string
df["important_features"] = get_important_features(df)

In [36]:
# view the first five rows of the data frame with the important_features column
df.head()

Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score,actors,directors,important_features
0,tm84618,taxi driver,MOVIE,"'mental', 'unstabl', 'vietnam', 'war', 'vetera...",1976,R,113,"'crime', 'drama'",['US'],,tt0075314,8.3,795222.0,27.612,8.2,"'robert de niro', 'jodie foster', 'albert broo...",'martin scorsese',"'robert de niro', 'jodie foster', 'albert broo..."
1,tm127384,monty python and the holy grail,MOVIE,"'king', 'arthur,', 'accompani', 'squire,', 're...",1975,PG,91,"'comedy', 'fantasy'",['GB'],,tt0071853,8.2,530877.0,18.216,7.8,"'graham chapman', 'john cleese', 'eric idle', ...","'terry jones', 'terry gilliam'","'graham chapman', 'john cleese', 'eric idle', ..."
2,tm70993,life of brian,MOVIE,"'brian', 'cohen', 'averag', 'young', 'jewish',...",1979,R,94,'comedy',['GB'],,tt0079470,8.0,392419.0,17.505,7.8,"'graham chapman', 'john cleese', 'terry gillia...",'terry jones',"'graham chapman', 'john cleese', 'terry gillia..."
3,tm190788,the exorcist,MOVIE,"'12-year-old', 'regan', 'macneil', 'begin', 'a...",1973,R,133,'horror',['US'],,tt0070047,8.1,391942.0,95.337,7.7,"'ellen burstyn', 'linda blair', 'max von sydow...",'william friedkin',"'ellen burstyn', 'linda blair', 'max von sydow..."
4,ts22164,monty python's flying circus,SHOW,"'british', 'sketch', 'comedi', 'seri', 'show',...",1969,TV-14,30,"'comedy', 'european'",['GB'],4.0,tt0063929,8.8,72895.0,12.919,8.3,"'graham chapman', 'michael palin', 'terry jone...",,"'graham chapman', 'michael palin', 'terry jone..."


In [37]:
# Make sure we see the full column.
pd.set_option('display.max_colwidth', 1000)

# view one the first cell of important features
df["important_features"].head(1)

0    'robert de niro', 'jodie foster', 'albert brooks', 'harvey keitel', 'cybill shepherd', 'peter boyle', 'leonard harris', 'diahnne abbott', 'gino ardito', 'martin scorsese', 'murray moston', 'richard higgs', 'bill minkin', 'bob maroff', 'victor argo', 'joe spinell', 'robinson frank adu', 'brenda dickson', 'norman matlock', 'harry northup', 'harlan cary poe', 'steven prince', 'peter savage', 'nicholas shields', 'ralph s. singleton', 'annie gagen', 'carson grant', 'mary-pat green', 'debbi morgan', 'don stroud', 'copper cunningham', 'garth avery', 'nat grant', 'billie perkins', 'catherine scorsese', 'charles scorsese' 'martin scorsese' 'crime', 'drama' taxi driver 'mental', 'unstabl', 'vietnam', 'war', 'veteran', 'work', 'night-tim', 'taxi', 'driver', 'new', 'york', 'citi', 'perceiv', 'decad', 'sleaz', 'feed', 'urg', 'violent', 'action,', 'attempt', 'save', 'preadolesc', 'prostitut', 'process.'
Name: important_features, dtype: object

# Recommendation System

## Recommender using Count vectorizer

In [38]:
df1 = df.copy()

In [39]:
# convert text to matrix of token counts

cm = CountVectorizer().fit_transform(df1["important_features"])


In [40]:
# Get cosine matrix from the count matrix
cs = cosine_similarity(cm)

In [41]:
# print similarity matrix
print(cs)

[[1.         0.00579294 0.02343822 ... 0.         0.01342675 0.02441931]
 [0.00579294 1.         0.33862588 ... 0.         0.02006734 0.00912415]
 [0.02343822 0.33862588 1.         ... 0.         0.0676603  0.        ]
 ...
 [0.         0.         0.         ... 1.         0.         0.        ]
 [0.01342675 0.02006734 0.0676603  ... 0.         1.         0.02114775]
 [0.02441931 0.00912415 0.         ... 0.         0.02114775 1.        ]]


In [42]:
# Get the shape of the cosine similarity matrix
cs.shape

(5434, 5434)

In [43]:
#Construct a reverse map of indices and movie titles
indices = pd.Series(df1.index, index=df1['title']).drop_duplicates()

In [44]:
# Function that takes in movie title as input and outputs most similar movies
def get_recommendations(title, cs=cs):
    # Get the index of the movie that matches the title
    idx = indices[title]

    # Create a list of enumerations for the similarity scores
    scores = list(enumerate(cs[idx]))

    # Sort the movie list based on the similarity scores
    sorted_scores = sorted(scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies
    sorted_scores =  sorted_scores[1:11]

    # Get the movie indices
    movie_indices = [i[0] for i in sorted_scores]
    
    # Create a loop to print the ten most similar movies
    print("The 10 most recommended movies to ", title , "are:\n")
    j=0
    for i in sorted_scores:
        movie_title = df1[df1.index==i[0]]["title"].values[0]
        print(j+1, movie_title)
        j = j+1
        if j>9:
            break

### Let us try out the recommender system using 4 movies

In [45]:
get_recommendations("fatherhood")

The 10 most recommended movies to  fatherhood are:

1 good sam
2 pee-wee's big holiday
3 kevin hart's guide to black history
4 true story
5 war of the worlds
6 legend
7 horse girl
8 silver linings playbook
9 dad stop embarrassing me!
10 saving private ryan


In [46]:
get_recommendations("inception")

The 10 most recommended movies to  inception are:

1 dunkirk
2 teenage mutant ninja turtles ii: the secret of the ooze
3 argo
4 triple threat
5 to all the boys: p.s. i still love you
6 forrest gump
7 body of lies
8 the imitation game
9 6 underground
10 top gun


In [47]:
get_recommendations("dunkirk")

The 10 most recommended movies to  dunkirk are:

1 inception
2 the imitation game
3 darkest hour
4 the king
5 django unchained
6 talking tom and friends
7 the liberator
8 the irishman
9 rebellion
10 the blind side


In [48]:
get_recommendations("cocomelon")

The 10 most recommended movies to  cocomelon are:

1 go! go! cory carson
2 rhyme time town
3 blippi wonders
4 ask the storybots
5 sofia the first
6 barbie: life in the dreamhouse
7 luna petunia
8 charlie's colorforms city
9 44 cats
10 rainbow high


## Recommender using TFIDF Vectorizer

In [49]:
# make a copy of the preprocessed data frame
df2 = df.copy()

In [50]:
# initialize the tfidf vectorizer
tfidf = TfidfVectorizer()
# convert text to matrix of token counts
tfidf_matrix= tfidf.fit_transform(df2["important_features"])

In [51]:
#print shape of matrix
tfidf_matrix.shape

(5434, 58601)

In [52]:
# Compute the cosine similarity matrix
cs2= linear_kernel(tfidf_matrix, tfidf_matrix)

In [53]:
# print shape of cosine similarity matrix
cs2.shape

(5434, 5434)

In [54]:
print(cs2)

[[1.00000000e+00 3.36031687e-03 1.16232521e-02 ... 0.00000000e+00
  7.84297292e-04 7.95893051e-03]
 [3.36031687e-03 1.00000000e+00 2.27474604e-01 ... 0.00000000e+00
  7.77655049e-03 4.03166975e-03]
 [1.16232521e-02 2.27474604e-01 1.00000000e+00 ... 0.00000000e+00
  2.79992635e-02 0.00000000e+00]
 ...
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 1.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [7.84297292e-04 7.77655049e-03 2.79992635e-02 ... 0.00000000e+00
  1.00000000e+00 4.92416052e-03]
 [7.95893051e-03 4.03166975e-03 0.00000000e+00 ... 0.00000000e+00
  4.92416052e-03 1.00000000e+00]]


In [55]:
#Construct a reverse map of indices and movie titles
indices1 = pd.Series(df2.index, index=df1['title']).drop_duplicates()

In [56]:
# Function that takes in movie title as input and outputs most similar movies
def get_recommendations_tfidf(title, cs=cs2):
    # Get the index of the movie that matches the title
    idx = indices1[title]

    # Create a list of enumerations for the similarity scores
    scores = list(enumerate(cs[idx]))

    # Sort the movie list based on the similarity scores
    sorted_scores = sorted(scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies
    sorted_scores =  sorted_scores[1:11]

    # Get the movie indices
    movie_indices = [i[0] for i in sorted_scores]
    
    # Create a loop to print the ten most similar movies
    print("The 10 most recommended movies to ", title , "are:\n")
    j=0
    for i in sorted_scores:
        movie_title = df2[df2.index==i[0]]["title"].values[0]
        print(j+1, movie_title)
        j = j+1
        if j>9:
            break

### Let us try out the recommender system using 4 movies

In [57]:
get_recommendations_tfidf("fatherhood")

The 10 most recommended movies to  fatherhood are:

1 good sam
2 kevin hart's guide to black history
3 bad trip
4 kevin hart: don't f**k this up
5 kevin hart: irresponsible
6 war of the worlds
7 she's gotta have it
8 dad stop embarrassing me!
9 murder mystery
10 grey's anatomy


In [58]:
get_recommendations_tfidf("inception")

The 10 most recommended movies to  inception are:

1 dunkirk
2 the paper tigers
3 the ryan white story
4 to all the boys: p.s. i still love you
5 triple threat
6 body of lies
7 argo
8 dragons: dawn of the dragon racers
9 talking tom and friends
10 they'll love me when i'm dead


In [59]:
get_recommendations_tfidf("dunkirk")

The 10 most recommended movies to  dunkirk are:

1 inception
2 darkest hour
3 the imitation game
4 the king
5 iboy
6 tmnt
7 lego dc comics super heroes: batman be-leaguered
8 the witcher: nightmare of the wolf
9 legend
10 the liberator


In [60]:
get_recommendations_tfidf("cocomelon")

The 10 most recommended movies to  cocomelon are:

1 rhyme time town
2 go! go! cory carson
3 blippi wonders
4 how to get over a breakup
5 the boss baby: back in business
6 ask the storybots
7 greenhouse academy
8 sofia the first
9 anchor baby
10 fun mom dinner


## Comparison of the recommenders

- In both recommendation systems atleast 3 of the recommended movies for each movie input are similar
- The count vectorizer shows better results as it returned better results for the cocomelon show. All the movies returned were kids movies as compared to the tfidf vectorizer.