# Movie Recommendation Model Built Using Cosine Similarity

This is a recommendation model used to made using Cosine Similarity ALgorithm.

## Import Libraries and data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import model_selection
import warnings
warnings.filterwarnings('ignore')

import plotly_express as px
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
data = pd.read_csv('netflix_movies_datatset.csv')

In [3]:
data.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...


## Understanding the features

In [4]:
print("The data has {} rows and {} columns".format(data.shape[0], data.shape[1]))

The data has 7787 rows and 12 columns


Next we need to check the types of data contained

In [5]:
for i in data.columns:
    print(i, ":", data.dtypes[i])

show_id : object
type : object
title : object
director : object
cast : object
country : object
date_added : object
release_year : int64
rating : object
duration : object
listed_in : object
description : object


All the features are object variables except release_year.

Convert date_added column to date

In [6]:
data['date_added'] = pd.to_datetime(data['date_added'])
print(data.date_added.dtypes)

datetime64[ns]


### Duplicate Rows

In [7]:
data[data.duplicated()]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description


This shows that the data has not duplicated rows. However, what if we ignore the movie_id column which is meant to make the rows unique? It could be possible to have duplicate content. Let's explore

In [8]:
data_d0 = data.iloc[:, 1:] #filters the show_id column
data_d0[data_d0.duplicated()]

Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description


In [9]:
data[data["cast"].duplicated()][data.cast.notna()] # Filtering data with duplicate casts and excluding null values

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
139,s140,TV Show,72 Dangerous Animals: Latin America,,Bob Brisbane,"Australia, United States",2017-12-22,2017,TV-14,1 Season,"Docuseries, International TV Shows, Science & ...","Powerful cats, indestructible arachnids and fl..."
387,s388,Movie,Ali Wong: Hard Knock Wife,Jay Karas,Ali Wong,United States,2018-05-13,2018,TV-MA,64 min,Stand-Up Comedy,"Two years after the hit ""Baby Cobra,"" Ali Wong..."
483,s484,Movie,Amy Schumer: The Leather Special,Amy Schumer,Amy Schumer,United States,2017-03-07,2017,TV-MA,57 min,Stand-Up Comedy,"Comic sensation Amy Schumer riffs on sex, dati..."
510,s511,Movie,Andhakaaram,V Vignarajan,"Vinoth Kishan, Arjun Das, Pooja Ramachandran, ...",India,2020-11-24,2020,TV-14,171 min,"Horror Movies, International Movies, Thrillers","As a blind librarian, dispirited cricketer and..."
525,s526,Movie,Angu Vaikuntapurathu (Malayalam),Trivikram Srinivas,"Allu Arjun, Pooja Hegde, Tabu, Sushanth, Nivet...",,2020-03-05,2020,TV-14,162 min,"Action & Adventure, Comedies, Dramas",After growing up enduring criticism from his f...
...,...,...,...,...,...,...,...,...,...,...,...,...
7428,s7429,Movie,Vir Das: Losing It,Marcus Raboy,Vir Das,United States,2018-12-11,2018,TV-MA,68 min,Stand-Up Comedy,"The world's got a lot of problems, but Vir Das..."
7429,s7430,Movie,Vir Das: Outside In - The Lockdown Special,Vir Das,Vir Das,India,2020-12-16,2020,TV-MA,50 min,Stand-Up Comedy,Stage banter takes on a different — deeper — m...
7522,s7523,TV Show,Weird Wonders of the World,,Chris Packham,United Kingdom,2017-03-31,2016,TV-PG,2 Seasons,"British TV Shows, Docuseries, Science & Nature TV",From animal oddities and bizarre science to me...
7604,s7605,Movie,Whitney Cummings: Money Shot,John Fortenberry,Whitney Cummings,United States,2019-01-01,2010,TV-MA,48 min,Stand-Up Comedy,Comedy Central roast veteran Whitney Cummings ...


In [10]:
data.cast.loc[510] # display the details of row 7716 that has duplicate content

'Vinoth Kishan, Arjun Das, Pooja Ramachandran, Kumar Natarajan, Misha Ghoshal, Arul Vincent, Chenthu Mohan, Pradeep Kalipurayath'

In [11]:
data[data['cast'] == data.cast.loc[510]] #comparing two rows with duplicate casts

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
509,s510,Movie,Andhaghaaram,V Vignarajan,"Vinoth Kishan, Arjun Das, Pooja Ramachandran, ...",,2020-11-24,2020,TV-14,171 min,"Horror Movies, International Movies, Thrillers","As a blind librarian, dispirited cricketer and..."
510,s511,Movie,Andhakaaram,V Vignarajan,"Vinoth Kishan, Arjun Das, Pooja Ramachandran, ...",India,2020-11-24,2020,TV-14,171 min,"Horror Movies, International Movies, Thrillers","As a blind librarian, dispirited cricketer and..."


In [12]:
data.cast.loc[7716] # display the details of row 7716 that has duplicate content

'Eileen Stevens, Alyson Leigh Rosenfeld, Sarah Natochenny, H.D. Quinn'

In [13]:
data[data['cast'] == 'Eileen Stevens, Alyson Leigh Rosenfeld, Sarah Natochenny, H.D. Quinn'] #comparing two rows with duplicate casts

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
7715,s7716,TV Show,Yoko,,"Eileen Stevens, Alyson Leigh Rosenfeld, Sarah ...",,2018-06-23,2016,TV-Y,1 Season,Kids' TV,"Friends Mai, Oto and Vik's games at the park b..."
7716,s7717,Movie,Yoko and His Friends,,"Eileen Stevens, Alyson Leigh Rosenfeld, Sarah ...","Russia, Spain",2018-06-23,2015,TV-Y,78 min,Children & Family Movies,"Vik meets new friends in a new city, where the..."


Comparison of two different sets of duplicate values show the possibility of having duplicate and non-duplicate rows if the casts are identical. How about the description column?

In [14]:
data[data["description"].duplicated()][data.description.notna()] # Filtering data with duplicate casts and excluding null values

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
510,s511,Movie,Andhakaaram,V Vignarajan,"Vinoth Kishan, Arjun Das, Pooja Ramachandran, ...",India,2020-11-24,2020,TV-14,171 min,"Horror Movies, International Movies, Thrillers","As a blind librarian, dispirited cricketer and..."
525,s526,Movie,Angu Vaikuntapurathu (Malayalam),Trivikram Srinivas,"Allu Arjun, Pooja Hegde, Tabu, Sushanth, Nivet...",,2020-03-05,2020,TV-14,162 min,"Action & Adventure, Comedies, Dramas",After growing up enduring criticism from his f...
1287,s1288,Movie,Chashme Buddoor,David Dhawan,"Ali Zafar, Siddharth, Divyendu Sharma, Tapsee ...",India,2017-05-01,2013,TV-PG,121 min,"Comedies, International Movies, Music & Musicals",When pretty new neighbor Seema falls for their...
1377,s1378,TV Show,ChuChu TV Nursery Rhymes & Kids Songs (Hindi),,,India,2020-04-18,2019,TV-Y,1 Season,Kids' TV,This educational series for tiny tots features...
1486,s1487,Movie,Consequences,Ozan Açıktan,"Nehir Erdoğan, Tardu Flordun, İlker Kaleli, Se...",Turkey,2019-10-25,2014,TV-MA,106 min,"Dramas, International Movies, Thrillers",Secrets bubble to the surface after a sensual ...
2343,s2344,Movie,Game Over (Tamil Version),Ashwin Saravanan,"Taapsee Pannu, Vinodhini, Parvathi T, Ramya Su...","India, Turkey",2019-08-21,2019,TV-MA,98 min,"Horror Movies, International Movies, Thrillers","As a series of murders hit close to home, a vi..."
4441,s4442,Movie,Nee Enge En Anbe,Sekhar Kammula,"Nayantara, Vaibhav Reddy, Pasupathy, Harshvard...",,2020-09-17,2014,TV-14,137 min,"International Movies, Thrillers",As a woman scours Hyderabad for her missing hu...
4594,s4595,Movie,Oh! Baby (Malayalam),B. V. Nandini Reddy,"Samantha Ruth Prabhu, Lakshmi, Rajendraprasad,...",,2019-09-25,2019,TV-14,146 min,"Comedies, International Movies, Music & Musicals",A surly septuagenarian gets another chance at ...
4595,s4596,Movie,Oh! Baby (Tamil),B. V. Nandini Reddy,"Samantha Ruth Prabhu, Lakshmi, Rajendraprasad,...",,2019-09-25,2019,TV-14,146 min,"Comedies, International Movies, Music & Musicals",A surly septuagenarian gets another chance at ...
4839,s4840,Movie,Petta (Telugu Version),Karthik Subbaraj,"Rajnikanth, Vijay Sethupathi, M. Sasikumar, Na...",,2019-04-07,2019,TV-14,170 min,"Action & Adventure, Comedies, Dramas","An affable, newly appointed college warden pro..."


There are 18 rows with duplicate rows. It is very difficult to have 2 movies to have identical description, and thus these are duplicate values. Details on 3 possible duplicates are compared to establish similarity.

In [15]:
data[data.description == data.description.loc[510]] #comparing two rows with duplicate casts

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
509,s510,Movie,Andhaghaaram,V Vignarajan,"Vinoth Kishan, Arjun Das, Pooja Ramachandran, ...",,2020-11-24,2020,TV-14,171 min,"Horror Movies, International Movies, Thrillers","As a blind librarian, dispirited cricketer and..."
510,s511,Movie,Andhakaaram,V Vignarajan,"Vinoth Kishan, Arjun Das, Pooja Ramachandran, ...",India,2020-11-24,2020,TV-14,171 min,"Horror Movies, International Movies, Thrillers","As a blind librarian, dispirited cricketer and..."


In [16]:
data[data.description == data.description.loc[1287]] #comparing two rows with duplicate casts

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
1286,s1287,Movie,Chashme Baddoor,David Dhawan,"Rishi Kapoor, Ali Zafar, Taapsee Pannu, Siddha...",India,2020-07-05,2013,TV-14,121 min,"Comedies, International Movies, Music & Musicals",When pretty new neighbor Seema falls for their...
1287,s1288,Movie,Chashme Buddoor,David Dhawan,"Ali Zafar, Siddharth, Divyendu Sharma, Tapsee ...",India,2017-05-01,2013,TV-PG,121 min,"Comedies, International Movies, Music & Musicals",When pretty new neighbor Seema falls for their...


In [17]:
data[data.description == data.description.loc[4594]]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
4593,s4594,Movie,Oh! Baby,B. V. Nandini Reddy,"Samantha Ruth Prabhu, Lakshmi, Rajendraprasad,...",India,2019-09-14,2019,TV-14,157 min,"Comedies, International Movies, Music & Musicals",A surly septuagenarian gets another chance at ...
4594,s4595,Movie,Oh! Baby (Malayalam),B. V. Nandini Reddy,"Samantha Ruth Prabhu, Lakshmi, Rajendraprasad,...",,2019-09-25,2019,TV-14,146 min,"Comedies, International Movies, Music & Musicals",A surly septuagenarian gets another chance at ...
4595,s4596,Movie,Oh! Baby (Tamil),B. V. Nandini Reddy,"Samantha Ruth Prabhu, Lakshmi, Rajendraprasad,...",,2019-09-25,2019,TV-14,146 min,"Comedies, International Movies, Music & Musicals",A surly septuagenarian gets another chance at ...


A close look at 3 movies with duplicate description shows that they are actually duplicate movies. Row 4594 is the same movie but in different language. These can therefore be removed as duplicates.

In [18]:
data0 = data.drop_duplicates(subset=['description'],keep="first")
data0

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,2020-08-14,2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,2016-12-23,2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,2018-12-20,2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,2017-11-16,2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,2020-01-01,2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...
...,...,...,...,...,...,...,...,...,...,...,...,...
7782,s7783,Movie,Zozo,Josef Fares,"Imad Creidi, Antoinette Turk, Elias Gergi, Car...","Sweden, Czech Republic, United Kingdom, Denmar...",2020-10-19,2005,TV-MA,99 min,"Dramas, International Movies",When Lebanon's Civil War deprives Zozo of his ...
7783,s7784,Movie,Zubaan,Mozez Singh,"Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...",India,2019-03-02,2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...
7784,s7785,Movie,Zulu Man in Japan,,Nasty C,,2020-09-25,2019,TV-MA,44 min,"Documentaries, International Movies, Music & M...","In this documentary, South African rapper Nast..."
7785,s7786,TV Show,Zumbo's Just Desserts,,"Adriano Zumbo, Rachel Khoo",Australia,2020-10-31,2019,TV-PG,1 Season,"International TV Shows, Reality TV",Dessert wizard Adriano Zumbo looks for the nex...


Data0 has no duplicate values.

In [19]:
len(data0[data0["description"].duplicated()][data0.description.notna()])

0

### Missing Values

Next, we look for percentage of empty rows in each feature. This will inform the variables that can be considered in model building. 

Features with more than 5% empty rows may not be suitable for model building

In [20]:
for i in data0.columns:
    print(i, " column has {}% empty rows".format(int(data0[i].isnull().sum()/len(data0)*100)))
    

show_id  column has 0% empty rows
type  column has 0% empty rows
title  column has 0% empty rows
director  column has 30% empty rows
cast  column has 9% empty rows
country  column has 6% empty rows
date_added  column has 0% empty rows
release_year  column has 0% empty rows
rating  column has 0% empty rows
duration  column has 0% empty rows
listed_in  column has 0% empty rows
description  column has 0% empty rows


The features: director, cast and country have more than 5% empty values, and may not suitable in model building.

Next we look at how many distinct values exist in a row. This informs more on the features that can be used in model building. The fewer the unique values the unique values the more suitable a feature is. From the outputs below and above, the following features are good candidates: type, rating and listed_in.

In [21]:
data0.nunique()

show_id         7769
type               2
title           7769
director        4049
cast            6828
country          680
date_added      1512
release_year      73
rating            14
duration         216
listed_in        492
description     7769
dtype: int64

Next we look at the frequencies of the modal values in each feature. This gives insight to each feature's modal frequencies.

In [22]:
data0.describe(include='all')

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
count,7769,7769,7769,5384,7054,7268,7759,7769.0,7762,7769,7769,7769
unique,7769,2,7769,4049,6828,680,1512,,14,216,492,7769
top,s1,Movie,3%,"Raúl Campos, Jan Suter",David Attenborough,United States,2020-01-01 00:00:00,,TV-MA,1 Season,Documentaries,In a future where the elite inhabit an island ...
freq,1,5361,1,18,18,2553,119,,2860,1606,334,1
first,,,,,,,2008-01-01 00:00:00,,,,,
last,,,,,,,2021-01-16 00:00:00,,,,,
mean,,,,,,,,2013.92573,,,,
std,,,,,,,,8.763353,,,,
min,,,,,,,,1925.0,,,,
25%,,,,,,,,2013.0,,,,


### Text EDA

To better understand how data is captured under various features, we group the desired features and observe how the datapoints are captured. 

In [23]:
data0.groupby(data0.type)['show_id'].count()

type
Movie      5361
TV Show    2408
Name: show_id, dtype: int64

In [24]:
data0.groupby(data0.rating)['show_id'].count()

rating
G             39
NC-17          3
NR            84
PG           246
PG-13        385
R            665
TV-14       1922
TV-G         193
TV-MA       2860
TV-PG        805
TV-Y         278
TV-Y7        271
TV-Y7-FV       6
UR             5
Name: show_id, dtype: int64

In [25]:
data0.groupby(data0.country)['show_id'].count()

country
Argentina                                              50
Argentina, Brazil, France, Poland, Germany, Denmark     1
Argentina, Chile                                        1
Argentina, Chile, Peru                                  1
Argentina, France                                       1
                                                       ..
Venezuela                                               1
Venezuela, Colombia                                     1
Vietnam                                                 5
West Germany                                            1
Zimbabwe                                                1
Name: show_id, Length: 680, dtype: int64

In [26]:
data0.groupby(data0.listed_in)['show_id'].count()

listed_in
Action & Adventure                                              99
Action & Adventure, Anime Features, Children & Family Movies     3
Action & Adventure, Anime Features, Classic Movies               1
Action & Adventure, Anime Features, Horror Movies                1
Action & Adventure, Anime Features, International Movies        28
                                                                ..
TV Horror, TV Mysteries, Teen TV Shows                           1
TV Horror, Teen TV Shows                                         1
TV Sci-Fi & Fantasy, TV Thrillers                                1
TV Shows                                                        12
Thrillers                                                       49
Name: show_id, Length: 492, dtype: int64

## Text Pre-Processing

We select the features to use build our prediction model based on the understanding of the data0.

In [27]:
new_cols = ['show_id', 'type', 'cast', 'rating', 'listed_in', 'description'] 

In [28]:
data0[new_cols].head()

Unnamed: 0,show_id,type,cast,rating,listed_in,description
0,s1,TV Show,"João Miguel, Bianca Comparato, Michel Gomes, R...",TV-MA,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",TV-MA,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",R,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,"Elijah Wood, John C. Reilly, Jennifer Connelly...",PG-13,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",PG-13,Dramas,A brilliant group of students become card-coun...


For the selected variables, a function is written that converts all string values to lower cases and remove spaces, in preparation for the use of cosine similarity algorithm. This is then applied to the selected columns.

In [29]:
def prep_d(x):
        return str.lower(x.replace(" ", ""))

In [30]:
data1 = data0[new_cols]
data1 = data1.astype('str')

for new_col in new_cols:
    data1.loc[:, new_col] = data1.loc[:, new_col].apply(prep_d)

data1.head(2)

Unnamed: 0,show_id,type,cast,rating,listed_in,description
0,s1,tvshow,"joãomiguel,biancacomparato,michelgomes,rodolfo...",tv-ma,"internationaltvshows,tvdramas,tvsci-fi&fantasy",inafuturewheretheeliteinhabitanislandparadisef...
1,s2,movie,"demiánbichir,héctorbonilla,oscarserrano,azalia...",tv-ma,"dramas,internationalmovies","afteradevastatingearthquakehitsmexicocity,trap..."


We create a new feature that combines all features. this is the feature that will be used in making prediction.

In [31]:
data1['tags'] = data1.type + ' ' + data1.cast + ' ' + data1.rating + ' ' + data1.listed_in + ' ' + data1.description
data1.head(3)

Unnamed: 0,show_id,type,cast,rating,listed_in,description,tags
0,s1,tvshow,"joãomiguel,biancacomparato,michelgomes,rodolfo...",tv-ma,"internationaltvshows,tvdramas,tvsci-fi&fantasy",inafuturewheretheeliteinhabitanislandparadisef...,"tvshow joãomiguel,biancacomparato,michelgomes,..."
1,s2,movie,"demiánbichir,héctorbonilla,oscarserrano,azalia...",tv-ma,"dramas,internationalmovies","afteradevastatingearthquakehitsmexicocity,trap...","movie demiánbichir,héctorbonilla,oscarserrano,..."
2,s3,movie,"teddchan,stellachung,henleyhii,lawrencekoh,tom...",r,"horrormovies,internationalmovies","whenanarmyrecruitisfounddead,hisfellowsoldiers...","movie teddchan,stellachung,henleyhii,lawrencek..."


## Model Building

We create an index of the show_id feature to be used for reference in the model

In [32]:
indices = pd.Series(data1.index, index=data1['show_id'])
indices

show_id
s1          0
s2          1
s3          2
s4          3
s5          4
         ... 
s7783    7782
s7784    7783
s7785    7784
s7786    7785
s7787    7786
Length: 7769, dtype: int64

We compile a cosine similarity algo 

In [33]:
count = CountVectorizer(stop_words='english')
count_mat = count.fit_transform(data1['tags'])
cosine_sim1 = cosine_similarity(count_mat, count_mat)

create a function that uses the cosine similarity to recommend movies/tv shows.

In [34]:
result = 0
def get_suggestion(show_id, cosine_sim):
    global result    
    wid = indices[show_id]
    # Get the pairwsie similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[wid]))
    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    # Get the scores of the 30 most similar movies
    sim_scores = sim_scores[1:31]
    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]
    # Return the top 10 most similar movies
    result =  data.iloc[movie_indices]
    #result.reset_index(inplace = True)
    return result

## Testing The Recommendation Model

We test the model based on content in various categories.

In [35]:
picked = 'International'

In [36]:
df = pd.DataFrame()

show_ids = ['s10', 's19', 's441']

for i in show_ids:
    get_suggestion(i, cosine_sim1)
    for i in picked:
        df = pd.concat([result[result['listed_in'].str.count(picked) > 0], df], ignore_index=True)
df.drop_duplicates(keep = 'first', inplace = True)
df.sort_values(by = 'show_id', ascending = False, inplace = True)

In [37]:
print("The model has recommended {} movies/tv shows.".format(df.shape[0]))

The model has recommended 43 movies/tv shows.


In [38]:
df.sort_values('release_year', ascending=[False]) #Sort recommended movies by 'release_year'

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
338,s28,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,2020-09-08,2020,TV-MA,99 min,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ..."
8,s6463,TV Show,The Hook Up Plan,,"Marc Ruchmann, Zita Hanrot, Sabrina Ouazani, J...",France,2019-10-11,2020,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...","When Parisian Elsa gets hung up on her ex, her..."
4,s427,TV Show,Almost Happy,Hernán Guerschuny,"Sebastián Wainraich, Natalie Pérez, Santiago K...",Argentina,2020-05-02,2020,TV-MA,1 Season,"International TV Shows, Spanish-Language TV Sh...","Sebastián is a radio show host of modest fame,..."
347,s411,Movie,All Good Ones Get Away,Víctor García,"Claire Forlani, Jake Abel, Titus Welliver, Mel...","Spain, Italy",2019-07-30,2019,TV-MA,83 min,"International Movies, Thrillers",When a mysterious figure blackmails an adulter...
136,s7019,TV Show,The World's Most Extraordinary Homes,,"Piers Taylor, Caroline Quentin",United Kingdom,2019-01-18,2019,TV-G,3 Seasons,"British TV Shows, Docuseries, International TV...",Award-winning architect Piers Taylor and actre...
351,s2897,Movie,I Am Mother,Grant Sputore,"Clara Rugaard, Rose Byrne, Hilary Swank, Luke ...",Australia,2019-06-07,2019,TV-PG,114 min,"International Movies, Sci-Fi & Fantasy, Thrillers","Following humanity's mass extinction, a teen r..."
141,s3290,TV Show,Kakegurui,,"Minami Hamabe, Mahiro Takasugi, Aoi Morikawa",,2019-07-04,2019,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Thrillers",Yumeko Jabami enrolls at Hyakkaou Private Acad...
353,s5862,Movie,Stunt School,Ali Yorgancıoğlu,"Aslı İnandık, Toygan Avanoğlu, Tuna Orhan, Can...",Turkey,2020-04-16,2019,TV-G,95 min,"Children & Family Movies, Comedies, Internatio...",An aspiring actress is admitted to a prestigio...
340,s3295,Movie,"Kalel, 15",Jun Lana,"Elijah Canlas, Eddie Garcia, Jaclyn Jose, Gabb...",Philippines,2020-12-09,2019,TV-MA,105 min,"Dramas, Independent Movies, International Movies","Surrounded by tensions and secrets, a teenage ..."
133,s1347,TV Show,Chocolate,,"Ha Ji-won, Yoon Kye-sang, Jang Seung-jo, Kang ...",South Korea,2019-11-30,2019,TV-14,1 Season,"International TV Shows, Korean TV Shows, Roman...",Brought together by meaningful meals in the pa...


In [39]:
len(df) == len(df[df['listed_in'].str.contains(picked)]) #Check if each row has the 'international' category

True

Let's test with a different category from a different feature, say 'TV-MA' from the 'rating' feature.

In [40]:
picked1 = result.iloc[1,-4]
picked1

'TV-MA'

In [41]:
df = pd.DataFrame()

show_ids = ['s10', 's19', 's441']

for i in show_ids:
    get_suggestion(i, cosine_sim1)
    for i in picked1:
        df = pd.concat([result[result['rating'].str.count(picked1) > 0], df], ignore_index=True)
df.drop_duplicates(keep = 'first', inplace = True)
df.sort_values(by = 'show_id', ascending = False, inplace = True)

In [42]:
print("The model has recommended {} movies/tv shows.".format(df.shape[0]))

The model has recommended 34 movies/tv shows.


In [43]:
df.sort_values('release_year', ascending=[False])  #Sort recommended movies by 'release_year'

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
116,s6702,TV Show,The Netflix Afterparty,,"David Spade, London Hughes, Fortune Feimster",United States,2021-01-02,2021,TV-MA,1 Season,"Stand-Up Comedy & Talk Shows, TV Comedies","Hosts David Spade, Fortune Feimster and London..."
2,s4054,TV Show,Messiah,,"Michelle Monaghan, Mehdi Dehbi, John Ortiz, To...",United States,2020-01-01,2020,TV-MA,1 Season,"TV Dramas, TV Thrillers",A wary CIA officer investigates a charismatic ...
3,s427,TV Show,Almost Happy,Hernán Guerschuny,"Sebastián Wainraich, Natalie Pérez, Santiago K...",Argentina,2020-05-02,2020,TV-MA,1 Season,"International TV Shows, Spanish-Language TV Sh...","Sebastián is a radio show host of modest fame,..."
110,s1496,TV Show,Cooked with Cannabis,,"Kelis, Leather Storrs",United States,2020-04-20,2020,TV-MA,1 Season,Reality TV,Chefs compete to get the hosts and special gue...
8,s6463,TV Show,The Hook Up Plan,,"Marc Ruchmann, Zita Hanrot, Sabrina Ouazani, J...",France,2019-10-11,2020,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...","When Parisian Elsa gets hung up on her ex, her..."
105,s28,Movie,#Alive,Cho Il,"Yoo Ah-in, Park Shin-hye",South Korea,2020-09-08,2020,TV-MA,99 min,"Horror Movies, International Movies, Thrillers","As a grisly virus rampages a city, a lone man ..."
111,s2652,TV Show,Haunted,,,"United States, Czech Republic",2019-10-11,2019,TV-MA,2 Seasons,"Reality TV, TV Horror, TV Thrillers",Real people sit down with friends and family t...
59,s3290,TV Show,Kakegurui,,"Minami Hamabe, Mahiro Takasugi, Aoi Morikawa",,2019-07-04,2019,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Thrillers",Yumeko Jabami enrolls at Hyakkaou Private Acad...
107,s3295,Movie,"Kalel, 15",Jun Lana,"Elijah Canlas, Eddie Garcia, Jaclyn Jose, Gabb...",Philippines,2020-12-09,2019,TV-MA,105 min,"Dramas, Independent Movies, International Movies","Surrounded by tensions and secrets, a teenage ..."
112,s411,Movie,All Good Ones Get Away,Víctor García,"Claire Forlani, Jake Abel, Titus Welliver, Mel...","Spain, Italy",2019-07-30,2019,TV-MA,83 min,"International Movies, Thrillers",When a mysterious figure blackmails an adulter...


In [44]:
len(df) == len(df[df['rating'].str.contains(picked1)]) #Check if each row has the 'TV-MA' category

True