# Content based Recommendation System based on Genders of movies and series

In this notebook we implement a content based recommendation system based on genders on the Netflix dataset obtained from kaggle:
https://www.kaggle.com/shivamb/netflix-shows

-------------------------------------------------------------------------------------------------------------------------------------------------------------

## 0. Import basic libraries

In [11]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd 

from sklearn.preprocessing import MultiLabelBinarizer

import random

print("Libraries imported!!")

Libraries imported!!


----------------------------------------------------------------------------------------
## 1. Load and read the dataset

Here, we read the dataset and we find the shape of it as well as the colum names.

In [2]:
df = pd.read_csv('netflix_data_cleaned.csv')
df

Unnamed: 0,show_id,type,title,cast,country,release_year,rating,duration,listed_in,description,day_added,month_added,year_added
0,s1,TV Show,3%,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...,14,August,2020
1,s2,Movie,7:19,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...,23,December,2016
2,s3,Movie,23:59,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow...",20,December,2018
3,s4,Movie,9,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi...",16,November,2017
4,s5,Movie,21,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...,1,January,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6638,s7781,Movie,Zoo,"Shashank Arora, Shweta Tripathi, Rahul Kumar, ...",India,2018,TV-MA,94 min,"Dramas, Independent Movies, International Movies",A drug dealer starts having doubts about his t...,1,July,2018
6639,s7782,Movie,Zoom,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero...",11,January,2020
6640,s7783,Movie,Zozo,"Imad Creidi, Antoinette Turk, Elias Gergi, Car...","Sweden, Czech Republic, United Kingdom, Denmar...",2005,TV-MA,99 min,"Dramas, International Movies",When Lebanon's Civil War deprives Zozo of his ...,19,October,2020
6641,s7784,Movie,Zubaan,"Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...",India,2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...,2,March,2019


## 2. Preprocess the dataset

In this case we are only interested in the genres of the Movies and TV series.

In [3]:
#get the columns we are interested on
df_genres = df[['show_id','title','type','listed_in']]
df_genres.head()

Unnamed: 0,show_id,title,type,listed_in
0,s1,3%,TV Show,"International TV Shows, TV Dramas, TV Sci-Fi &..."
1,s2,7:19,Movie,"Dramas, International Movies"
2,s3,23:59,Movie,"Horror Movies, International Movies"
3,s4,9,Movie,"Action & Adventure, Independent Movies, Sci-Fi..."
4,s5,21,Movie,Dramas


### Pre-process the dataset to one-hot encode the genres

In [6]:
#convert the genres of each row to a list
df_genres['genre'] = df_genres['listed_in'].apply(lambda x :  x.replace(' ,',',').replace(', ',',').split(','))
#get the genres of netflix content
genres = []
for i in df['genre']: genres += i
genres = sorted(list(set(genres)))
print('In total there are', len(genres), 'genres:')
for g in genres :
    print('-', g)

In total there are 42 genres:
- Action & Adventure
- Anime Features
- Anime Series
- British TV Shows
- Children & Family Movies
- Classic & Cult TV
- Classic Movies
- Comedies
- Crime TV Shows
- Cult Movies
- Documentaries
- Docuseries
- Dramas
- Faith & Spirituality
- Horror Movies
- Independent Movies
- International Movies
- International TV Shows
- Kids' TV
- Korean TV Shows
- LGBTQ Movies
- Movies
- Music & Musicals
- Reality TV
- Romantic Movies
- Romantic TV Shows
- Sci-Fi & Fantasy
- Science & Nature TV
- Spanish-Language TV Shows
- Sports Movies
- Stand-Up Comedy
- Stand-Up Comedy & Talk Shows
- TV Action & Adventure
- TV Comedies
- TV Dramas
- TV Horror
- TV Mysteries
- TV Sci-Fi & Fantasy
- TV Shows
- TV Thrillers
- Teen TV Shows
- Thrillers


In [8]:
#initialize a multilabel binarizer
mlb = MultiLabelBinarizer()
#one-hot encode the genres of each movie and tv series
genres_df2 = pd.DataFrame(mlb.fit_transform(df_genres['genre']), columns=mlb.classes_, index=df['genre'].index)
#concatenate the two datasets
genres_df = pd.concat([df_genres, genres_df2], axis=1)
genres_df = genres_df.drop(['listed_in','genre'],axis=1)
genres_df.head()

Unnamed: 0,show_id,title,type,Action & Adventure,Anime Features,Anime Series,British TV Shows,Children & Family Movies,Classic & Cult TV,Classic Movies,...,TV Action & Adventure,TV Comedies,TV Dramas,TV Horror,TV Mysteries,TV Sci-Fi & Fantasy,TV Shows,TV Thrillers,Teen TV Shows,Thrillers
0,s1,3%,TV Show,0,0,0,0,0,0,0,...,0,0,1,0,0,1,0,0,0,0
1,s2,7:19,Movie,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,s3,23:59,Movie,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,s4,9,Movie,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,s5,21,Movie,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [10]:
genres_df.groupby(['type']).sum()

Unnamed: 0_level_0,Action & Adventure,Anime Features,Anime Series,British TV Shows,Children & Family Movies,Classic & Cult TV,Classic Movies,Comedies,Crime TV Shows,Cult Movies,...,TV Action & Adventure,TV Comedies,TV Dramas,TV Horror,TV Mysteries,TV Sci-Fi & Fantasy,TV Shows,TV Thrillers,Teen TV Shows,Thrillers
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Movie,700,55,0,0,466,0,95,1423,0,57,...,0,0,0,0,0,0,0,0,0,479
TV Show,0,0,134,184,0,25,0,0,339,0,...,139,449,618,65,85,71,4,46,57,0


## 3. Content based recommendation system

Now, let's take a look at how to implement Content-Based or Item-Item recommendation systems. This technique attempts to figure out what a user's favourite aspects of an item is, and then recommends items that present those aspects. In our case, we're going to try to figure out the input's favorite genres from the movies and ratings given.

### 3.1. Create randomly the user data
Let's begin by creating an input user to recommend movies to. The user will be randomly created with random rates and random movies and tv shows of the netflix dataset.

In [43]:
n = 20
user_data = dict()

content_list = genres_df['title'].tolist()

user_data['title'] = random.sample(content_list, n)
user_data['rating'] = random.sample(np.arange(0,5.5,0.5).tolist() + np.arange(0,5.5,0.5).tolist(), n)
user_data = pd.DataFrame(data=user_data)
user_data

Unnamed: 0,title,rating
0,Loving,1.0
1,Dr. Seuss' The Grinch,2.5
2,"Crouching Tiger, Hidden Dragon: Sword of Destiny",4.5
3,Asees,0.5
4,Jane The Virgin,4.0
5,Surga Yang Tak Dirindukan 2,3.0
6,Anarkali of Aarah,1.0
7,Woody Woodpecker,2.5
8,Mike Birbiglia: The New One,4.5
9,The Saint,0.0


Now, let's add the content id to the table

In [47]:
#filter the contents by title
input_id = genres_df[genres_df['title'].isin(user_data['title'].tolist())]
#concatenate with the genres dataset
user_data = pd.merge(inputId, user_data)
#select only the columns of interest
user_data = user_data[['show_id','title','type','rating']]
user_data

Unnamed: 0,show_id,title,type,rating
0,s330,After We Collided,Movie,5.0
1,s502,Anarkali of Aarah,Movie,1.0
2,s595,Asees,Movie,0.5
3,s1548,"Crouching Tiger, Hidden Dragon: Sword of Destiny",Movie,4.5
4,s1717,Desperados,Movie,0.0
5,s1843,Dr. Seuss' The Grinch,Movie,2.5
6,s2015,Enter the Warriors Gate,Movie,0.5
7,s2092,"Faith, Hope & Love",Movie,1.5
8,s2324,G-Force,Movie,4.0
9,s3123,Jane The Virgin,TV Show,4.0
