Recommender system is essentially a system/model/algoritm intended to give suggestion of relevant items to the users. 
It can be movie, music, and many more. Generally speaking, when it comes to the relation between users and service providers or buyers and e-commerce, recommendations will be extremely needed. 
In the end, decent recommendations will be a win-win solution that benefits both parties as the users are more royal because they got what they wanted and the service providers gained more insight that could be turn to the profit. 
You may think, how impactful those recommendations are? in fact, it's massive.

According to McKindsey, recommendations plays a crucial role in:

40% of app installs on Google Play
60% of watch time on Youtube
35% of puchase on Amazon
75% of movies watched on Netflix
So, i think its not a bad idea to build one, right?

In [None]:
pip install colorama


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.image as mpimg
from sklearn.impute import SimpleImputer

import math
import plotly.express as px
import plotly.graph_objects as go
import missingno as msno
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
                                  AnnotationBbox)
from plotly.colors import n_colors
from plotly.subplots import make_subplots
from IPython.display import Image
from colorama import Fore, Back, Style
m_ = Fore.MAGENTA
b_ = Fore.BLUE
y_ = Fore.YELLOW
r_ = Fore.RED
g_ = Fore.GREEN

sr_ = Style.RESET_ALL

Exploratory Data Analysis

First thing first, there's always an EDA to give us a sense of what data we are dealing with. 
It's also useful to acquire some insights, informations, and even mistakes from data.

In [None]:
data = pd.read_csv(r"C:\Users\vivih\Downloads\MoviesOnStreamingPlatforms_updated.csv")
data_tv = pd.read_csv(r"C:\Users\vivih\Downloads\tv_shows.csv")

In [None]:
data.head()

In [None]:
#Remove "Unnamed" 
data=data.drop("Unnamed: 0",axis=1)
data.head()

In [None]:
data.describe(include='all')

In [None]:
data.info()

In [None]:
#Check for total no.of null values in each column 
data.isna().sum()

In [None]:
#Removing the Target value
data = data[data['IMDb'].notna()]

In [None]:
#Visualizing the amount of missing data
msno.bar(data ,color='grey', figsize=(10, 4))

In [None]:
#Dropping "Rotten Tomatoes" for its excessively low data 
data = data.drop(['Rotten Tomatoes'], axis=1)

#"Title" and "Type" are dropped as it is irrelevant for our prediction.
data = data.drop(['Title','Type'], axis=1)
data.isna().sum()

In [None]:
#Checking unique values in "Age" 
data['Age'].unique()

In [None]:
#Imputing null values in "Age" with "all"
imputer = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value="all")
data["Age"] = imputer.fit_transform(data[["Age"]]).ravel()

data.isna().sum()

In [None]:
#Dropping all the rows(entries) where there are celss with no data 
data.dropna(axis=0, how='any',inplace=True)

data.isna().sum()

In [None]:
#Visualizing if there is anymore missing data or not
msno.bar(data ,color='grey', figsize=(10, 4))

In [None]:
data.info()

In [None]:
data_tv.head()

In [None]:
#Remove "Unnamed" 
data_tv=data_tv.drop("Unnamed: 0",axis=1)
data_tv.head()

In [None]:
data_tv.info

In [None]:
data_tv.isna().sum()

In [None]:
#Dropping all the rows(entries) where there are celss with no data 
data_tv.dropna(axis=0, how='any',inplace=True)

data_tv.isna().sum()

In [None]:
#Visualizing if there is anymore missing data or not
msno.bar(data_tv ,color='grey', figsize=(10, 4))

Feature Generation

In [None]:
def splitting(dataframe,col):
    result = dataframe[col].str.get_dummies(',')
    print('Done!')
    return result

In [None]:
m_genres = splitting(data,'Genres')
m_lang = splitting(data,'Language')

TV Shows and Movie per platform

In [None]:
def val_sum(data,c):
    return data[c].sum(axis=0)

In [None]:
val_counts = []
datas = [data,data_tv]
cols = ['Netflix','Hulu','Prime Video','Disney+']

for x in datas:
    for y in cols:
        val_counts.append(val_sum(x,y))

In [None]:
val_counts

Custom Color Palatte

In [None]:
custom_colors = ["#ff6B89","#95d5ad","#a2bbff","#c6f8f1"]
customPalette = sns.set_palette(sns.color_palette(custom_colors))

In [None]:
sns.palplot(sns.color_palette(custom_colors),size=1)

In [None]:
netflix_p = sns.light_palette(custom_colors[0], reverse=True)
sns.palplot(sns.color_palette(netflix_p),size=1)

In [None]:
hulu_p = sns.light_palette(custom_colors[1], reverse=True)
sns.palplot(sns.color_palette(hulu_p),size=1)

In [None]:
prime_p = sns.light_palette(custom_colors[2], reverse=True)
sns.palplot(sns.color_palette(prime_p),size=1)

In [None]:
disney_p = sns.dark_palette(custom_colors[3], reverse=True)
sns.palplot(sns.color_palette(disney_p),size=1)

In [None]:
def donut(i,data,sizes,title):
    plt.subplot(i)
    plt.pie(sizes, explode=explode, labels=labels, colors=colors,
                autopct='%1.1f%%', shadow=True)

    centre_circle = plt.Circle((0,0),0.5,color='black', fc='white',linewidth=1.25)
    fig = plt.gcf()
    fig.gca().add_artist(centre_circle)
    plt.title(title)
    plt.axis('equal')

In [None]:
fig = plt.subplots(figsize=(16, 8))
labels = 'Netflix', 'Hulu','Prime','Disney+'
sizes1 = [val_counts[0], val_counts[1],val_counts[2],val_counts[3]]
sizes2 = [val_counts[4], val_counts[5],val_counts[6],val_counts[7]]
colors = custom_colors
explode = (0, 0, 0, 0) 

donut(121,data,sizes1,'Movies')
donut(122,data_tv,sizes2,'TV shows')
plt.show()

 Prime has the maximum number of movies and Netflix has the maximum number of tv shows!

In [None]:
fig = make_subplots(rows=1, cols=2)

fig.add_trace(go.Indicator(
    mode = "number",
    value = 4113,
    number={'font':{'color': custom_colors[2],'size':100}},
    title = {"text": "Movies<br><span style='font-size:0.8em;color:gray'>On Prime Video</span>"},
    domain = {'x': [0, 0.5], 'y': [0.6, 1]}))

fig.add_trace(go.Indicator(
    mode = "number",
    value = 1971,
    number={'font':{'color': custom_colors[2],'size':100}},
    title = {"text": "TV Shows<br><span style='font-size:0.8em;color:gray'>On Netflix</span>"},
    domain = {'x': [0.5, 1], 'y': [0, 0.4]}))

fig.show()