# **TV shows on Netflix, Prime Video, Hulu and Disney+ : A Data Analysis**
## **Content**

The data scraped comprises a comprehensive list of tv shows available on various streaming platforms

## **Aim**
To find the best streaming platform from Netflix, Prime Video, Hulu and Disney+

## **Inspiration**

1) Which streaming platform(s) can I find this tv show on?

2) IMDb ratings of a tv show?

3) Target age group tv shows vs the streaming application they can be found on

4) The year during which a tv show was produced and the streaming 
platform they can be found on

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import opendatasets as od
dataset_url = 'https://www.kaggle.com/ruchi798/tv-shows-on-netflix-prime-video-hulu-and-disney'
od.download(dataset_url)

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
df = pd.read_csv('/content/drive/MyDrive/project eda/tv_shows.csv')
df.head()

# Understaing the data.

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df['type'].value_counts()

## **Observation**
1)Data missing in the IMDb, Age and Rotten Tomatoes

2)The data only contain single type of data, TV Shows 

# Filling the Missing Data

In [None]:
df['Age'].value_counts()

Most of the TV Shows are 16+. So assuming the missing values is the same

In [None]:
df['Age'].fillna(value = '16+', inplace = True)


In [None]:
df['IMDb'].describe().to_frame()

50th percentile of IMDb is 7.3 , filling rest with the same

In [None]:
df['IMDb'].fillna(value = 7.3,inplace = True)


1)Large number of Null Value in **'Rotten Tomatoes'**. Hence droping the column

2)There is only one type of data,TV shows. Hence droping the type column

In [None]:
df.drop('Rotten Tomatoes', axis = 1,inplace=True)


In [None]:
df.drop('type',axis = 1, inplace = True)

# Overview of the Data

In [None]:
df.info()

In [None]:
df.describe()

**Observations**

1) This dataset contain TV shows from 1901 to 2020

2) Highest IMDb rated TV show is 9.6 and 1.0 as the lowest

3) 38.21% TV Shows are in Prime Videos 


# **Analytical Approach : A Detailed data analysis**

In [None]:
import matplotlib

sns.set_style('dark')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

In [None]:
sns.distplot(df['Year'])

**Observation**

By the beginning of the 21th century there is a trendomous


In [None]:
top_imdb_rated = df[['Title','IMDb']]
top_imdb_rated = top_imdb_rated.sort_values(by = 'IMDb',ascending=False)[:10]
top_imdb_rated

In [None]:
sns.barplot(x='IMDb', y = 'Title',data = top_imdb_rated)

In [None]:
xtop_imdb_rated = df[['Title','IMDb']]
xtop_imdb_rated = xtop_imdb_rated.sort_values(by = 'IMDb', ascending = True)[:10]
xtop_imdb_rated

**Observation**

1) *Destiny* is the top IMDb rated TV Show with 9.6 rating

2) *Be with you* is the lowest IMDb rated TV show with 1.0 rating

In [None]:
age = df['Age'].value_counts()
age

In [None]:
age.plot.pie(x='Age',autopct='%1.2f%%')

In [None]:
platforms = df[['Netflix', 'Hulu', 'Prime Video', 'Disney+']].apply(pd.Series.value_counts).reset_index()
platforms = platforms.T
platforms.drop('index',inplace=True)
platforms

In [None]:
platforms.plot.bar()

**Observation**

Large of number TV Show are present in Prime Video (as mentioned earlier). To be precise, 2144 TV Shows

In [None]:
ratingdic = {}
for i in ['Netflix', 'Hulu', 'Prime Video', 'Disney+']:
  ratingdic['r_'+i] = (df[df[i]==1].IMDb.sum())/(df[df[i]==1][i].sum())


In [None]:
rating = pd.DataFrame.from_dict(ratingdic,orient='index',columns=['Rating'])
rating 

In [None]:
sns.barplot(x='Rating', y =rating.index, data = rating)

In [None]:
rating.plot.line()

**Observation**

Most of the TV Shows in Prime Video is highly rated by IMDb. Following the Prime Video we have the Netflix, Hulu and Disney+

In [None]:
platform_agedict = {}
for i in ['Netflix', 'Hulu', 'Prime Video', 'Disney+']:
  platform_agedict['a_'+i] = df[df[i]==1].Age.value_counts()
platform_agedict

In [None]:
platform_age = pd.DataFrame.from_dict(platform_agedict,orient='index')
platform_age 

In [None]:
platform_age.info()

In [None]:
platform_age.fillna(0,inplace = True)
platform_age

In [None]:
platform_age.plot.bar()

In [None]:
fig = plt.figure()


ax1 = fig.add_axes([0, 0, 0.5, 0.5], aspect = 1)
ax1.set_title('Age of Disney+')
ax1.pie(platform_aget['a_Disney+'], labels = platform_aget.index,autopct='%1.1f%%')

ax2 = fig.add_axes([0.3, 0, 0.5, 0.5], aspect = 1)
ax2.set_title('Age of Hulu')
ax2.pie(platform_aget['a_Hulu'], labels = platform_aget.index,autopct='%1.1f%%')

ax3 = fig.add_axes([0.6, 0, 0.5, 0.5], aspect = 1)
ax3.set_title('Age of Netflix')
ax3.pie(platform_aget['a_Netflix'], labels = platform_aget.index,autopct='%1.1f%%')

ax4 = fig.add_axes([0.9, 0, 0.5, 0.5], aspect = 1)
ax4.set_title('Age of Prime Video')
ax4.pie(platform_aget['a_Prime Video'], labels = platform_aget.index,autopct='%1.1f%%')

**Observation**

1) Most of the TV Shows of Disney+ is accessible to all age, hence good platform for kids

2) Hulu is a good streaming platform for children above the age of 7

3) While as Netflix and Prime Video are adult contents 

In [None]:
yearwise = {}
for i in ['Netflix', 'Hulu', 'Prime Video', 'Disney+']:  
  yearwise[i+'_year'] = df[df[i] == 1]['Year'].value_counts()

dfyearwise = pd.DataFrame.from_dict(yearwise,orient='index')
dfyearwise = dfyearwise.T

In [None]:
dfyearwise.fillna(0,inplace = True)
dfyearwise.info()

In [None]:
dfyearwise

In [None]:
dfyearwise['Netflix_year'].max()

In [None]:
dfyearwise['Hulu_year'].max()

In [None]:
dfyearwise['Prime Video_year'].max()

In [None]:
dfyearwise['Disney+_year'].max()

**Observation**

1) In single year large of TV Shows relase in Prime Video. Following the Prime Video there are Netflix.

2) In term of new relase Hulu and Disney+ shows a poor performance.

# **Conclusion**
1) In the given dataset there is data about TV Shows from 1901 to 2020, where we can see a tremendous increase in the number of TV shows by beginning of 21st century

2) Prime Video is the most rated streaming platform, which streams adult contents(16+ to be correct) the more, and some kids contents too.

3) Following Prime Video, Netflix is the 2nd most rated , most of the TV shows in Netflix are also adult contents but few are accessible to everyone

4) Hulu is an average rated streaming platform, which streams kids TV shows

5) Disney+ is a poor rated streaming platform, which streams more contents accessible to everyone.

In short, all the streaming platforms have pros and concs, but most of the public supported TV shows are streamed by Prime Video