In [None]:
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
from numpy import mean
plt.style.use('ggplot')

In [None]:
data = pd.read_csv('../input/netflix-shows/netflix_titles_nov_2019.csv')
data.head()

In [None]:
data.columns

In [None]:
data.isnull().sum()

In [None]:
data.nunique()

In [None]:
data.dtypes

Lets look at the number the representation of TV Shows vs Movies in our dataset...

In [None]:
sb.countplot(x="type", data=data)

As shown, the number of movies doubles the number of TV shows in our dataset

I wanna look at how the ratings of TV shows compares to Movies. Lets start by looking at a pie chart of ratings in our dataset

In [None]:
plt.figure(figsize=(20,20))
data['rating'].value_counts().plot.pie(autopct="%1.1f%%")

Now lets take a look at a pie chart of the ratings of just tv shows vs ratings of just movies

In [None]:
plt.figure(figsize=(10,10))
tv = data.loc[data['type']=="TV Show"]
movie = data.loc[data['type']=="Movie"]
tv['rating'].value_counts().plot.pie(autopct="%1.1f%%", label="TV Shows")


In [None]:
plt.figure(figsize=(10,10))
movie['rating'].value_counts().plot.pie(autopct="%1.1f%%", label="Movies")

Based on these two pie charts, its still hard to see the relationship between the two. Seaborns catplot paints a clearer picture

In [None]:
sb.catplot(x="type", col="rating", kind="count", col_wrap=3, data=data)

Next, I wanna look at which countries contribute most to the releases on netflix.
Here is a bar graph of the top 10 countries with the highest number of releases...

In [None]:
plt.figure(figsize=(15,10))
plt.ylabel("Number of Releases")
data['country'].value_counts().nlargest(10).plot.bar()

As shown, US leads the number of releases by a large margin, followed by India and the UK

It'd be interesting to see the proportion of TV show VS Movie releases amongst these countries.
Lets put a hue on 'type' on this graph to show this...

In [None]:
top10 = data['country'].value_counts().nlargest(10)
plt.figure(figsize=(20,15))
df = data[data['country'].isin(top10.index)]
sb.countplot(x='country', hue='type', data=df)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.xlabel("Country", fontsize=20)
plt.ylabel("Number of Releases", fontsize=20)

One interesting observation from this chart is India. As shown, there is a highly disproportionate number of Movie release compared to TV Show releases in India, most likely due to the famous Hindi cinema industry Bollywood. Another observation is the TV industry in Japan and South Korea...I did a little research and learned that a TV genre known as "Korean Drama" has gained worldwide popularity, a similar story for Japanese Anime.

Based on this, I thought it would be interesting to look at the average length of movies amongst these countries...

In [None]:
plt.figure(figsize=(20,15))
df = data.loc[data['type'] == 'Movie']
top10 = df['country'].value_counts().nlargest(10)
df = df[df['country'].isin(top10.index)]
df['duration'] = df['duration'].str.strip('min')
df['duration'] = pd.to_numeric(df['duration'])
sb.barplot(x='country', y='duration', estimator=mean, data=df, ci=False)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.xlabel("Country", fontsize=20)
plt.ylabel("Average Movie Length (Min)", fontsize=20)

Not only does India have a disproportionately large number of Movie releases, the average length of the movies is also higher than any other country. Curiously followed by Turkey...