
---

<h1 style="text-align: center;font-size: 40px;">Netflix Data Analysis & Visualization</h1>

---

<center><img src="https://media.giphy.com/media/xRItaX70EufZK/giphy.gif"></center>

---

>><h4>Netflix, Inc. is an American media-services provider and production company headquartered in Los Gatos, California, founded in 1997 by Reed Hastings and Marc Randolph in Scotts Valley, California. The company's primary business is its subscription-based streaming service which offers online streaming of a library of films and television programs, including those produced in-house. As of April 2020, Netflix had over 193 million paid subscriptions worldwide, including 73 million in the United States. It is available worldwide except in the following: mainland China (due to local restrictions), Iran, Syria, North Korea, and Crimea (due to U.S. sanctions). The company also has offices in French, United States, United Kingdom, Brazil, the Netherlands, India, Japan, and South Korea. Netflix is a member of the Motion Picture Association of America (MPAA). Today, the company produces and distributes content from countries all over the globe.</h4>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import pandas_profiling
import missingno as msno
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
import seaborn as sns


In [None]:
df = pd.read_csv("/kaggle/input/netflix-shows/netflix_titles.csv")
df.head()

In [None]:
df.info()

In [None]:
profile = pandas_profiling.ProfileReport(df)
profile

In [None]:
n = msno.bar(df,color='purple')

><div class="alert alert-block alert-info">
<b></b> We have null values in column Director,Cast,Country,Rating,Date_added,So before doing visualization let's at first deal with these values
</div>

><div class="alert alert-block alert-info">
<b></b> We don't need Director,Cast from the Dataset to visualize  & sincce these two column has missing values ,so we can drop them.
</div>

In [None]:
df.drop(["director","cast"],axis =1,inplace=True)
df.head()

In [None]:
df['country'].value_counts()

><h4>Country column is important for our Visualization & since it has some null values we can replace them by United States,Since United States has the largest nummber of shows,and Netflix was also created in United States</h4>

In [None]:
df['country'].replace(np.nan,"United States",inplace=True)

In [None]:
df['date_added'].value_counts()

><h4>since we have release_year so we don't need year value from date_added column,But Month is important for Visualizing our data,so let's separate the month from the date_added column & replace the Null values with 0</h4>

In [None]:
netflix_date = df[['date_added']].replace(np.nan,'Not Added')
netflix_date["release_month"] = netflix_date['date_added'].apply(lambda x: x.lstrip().split(" ")[0])
netflix_date.head()

In [None]:
netflix_date["release_month"].value_counts()

In [None]:
netflix_date['release_month'].replace('Not', 0,inplace=True)


In [None]:
netflix_date["release_month"].value_counts()

In [None]:
netflix_date.drop("date_added",axis=1,inplace=True)
netflix_date.head()

In [None]:
netflix = pd.concat([df,netflix_date],axis=1)
netflix.head()

In [None]:
netflix.drop("date_added",axis=1,inplace=True)
netflix.head()

In [None]:
netflix["rating"].value_counts()

In [None]:
netflix["rating"].isnull().sum()

><h4>Since rating column has only 10 null values,so let's replace the null values with TV-MA since they gives the most amount of Rating</h4>

In [None]:
netflix["rating"].replace(np.nan,"TV-MA",inplace=True)
netflix.isnull().sum()

><h4>So we successfully removed all the Null Values,Now we can visualize our Data</h4>

In [None]:
netflix.head()

><h4>Let's find out the number of Movie & Tv Show</h4>

In [None]:
sns.set()
sns.countplot(x="type",data=netflix)
plt.show()

><h4>So Netflix has around 4500 Movies & almost 2000 Tv Show </h4>

In [None]:
plt.figure(figsize=(12,9))
sns.countplot(x="rating",data=netflix,order= netflix['rating'].value_counts().index[0:14])

><h4>So most of the ratings is given by TV-MA then TV-14</h4>

In [None]:
sns.set()
plt.figure(figsize=(30,9))
sns.countplot(x="release_year",data= netflix,order = netflix['release_year'].value_counts().index[0:40])
plt.xticks(rotation=45)
plt.show()

><h4>So total Highest number of Movies & Tv Shows has been released in the Year 2018</h4>

><h3>Let's see which month directors prefer most to release their Movies & Tv Shows</h3>

In [None]:
sns.set()
plt.figure(figsize=(20,8))
sns.countplot(x="release_month",data= netflix,order = netflix['release_month'].value_counts().index[0:12])
plt.xticks(rotation=45)
plt.show()

><h4>Most of the directors prefer to release their Movies & Tv Shows in December.Since December is the Month of Vacations</h4>

In [None]:
sns.set()
plt.figure(figsize=(25,9))
sns.countplot(x="rating",data= netflix,hue= "type",order = netflix['rating'].value_counts().index[0:15])
plt.xticks(rotation=45)
plt.show()

><h4>Both for Movies & Tv Shows TV-MA always gives the Highest number of ratings,but TV-14 almost gives the same amount of ratings for Tv-Shows as TV-MA</h4>

In [None]:
netflix["country"].value_counts().head()

In [None]:
sns.set()
plt.figure(figsize=(25,9))
sns.countplot(x="country",data= netflix,hue= "type",order = netflix['country'].value_counts().index[0:15])
plt.xticks(rotation=45)
plt.show()

> <h4>So United States provides the Highest number of Movies & Tv Shows,then at 2nd place India provides the Highest number of Movies</h4>

In [None]:
top = netflix['country'].value_counts()[0:8]
top.index

In [None]:
fig = px.pie(netflix,values = top,names = top.index,labels= top.index)
fig.update_traces(textposition ='inside',textinfo='percent+label')
fig.show()

><h4>United States has the Highest number of Movies & Tv Shows</h4>

> <h3> WordCloud</h3>

<h4>Word Cloud of Country's</h4>

In [None]:
from wordcloud import WordCloud
wordcloud = WordCloud(background_color = "black",width=1730,height=970).generate(" ".join(netflix.country))
plt.figure(figsize=(15,10))
plt.imshow(wordcloud,interpolation = 'bilinear')
plt.axis("off")
plt.figure(1,figsize=(12,12))
plt.show()

><h4>WordCloud of Titles</h4>

In [None]:
wordcloud = WordCloud(background_color = "white",width=1730,height=970).generate(" ".join(netflix.title))
plt.figure(figsize=(15,10))
plt.imshow(wordcloud,interpolation = 'bilinear')
plt.axis("off")
plt.figure(1,figsize=(12,12))
plt.show()

><h3>Let's find out which Genre of Movies & Tv Shows Netflix Provides the Most</h3>

In [None]:
top_listed_in=netflix["listed_in"].value_counts()[0:25]
top_listed_in.head()

In [None]:
sns.set()
plt.figure(figsize=(30,15))
sns.countplot(x='listed_in',data = netflix,order =netflix["listed_in"].value_counts().index[0:25])
plt.xticks(rotation = 90)
plt.show()

In [None]:
fig = px.pie(netflix,values = top_listed_in,names = top_listed_in.index,labels= top_listed_in.index)
fig.update_traces(textposition ='inside',textinfo='percent+label')
fig.show()

> <h4>Netflix provides "Documentry" type Movies & TvShows most then in the 2nd place it provides Stand Up Comedy most</h4>

> <h3>Let's  find out which genre Movies/Tv shows get's the most amount of rating , by which rating</h3>

In [None]:
sns.set()
plt.figure(figsize=(30,15))
sns.countplot(x='listed_in',hue='rating',data = netflix,order =netflix["listed_in"].value_counts().index[0:10])
plt.xticks(rotation = 30)
plt.show()

> <h4>Drama type movies are mostly rated by TV-14 & Most of the Stand Up Comedy are rated by TV-MA </h4>

> <h4>Let's see Listed of old Movies on Netflix </h4>

In [None]:
old = netflix.sort_values("release_year",ascending=True)
old[["title","type","country","release_year"]].head(20)

> <h4>All of the oldest Movies & TV Shows on Netflix  are from United States</h4>

> <h3>List of Kids TV Shows on Netflix</h3>

In [None]:
kids_show=netflix[netflix["listed_in"] == "Kids' TV"].reset_index()
kids_show[["title","country","release_year"]].head(10)

> <div class="alert alert-block alert-info">
<b></b> Let's see Bangladesh has any Movies on Netflix or Not
</div>


In [None]:
netflix[netflix["country"] == "Bangladesh"]

In [None]:
Country = pd.DataFrame(netflix["country"].value_counts().reset_index().values,columns=["country","TotalShows"])
Country.head()

> <h3> Country's Having the Netflix Movie/ Tv Shows</h3>

In [None]:
fig = px.choropleth(   
    locationmode='country names',
    locations=Country.country,
    featureidkey="Country.country",
    labels=Country["TotalShows"]
)
fig.show()