# **Netflix Exploratory Data Analysis**

![](https://pmcvariety.files.wordpress.com/2019/03/netflix-logo-n-icon.png)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### **About NETFLIX**
> ****Netflix has been leading the way for digital content since 1997****


Netflix is the world's leading streaming entertainment service with 193 million paid memberships in over 190 countries enjoying TV series, documentaries and feature films across a wide variety of genres and languages. Members can watch as much as they want, anytime, anywhere, on any internet-connected screen. Members can play, pause and resume watching, all without commercials or commitments.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns #importing our visualization library
import matplotlib.pyplot as plt
import plotly.express as px
%matplotlib inline


In [None]:
df = pd.read_csv('/kaggle/input/netflix-shows/netflix_titles.csv')
print(df.shape)

**Knowing DATA**

In [None]:
df.head()

*checking for null values*

In [None]:
df.isnull().sum()
sns.set()
plt.figure(figsize=(15,8))
sns.heatmap(df.isnull(),cmap = 'viridis')
plt.show()


In [None]:
df.isnull().sum()

**There are 5 component with null data**
* Director
* Cast
* Country
* Date_added
* Rating

In [None]:
df['country'].replace(np.nan, 'United States',inplace = True)
df['rating'].replace(np.nan, 'TV-MA',inplace = True)



> Since we have released year ,there is no use of date added.
Therefore we can drop date added.

In [None]:
df.drop('date_added',axis=1,inplace=True)
df.head()

## We have dealt with all the missing data , lets get started with our data visualization 

In [None]:
sns.countplot(x='type',data=df)

> *Netflix has around 4500 Movies & almost 2000 Tv Show*

In [None]:
plt.figure(figsize =(12,9))
sns.countplot(x='rating',data=df,order=df['rating'].value_counts().index[0:50],hue=df['type'])

> *Most of the ratings are given by TV-MA and TV-14 for movies and tv-series.*

## Rating Distribution

- [How does Netflix decide the maturity rating on TV shows and movies? (USE ver.)](https://help.netflix.com/en/node/2064/us)

> Each TV show and movie on Netflix is assigned a maturity rating to help members make informed choices for themselves and their children. Maturity ratings are either determined by Netflix or by a local standards organization. Netflix determines maturity ratings by the frequency and impact of mature content in a TV show or movie. TV show ratings reflect the overall maturity level of the whole series.


|Little Kids | Older Kids | Teens | Mature|
|-|-|-|-|
|G, TV-Y, TV-G | PG, TV-Y7, TV-Y7-FV, TV-PG | PG-13, TV-14 | R, NC-17, TV-MA|

---

### Rating System

>  [Motion Picture Association of America film rating system](https://en.wikipedia.org/wiki/Motion_Picture_Association_of_America_film_rating_system)

|Rating|Meaning|
|-|-|
|G|General Audiences|
|PG|Parental Guidance Suggested|
|PG-13|Parents Stongly Cautioned|
|R|Restricted|
|NC-17|Adults Only|



In [None]:
sns.set(style='darkgrid')
plt.figure(figsize=(25,10))
sns.countplot(x='country',data=df,hue='type',order=df['country'].value_counts().index[0:10])
plt.xticks(rotation=90)
plt.show()

> Most number of movies & tv-series are from :-
* United States - *58.4%*
* India - *18.1%*
* United Kingdom - *8.11%*

In [None]:
top = df['country'].value_counts()[0:8]
fig=px.pie(df, values=top, names=top.index, labels=top.index)
fig.update_traces(textposition='inside',textinfo='percent+label')
fig.show()

#### **Release of movies & tv-series per year**

In [None]:
sns.set()
plt.figure(figsize=(10,10))
sns.countplot(x="release_year",data= df,order = df['release_year'].value_counts().index[0:40])
plt.xticks(rotation=90)
plt.title('Number of Movies & Tv-shows per year')
plt.show()

> *Highest number of Movies & Tv-shows were released in the year 2018*

In [None]:
top_listed=df['listed_in'].value_counts()[0:25]

fig=px.pie(df,values=top_listed,names=top_listed.index,labels=top_listed.index)
fig.update_traces(textposition='inside',textinfo='percent+label')
fig.show()

Most genre released on Netflix
* *Documentaries*
* *Standup Comedy*
* *Drama*

#### **Genre Vs Rating**

In [None]:
sns.set(style='white')
plt.figure(figsize=(15,15))
sns.countplot(x='listed_in',hue='rating',data = df,order =df["listed_in"].value_counts().index[0:10])
plt.xticks(rotation = 90)
plt.show()

***Drama type movies are mostly rated as TV-14 & Most of the Stand Up Comedy are rated as**** TV-MA***

### Duration

In [None]:
df['season_count'] = df.apply(lambda x : x['duration'].split(" ")[0] if "Season" in x['duration'] else "", axis = 1)
df['duration'] = df.apply(lambda x : x['duration'].split(" ")[0] if "Season" not in x['duration'] else "", axis = 1)
df['season_count'].value_counts()

#### **Tv-show with most number of seasons**

In [None]:
display(df[df['season_count'] == '15'][['title','director', 'cast','country','release_year']])

![](https://cdn1.edgedatg.com/aws/v2/abc/GreysAnatomy/showimages/64f1df58afb7276e875ee449e76e8635/1200x627-Q80_64f1df58afb7276e875ee449e76e8635.jpg)

![](https://filmdaily.co/wp-content/uploads/2020/05/NCIS_lede.jpg)

#### **Longest documentary on netflix**

In [None]:
time=df.loc[df['listed_in'] == 'Documentaries',:]
time=time.loc[df['duration'] >= '99',:]
time


#### List of Kids Tv-show

In [None]:
kids_show=df.loc[df['listed_in'] == "Kids' TV",:].reset_index()
kids_show[["title","country","release_year"]].head(10)

**Countries Movies/Tv-show available on netflix**

In [None]:
Country = pd.DataFrame(df["country"].value_counts().reset_index().values,columns=["country","TotalShows"])
Country.head()

fig = px.choropleth(   
    locationmode='country names',
    locations=Country.country,
    featureidkey="Country.country",
    labels=Country["TotalShows"]
)
fig.show()