**NETFLIX**
---
**Netflix, Inc.** is an American over-the-top content platform and production company . **Netflix is a subscription-based streaming service that allows our members to watch TV shows and movies without commercials on an internet-connected device.** You can also download TV shows and movies to your iOS, Android, or Windows 10 device and watch without an internet connection.



![](https://assets.nflxext.com/ffe/siteui/allow-robots/contentSampling/seo-watch-free-link-preview.jpg)


The Goal of this Notebook is To Present information in understandable format with minimal code.
---
**Note:- As, In Programming one aspect can be done is multiple ways, your way of programming may differ from mine but ultimatly result will be same.**


Then Lets start,
---

**Importing Libraries**
---

In [None]:
import numpy as np
import pandas as pd

import io
import missingno

import warnings
warnings.filterwarnings("ignore")

import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('seaborn')



**Importing dataset**
---

In [None]:
df = pd.read_csv("../input/netflix-shows/netflix_titles.csv")

For Exploratory analysis  we should first Understand the dataset basic information such as **Shape of data, Missing data, number of abservations etc.**

Lets Begin with knowing details about data.
---

In [None]:
df.shape
df.describe()
df.info() 
df.isna().sum()


**Handing missing values**

i have handled missing values by just inserting 'missing' keyword whenever data is not available.

In [None]:
df.director.fillna('missing', inplace=True)
df.cast.fillna('missing', inplace=True)
df.country.fillna('missing', inplace=True)
df.isna().sum()

**Droping records that have missing Date_added and missing rating values.**

In [None]:
df.dropna(subset=['date_added'], inplace = True)
df.dropna(subset=['rating'], inplace = True)
df.isna().sum()

In [None]:
df.head()

---
Now dataset is ready for **Feature Engineering**.
---

We can extract **date month year** from **date_added field.**

We can extract **Different categories that show is listed** from **listed_in field.**

Lets Begin with **date_added field**.

In [None]:
df['added_month'] = np.nan
df['added_date'] = np.nan
df['added_year'] = np.nan

In [None]:
for i in range(len(df)):
    df['added_month'][i] = df.date_added.iloc[i].split(' ')[0]

for i in range(len(df)):
    df['added_date'][i] = df.date_added.iloc[i].split(' ')[1][:-1]

for i in range(len(df)):
    df['added_year'][i] = df.date_added.iloc[i].split(' ')[2]

As we have extracted date month and year therefore we don't need date_added field.

In [None]:
df.drop('date_added', axis=1, inplace=True)

---
From listed_in feature we have to extract categories. 
---

*   Firstly we will split each listing.
*   Then we count each category.

Lastly We will Sort and save it as dictionary.

In [None]:
listed_in = []
for i in range(len(df)):
    listed_in.extend(df.listed_in.iloc[i].split(','))

In [None]:
listed_dic = {}
for i in listed_in:
    listed_dic[i] = listed_in.count(i)

print(listed_dic)

In [None]:
listed_dic = sorted(listed_dic.items(), key=lambda item: item[1], reverse=True)
print(listed_dic)

In [None]:
listed_dic = dict(listed_dic)

---
Now, Dataset is ready for **Exploratory Data Analysis.**
---
Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics.

The purpose of exploratory data analysis is to:


1. Check for **missing data and other mistakes**. 

2. Gain **maximum insight** into the data set and its underlying structure. 

3. **Uncover a parsimonious model**, one which explains the data with a minimum number of predictor variables.

Most used **Data visualization libraries** are:

1.  Matplotlib                     
2.  Seaborn 
3.  ggplot
4.  Bokeh
5.  pygal
6.  Plotly
7.  geoplotlib
8.  Gleam
9.  missingno
10. Leather

I am using Matplotlib and seaborn for visualzation.

In [None]:
plt.barh(df.type.unique(), df.type.value_counts(), color = "#003f5c")

# Show graphic

plt.title('Most Uploaded Content',fontsize=20, fontweight='bold') 
plt.show()

In [None]:
data = df.groupby('type')['director'].value_counts()['Movie'][1: 20]
data = pd.DataFrame(data)

plt.barh(data.index, data.director, color= "#003f5c")


#plot title 
plt.title('Most Popular Director for Movies',fontsize=20, fontweight='bold') 

# Show graphic
plt.show()

In [None]:
data = df.groupby('type')['director'].value_counts()['TV Show'][1: 20]
data = pd.DataFrame(data)

plt.barh(data.index, data.director, color= "#bc5090")


#plot title 
plt.title('Most Popular Director for TV shows',fontsize=20, fontweight='bold') 

# Show graphic
plt.show()

In [None]:
height = list(listed_dic.values())[:10]

bars = list(listed_dic.keys())[:10]
y_pos = np.arange(len(bars))

# Create Graph
plt.bar(y_pos, height,color = "#003f5c")

# Create names on the x-axis
plt.xticks(y_pos, bars)

#x-axis labels 
plt.xlabel('Categories') 

# Rotate labels
plt.xticks(rotation=90)

#y-axis labels 
plt.ylabel('Records founds') 

#plot title 
plt.title('Most Popular category',fontsize=20, fontweight='bold') 

# Show graphic
plt.show()


In [None]:
# Create horizontal bars
plt.barh(y_pos, height,color = "#bc5090")
 
# Create names on the x-axis
plt.yticks(y_pos, bars)

#x-axis labels 
plt.ylabel('Categories') 

#y-axis labels 
plt.xlabel('Records founds') 
# Show graphic

plt.title('Most Popular category',fontsize=20, fontweight='bold') 
plt.show()

In [None]:
bar = list(df.added_month.unique()[:-2])

height = list(df.added_month.value_counts()[:-1])

plt.barh(bar, height, color = "#003f5c" )

plt.title('Content uploaded in perticulor month',fontsize=20, fontweight='bold') 
plt.show()

In [None]:
movie = df.groupby('type')['release_year'].value_counts()['Movie']

movie= pd.DataFrame(movie.sort_values(ascending=False)[:20])

plt.bar(movie.index, movie.release_year, color = "#003f5c" )

plt.title('Movies uploaded in perticulor year',fontsize=20, fontweight='bold') 
plt.show()

In [None]:
TV = df.groupby('type')['release_year'].value_counts()['TV Show']

TV = pd.DataFrame(TV.sort_values(ascending=False)[:20])

plt.bar(TV.index, TV.release_year, color = "#bc5090" )

plt.title('TV Content uploaded in perticulor year' ,fontsize=20, fontweight='bold') 
plt.show()

In [None]:
bar = list(df.rating.unique()[:])

height = list(df.rating.value_counts()[:])

plt.barh(bar, height, color = "#003f5c")

plt.title('Content uploaded with Rating',fontsize=20, fontweight='bold') 
plt.show()

In [None]:
data = df.groupby('type')['rating'].value_counts()['TV Show'][0:5]
data = pd.DataFrame(data)

plt.barh(data.index, data.rating, color= "#bc5090")


#plot title 
plt.title('Maximum TV shows With Rating',fontsize=20, fontweight='bold') 

# Show graphic
plt.show()

In [None]:
data = df.groupby('type')['rating'].value_counts()['Movie'][0:5]
data = pd.DataFrame(data)

plt.barh(data.index, data.rating, color= "#003f5c")


#plot title 
plt.title('Maximum Movies With Rating',fontsize=20, fontweight='bold') 

# Show graphic
plt.show()

In [None]:
plt.plot(df.duration.value_counts().index.to_list()[3: 20], df.duration.value_counts()[3:20], color='#003f5c')
plt.xticks(rotation='90')
plt.title('Duration Of content', fontsize=20, fontweight='bold')
plt.show()


In [None]:
plt.figure(figsize=(8, 6))
labels=['1 Season', '2 Season', '3 Season']
_, _, texts = plt.pie(df.duration.value_counts()[:3], labels=labels, autopct='%1.2f%%', startangle=90, 
                      explode=(0.0, 0.1, 0.2), colors=['#003f5c', '#bc5090', '#ffa600'])
plt.axis('equal')
plt.title('Seasons Available on Netflix', fontsize=20, fontweight='bold');
for text in texts:
    text.set_color('white')

**Initial Findings On Netflix Dataset**
---
1. Maximum of content added on netflix platform belongs to  **TV Show type**. 

2. Most polular director on netflix platform is the **Alastair Fothergill**.
3. We can see that **form 2013 to 2020 content added** on netflix platform is **increased For TV shows** but somehow, there is **decrease in volume of content of movies from 2017**. It can be a start for changed watching behaviour of user.

4. Most of the content is listed in  **International TV Shows**. and published duration of 1 Season.
5. Netflix **have maximum content of TV-MA Rating that is followed by TV-14 For both types** i.e. For Movies as well as TV series.
6. Interesting is , **content Rating R fall on number 3 for Movies but there are very less TV shows Published with Rating R.**There are very less TV series with content rating R.

Thank you For Your Valuable Time.
---
If You have come this far, then Please Upfork  the NoteBook.
---