## MEDIA ANALYSIS FOR NETFLIX

### INTRODUCTION

- The "netflix_titles.csv" dataset used in this analysis is an structured data, about the netflix shows/movies released between 1925 and 2021.

- The data is a csv file.

- This jupyter notebook is used to only clean and simplify the data as needed.

- This will generate a new datafile (Netflix_titles_cleaned.csv) which then be further used for tableau dashboard.

### SCOPE

- **Data filename:** netflix_titles.csv
- **Data type:** Structured data
- **Processes used:** Downloading, Preparing, Evaluation, Processing

### STEP-1: Data Import and preparation

This step involves data import, it includes import of libraries that facilitate data import for further cleansing and triming.

In [42]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

In [43]:
na_vals = [' ', 'nan', 'Nan', 'NaN','?', 'missing', 'Missing']      #the list containing all values deemed as null values

path= 'C:\\Users\\R5 EJ274t\\Desktop\\DATA-PROJECTS\\Netflix-analysis\\_dataset\\'

df = pd.read_csv(path+'netflix_titles.csv', na_values = na_vals)

In [44]:
df.shape

(8807, 12)

it has 12 columns and 8807 rows.

In [45]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

These are the column names (index)

In [46]:
df.dtypes

show_id         object
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object

These are the data types presented in each columns

### STEP-2: Data Processing/Cleaning

In [47]:
#Firstly, let's check number of null values to handle.

df.isnull().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

In [52]:
#Let's handle these null values

df.director.fillna('No Director', inplace=True)
df.cast.fillna('No Cast', inplace=True)
df.country.fillna('Country Unavailable', inplace=True)

#droping all data with no date added and no rating.
df.dropna(subset=["date_added", "rating"], inplace=True)

In [53]:
#checking for number of duplicates in data
df.duplicated().sum()

0

There are no duplicates which will affect the data.

In [73]:
df.date_added.dtype

dtype('<M8[ns]')

In [74]:
df.rating.dtype

dtype('O')

*date_added is of type object. Let's change the datatype to datatime.*

In [61]:
#converting the datetime columns to correct data type.
df['date_added']= pd.to_datetime(df['date_added'])

In [62]:
df.date_added.head(5)

0   2021-09-25
1   2021-09-24
2   2021-09-24
3   2021-09-24
4   2021-09-24
Name: date_added, dtype: datetime64[ns]

### STEP-3: Checking data for any other discrepencies

In [81]:
df.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,No Cast,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,No Director,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",Country Unavailable,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,No Director,No Cast,Country Unavailable,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,No Director,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [83]:
df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,No Cast,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,No Director,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",Country Unavailable,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,No Director,No Cast,Country Unavailable,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,No Director,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,2019-11-20,2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
8803,s8804,TV Show,Zombie Dumb,No Director,No Cast,Country Unavailable,2019-07-01,2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,2019-11-01,2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,2020-01-11,2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


### STEP-4: Data export for Visualisation

In [84]:
df.to_csv(path+'netflix_titles_processed.csv', index=False)
print('done.... 100%')

done.... 100%


### ACNOWLEDGEMENT

- **Author:** Samwaran Banerjee
- This data has been taken from: [Kaggle](https://www.kaggle.com/datasets/shivamb/netflix-shows)
- The dataset has been processed under full transparency.
- The visulization dashboard can be found here: [Dashboard](https://public.tableau.com/app/profile/samwaran/viz/NetflixDataVisualized/Dashboard2)
- Further case study on the same can be found on [my website](https://samwaran.github.io)