# Netflix Data Analysis

## Introduction
In this project, we analyze a Netflix dataset to draw insights regarding viewing trends, content preferences, and other metrics. The analysis was performed using Python and popular libraries such as `pandas`.

## Requirements
To run this project, you need the following:
- Python
- pandas 2.2.2

In [47]:
# Importing necessary libraries

import pandas as pd

### Checking the `Pandas` Version


In [48]:
pd.__version__

'2.2.2'

In [49]:
# Loading the dataset

data = pd.read_csv("netflix1.csv")

In [50]:
# Previewing the Data

data.head()

Unnamed: 0,show_id,type,title,director,country,date_added,release_year,rating,duration,listed_in
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,United States,9/25/2021,2020,PG-13,90 min,Documentaries
1,s3,TV Show,Ganglands,Julien Leclercq,France,9/24/2021,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act..."
2,s6,TV Show,Midnight Mass,Mike Flanagan,United States,9/24/2021,2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries"
3,s14,Movie,Confessions of an Invisible Girl,Bruno Garotti,Brazil,9/22/2021,2021,TV-PG,91 min,"Children & Family Movies, Comedies"
4,s8,Movie,Sankofa,Haile Gerima,United States,9/24/2021,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies"


In [51]:
# Understanding the Data Structure with `info()`

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8790 entries, 0 to 8789
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8790 non-null   object
 1   type          8790 non-null   object
 2   title         8790 non-null   object
 3   director      8790 non-null   object
 4   country       8790 non-null   object
 5   date_added    8790 non-null   object
 6   release_year  8790 non-null   int64 
 7   rating        8790 non-null   object
 8   duration      8790 non-null   object
 9   listed_in     8790 non-null   object
dtypes: int64(1), object(9)
memory usage: 686.8+ KB


In [52]:
"""Deleting redundant columns.
Renaming the columns.
Droppping the duplicates.
Remove the NaN values from the dataset
Check for some more Transformations"""

'Deleting redundant columns.\nRenaming the columns.\nDroppping the duplicates.\nRemove the NaN values from the dataset\nCheck for some more Transformations'

### Deleting redundant columns

In [53]:
data.head(1)

Unnamed: 0,show_id,type,title,director,country,date_added,release_year,rating,duration,listed_in
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,United States,9/25/2021,2020,PG-13,90 min,Documentaries


In [54]:
data.columns  # Identifying Redundant Columns

Index(['show_id', 'type', 'title', 'director', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in'],
      dtype='object')

In [55]:
data.drop(columns = "rating",inplace = True)  # Dropping the 'rating' column

In [56]:
data.head()

Unnamed: 0,show_id,type,title,director,country,date_added,release_year,duration,listed_in
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,United States,9/25/2021,2020,90 min,Documentaries
1,s3,TV Show,Ganglands,Julien Leclercq,France,9/24/2021,2021,1 Season,"Crime TV Shows, International TV Shows, TV Act..."
2,s6,TV Show,Midnight Mass,Mike Flanagan,United States,9/24/2021,2021,1 Season,"TV Dramas, TV Horror, TV Mysteries"
3,s14,Movie,Confessions of an Invisible Girl,Bruno Garotti,Brazil,9/22/2021,2021,91 min,"Children & Family Movies, Comedies"
4,s8,Movie,Sankofa,Haile Gerima,United States,9/24/2021,1993,125 min,"Dramas, Independent Movies, International Movies"


In [57]:
data.columns

Index(['show_id', 'type', 'title', 'director', 'country', 'date_added',
       'release_year', 'duration', 'listed_in'],
      dtype='object')

In [58]:
"""Deleting redundant columns.
Renaming the columns.
Droppping the duplicates.
Remove the NaN values from the dataset
Check for some more Transformations"""

'Deleting redundant columns.\nRenaming the columns.\nDroppping the duplicates.\nRemove the NaN values from the dataset\nCheck for some more Transformations'

### Renaming the columns

In [59]:
data.columns

Index(['show_id', 'type', 'title', 'director', 'country', 'date_added',
       'release_year', 'duration', 'listed_in'],
      dtype='object')

##### Capitalizing Column Names

In [60]:
new_column_names = []

for i in data.columns:
    new_column_names.append(i.capitalize())

In [61]:
new_column_names

['Show_id',
 'Type',
 'Title',
 'Director',
 'Country',
 'Date_added',
 'Release_year',
 'Duration',
 'Listed_in']

In [62]:
data.columns = new_column_names

In [63]:
data.head()

Unnamed: 0,Show_id,Type,Title,Director,Country,Date_added,Release_year,Duration,Listed_in
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,United States,9/25/2021,2020,90 min,Documentaries
1,s3,TV Show,Ganglands,Julien Leclercq,France,9/24/2021,2021,1 Season,"Crime TV Shows, International TV Shows, TV Act..."
2,s6,TV Show,Midnight Mass,Mike Flanagan,United States,9/24/2021,2021,1 Season,"TV Dramas, TV Horror, TV Mysteries"
3,s14,Movie,Confessions of an Invisible Girl,Bruno Garotti,Brazil,9/22/2021,2021,91 min,"Children & Family Movies, Comedies"
4,s8,Movie,Sankofa,Haile Gerima,United States,9/24/2021,1993,125 min,"Dramas, Independent Movies, International Movies"


In [64]:
data.columns

Index(['Show_id', 'Type', 'Title', 'Director', 'Country', 'Date_added',
       'Release_year', 'Duration', 'Listed_in'],
      dtype='object')

In [65]:
"""Deleting redundant columns.
Renaming the columns.
Droppping the duplicates.
Remove the NaN values from the dataset
Check for some more Transformations"""

'Deleting redundant columns.\nRenaming the columns.\nDroppping the duplicates.\nRemove the NaN values from the dataset\nCheck for some more Transformations'

### Droppping the duplicates

In [66]:
data.duplicated()  # Identifying Duplicate Rows

0       False
1       False
2       False
3       False
4       False
        ...  
8785    False
8786    False
8787    False
8788    False
8789    False
Length: 8790, dtype: bool

In [67]:
data.duplicated().sum()  #Counting duplicate rows

np.int64(0)

In [68]:
"""Deleting redundant columns.
Renaming the columns.
Droppping the duplicates.
Remove the NaN values from the dataset
Check for some more Transformations"""

'Deleting redundant columns.\nRenaming the columns.\nDroppping the duplicates.\nRemove the NaN values from the dataset\nCheck for some more Transformations'

### Remove the NaN values from the dataset

In [69]:
data.isna()  # Identifying Missing Values

Unnamed: 0,Show_id,Type,Title,Director,Country,Date_added,Release_year,Duration,Listed_in
0,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...
8785,False,False,False,False,False,False,False,False,False
8786,False,False,False,False,False,False,False,False,False
8787,False,False,False,False,False,False,False,False,False
8788,False,False,False,False,False,False,False,False,False


In [70]:
data.isna().sum()  # Counting missing values in each column

Show_id         0
Type            0
Title           0
Director        0
Country         0
Date_added      0
Release_year    0
Duration        0
Listed_in       0
dtype: int64

In [71]:
"""Deleting redundant columns.
Renaming the columns.
Droppping the duplicates.
Remove the NaN values from the dataset
Check for some more Transformations"""

'Deleting redundant columns.\nRenaming the columns.\nDroppping the duplicates.\nRemove the NaN values from the dataset\nCheck for some more Transformations'

### Check for some more transformations

In [72]:
data.head()

Unnamed: 0,Show_id,Type,Title,Director,Country,Date_added,Release_year,Duration,Listed_in
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,United States,9/25/2021,2020,90 min,Documentaries
1,s3,TV Show,Ganglands,Julien Leclercq,France,9/24/2021,2021,1 Season,"Crime TV Shows, International TV Shows, TV Act..."
2,s6,TV Show,Midnight Mass,Mike Flanagan,United States,9/24/2021,2021,1 Season,"TV Dramas, TV Horror, TV Mysteries"
3,s14,Movie,Confessions of an Invisible Girl,Bruno Garotti,Brazil,9/22/2021,2021,91 min,"Children & Family Movies, Comedies"
4,s8,Movie,Sankofa,Haile Gerima,United States,9/24/2021,1993,125 min,"Dramas, Independent Movies, International Movies"


In [73]:
data["Show_id"].unique()  # Correcting Inconsistent Show IDs

array(['s1', 's3', 's6', ..., 's8801', 's8784', 's8786'], dtype=object)

In [74]:
data["Show_id"] = data["Show_id"].apply(lambda x: x.split("s")[1])

In [75]:
data

Unnamed: 0,Show_id,Type,Title,Director,Country,Date_added,Release_year,Duration,Listed_in
0,1,Movie,Dick Johnson Is Dead,Kirsten Johnson,United States,9/25/2021,2020,90 min,Documentaries
1,3,TV Show,Ganglands,Julien Leclercq,France,9/24/2021,2021,1 Season,"Crime TV Shows, International TV Shows, TV Act..."
2,6,TV Show,Midnight Mass,Mike Flanagan,United States,9/24/2021,2021,1 Season,"TV Dramas, TV Horror, TV Mysteries"
3,14,Movie,Confessions of an Invisible Girl,Bruno Garotti,Brazil,9/22/2021,2021,91 min,"Children & Family Movies, Comedies"
4,8,Movie,Sankofa,Haile Gerima,United States,9/24/2021,1993,125 min,"Dramas, Independent Movies, International Movies"
...,...,...,...,...,...,...,...,...,...
8785,8797,TV Show,Yunus Emre,Not Given,Turkey,1/17/2017,2016,2 Seasons,"International TV Shows, TV Dramas"
8786,8798,TV Show,Zak Storm,Not Given,United States,9/13/2018,2016,3 Seasons,Kids' TV
8787,8801,TV Show,Zindagi Gulzar Hai,Not Given,Pakistan,12/15/2016,2012,1 Season,"International TV Shows, Romantic TV Shows, TV ..."
8788,8784,TV Show,Yoko,Not Given,Pakistan,6/23/2018,2016,1 Season,Kids' TV


In [76]:
type(data["Show_id"][0])

str

In [77]:
data["Show_id"] = data["Show_id"].astype(int)  # Converting "Show id" to Integer

In [78]:
data

Unnamed: 0,Show_id,Type,Title,Director,Country,Date_added,Release_year,Duration,Listed_in
0,1,Movie,Dick Johnson Is Dead,Kirsten Johnson,United States,9/25/2021,2020,90 min,Documentaries
1,3,TV Show,Ganglands,Julien Leclercq,France,9/24/2021,2021,1 Season,"Crime TV Shows, International TV Shows, TV Act..."
2,6,TV Show,Midnight Mass,Mike Flanagan,United States,9/24/2021,2021,1 Season,"TV Dramas, TV Horror, TV Mysteries"
3,14,Movie,Confessions of an Invisible Girl,Bruno Garotti,Brazil,9/22/2021,2021,91 min,"Children & Family Movies, Comedies"
4,8,Movie,Sankofa,Haile Gerima,United States,9/24/2021,1993,125 min,"Dramas, Independent Movies, International Movies"
...,...,...,...,...,...,...,...,...,...
8785,8797,TV Show,Yunus Emre,Not Given,Turkey,1/17/2017,2016,2 Seasons,"International TV Shows, TV Dramas"
8786,8798,TV Show,Zak Storm,Not Given,United States,9/13/2018,2016,3 Seasons,Kids' TV
8787,8801,TV Show,Zindagi Gulzar Hai,Not Given,Pakistan,12/15/2016,2012,1 Season,"International TV Shows, Romantic TV Shows, TV ..."
8788,8784,TV Show,Yoko,Not Given,Pakistan,6/23/2018,2016,1 Season,Kids' TV


In [79]:
data["Date_added"] = data["Date_added"].apply(lambda x: x.replace("/","-"))  # Replacing forward slashes with hyphens for dates

In [80]:
data

Unnamed: 0,Show_id,Type,Title,Director,Country,Date_added,Release_year,Duration,Listed_in
0,1,Movie,Dick Johnson Is Dead,Kirsten Johnson,United States,9-25-2021,2020,90 min,Documentaries
1,3,TV Show,Ganglands,Julien Leclercq,France,9-24-2021,2021,1 Season,"Crime TV Shows, International TV Shows, TV Act..."
2,6,TV Show,Midnight Mass,Mike Flanagan,United States,9-24-2021,2021,1 Season,"TV Dramas, TV Horror, TV Mysteries"
3,14,Movie,Confessions of an Invisible Girl,Bruno Garotti,Brazil,9-22-2021,2021,91 min,"Children & Family Movies, Comedies"
4,8,Movie,Sankofa,Haile Gerima,United States,9-24-2021,1993,125 min,"Dramas, Independent Movies, International Movies"
...,...,...,...,...,...,...,...,...,...
8785,8797,TV Show,Yunus Emre,Not Given,Turkey,1-17-2017,2016,2 Seasons,"International TV Shows, TV Dramas"
8786,8798,TV Show,Zak Storm,Not Given,United States,9-13-2018,2016,3 Seasons,Kids' TV
8787,8801,TV Show,Zindagi Gulzar Hai,Not Given,Pakistan,12-15-2016,2012,1 Season,"International TV Shows, Romantic TV Shows, TV ..."
8788,8784,TV Show,Yoko,Not Given,Pakistan,6-23-2018,2016,1 Season,Kids' TV


In [81]:
# Count country occurrences

df = data["Country"].value_counts().reset_index()

In [82]:
df[df["Country"] == "India"]

Unnamed: 0,Country,count
1,India,1057


In [83]:
data

Unnamed: 0,Show_id,Type,Title,Director,Country,Date_added,Release_year,Duration,Listed_in
0,1,Movie,Dick Johnson Is Dead,Kirsten Johnson,United States,9-25-2021,2020,90 min,Documentaries
1,3,TV Show,Ganglands,Julien Leclercq,France,9-24-2021,2021,1 Season,"Crime TV Shows, International TV Shows, TV Act..."
2,6,TV Show,Midnight Mass,Mike Flanagan,United States,9-24-2021,2021,1 Season,"TV Dramas, TV Horror, TV Mysteries"
3,14,Movie,Confessions of an Invisible Girl,Bruno Garotti,Brazil,9-22-2021,2021,91 min,"Children & Family Movies, Comedies"
4,8,Movie,Sankofa,Haile Gerima,United States,9-24-2021,1993,125 min,"Dramas, Independent Movies, International Movies"
...,...,...,...,...,...,...,...,...,...
8785,8797,TV Show,Yunus Emre,Not Given,Turkey,1-17-2017,2016,2 Seasons,"International TV Shows, TV Dramas"
8786,8798,TV Show,Zak Storm,Not Given,United States,9-13-2018,2016,3 Seasons,Kids' TV
8787,8801,TV Show,Zindagi Gulzar Hai,Not Given,Pakistan,12-15-2016,2012,1 Season,"International TV Shows, Romantic TV Shows, TV ..."
8788,8784,TV Show,Yoko,Not Given,Pakistan,6-23-2018,2016,1 Season,Kids' TV


In [84]:
# Save data to a CSV file

data.to_csv("Cleaned_Netflix_Data_CSV.csv", index = False)