# **About Dataset**

This Netflix Dataset has information about the TV Shows and Movies available on Netflix.

It provides various metadata such as the type of content, cast, genres, country of origin, release details, and more. This dataset can be useful for content analysis, recommendation system development, or trend studies.

This dataset is collected from Flixable which is a third-party Netflix search engine.

# **Using this dataset, we answered multiple questions with Python in our Project.**

Q. 1) For 'House of Cards', what is the Show Id and Who is the Director of this show ?

Q. 2) In which year the highest number of the TV Shows & Movies were released ? Show with Bar Graph.

Q. 3) How many Movies & TV Shows are in the dataset ? Show with Bar Graph.

Q. 4) Show all the Movies that were released in year 2000.

Q. 5) Show only the Titles of all TV Shows that were released in India only.

Q. 6) Show Top 10 Directors, who gave the highest number of TV Shows & Movies to Netflix ?

Q. 7) Show all the Records, where "Category is Movie and Type is Comedies" or "Country is United Kingdom".

Q. 8) In how many movies/shows, Tom Cruise was cast ?

Q. 9) What are the different Ratings defined by Netflix ?

Q. 9.1) How many Movies got the 'TV-14' rating, in Canada ?

Q. 9.2) How many TV Shows got the 'R' rating, after year 2018 ?

Q. 10) What is the maximum duration of a Movie/Show on Netflix ?

Q. 11) Which individual country has the Highest No. of TV Shows ?

Q. 12) How can we sort the dataset by Year ?

Q. 13) Find all the instances where: Category is 'Movie' and Type is 'Dramas' or Category is 'TV Show' & Type is 'Kids' TV'.

# **1- Import Required Libraries**

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline


# **2- Load the Dataset**

In [None]:
# import dataset
df = pd.read_csv('Netflix Dataset.csv')

In [None]:
# To show all columns
pd.set_option('display.max_columns', None)


# **3- Basic Overview of the Dataset**

In [None]:
# show first 5 rows
df.head()

In [None]:
# last 5 rows
df.tail()

In [None]:
# show columns name
df.columns

In [47]:
# Shape of dataset
df.shape

(7789, 11)

In [55]:
# Show the size of element
df.size

85679

In [57]:
# Show the data types of data
df.dtypes

Show_Id         object
Category        object
Title           object
Director        object
Cast            object
Country         object
Release_Date    object
Rating          object
Duration        object
Type            object
Description     object
dtype: object

# **4- Check Dataset Information (Data Types & Null Values)**

In [51]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7789 entries, 0 to 7788
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Show_Id       7789 non-null   object
 1   Category      7789 non-null   object
 2   Title         7789 non-null   object
 3   Director      5401 non-null   object
 4   Cast          7071 non-null   object
 5   Country       7282 non-null   object
 6   Release_Date  7779 non-null   object
 7   Rating        7782 non-null   object
 8   Duration      7789 non-null   object
 9   Type          7789 non-null   object
 10  Description   7789 non-null   object
dtypes: object(11)
memory usage: 669.5+ KB


In [52]:
df.isnull().sum()

Show_Id            0
Category           0
Title              0
Director        2388
Cast             718
Country          507
Release_Date      10
Rating             7
Duration           0
Type               0
Description        0
dtype: int64

# **5- Basic Statistical Summary**

In [53]:
df.describe()

Unnamed: 0,Show_Id,Category,Title,Director,Cast,Country,Release_Date,Rating,Duration,Type,Description
count,7789,7789,7789,5401,7071,7282,7779,7782,7789,7789,7789
unique,7787,2,7787,4050,6831,681,1565,14,216,492,7769
top,s6621,Movie,The Lost Okoroshi,"Raúl Campos, Jan Suter",David Attenborough,United States,"January 1, 2020",TV-MA,1 Season,Documentaries,A surly septuagenarian gets another chance at ...
freq,2,5379,2,18,18,2556,118,2865,1608,334,3


# **6- Handling Missing Values**

In [54]:
df.head()

Unnamed: 0,Show_Id,Category,Title,Director,Cast,Country,Release_Date,Rating,Duration,Type,Description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,07:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",PG-13,123 min,Dramas,A brilliant group of students become card-coun...


In [63]:
df.isnull().sum()

Show_Id            0
Category           0
Title              0
Director        2388
Cast             718
Country          507
Release_Date      10
Rating             7
Duration           0
Type               0
Description        0
dtype: int64

In [64]:
sns.heatmap(df.isnull().sum())

IndexError: Inconsistent shape between the condition and the input (got (11, 1) and (11,))

# **7- Check and Remove Duplicate Records**

In [61]:
df[df.duplicated()]

Unnamed: 0,Show_Id,Category,Title,Director,Cast,Country,Release_Date,Rating,Duration,Type,Description
6300,s684,Movie,Backfire,Dave Patten,"Black Deniro, Byron ""Squally"" Vinson, Dominic ...",United States,"April 5, 2019",TV-MA,97 min,"Dramas, Independent Movies, Thrillers",When two would-be robbers accidentally kill a ...
6622,s6621,Movie,The Lost Okoroshi,Abba T. Makama,"Seun Ajayi, Judith Audu, Tope Tedela, Ifu Enna...",Nigeria,"September 4, 2020",TV-MA,94 min,"Comedies, Dramas, Independent Movies",A disillusioned security guard transforms into...


In [67]:
# Even though both rows have the same Category ('Movie') and Rating ('TV-MA'),
# they are NOT duplicates because all other columns (Title, Director, Cast, Country, etc.)
# contain different values. By default, pandas considers a row duplicate only if
# ALL column values match exactly.
