# Netflix - Movies and Shows Analysis

### Objectives

Extract insights from the data

### Dataset summary

##### **Main Dataset**: [Netflix Movies and TV Shows](https://www.kaggle.com/shivamb/netflix-shows)
    
TV Shows and Movies listed on Netflix
This dataset consists of tv shows and movies available on Netflix as of 2019. The dataset is collected from Flixable which is a third-party Netflix search engine.

In 2018, they released an interesting report which shows that the number of TV shows on Netflix has nearly tripled since 2010. The streaming service’s number of movies has decreased by more than 2,000 titles since 2010, while its number of TV shows has nearly tripled. It will be interesting to explore what all other insights can be obtained from the same dataset.

12 columns:
- show_id (string): unique identifier for each movie
- type (string): TV Show or Movie
- title (string): title of the record
- director (string): director of the record
- cast (string): principal actors
- country (string): country where was filmed
- date_added (date): date added to netflix
- release_year (int): year of release
- rating (string): rating audience
- duration (string): duration of the record
- listed_in (string): tags of description
- description (string): description of the record



[//]: # "Cell intentionally left in blank"



In [2]:
# import basic packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv("datasets/netflix_titles.csv")
df.shape

(7787, 12)

In [5]:
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...


## Normalizing data

To avoid silly mistakes, I'm gonna normalize and standarize the actual data. The first step is convert each column to their respective data type. Because if we see the differents dtypes around the dataset, we can notice that are all object type except the year column. The idea is transform dates into date type

In [9]:
df.dtypes

show_id         object
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object

In [24]:
df["date_added"] = pd.to_datetime(df.date_added)
df.dtypes

show_id                 object
type                    object
title                   object
director                object
cast                    object
country                 object
date_added      datetime64[ns]
release_year             int64
rating                  object
duration                object
listed_in               object
description             object
dtype: object

Now let's check the missing data, which has NaN values

In [28]:
df.isna().any()

show_id         False
type            False
title           False
director         True
cast             True
country          True
date_added       True
release_year    False
rating           True
duration        False
listed_in       False
description     False
dtype: bool

### Getting dummies

There are a lot of categorical variables, so to handle them, get the dummy data from them is a good approach

In [34]:
df[np.logical_and(df["cast"].isna(), df["director"].isna())]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
26,s27,TV Show,(Un)Well,,,United States,2020-08-12,2020,TV-MA,1 Season,Reality TV,This docuseries takes a deep dive into the luc...
52,s53,Movie,100 Days Of Solitude,,,Spain,2018-07-06,2018,TV-MA,93 min,"Documentaries, International Movies",Spanish photographer José Díaz spends 100 days...
130,s131,TV Show,60 Days In,,,United States,2020-11-01,2019,TV-MA,1 Season,Reality TV,"Recruited by a sheriff, volunteers infiltrate ..."
134,s135,TV Show,7 Days Out,,,United States,2018-12-21,2018,TV-PG,1 Season,Docuseries,Witness the excitement and drama behind the sc...
137,s138,TV Show,72 Cutest Animals,,,Australia,2016-06-01,2016,TV-PG,1 Season,"Docuseries, International TV Shows, Science & ...",This series examines the nature of cuteness an...
...,...,...,...,...,...,...,...,...,...,...,...,...
7653,s7654,TV Show,Women Behind Bars,,,United States,2016-11-01,2010,TV-14,3 Seasons,"Crime TV Shows, Docuseries",This reality series recounts true stories of w...
7661,s7662,TV Show,Word Party Songs,,,United States,2020-08-07,2020,TV-Y,1 Season,Kids' TV,"Sing along and dance with Bailey, Franny, Kip,..."
7670,s7671,TV Show,World's Most Wanted,,,United States,2020-08-05,2020,TV-14,1 Season,"Crime TV Shows, Docuseries, International TV S...","Suspected of heinous crimes, they’ve avoided c..."
7766,s7767,TV Show,Zig & Sharko,,,France,2017-12-01,2016,TV-Y7,1 Season,"Kids' TV, TV Comedies","Zig, an island-bound hyena, will do anything t..."
