# Netflix and Chill

The dataset originally consists of 6234 rows × 12 columns. We have used a already present netflix dataset from Kaggle. We are trying to gain interesting insights and draw conclusions from it. We are using pandas dataframe, and matplotlib and seaborn as visual aids.

## Downloading the Dataset

##### Kaggle dataset of netlix movies and shows is downloaded using the URL for further use

Let's begin by downloading the data, and listing the files within the dataset.

In [60]:
from urllib.request import urlretrieve
urlretrieve('https://www.kaggle.com/shivamb/netflix-shows/notebooks')
import pandas as pd

Let us save and upload our work to Jovian before continuing.

In [61]:
project_name = "netflix-and-chill"

In [62]:
!pip install jovian --upgrade -q

In [63]:
import jovian

In [64]:
jovian.commit(project=project_name)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Detected Kaggle notebook...[0m
[jovian] Uploading notebook to https://jovian.ml/maktubsomya16/netflix-and-chill[0m


<IPython.core.display.Javascript object>

## Data Preparation and Cleaning

* Data is read using pandas from a csv file
* Data is explored
* Data is cleaned, all rows with any NULL values are removed
* Additional column with number of days of presence of a show or movie on netflix is added


In [65]:
#import pandas as pd
import matplotlib.pyplot as plt

In [66]:
#read data
netflix_df = pd.read_csv('../input/netflix-shows/netflix_titles.csv')
netflix_df


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China","September 9, 2019",2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...
1,80117401,Movie,Jandino: Whatever it Takes,,Jandino Asporaat,United Kingdom,"September 9, 2016",2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...
2,70234439,TV Show,Transformers Prime,,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,"September 8, 2018",2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob..."
3,80058654,TV Show,Transformers: Robots in Disguise,,"Will Friedle, Darren Criss, Constance Zimmer, ...",United States,"September 8, 2018",2016,TV-Y7,1 Season,Kids' TV,When a prison ship crash unleashes hundreds of...
4,80125979,Movie,#realityhigh,Fernando Lebrija,"Nesta Cooper, Kate Walsh, John Michael Higgins...",United States,"September 8, 2017",2017,TV-14,99 min,Comedies,When nerdy high schooler Dani finally attracts...
...,...,...,...,...,...,...,...,...,...,...,...,...
6229,80000063,TV Show,Red vs. Blue,,"Burnie Burns, Jason Saldaña, Gustavo Sorola, G...",United States,,2015,NR,13 Seasons,"TV Action & Adventure, TV Comedies, TV Sci-Fi ...","This parody of first-person shooter games, mil..."
6230,70286564,TV Show,Maron,,"Marc Maron, Judd Hirsch, Josh Brener, Nora Zeh...",United States,,2016,TV-MA,4 Seasons,TV Comedies,"Marc Maron stars as Marc Maron, who interviews..."
6231,80116008,Movie,Little Baby Bum: Nursery Rhyme Friends,,,,,2016,,60 min,Movies,Nursery rhymes and original music for children...
6232,70281022,TV Show,A Young Doctor's Notebook and Other Stories,,"Daniel Radcliffe, Jon Hamm, Adam Godley, Chris...",United Kingdom,,2013,TV-MA,2 Seasons,"British TV Shows, TV Comedies, TV Dramas","Set during the Russian Revolution, this comic ..."


In [67]:
#explore data
netflix_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6234 entries, 0 to 6233
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       6234 non-null   int64 
 1   type          6234 non-null   object
 2   title         6234 non-null   object
 3   director      4265 non-null   object
 4   cast          5664 non-null   object
 5   country       5758 non-null   object
 6   date_added    6223 non-null   object
 7   release_year  6234 non-null   int64 
 8   rating        6224 non-null   object
 9   duration      6234 non-null   object
 10  listed_in     6234 non-null   object
 11  description   6234 non-null   object
dtypes: int64(2), object(10)
memory usage: 584.6+ KB


In [68]:
netflix_df.describe()

Unnamed: 0,show_id,release_year
count,6234.0,6234.0
mean,76703680.0,2013.35932
std,10942960.0,8.81162
min,247747.0,1925.0
25%,80035800.0,2013.0
50%,80163370.0,2016.0
75%,80244890.0,2018.0
max,81235730.0,2020.0


In [69]:
netflix_df['director'].fillna('Un-important',inplace=True)
netflix_df['cast'].fillna('Un-important',inplace=True)
netflix_df['country'].fillna('Un-important',inplace=True)
netflix_df.isnull().sum()

show_id          0
type             0
title            0
director         0
cast             0
country          0
date_added      11
release_year     0
rating          10
duration         0
listed_in        0
description      0
dtype: int64

In [70]:
netflix_df.dropna(how='any', inplace=True, axis=0)
netflix_df.isnull().sum()
netflix_df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China","September 9, 2019",2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...
1,80117401,Movie,Jandino: Whatever it Takes,Un-important,Jandino Asporaat,United Kingdom,"September 9, 2016",2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...
2,70234439,TV Show,Transformers Prime,Un-important,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,"September 8, 2018",2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob..."
3,80058654,TV Show,Transformers: Robots in Disguise,Un-important,"Will Friedle, Darren Criss, Constance Zimmer, ...",United States,"September 8, 2018",2016,TV-Y7,1 Season,Kids' TV,When a prison ship crash unleashes hundreds of...
4,80125979,Movie,#realityhigh,Fernando Lebrija,"Nesta Cooper, Kate Walsh, John Michael Higgins...",United States,"September 8, 2017",2017,TV-14,99 min,Comedies,When nerdy high schooler Dani finally attracts...
...,...,...,...,...,...,...,...,...,...,...,...,...
6218,80162994,TV Show,Talking Tom and Friends,Un-important,"Colin Hanks, Tom Kenny, James Adomian, Lisa Sc...","Cyprus, Austria, Thailand","April 10, 2019",2017,TV-G,2 Seasons,"Kids' TV, TV Comedies",Full of funny one-liners and always ready for ...
6219,80186475,TV Show,Pokémon the Series,Un-important,"Sarah Natochenny, Laurie Hymes, Jessica Paquet...",Japan,"April 1, 2019",2019,TV-Y7-FV,2 Seasons,"Anime Series, Kids' TV",Ash and his Pikachu travel to the Alola region...
6220,70272742,TV Show,Justin Time,Un-important,"Gage Munroe, Scott McCord, Jenna Warren",Canada,"April 1, 2016",2012,TV-Y,2 Seasons,Kids' TV,"In Justin's dreams, he and his imaginary frien..."
6221,80067942,TV Show,Terrace House: Boys & Girls in the City,Un-important,"You, Reina Triendl, Ryota Yamasato, Yoshimi To...",Japan,"April 1, 2016",2016,TV-14,2 Seasons,"International TV Shows, Reality TV",A new set of six men and women start their liv...


In [71]:
#Adding another column with the number of days it has been present on netflix uptil now
netflix_df['date_added'] = pd.to_datetime(netflix_df.date_added)
netflix_df['date_today'] = pd.to_datetime('October 3,2020')
netflix_df['Presence on netflix']=(netflix_df['date_today']-netflix_df['date_added']).dt.days
netflix_df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,date_today,Presence on netflix
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China",2019-09-09,2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...,2020-10-03,390
1,80117401,Movie,Jandino: Whatever it Takes,Un-important,Jandino Asporaat,United Kingdom,2016-09-09,2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...,2020-10-03,1485
2,70234439,TV Show,Transformers Prime,Un-important,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,2018-09-08,2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob...",2020-10-03,756
3,80058654,TV Show,Transformers: Robots in Disguise,Un-important,"Will Friedle, Darren Criss, Constance Zimmer, ...",United States,2018-09-08,2016,TV-Y7,1 Season,Kids' TV,When a prison ship crash unleashes hundreds of...,2020-10-03,756
4,80125979,Movie,#realityhigh,Fernando Lebrija,"Nesta Cooper, Kate Walsh, John Michael Higgins...",United States,2017-09-08,2017,TV-14,99 min,Comedies,When nerdy high schooler Dani finally attracts...,2020-10-03,1121
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6218,80162994,TV Show,Talking Tom and Friends,Un-important,"Colin Hanks, Tom Kenny, James Adomian, Lisa Sc...","Cyprus, Austria, Thailand",2019-04-10,2017,TV-G,2 Seasons,"Kids' TV, TV Comedies",Full of funny one-liners and always ready for ...,2020-10-03,542
6219,80186475,TV Show,Pokémon the Series,Un-important,"Sarah Natochenny, Laurie Hymes, Jessica Paquet...",Japan,2019-04-01,2019,TV-Y7-FV,2 Seasons,"Anime Series, Kids' TV",Ash and his Pikachu travel to the Alola region...,2020-10-03,551
6220,70272742,TV Show,Justin Time,Un-important,"Gage Munroe, Scott McCord, Jenna Warren",Canada,2016-04-01,2012,TV-Y,2 Seasons,Kids' TV,"In Justin's dreams, he and his imaginary frien...",2020-10-03,1646
6221,80067942,TV Show,Terrace House: Boys & Girls in the City,Un-important,"You, Reina Triendl, Ryota Yamasato, Yoshimi To...",Japan,2016-04-01,2016,TV-14,2 Seasons,"International TV Shows, Reality TV",A new set of six men and women start their liv...,2020-10-03,1646


In [72]:
import jovian

In [73]:
jovian.commit(project=project_name)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Detected Kaggle notebook...[0m
[jovian] Uploading notebook to https://jovian.ml/maktubsomya16/netflix-and-chill[0m


<IPython.core.display.Javascript object>

## Exploratory Analysis and Visualization

Let us gain some interesting insights by visualisation



Let's begin by importing`matplotlib.pyplot` and `seaborn`.

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 10)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

In [None]:
#computation of numerical data
netflix_df.describe()

***Insights*** - The most number of tv shows and movies were released in 2018 according to the dataset provided.
Shown by histogram

In [None]:
plt.figure(figsize=(12,15))
plt.title("Release data")
plt.hist(netflix_df['release_year'],bins=100);
plt.xlabel('Year of release');
plt.ylabel('No. of releases');

***Insights*** - A huge fraction of Netflix contains movies, and comparitively very low count of tv shows. Represented by pie chart.

In [None]:
types=netflix_df['type'].value_counts().index
count=netflix_df['type'].value_counts()
plt.pie(count,labels=types,radius=1)
plt.show()

***Insights*** - We can say that most content of netflix are Documentaries. Shown by line chart.

In [None]:
netflix_df.groupby('listed_in').count()
netflix_df['listed_in'].value_counts().plot()
plt.xlabel('Genres')
plt.ylabel('Count')
plt.title("Genres vs Count")
plt.xticks(rotation=90);

***Insights*** - Maximum content on netflix is TV-MA (suitabale for Mature-Adult) rated. Line plot representation.

In [None]:
plt.figure(figsize=(15,10))
plt.plot(netflix_df.groupby('rating').count());
plt.xlabel('Rating');

***Insights*** - Most content on netflix is TV_MA rated, and least NC-17. Represented by bar plot

In [None]:
netflix_df['rating'].value_counts().plot(kind='bar');
plt.xlabel('Rating');
plt.ylabel('No. of shows or movies');

Let us save and upload our work to Jovian before continuing

In [None]:
import jovian

In [None]:
jovian.commit(project=project_name)

## Asking and Answering Questions

Our notebook answers multiple questions and gives some interesting information about the tv shows and movies present on netflix.

#### Q1: What year had the maximum number of releases?

In [None]:
plt.figure(figsize=(12,15))
plt.title("Release data")
plt.hist(netflix_df['release_year'],bins=100);
plt.xlabel('Year of release');
plt.ylabel('No. of releases');

With each bin representing a year, 2018 had the maximum number of releases.

#### Q2: Which is the oldest aired show/movie on netflix?

In [None]:
oldest = netflix_df['Presence on netflix'].sort_values(ascending=False).head(1).index

In [None]:
print("""Type: {}
      Title: {}
      Director: {}
      Cast: {}
      Description: {}
      Genre: {}""".format(netflix_df['type'][oldest] , netflix_df['title'][oldest] , netflix_df['director'][oldest] , netflix_df['cast'][oldest] , netflix_df['description'][oldest] , netflix_df['listed_in'][oldest]))

To and From New York is the oldest aired movie on netflix. And its details are given above too.

#### Q3: Which genre is netflix ruled by?

In [None]:
netflix_df.groupby('listed_in').count()
netflix_df['listed_in'].value_counts().plot()
plt.xlabel('Genres')
plt.ylabel('Count')
plt.title("Genres vs Count")
plt.xticks(rotation=90);

Netflix is mostly ruled by documentaries.

#### Q4: What maturity is most suitable to watch netflix?

In [None]:
types=netflix_df['rating'].value_counts().index
count=netflix_df['rating'].value_counts()
plt.pie(count,labels=types,radius=1)
plt.show()

Netflix is most suitable to be watched by mature adults.

#### Q5: Compare the count of movies and tv shows present on netflix.

In [None]:
plt.figure(figsize=(5,8))
plt.hist(netflix_df['type']);

There are approximately 4250 movies on netflix, which is quite large compared to the approx 1900 tv shows.

Let us save and upload our work to Jovian before continuing.

In [None]:
import jovian

In [None]:
jovian.commit(project=project_name)

## Inferences and Conclusion

We have worked on the data consisting of tv shows and movies on netflix, and gained some valuable insights by pandas dataframe operations and visualisations aided by matplotlib and seaborn.

In [None]:
import jovian

In [None]:
jovian.commit(project=project_name)

## References and Future Work

**Future work** - We plan to add more datasets to the present one such that we have user rating columns and frequency of watch, and then draw more meaningful analysis from it.

**Links to useful resources** - 
* https://pandas.pydata.org/docs/user_guide/index.html#user-guide
* https://matplotlib.org/users/index.html
* https://seaborn.pydata.org/introduction.html#:~:text=Seaborn%20is%20a%20library%20for%20making%20statistical%20graphics%20in%20Python.&text=Its%20plotting%20functions%20operate%20on,aggregation%20to%20produce%20informative%20plots.

In [None]:
import jovian

In [None]:
jovian.commit(project=project_name)