In [1]:
# importing libraries
import pandas as pd
import numpy as np
import plotly.express as ex

In [2]:
# reading dataset 
data=pd.read_csv('/home/parth/Desktop/netflix_titles.csv')
data.head(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
5,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...


In [3]:
# checking null values in dataset
data.isnull().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

In [4]:
# Checking datatypes 
data.dtypes

show_id         object
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object

### Finding oldest movie on netflix 

In [5]:
data['release_year'].min()

1925

In [6]:
data[data['release_year']==1925]
# oldest movie was from year 1925

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
4250,s4251,TV Show,Pioneers: First Women Filmmakers*,,,,"December 30, 2018",1925,TV-14,1 Season,TV Shows,This collection restores films from women who ...


#### Changing datatype of date_added column from object to datetime

In [7]:
data['date_added'] = pd.to_datetime(data['date_added'])
# data['date_added'] = pd.to_datetime(data['date_added'], format='%m/%d/%Y')

### Visualising number of Movies/TV show release over year

In [9]:
ex.histogram(data,x='release_year',labels={'release_year':'release year'},color_discrete_sequence=['turquoise'])


### Removing random rating category (74 min,84 min,66 min) and na values from rating column

In [38]:
data['rating'].unique()

array(['PG-13', 'TV-MA', 'PG', 'TV-14', 'TV-PG', 'TV-Y', 'TV-Y7', 'R',
       'TV-G', 'G', 'NC-17', '74 min', '84 min', '66 min', 'NR', nan,
       'TV-Y7-FV', 'UR'], dtype=object)

In [39]:
data=data.replace(['74 min', '84 min', '66 min'],np.nan)


In [40]:
data['rating'].unique()

array(['PG-13', 'TV-MA', 'PG', 'TV-14', 'TV-PG', 'TV-Y', 'TV-Y7', 'R',
       'TV-G', 'G', 'NC-17', nan, 'NR', 'TV-Y7-FV', 'UR'], dtype=object)

In [41]:
data.dropna(subset=['rating'],inplace=True)

### Counts by Ratings

In [42]:
ex.histogram(data,x='rating',labels={'rating':'ratings'},color_discrete_sequence=['turquoise'])

### Count by Country

In [43]:
country_count = data.copy()
country_count = pd.concat([country_count, data['country'].str.split(",", expand=True)], axis=1)
country_count = country_count.melt(id_vars=["type","title"], value_vars=range(12), value_name="Country")
country_count = country_count[country_count["Country"].notna()]
country_count["Country"] = country_count["Country"].str.strip()
country_count.head(5)

Unnamed: 0,type,title,variable,Country
0,Movie,Dick Johnson Is Dead,0,United States
1,TV Show,Blood & Water,0,South Africa
4,TV Show,Kota Factory,0,India
7,Movie,Sankofa,0,United States
8,TV Show,The Great British Baking Show,0,United Kingdom


In [44]:
country_count['Country'].nunique()

123

In [45]:
country_count.Country.value_counts().head(5)

United States     3687
India             1046
United Kingdom     806
Canada             445
France             393
Name: Country, dtype: int64

#### Count by Movie and TV Show

In [46]:
ex.histogram(country_count,x='Country',color='type',color_discrete_sequence=ex.colors.sequential.Burg).update_xaxes(
        categoryorder="total descending",range=(0, 30))

In [47]:
country_count.type.value_counts().head(5)

Movie      7375
TV Show    2638
Name: type, dtype: int64

In [48]:
ex.histogram(data,x="type",color_discrete_sequence=['turquoise'])

### Cast Count

In [49]:
cast_count = data.copy()
cast_count = pd.concat([cast_count, data['cast'].str.split(",", expand=True)], axis=1)
cast_count = cast_count.melt(id_vars=["type","title"], value_vars=range(44), value_name="Cast_name")
cast_count = cast_count[cast_count["Cast_name"].notna()]
cast_count["Cast_name"] = cast_count["Cast_name"].str.strip()
cast_count.head(5)

Unnamed: 0,type,title,variable,Cast_name
1,TV Show,Blood & Water,0,Ama Qamata
2,TV Show,Ganglands,0,Sami Bouajila
4,TV Show,Kota Factory,0,Mayur More
5,TV Show,Midnight Mass,0,Kate Siegel
6,Movie,My Little Pony: A New Generation,0,Vanessa Hudgens


In [50]:
cast_count.Cast_name.value_counts().head(5)

Anupam Kher         43
Shah Rukh Khan      35
Julie Tejwani       33
Naseeruddin Shah    32
Takahiro Sakurai    32
Name: Cast_name, dtype: int64

#### Visualizing count by Cast name 

In [51]:
ex.histogram(cast_count,x="Cast_name",
             color=cast_count['type'],color_discrete_sequence=ex.colors.sequential.Burg).update_xaxes(categoryorder="total descending",range=(0, 30))

### Genre Count

In [52]:
genre_count=data.copy()
genre_count = pd.concat([genre_count, data['listed_in'].str.split(",", expand=True)], axis=1)
genre_count=genre_count.melt(id_vars=["type","title"],value_vars=range(12),value_name="Genre")
genre_count=genre_count[genre_count["Genre"].notna()]
genre_count["Genre"]=genre_count["Genre"].str.strip()



Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike



Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike



In [53]:
genre_count.head(5)

Unnamed: 0,type,title,variable,Genre
0,Movie,Dick Johnson Is Dead,0,Documentaries
1,TV Show,Blood & Water,0,International TV Shows
2,TV Show,Ganglands,0,Crime TV Shows
3,TV Show,Jailbirds New Orleans,0,Docuseries
4,TV Show,Kota Factory,0,International TV Shows


In [54]:
genre_count['Genre'].nunique()

42

#### Visualizing count by different geners available on Netflix

In [55]:
ex.histogram(genre_count,x="Genre",
             color_discrete_sequence=ex.colors.sequential.Burg).update_xaxes(categoryorder="total descending",range=(0, 30))

### Visualizing Trend over years by Movie and TV show

In [56]:
# Trend over years by Movie and TV show
ex.histogram(data,x=data['date_added'].dt.year,
             color=data['type'],
             color_discrete_sequence=ex.colors.sequential.Burg,
            title="Netflix number of Movies show and TV shows over year")

### Findings

#### About Netflix and its Content

- There are more no of Movies ie 7375 than TV Show ie 2638 on Netflix.
- From the year 2016 Netflix boosted its content addition on its Platform.
- In the year 2019 Netflix added 1424 Movies and 592 TV Shows which is most then rest of the years.
- United States produce most number of content ie 3687 titles followed by India 1046 titles and United Kingdom 806 titles
- Most people on Netflix likes to watch International movies, Dramas and Comedies these are the most preferred genres on Netflix.
- Anupam Kher is the actor with the highest number of titles ie 43 Movies. Anupam Kher is an Indian actor, director, and producer who haa appeared in over 500 Movies.
- Oldest content on Netflix is Pioneers: First Women Filmmakers, It's a collection of restored films from 1925.
- Netflix has more than 40 different type of genres available on its platform.
