## Data Analysis of Netflix Movies and TV Shows

1. Understanding what content is available in different countries.
2. Identifying similar content by matching text-based features.
3. Network Analysis of Actors/Directors and find interesting insights.
4. Does Netflix have an increasing focus on TV rather than movies in recent years?

In [12]:
import pandas as pd
pd.__version__

'0.20.1'

### Read Data

In [2]:
netflix = pd.read_csv('netflix_titles.csv')

In [6]:
# Inspecting the first five rows
netflix.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China","September 9, 2019",2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...
1,80117401,Movie,Jandino: Whatever it Takes,,Jandino Asporaat,United Kingdom,"September 9, 2016",2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...
2,70234439,TV Show,Transformers Prime,,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,"September 8, 2018",2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob..."
3,80058654,TV Show,Transformers: Robots in Disguise,,"Will Friedle, Darren Criss, Constance Zimmer, ...",United States,"September 8, 2018",2016,TV-Y7,1 Season,Kids' TV,When a prison ship crash unleashes hundreds of...
4,80125979,Movie,#realityhigh,Fernando Lebrija,"Nesta Cooper, Kate Walsh, John Michael Higgins...",United States,"September 8, 2017",2017,TV-14,99 min,Comedies,When nerdy high schooler Dani finally attracts...


In [13]:
# rows, columns
netflix.shape

(6234, 12)

In [5]:
netflix.dtypes

show_id          int64
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object

In [10]:
netflix.isnull().sum()

show_id            0
type               0
title              0
director        1969
cast             570
country          476
date_added        11
release_year       0
rating            10
duration           0
listed_in          0
description        0
dtype: int64

If we drop rows that have **any** missing values, we'll have this many rows remaining

In [16]:
netflix.dropna(how='any').shape

(3774, 12)

3,774 rows of the original 6,234 means we only have what % of the original data?

In [18]:
3774 / 6234 * 100

60.53897978825794

In [19]:
netflix.type.describe()

count      6234
unique        2
top       Movie
freq       4265
Name: type, dtype: object

In [20]:
netflix.type.value_counts()

Movie      4265
TV Show    1969
Name: type, dtype: int64

In [21]:
netflix.type.value_counts(normalize=True)

Movie      0.684151
TV Show    0.315849
Name: type, dtype: float64

In [22]:
pd.crosstab(netflix.type, netflix.country)

country,Argentina,"Argentina, Brazil, France, Poland, Germany, Denmark","Argentina, Chile","Argentina, Chile, Peru","Argentina, France","Argentina, France, Germany","Argentina, Italy","Argentina, Spain","Argentina, United States","Argentina, United States, Mexico",...,"United States, United Kingdom, Spain, South Korea","United States, Uruguay","United States, Venezuela",Uruguay,"Uruguay, Argentina, Spain","Uruguay, Spain, Mexico",Venezuela,"Venezuela, Colombia",Vietnam,West Germany
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Movie,26,1,1,1,1,1,2,6,1,0,...,1,1,1,2,1,1,1,1,4,1
TV Show,12,0,0,0,0,0,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0


Country column needs to be cleaned up so that only one country is listed in a field. Perhaps the use of a category type?