# **Project: Investigating Netflix Movies**

![Netflix](datasets/netflix_image.jpg)

**Netflix**! What started in 1997 as a DVD rental service has since exploded into one of the largest entertainment and media companies.

Given the large number of movies and series available on the platform, it is a perfect opportunity to flex your exploratory data analysis skills and dive into the entertainment industry.

You work for a production company that specializes in nostalgic styles. You want to do some research on movies released in the 1990's. You'll delve into Netflix data and perform exploratory data analysis to better understand this awesome movie decade!

You have been supplied with the dataset `netflix_data.csv`, along with the following table detailing the column names and descriptions. Feel free to experiment further after submitting!

## The data
### **netflix_data.csv**
| Column | Description |
|--------|-------------|
| `show_id` | The ID of the show |
| `type` | Type of show |
| `title` | Title of the show |
| `director` | Director of the show |
| `cast` | Cast of the show |
| `country` | Country of origin |
| `date_added` | Date added to Netflix |
| `release_year` | Year of Netflix release |
| `duration` | Duration of the show in minutes |
| `description` | Description of the show |
| `genre` | Show genre |

## TASK: Perform exploratory data analysis on the netflix_data.csv data to understand more about movies from the 1990s decade.

What was the most frequent movie duration in the 1990s? Save an approximate answer as an integer called duration (use 1990 as the decade's start year).

A movie is considered short if it is less than 90 minutes. Count the number of short action movies released in the 1990s and save this integer as short_movie_count.

Feel free to experiment after submitting the project!

In [5]:
# Loading in required libraries

import os
import pandas as pd
import seaborn as sns
import numpy as np

# Reading in the Nobel Prize data
# build read_csv function
folder_name = 'datasets'
dir = r'C:\Users\mcaba\OneDrive\Escritorio\Data Science\Datacamp_Projects\DataCamp_Projects\{}'.format(folder_name)

def read_csv_fun(folder_name,file_name, path):
    path = dir
    os.chdir(path)
    df = pd.read_csv('{}.csv'.format(file_name), sep=',', low_memory=False, on_bad_lines='skip')
    return df

netflix_df = read_csv_fun('datasets','netflix_data', dir)

# Taking a look at the first several winners
netflix_df.head(6)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,duration,description,genre
0,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,93,After a devastating earthquake hits Mexico Cit...,Dramas
1,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,78,"When an army recruit is found dead, his fellow...",Horror Movies
2,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,80,"In a postapocalyptic world, rag-doll robots hi...",Action
3,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,123,A brilliant group of students become card-coun...,Dramas
4,s6,TV Show,46,Serdar Akar,"Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan...",Turkey,"July 1, 2017",2016,1,A genetics professor experiments with a treatm...,International TV
5,s7,Movie,122,Yasir Al Yasiri,"Amina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed...",Egypt,"June 1, 2020",2019,95,"After an awful accident, a couple admitted to ...",Horror Movies


1. Glimpse on data

In [6]:
display(
    print(netflix_df.isna().sum())
    )

display(
    print(len(netflix_df))
    )

display(
    netflix_df.info()
)

display(
    netflix_df.value_counts("type")
)

display(
    netflix_df.value_counts("country")
)

display(
    netflix_df.value_counts("genre")
)

display(
    netflix_df.describe()
)

show_id         0
type            0
title           0
director        0
cast            0
country         0
date_added      0
release_year    0
duration        0
description     0
genre           0
dtype: int64


None

4812


None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4812 entries, 0 to 4811
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       4812 non-null   object
 1   type          4812 non-null   object
 2   title         4812 non-null   object
 3   director      4812 non-null   object
 4   cast          4812 non-null   object
 5   country       4812 non-null   object
 6   date_added    4812 non-null   object
 7   release_year  4812 non-null   int64 
 8   duration      4812 non-null   int64 
 9   description   4812 non-null   object
 10  genre         4812 non-null   object
dtypes: int64(2), object(9)
memory usage: 413.7+ KB


None

type
Movie      4677
TV Show     135
dtype: int64

country
United States     1886
India              864
United Kingdom     311
Canada             155
France             133
                  ... 
Paraguay             1
Guatemala            1
Namibia              1
Iran                 1
Zimbabwe             1
Length: 72, dtype: int64

genre
Dramas                  1343
Comedies                1029
Action                   696
Children                 421
Documentaries            352
Stand-Up                 283
Horror Movies            239
International Movies     100
Classic Movies            69
Thrillers                 49
International TV          39
Crime TV                  30
Uncategorized             25
British TV                20
Independent Movies        20
Anime Features            18
Music                     14
Cult Movies               11
Sci-Fi                    11
Kids                      10
Anime Series               9
Docuseries                 7
TV Shows                   4
Romantic Movies            3
TV Comedies                3
TV Action                  2
LGBTQ Movies               1
Reality TV                 1
Classic                    1
TV Horror                  1
Romantic TV                1
dtype: int64

Unnamed: 0,release_year,duration
count,4812.0,4812.0
mean,2012.711554,99.566708
std,9.517978,30.889305
min,1942.0,1.0
25%,2011.0,88.0
50%,2016.0,99.0
75%,2018.0,116.0
max,2021.0,253.0


2. Filter for desire year (1990)

In [16]:

years = netflix_df.value_counts("release_year")
print(years.head(40))


release_year
2017    646
2018    624
2016    562
2019    488
2020    379
2015    340
2014    224
2013    183
2012    150
2010    125
2011    119
2009    101
2008     99
2006     75
2007     69
2005     59
2004     50
2003     39
2002     39
2001     32
2000     30
1997     26
1999     26
1998     26
1995     16
1993     16
1992     16
1996     15
1990     15
1994     14
1982     14
1991     14
1988     14
1989     12
1973     10
1979      9
1980      9
1986      8
1984      8
1983      8
dtype: int64


In [21]:
netflix_df_90s = netflix_df[netflix_df["release_year"] >= 1990 & netflix_df["release_year"] < 2000]

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [12]:
display(
    netflix_df_1990.info()
)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 15 entries, 240 to 4566
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       15 non-null     object
 1   type          15 non-null     object
 2   title         15 non-null     object
 3   director      15 non-null     object
 4   cast          15 non-null     object
 5   country       15 non-null     object
 6   date_added    15 non-null     object
 7   release_year  15 non-null     int64 
 8   duration      15 non-null     int64 
 9   description   15 non-null     object
 10  genre         15 non-null     object
dtypes: int64(2), object(9)
memory usage: 1.4+ KB


None

In [13]:
display(
    netflix_df_1990.describe()
)

Unnamed: 0,release_year,duration
count,15.0,15.0
mean,1990.0,109.666667
std,0.0,49.710112
min,1990.0,1.0
25%,1990.0,91.0
50%,1990.0,104.0
75%,1990.0,154.0
max,1990.0,174.0
