## Exercise04 - Investigating Netflix Movies

**Netflix!** What started in 1997 as a DVD rental service has since exploded into one of the largest entertainment and media companies.

Given the large number of movies and series available on the platform, it is a perfect opportunity to flex your exploratory data analysis skills and dive into the entertainment industry. Our friend has also been brushing up on their Python skills and has taken a first crack at a CSV file containing Netflix data. They believe that the average duration of movies has been declining. Using your friends initial research, you'll delve into the Netflix data to see if you can determine whether movie lengths are actually getting shorter and explain some of the contributing factors, if any.

You have been supplied with the dataset `netflix_data.csv` , along with the following table detailing the column names and descriptions. This data does contain null values and some outliers, but handling these is out of scope for the project. Feel free to experiment after submitting!

### 💾 The data: netflix_data.csv

| Column | Description |
|--------|-------------|
| `show_id` | The ID of the show |
| `type` | Type of show |
| `title` | Title of the show |
| `director` | Director of the show |
| `cast` | Cast of the show |
| `country` | Country of origin |
| `date_added` | Date added to Netflix |
| `release_year` | Year of Netflix release |
| `duration` | Duration of the show in minutes |
| `description` | Description of the show |
| `genre` | Show genre |

### Tasks :
Your friend suspects that movies are getting shorter and they've found some initial evidence of this. Having peaked your interest, you will perform exploratory data analysis on the `netflix_data.csv` data to understand what may be contributing to movies getting shorter over time. Your analysis will follow these steps:
1. Filter the data to remove TV shows.
2. Investigate and subset the Netflix movie data, keeping only the columns `"title"`, `"country"`, `"genre"`, `"release_year"`, `"duration"`, and saving this into a new DataFrame for example named `netflix_movies`.
3. Filter `netflix_movies` to find the movies that are strictly shorter than 60 minutes, saving the resulting DataFrame; inspect the result to find possible contributing factors.
4. Using a for loop and if/elif statements, iterate through the rows of `netflix_movies` and assign colors of your choice to four genre groups ("Children", "Documentaries", "Stand-Up", and "Other" for everything else). Save the results in a colors list. Initialize a matplotlib figure object called fig and create a scatter plot for movie duration by release year using the colors list to color the points and using the labels `"Release year"` for the x-axis, `"Duration (min)"` for the y-axis, and the title `"Movie Duration by Year of Release"`.
5. After inspecting the plot, answer the question `"Are we certain that movies are getting shorter?"`.


In [7]:
# Importing pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt

# Start coding!
netflix_data = pd.read_csv('netflix_data.csv')

In [8]:
netflix_data.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,duration,description,genre
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",2020,4,In a future where the elite inhabit an island ...,International TV
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,93,After a devastating earthquake hits Mexico Cit...,Dramas
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,78,"When an army recruit is found dead, his fellow...",Horror Movies
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,80,"In a postapocalyptic world, rag-doll robots hi...",Action
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,123,A brilliant group of students become card-coun...,Dramas


In [13]:
netflix_data.shape

## options 1st to calculate length
num_tv_show = len(netflix_data[netflix_data['type'] == 'TV Show'])
# num_movie = len(netflix_data[netflix_data['type'] == "Movies"])

# ## option 2nd
# num_tv_show_sum = (netflix_data['type'] == 'TV Show').sum()
# num_movei_sum = (netflix_data['type'] == 'Movie').sum()

## option 3th
type_counts = netflix_data['type'].value_counts()
num_movies_vc = type_counts.get('Movie', 0)  ## 0 mean if no type Movie it will return 0

print('total data: ', netflix_data.shape[0])
print('total tv show: ', num_tv_show)
print('total movies: ', num_movies_vc)


total data:  7787
total tv show:  2410
total movies:  5377


### Answer 1

In [17]:
## options 1st
netflix_data = netflix_data[netflix_data['type'] != 'TV Show']

## options 2nd
# netflix_data = netflix_data.drop(netflix_data[netflix_data['type'] == 'TV Show'].index)

## options 3th
# netflix_data = netflix_data.query("type != 'TV Show'")

## check if it work
num_tv_show_sum = (netflix_data['type'] == 'TV Show').sum()
print('total tv show: ', num_tv_show_sum)
print('total data: ', netflix_data.shape[0])

total tv show:  0
total data:  5377


### Answer 2 