# Assignment 01: Git Essentials and Python Notebook for Analysis

**Dataset**: Kaggle "Netflix Movies and TV Shows" (netflix_titles.csv)

**Research Questions**:
1. How has the volume of content added to Netflix changed over time?
2. Does the 'type' (Movie vs. TV Show) follow the same trend?


### Imports

In [5]:
import pandas as pd
import matplotlib.pyplot as plt
import os
from pathlib import Path

print("Libraries loaded successfully.")

Libraries loaded successfully.


### Loading Data Using Relative Path

In [13]:
Data_Dir = Path("data/netflix")
print("Data directory set to: {Data_Dir}")

CSV_PATH = Data_Dir / "netflix_titles.csv"

if not CSV_PATH.exists():
    raise FileNotFoundError(
        f"Missing {CSV_PATH}.\n"
        "Ensure the 'data/netflix' folder exists in your project directory."
    )

df = pd.read_csv(CSV_PATH)
df.head()


Data directory set to: {Data_Dir}


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


### Data Dictionary
**show_id**: Unique identifier for each movie or TV show

**type**: Content classification (movie or TV show)

**title**: The official name of the content

**director**: The person who oversees the content creation and guides the crew 

**cast**: The primary actors, actresses or voice talent

**country**: The country where the production took place

**date_added**: The date the content was added to the Netflix library

**release_year**: The original year the content was released

**rating**: The age-based content rating 

**duration**: Length of the content

**listed_in**: The genre of the content

**description**: A brief summary of the content's plot

### Getting Data Ready for Analysis Using a Custom Function

Here, we are using a custom function to categorize the data in "year_added" to five year intervals which we will later use for analysis of content volume over time.


In [14]:
def five_year_interval(year):
    if pd.isna(year):
        return "Unknown"
    try:
        year = int(year)
        start_year = (year // 5) * 5
        end_year = start_year + 4
        return f"{start_year}-{end_year}"
    except ValueError:
        return "Unknown"