Project Title:
StreamScope – Netflix Content Strategy Analyzer: Insights into Global Streaming Trends
1. Project Scope
StreamScope is a data analytics project based on the Netflix Kaggle dataset.
The goal of this project is to analyze Netflix content to identify trends, patterns, and insights such as:
Distribution of Movies vs TV Shows
Content growth over the years
Most popular genres
Country-wise content production
Rating distribution
The project aims to prepare a clean dataset that can later be used for data visualization and recommendation systems.

2.Success Metrics
The project will be considered successful if:
The dataset is properly cleaned and free of duplicate records
Missing values are handled appropriately
Categorical features (genre, rating, country) are normalized
The dataset is ready for further analysis
Meaningful insights can be extracted from the cleaned data

In [2]:
import pandas as pd
df = pd.read_csv("netflix_titles.csv")
df.head()


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [3]:
print("Step 3.1 — Checking Missing Values")
print(df.isnull().sum())


Step 3.1 — Checking Missing Values
show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64


In [4]:
print("\nStep 3.2 — Removing Duplicate Rows")
before_duplicates = df.shape[0]
df.drop_duplicates(inplace=True)
after_duplicates = df.shape[0]
print("Duplicates Removed:", before_duplicates - after_duplicates)


Step 3.2 — Removing Duplicate Rows
Duplicates Removed: 0


In [5]:
print("\nStep 3.3 — Handling Missing Values")
df['director'] = df['director'].fillna("Unknown")
df['cast'] = df['cast'].fillna("Not Available")
df['country'] = df['country'].fillna("Unknown")
df['rating'] = df['rating'].fillna("Not Rated")
df['duration'] = df['duration'].fillna("Unknown")
df['date_added'] = pd.to_datetime(df['date_added'], errors='coerce')
df.dropna(subset=['title', 'type'], inplace=True)
df.reset_index(drop=True, inplace=True)



Step 3.3 — Handling Missing Values


In [6]:
print("\nStep 3 Completed")
print("Final Shape After Cleaning:", df.shape)

print("\nMissing Values After Cleaning:")
print(df.isnull().sum())


Step 3 Completed
Final Shape After Cleaning: (8807, 12)

Missing Values After Cleaning:
show_id          0
type             0
title            0
director         0
cast             0
country          0
date_added      98
release_year     0
rating           0
duration         0
listed_in        0
description      0
dtype: int64


In [7]:

df['listed_in'] = df['listed_in'].str.lower()
df['rating'] = df['rating'].str.lower()
df['country'] = df['country'].str.lower()


In [8]:
genre_dummies = df['listed_in'].str.get_dummies(sep=', ')
df = pd.concat([df, genre_dummies], axis=1)


In [9]:
df['rating_encoded'] = df['rating'].astype('category').cat.codes


In [11]:
df['country_encoded'] = df['country'].astype('category').cat.codes


In [12]:
print(df.head())
print("Final Shape:", df.shape)

  show_id     type                  title         director  \
0      s1    Movie   Dick Johnson Is Dead  Kirsten Johnson   
1      s2  TV Show          Blood & Water          Unknown   
2      s3  TV Show              Ganglands  Julien Leclercq   
3      s4  TV Show  Jailbirds New Orleans          Unknown   
4      s5  TV Show           Kota Factory          Unknown   

                                                cast        country  \
0                                      Not Available  united states   
1  Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...   south africa   
2  Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...        unknown   
3                                      Not Available        unknown   
4  Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...          india   

  date_added  release_year rating   duration  ... tv action & adventure  \
0 2021-09-25          2020  pg-13     90 min  ...                     0   
1 2021-09-24          2021  tv-ma  2 Seasons  ... 