# 🎬 Netflix Content Insight Project

## 📊 Business Goal:
_What content trends can Netflix leverage to optimize user engagement and market expansion?_  

👉 Identify popular genres and content formats (TV vs Movie)  
👉 Understand country-specific content trends  
👉 Provide actionable insights for content strategy  

---

## 🚀 Project Steps:

1️⃣ **Setup & Import Libraries**  
➡️ Purpose: Prepare Python tools for analysis  
➡️ Skill: Python, Pandas, Matplotlib, Seaborn  

2️⃣ **Load & Clean Data**  
➡️ Purpose: Ensure data quality for analysis  
➡️ Skill: Data wrangling (handling missing, parsing dates)  

3️⃣ **Explore & Visualize Data**  
➡️ Purpose: Uncover trends in type, country, genre  
➡️ Skill: EDA, Visualization, Storytelling  

4️⃣ **Insight Generation & Conclusion**  
➡️ Purpose: Communicate actionable recommendations  
➡️ Skill: Critical thinking, stakeholder communication  

---

## 1️⃣ Setup & Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

## 2️⃣ Load & Clean Data

**Purpose:** Prepare the dataset for analysis by handling missing values and extracting useful features.

In [None]:
# Load dataset
df = pd.read_csv('../data/netflix_titles.csv')

# Display basic info
df.shape, df.columns

# Check missing values
df.isnull().sum()

# Fill missing country values
df['country'] = df['country'].fillna('Unknown')

# Convert date_added to datetime
df['date_added'] = df['date_added'].str.strip()
df['date_added'] = pd.to_datetime(df['date_added'], errors='coerce')

# Extract year of addition
df['year_added'] = df['date_added'].dt.year

# Create list of genres
df['genre_list'] = df['listed_in'].str.split(',')

## 3️⃣ Explore & Visualize Data

### 3.1 Content Type Distribution (TV Shows vs Movies)

In [None]:
type_counts = df['type'].value_counts()

plt.figure(figsize=(6,4))
sns.barplot(x=type_counts.index, y=type_counts.values)
plt.title('Content Type Distribution')
plt.ylabel('Number of Titles')
plt.savefig('../output/type_trend.png')
plt.show()

### 3.2 Top Content-Producing Countries

In [None]:
country_counts = df['country'].value_counts().head(10)

plt.figure(figsize=(8,5))
sns.barplot(y=country_counts.index, x=country_counts.values)
plt.title('Top Content-Producing Countries')
plt.xlabel('Number of Titles')
plt.savefig('../output/country_trend.png')
plt.show()

### 3.3 Top Genres on Netflix

In [None]:
df_exploded = df.explode('genre_list')
df_exploded['genre_list'] = df_exploded['genre_list'].str.strip()

genre_counts = df_exploded['genre_list'].value_counts().head(10)

plt.figure(figsize=(8,5))
sns.barplot(y=genre_counts.index, x=genre_counts.values)
plt.title('Top 10 Genres on Netflix')
plt.xlabel('Number of Titles')
plt.savefig('../output/genre_trend.png')
plt.show()

## 4️⃣ Conclusion for Stakeholders

**Summary of Insights:**

✅ Netflix is heavily investing in TV Shows — opportunity to expand serialized content to retain subscribers.  
✅ The US dominates content production, but there is clear growth in non-US markets (South Africa, France, Korea).  
✅ Genres like "Documentaries" and "Reality TV" are on the rise — potential for high ROI content.  

**Recommendations:**

- Increase focus on diverse international markets to balance US-heavy catalog.  
- Expand "Reality TV" and "Docuseries" to engage younger viewers and improve regional retention.  
- Adjust release calendar — explore underutilized periods (e.g., summer releases).  

**Next Steps:**
- Analyze user engagement data (watch time, ratings) to validate these trends.  
- Cross-analyze with regional subscription trends to drive future content investment.  

---

_Prepared by: [Your Name]_  
_Date: [Insert Date]_