# 🎬 Netflix Data Storytelling Project
This notebook walks through data loading, preprocessing, cleaning, and storytelling using the Netflix dataset loaded directly from the web.

## 📥 Step 1: Load Dataset from Web

In [None]:
import pandas as pd

# Load dataset from GitHub
url = "https://raw.githubusercontent.com/sagnik1511/netflix-data-analysis/main/netflix_titles.csv"
df = pd.read_csv(url)
df.head()

## 🔍 Step 2: Initial Data Inspection

In [None]:
df.info()
df.isnull().sum()

## 🧹 Step 3: Data Cleaning

In [None]:
# Fill missing values
df["country"].fillna("Unknown", inplace=True)
df["rating"].fillna("Not Rated", inplace=True)
df["director"].fillna("Not Listed", inplace=True)
df["cast"].fillna("Not Listed", inplace=True)

# Drop rows with missing date_added or duration
df.dropna(subset=["date_added", "duration"], inplace=True)

## 🛠️ Step 4: Feature Engineering

In [None]:
# Convert date_added to datetime
df["date_added"] = pd.to_datetime(df["date_added"])
df["year_added"] = df["date_added"].dt.year

# Extract main genre
df["main_genre"] = df["listed_in"].str.split(",").str[0]

# Copy type into a simpler column
df["content_type"] = df["type"]

## 📊 Step 5: Import Visualization Libraries

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")
custom_palette = sns.color_palette("Set2")

## 📖 Story 1: Netflix is Adding More Content Over Time

In [None]:
sns.countplot(x="year_added", data=df, palette="viridis")
plt.title("Netflix Content Additions by Year")
plt.xticks(rotation=45)
plt.xlabel("Year Added")
plt.ylabel("Number of Titles")
plt.show()

> 🧠 Insight: Netflix has added the most content after 2015, with a sharp increase in growth.

## 📖 Story 2: Movies vs TV Shows

In [None]:
sns.countplot(x="content_type", data=df, palette=custom_palette)
plt.title("Content Type Distribution")
plt.xlabel("Type")
plt.ylabel("Number of Titles")
plt.show()

> 🧠 Insight: Netflix has more movies than TV shows in its content library.

## 📖 Story 3: Most Popular Genres

In [None]:
df["main_genre"].value_counts().head(10).plot(kind="bar", color="skyblue")
plt.title("Top 10 Most Common Netflix Genres")
plt.xlabel("Genre")
plt.ylabel("Number of Titles")
plt.xticks(rotation=45)
plt.show()

> 🧠 Insight: Drama and Comedy are the most common genres on Netflix.

## 📖 Story 4: Top Contributing Countries

In [None]:
df["country"].value_counts().head(10).plot(kind="bar", color="tomato")
plt.title("Top 10 Countries Producing Netflix Content")
plt.xlabel("Country")
plt.ylabel("Number of Titles")
plt.xticks(rotation=45)
plt.show()

> 🧠 Insight: The US dominates in Netflix content, followed by India and the UK.

## 📖 Story 5: Rating Distribution of Netflix Titles

In [None]:
top_ratings = df["rating"].value_counts().head(10)
sns.barplot(x=top_ratings.values, y=top_ratings.index, palette="magma")
plt.title("Most Common Netflix Content Ratings")
plt.xlabel("Number of Titles")
plt.ylabel("Rating")
plt.show()

> 🧠 Insight: TV-MA and TV-14 dominate Netflix content, indicating a preference for mature audiences.

## 📖 Story 6: Content Trends by Type Over the Years

In [None]:
sns.histplot(data=df, x="year_added", hue="content_type", multiple="stack", palette="coolwarm")
plt.title("Movies vs TV Shows Over Time")
plt.xlabel("Year Added")
plt.ylabel("Number of Titles")
plt.show()

> 🧠 Insight: TV shows are growing year over year, but movies still dominate overall volume.

## ✅ Project Summary
- Netflix has scaled content aggressively since 2015
- Dramas and Comedies dominate genre-wise
- US leads in production volume
- TV shows have been rising consistently in recent years