# 📊 Netflix Dataset Analysis
### Complete Project — Content Trends Analysis for Strategic Recommendations
---
This notebook performs data cleaning, analysis, and visualization on the Netflix dataset to understand content trends over time.
We'll uncover insights about movies vs TV shows, genres, country-wise distribution, and release trends.


In [None]:
# 📦 Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set visual style
sns.set(style='whitegrid', palette='pastel', font_scale=1.1)


In [None]:
# 📥 Load the dataset
from google.colab import files
uploaded = files.upload()

import io
df = pd.read_csv(io.BytesIO(uploaded[list(uploaded.keys())[0]]))
df.head()

## 🧹 Data Cleaning

In [None]:
# Check dataset information
df.info()

# Convert 'Release_Date' to datetime format
df['Release_Date'] = pd.to_datetime(df['Release_Date'], errors='coerce')
df['Year'] = df['Release_Date'].dt.year

# Handle missing values (fill or drop where needed)
df['Country'].fillna('Unknown', inplace=True)
df['Rating'].fillna('Unknown', inplace=True)

# Display cleaned data summary
df.describe(include='all').T.head(10)

## 🎬 Distribution of Movies vs TV Shows

In [None]:
plt.figure(figsize=(6,4))
sns.countplot(data=df, x='Category', palette='Set2')
plt.title('Movies vs TV Shows on Netflix')
plt.xlabel('Category')
plt.ylabel('Count')
plt.show()


## 📈 Content Growth Over the Years

In [None]:
plt.figure(figsize=(10,5))
sns.countplot(data=df, x='Year', hue='Category', palette='muted')
plt.title('Content Added to Netflix Over the Years')
plt.xticks(rotation=45)
plt.xlabel('Year')
plt.ylabel('Count')
plt.legend(title='Category')
plt.show()

## 🌍 Top 10 Countries by Content

In [None]:
top_countries = df['Country'].value_counts().head(10)
plt.figure(figsize=(8,5))
sns.barplot(x=top_countries.values, y=top_countries.index, palette='coolwarm')
plt.title('Top 10 Countries with Most Content on Netflix')
plt.xlabel('Number of Titles')
plt.ylabel('Country')
plt.show()

## 🎭 Most Common Genres

In [None]:
# Extract and count top genres
from collections import Counter
genres = df['Type'].dropna().astype(str).str.split(',')
genre_list = [g.strip() for sublist in genres for g in sublist]
top_genres = pd.Series(genre_list).value_counts().head(10)

plt.figure(figsize=(8,5))
sns.barplot(x=top_genres.values, y=top_genres.index, palette='mako')
plt.title('Top 10 Genres on Netflix')
plt.xlabel('Number of Titles')
plt.ylabel('Genre')
plt.show()

## ⭐ Ratings Distribution

In [None]:
plt.figure(figsize=(10,5))
sns.countplot(data=df, x='Rating', order=df['Rating'].value_counts().index[:10], palette='rocket')
plt.title('Distribution of Ratings on Netflix')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

## 💡 Insights & Strategic Recommendations
- Netflix has a higher volume of **Movies** than **TV Shows**, but the growth rate of TV Shows is increasing.
- The **United States** dominates in content count, followed by **India** and **United Kingdom**.
- Popular genres include **International Dramas**, **Comedies**, and **Documentaries**.
- Netflix should continue investing in **regional content** and **TV Shows**, as these categories are gaining traction.
- Diverse representation across countries can help Netflix expand its **global market share**.
