# 📺 Netflix Data Analysis

Analyzing Netflix content trends using Python and visualizations.

## 📌 Introduction
Netflix has become a global platform with a diverse set of content. This project aims to explore trends such as content release over time, genre distribution, and country-specific insights.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud

# Set styles
sns.set(style="darkgrid")
plt.rcParams['figure.figsize'] = (10, 6)

# Load the dataset
df = pd.read_csv("https://raw.githubusercontent.com/abhijeetburman/Netflix-Data-Analysis/master/netflix_titles.csv")
df.head()

## 🧹 Data Cleaning

In [None]:
# Convert date_added to datetime
df['date_added'] = pd.to_datetime(df['date_added'])

# Fill missing values where appropriate
df['country'].fillna('Unknown', inplace=True)
df['cast'].fillna('Unknown', inplace=True)
df['director'].fillna('Unknown', inplace=True)
df['rating'].fillna('Unknown', inplace=True)

# Check for duplicates
df.drop_duplicates(inplace=True)
df.isnull().sum()

## 📊 Exploratory Data Analysis (EDA)

In [None]:
# Content added over the years
df['year_added'] = df['date_added'].dt.year
df_year = df['year_added'].value_counts().sort_index()

df_year.plot(kind='bar', color='salmon')
plt.title('Content Added Per Year')
plt.xlabel('Year')
plt.ylabel('Number of Titles')
plt.tight_layout()
plt.show()

In [None]:
# Content type distribution
sns.countplot(data=df, x='type', palette='Set2')
plt.title('Distribution of Content Type (Movie vs TV Show)')
plt.show()

In [None]:
# Top countries producing Netflix content
top_countries = df['country'].value_counts().head(10)
top_countries.plot(kind='bar', color='skyblue')
plt.title('Top 10 Countries by Content')
plt.ylabel('Number of Titles')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# WordCloud for most common genres
from collections import Counter
genre_series = df['listed_in'].dropna().str.split(', ')
genre_list = [genre for sublist in genre_series for genre in sublist]
genre_freq = dict(Counter(genre_list))

wc = WordCloud(width=1000, height=500, background_color='black').generate_from_frequencies(genre_freq)
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.title("Most Common Genres on Netflix")
plt.show()

## 🧠 Conclusion
- Most Netflix content is released post-2015, showing rapid platform growth.
- The US dominates content production, followed by India and the UK.
- TV Shows and Movies are fairly balanced.
- Popular genres include Dramas, International Movies, and Comedies.

> 🎯 **Next Steps**: Try filtering by genre, analyzing directors, or comparing ratings across countries!