# 🎬 Netflix Data Analysis

Welcome to my analysis of the Netflix Movies and TV Shows dataset. This project is part of my portfolio for the Train IT Datathon 2025.

Author: **moriadim**

In [None]:
# 📦 Importing Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Set visual style
sns.set(style='whitegrid')

In [None]:
# 📁 Load Dataset
# The dataset can be downloaded from Kaggle or other open data sources
df = pd.read_csv('netflix_titles.csv')
df.head()

In [None]:
# 🔍 Data Overview
df.info()
df.describe(include='all')

In [None]:
# 🧹 Data Cleaning
# Check for nulls
df.isnull().sum()

# Drop rows with nulls in 'type', 'title', or 'country'
df.dropna(subset=['type', 'title', 'country'], inplace=True)
df.fillna('Unknown', inplace=True)

In [None]:
# 📊 Content Type Distribution
sns.countplot(x='type', data=df, palette='Blues')
plt.title('Distribution of Content Type on Netflix')
plt.xlabel('Type')
plt.ylabel('Count')
plt.show()

In [None]:
# 🌍 Top 10 Countries with Most Content
top_countries = df['country'].value_counts().head(10)
sns.barplot(x=top_countries.values, y=top_countries.index, palette='coolwarm')
plt.title('Top 10 Countries with Most Netflix Content')
plt.xlabel('Count')
plt.ylabel('Country')
plt.show()

In [None]:
# 🗓 Content Added Over Time
df['date_added'] = pd.to_datetime(df['date_added'], errors='coerce')
df['year_added'] = df['date_added'].dt.year
df['year_added'].value_counts().sort_index().plot(kind='bar', figsize=(10,5))
plt.title('Content Added per Year')
plt.xlabel('Year')
plt.ylabel('Number of Titles')
plt.show()

## 📌 Conclusion
- The majority of content on Netflix consists of Movies.
- The USA dominates the content library.
- There has been a rise in content added especially after 2016.

This notebook was created by **moriadim** as part of his portfolio for the Train IT 2025 Datathon.