# Netflix Content Analysis (Beginner)
**Goal:** Perform cleaning, EDA and visualizations on a small Netflix sample dataset.
This notebook is self-contained and includes a tiny sample dataset so you can run it immediately.

In [None]:
# !pip install pandas matplotlib seaborn python-dateutil


In [None]:
import pandas as pd

data = [
    ["s1","Movie","The First Sample","Jane Doe","Actor A, Actor B","United States","January 1, 2018",2018,"PG-13","90 min","Dramas","A sample drama about beginnings."],
    ["s2","TV Show","Sample Series","John Smith","Actor C, Actor D","United States, Canada","March 5, 2019",2019,"TV-MA","2 Seasons","Comedies","A comedic series about samples."],
    ["s3","Movie","International Tale","A. Director","","India","June 12, 2017",2017,"R","110 min","International Movies","An international story of culture."],
]
cols = ["show_id","type","title","director","cast","country","date_added","release_year","rating","duration","listed_in","description"]
df = pd.DataFrame(data, columns=cols)
df.head()

In [None]:
df = df.copy()
# normalize column names
df.columns = [c.strip().lower().replace(' ', '_') for c in df.columns]
# parse dates
df['date_added'] = pd.to_datetime(df['date_added'], errors='coerce')
# standardize categorical values
df['type'] = df['type'].str.title().astype('string')
# split multi-country into single string (keep original for reference)
df['country_list'] = df['country'].fillna('').str.split(',').apply(lambda lst: [c.strip() for c in lst if c.strip()])
df['first_country'] = df['country_list'].apply(lambda lst: lst[0] if lst else None)
df

In [None]:
# summary counts
print("Total rows:", len(df))
print("\nMovies vs TV Shows:\n", df['type'].value_counts())
print("\nTop countries (first listed country):\n", df['first_country'].value_counts())
print("\nRelease years:\n", df['release_year'].value_counts().sort_index())

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='whitegrid')

# bar: movies vs tv
plt.figure(figsize=(5,4))
sns.countplot(data=df, x='type')
plt.title('Movies vs TV Shows')
plt.tight_layout()
plt.show()

# top countries
plt.figure(figsize=(6,4))
df['first_country'].value_counts().plot(kind='bar')
plt.title('Top Countries (first country listed)')
plt.ylabel('count'); plt.tight_layout(); plt.show()

# releases by year
plt.figure(figsize=(6,3))
df['release_year'].value_counts().sort_index().plot(kind='line', marker='o')
plt.title('Releases by Year'); plt.ylabel('count'); plt.tight_layout(); plt.show()

## Quick Findings (from sample)
- Movies outnumber TV Shows in this tiny sample.
- Countries are diverse (US, India, etc.).
- Releases in sample cover 2017â€“2019.

> Next steps (for a full dataset): expand to full Kaggle dataset, compute top genres, interactive dashboard (Streamlit/Tableau).