In [None]:

import pandas as pd

# Load the cleaned dataset
df = pd.read_csv("../data/processed/netflix_tudum_top10_cleaned.csv")

# Preview the data
df.head()


In [None]:

import matplotlib.pyplot as plt

# Sort by Total Watch Hours (just in case)
df_sorted = df.sort_values(by="Total Watch Hours", ascending=False)

# Plot
plt.figure(figsize=(10, 6))
bars = plt.barh(df_sorted["Title"], df_sorted["Total Watch Hours"], color="skyblue")
plt.xlabel("Total Watch Hours")
plt.title("Top 10 Netflix Titles by Watch Hours (This Week)")
plt.gca().invert_yaxis()

# Add labels
for bar in bars:
    width = bar.get_width()
    plt.text(width + 1e6, bar.get_y() + bar.get_height()/2, f"{int(width):,}", va='center')

plt.tight_layout()
plt.show()


In [None]:

# Scatter plot: Runtime (Minutes) vs Total Watch Hours
plt.figure(figsize=(8, 6))
plt.scatter(df["Runtime (Minutes)"], df["Total Watch Hours"], color="crimson", s=100, edgecolor="k", alpha=0.8)

# Annotate each point with its title
for i in range(len(df)):
    plt.text(df["Runtime (Minutes)"][i] + 1,
             df["Total Watch Hours"][i],
             df["Title"][i],
             fontsize=9,
             alpha=0.8)

plt.xlabel("Runtime (Minutes)")
plt.ylabel("Total Watch Hours")
plt.title("Runtime vs Total Watch Hours")
plt.grid(True)
plt.tight_layout()
plt.show()


In [None]:

# Scatter plot: Views vs Weeks in Top 10
plt.figure(figsize=(8, 6))
plt.scatter(df["Total Views"], df["Weeks"], color="darkgreen", s=100, edgecolor="k", alpha=0.8)

# Annotate each point with its title
for i in range(len(df)):
    plt.text(df["Total Views"][i] + 100000,
             df["Weeks"][i],
             df["Title"][i],
             fontsize=9,
             alpha=0.8)

plt.xlabel("Total Views")
plt.ylabel("Weeks in Top 10")
plt.title("Views vs Weeks in Netflix Top 10")
plt.grid(True)
plt.tight_layout()
plt.show()



## ðŸŽ¬ Netflix Top 10 Analysis â€“ Week of [Insert Week]

In this short analysis, I explored Netflix's global Top 10 using data extracted from their official Tudum website. After cleaning and transforming the data, I visualized key metrics to uncover insights:

### ðŸ”¹ Top Titles by Total Watch Hours
Long-form dramas like *The Life List* and *The Core* dominated total watch time â€” suggesting deep engagement and possibly repeat viewings.

### ðŸ”¹ Does Runtime Impact Watch Hours?
A scatter plot comparing runtime vs watch hours showed no direct correlation â€” shorter films like *Sniper: Rogue Mission* still pulled major numbers.

### ðŸ”¹ Views vs Staying Power
The final plot revealed how titles like *Kraven the Hunter* maintain presence in the charts with steady views, while others spike and vanish quickly.

---

**Next Steps:**
- Add historical data to track trends across weeks
- Merge with IMDB/RottenTomatoes to compare quality vs popularity
- Push to a dashboard or automate weekly updates via Airflow

> ðŸ“Œ Built with Python, pandas, matplotlib â€” and deployed the full pipeline to Google Cloud Storage.

ðŸ§  *Want the code or to follow more breakdowns? Check out [github.com/ma2003x](https://github.com/ma2003x)*
